How we used four AI agents to turn Jira tickets into reviewed PRs for about $12 each
An event-driven engineering workflow where four specialized Hermes agents handle ticket intake, coding, review, and CI — while humans keep merge authority. Routine tickets go from intake to reviewed PR in about four hours for roughly $12 in AI spend.
Agents in this flow
Intake & Gate Agent
Senior Coder
Code Reviewer
CI / Lint / Style Gate
Before we rebuilt our engineering workflow, our team faced a classic problem: ticket intake → development → review → merge → QA was manual, slow, and created friction at every handoff.
Developers were:
- Manually reading Jira tickets
- Creating branches by hand
- Waiting for code reviews (which took time)
- Manually moving tickets through statuses
- Pushing to QA manually
- Losing context between Jira and GitHub
The cost? 20-30% of dev time spent on ceremony instead of coding. Plus, when QA found bugs, the ticket status in Jira would lag behind what was actually happening in GitHub, creating confusion.
At ~50 routine tickets per quarter, the old workflow consumed roughly 325 engineering hours: 50 tickets × 6.5 hours per ticket. That is about 8 full-time engineering weeks, or roughly 2 months of engineering time. With agents, the human time drops to a few minutes per ticket, while production merge authority stays with a human.
We wanted autonomous agents handling the routine work, while keeping humans in control of the final decision (merging to production). Here's what we built.
1. The Architecture
Our system uses four specialized AI agents running on Hermes and Jira webhooks as event triggers.
The Four Named Agents
1. Mark — The Intake & Gate Agent (Claude 3.5 Haiku)
Job: When a new Jira ticket arrives (DB-* on the Development Board project board), Mark wakes up.
Tasks:
- Validate the ticket is assigned to "Luke The Dev"
- Check if a PR already exists for this ticket (risk gate)
- Check if a similar ticket is already in progress (duplicate gate)
- Create a fresh GitHub branch from origin/prod (never from another feature branch)
- Decide: Is this ticket safe to implement, or are there blockers?
Cost: Cheap — Haiku is ~95% accurate for structured tasks like reading + gating.
Output: If safe → triggers Andrew. If blocked → comments on Jira with blocker reason.
2. Andrew — The Senior Coder (OpenAI 5.5 Pro, fallback Claude Opus 4.8)
Job: Write the actual code.
Tasks:
- Implement the feature/fix based on ticket description + acceptance criteria
- Write tests
- Self-review the code
- Push to GitHub
- Open a PR and link it to the Jira ticket
Cost: Expensive but worth it.
Why the fallback? When 5.5 is rate-limited or unavailable, Claude Opus still produces high-quality code.
Quality gate: Requires exact commit SHA and date in Jira before Mark approves.
3. Rev — The Code Reviewer (Claude 3.5 Haiku)
Job: Review the PR that Andrew opened.
Tasks:
- Check for security issues
- Verify tests actually test the feature
- Run smoke tests (if applicable)
- Leave inline comments on the PR
- If passing → approves PR and moves ticket to "Ready for Human Merge"
Cost: Cheap — Haiku is sufficient for pattern-matching (security anti-patterns, test completeness).
Human override: The PR can't be merged without Luke's manual approval.
4. Mr. Pipeline — The CI/Lint/Style Gate (Claude Haiku)
Job: Runs after every commit.
Tasks:
- Verify code passes Codacy linting rules
- Check test coverage meets minimum (e.g., >75%)
- Validate commit messages follow format
- Run style checks
- Report back to GitHub + Jira
Cost: Very cheap — mostly subprocess calls to existing linters.
Output: Either "ready to merge" or "fix these issues".
Communication Paths
Jira Webhook Event
→ Mark (Gate Check)
→ (If safe) → Andrew (Code)
→ (Diff complete) → Rev (Review)
→ (Approved) → Mr. Pipeline (CI Gate)
→ (Passed) → Jira status: "Ready for QA"
→ (Telegram notification to Luke)
2. The Event-Driven Flow
Step 1: Ticket Created in Jira
- Developer/PM creates a Jira ticket in the DB project board
- Assigns it to "Luke The Dev" (our development filter)
- Webhook fires to our local Jira proxy at 127.0.0.1:XXXX (exposed via Tailscale Funnel)
Step 2: Mark Intake (Orchestration)
Mark runs immediately:
- Is this ticket assigned to "Luke The Dev"? → No? Exit silently (not our workflow)
- Does a GitHub PR already exist for this ticket? → Yes? Gate: "Existing PR found" → Jira comment + wait for completion
- Is a similar ticket already in progress? → Yes? Gate: "Duplicate/in-progress" → Jira comment + escalate to Luke
- Status check: Is the ticket ready to implement? → No? Gate: "Missing acceptance criteria" → Jira comment
- If all gates pass: → Create branch
feature/DB-1234-ticket-namefrom fresh origin/prod → Trigger Andrew to start coding → Jira status: "In Progress"
Example Jira comment from Mark:
All gates passed. Triggering code generation...
- Branch: feature/DB-1234-new-payment-flow
- Assigned to: Andrew (Senior Coder)
- ETA: ~5-10 minutes
Step 3: Andrew Codes (Implementation)
Andrew gets the ticket details + repo context:
- Pull the branch + read CLAUDE.md / AGENTS.md / .cursorrules
- Understand the acceptance criteria
- Write code + tests
- Self-review (security, performance, test quality)
- Push to GitHub
- Open PR, link to Jira ticket DB-1234
- Report completion to Mark
Example PR description (auto-generated):
Fixes DB-1234: New Payment Flow
## Acceptance Criteria
- [ ] Payment form validates card details
- [ ] Supports Stripe + PayPal
- [ ] Handles timeout gracefully
## Tests
- 8 new unit tests
- 2 integration tests (Stripe sandbox)
- Manual test: Can complete checkout end-to-end
## Changes
- app/payment/processor.py (+120 lines)
- app/payment/test_processor.py (+200 lines)
- requirements.txt (added stripe==8.0.0)
Step 4: Rev Reviews (Quality Gate)
Rev automatically reviews the PR:
- Read PR diff
- Check for security issues (SQL injection, XSS, secrets in code)
- Validate tests: Does test count match complexity of change? Are tests actually testing the feature?
- Run smoke tests (if configured)
- Leave detailed comments
- If passing → GitHub approve + Jira status: "Ready for QA"
Example review comment:
Approved (with notes)
Security: Stripe API key properly injected via env var. OK
Tests: 10 tests cover payment flows well. OK
Coverage: 86% (above 75% threshold). OK
Minor: Consider adding timeout test for slow networks.
Step 5: Mr. Pipeline Checks (CI/CD Gate)
Every commit triggers Mr. Pipeline:
- Run Codacy linting rules
- Verify test coverage
- Check code style (Prettier/Black)
- Run unit tests
- Report status to GitHub
Status check:
All CI gates passed
- Linting: OK (0 issues)
- Coverage: 86% OK
- Tests: 12 passed in 45s OK
- Ready to merge when approved
Step 6: Human Approval & Merge
Luke (the human) sees the Telegram notification:
DB-1234: New Payment Flow
Code ready for review
Andrew completed implementation
Rev approved PR
All CI gates passed
Ready for merge: [Link to PR]
Luke manually clicks "Merge" on GitHub. This is intentional. We don't auto-merge — merging to production is a human decision.
Step 7: QA Handoff
Once merged:
- Jira status: "In QA" (auto-transitioned)
- Telegram notification to QA team
- QA tests in staging environment
- If bug found: QA creates a new Jira ticket (QA-*) linked to DB-1234
- When QA approves: Jira status: "Done"
3. The Token Economy (How We Save Money)
I spend approximately $8–$18 per ticket on AI agents. Here's why it's still cheap compared with manual engineering time.
Token Breakdown Per Ticket
| Agent | Model | Tokens | Cost | Why Cheap |
|---|---|---|---|---|
| Mark | Claude Haiku 3.5 | 1K–2K | ~$0.01 | Structured tasks: gating, branch creation |
| Andrew | 5.5 Pro | 80K–150K input; 25K–50K output/reasoning | $7–$14 | Expensive model used once, only after gates pass |
| Rev | Claude Haiku 3.5 | 10K–25K | ~$0.03–$0.10 | Pattern-matching: security, test quality |
| Mr. Pipeline | Claude Haiku 3.5 | 1K–3K | ~$0.01–$0.03 | Mostly subprocess calls: linters |
| Kanban Notification | Haiku | 500–1K | ~$0.01 | Just formatting + Telegram/Jira post |
Total per ticket: ~120K–230K tokens, ~$8–$18.
4. Token Optimization Strategies
Use Cheap Models for Gating. Mark (Haiku) does gate checks, not code generation. Haiku is 95% accurate on structured tasks and costs 1/10th of Opus. We use expensive models (Andrew/o5.5 Pro) only for open-ended code generation.
Never Regenerate, Get It Right Once. Andrew writes code, submits it once. Rev reviews once, doesn't iterate with Andrew. If Rev finds issues, we escalate to Luke (human decision). This prevents token churn from multi-turn loops.
Context Reuse via CLAUDE.md. Every repo has a CLAUDE.md file (guidelines for AI). Mark references it when creating branches. Andrew reads it once, uses it to guide code style. No need to repeat context in every prompt.
Parallel Execution. Mark runs immediately on webhook. If gates pass, Andrew starts (no waiting). Rev reviews in parallel with testing. Mr. Pipeline runs on commit (not dependent on Rev). Parallelism = faster + same token cost.
Stateless Agents. Each agent is independent (no shared state between them). No need for context switching or long-running sessions. Each agent reads Jira + GitHub directly, processes, and exits. Stateless = no wasted tokens on state management.
Kanban Notifications Are Optional. Sending Telegram + Jira comments adds ~$0.01 per notification. For teams that want zero notification overhead, this is opt-in. We batch notifications (don't send one per action).
Skip Redundant Work. If a PR already exists for a ticket, Mark gates it (doesn't generate again). If a ticket is blocked, Mark doesn't trigger Andrew. Short-circuits prevent token waste on dead-end work.
Real Cost Example
A typical feature ticket (DB-1234):
- Mark intake check:
1K tokens ($0.01) - Andrew implementation: ~80K–150K input tokens +
25K–50K output/reasoning tokens ($7–$14) - Rev review:
10K–25K tokens ($0.03–$0.10) - Mr. Pipeline CI:
1K–3K tokens ($0.01–$0.03) - Kanban notifications:
500–1K tokens ($0.01)
Total: ~120K–230K tokens, usually ~$8–$18 per ticket.
Using ~$12 as the average, 50 tickets/quarter costs about ~$600/quarter in AI spend, or roughly ~$200/month. At a heavier run rate of ~20 tickets/week, the same system would cost about ~$240/week, ~$1,040/month, or ~$12,480/year for autonomous code generation + review. For a team of 5 devs, that heavier run rate is roughly ~$208/month per dev in AI labor.
5. The GitHub Branch Strategy
I enforce a strict branching invariant.
Rule: Always Branch from origin/prod
# CORRECT
git checkout -b feature/DB-1234-name origin/prod
# WRONG (creates hidden dependencies)
git checkout -b feature/DB-1234-name feature/DB-999-other
Why? If you branch from another feature branch (DB-999), your PR now implicitly depends on DB-999's PR being merged first. This breaks parallelism and creates merge conflicts.
Naming Convention
Branch names follow the pattern:
feature/DB-1234-short-description
bugfix/DB-1234-short-description
hotfix/DB-1234-short-description
bau/DB-1234-short-description (business-as-usual)
parent/DB-1234 (epic parent branch)
chore/TICKET-1234-description (no prefix for chores)
Rejected patterns (GitHub branch protection rules reject):
fix/DB-1234-name(ambiguous: bugfix or hotfix?)DB-1234-name(no type prefix)my-feature-fix(no Jira ID)
PR Verification Before Merge
Before merging, Luke verifies:
- Commit history contains ONLY DB-1234 changes
- Changed files are relevant to the ticket
- No accidental merge commits
- No stray files from other tickets
- Commit SHA matches what Mark/Andrew reported
This ensures we never accidentally merge unrelated code.
6. Jira Status Automation
I sync Jira statuses with Kanban progress automatically using jira-transition:
jira-transition DB-1234 "In Progress"
jira-transition DB-1234 "Ready for QA"
jira-transition DB-1234 "Done"
Status Flow
Unstarted
↓
Mark triggers Andrew
↓
In Progress (Mark sets this)
↓
Andrew pushes code
↓
Ready for Human Merge (Rev sets this when PR approved)
↓
Luke merges manually
↓
PR merged → Jira auto-transitions to "In QA"
↓
QA approves
↓
Done
Why auto-transition? Without it, the status in Jira lags behind reality (PR is merged in GitHub, but Jira still says "In Progress"). This confuses team members and causes duplicate work.
7. The Telegram Notification System
Every major event sends a Telegram notification to Luke's home channel:
Event: Mark gates passed
"DB-1234 ready for code generation. Triggering Andrew."
Event: Andrew completed code
"DB-1234 complete. PR: github.com/smartways/tms/pull/456. Rev reviewing now..."
Event: Rev approved
"DB-1234 approved. All CI gates passed. Ready to merge: [Link]"
Event: Blocker found
"DB-1234 blocked: Duplicate with DB-999. Please resolve and re-trigger."
Why Telegram?
- Notifications arrive immediately (not email)
- Easy to click through to GitHub/Jira
- Can reply with voice messages (important for busy directors)
- Creates an audit trail (all decisions are in chat)
8. Security Boundaries
The agents do not have production authority.
- Agents can create branches and PRs, but cannot merge protected branches.
- Production merges require Luke's manual approval.
- GitHub branch protection still requires CI to pass.
- Agent credentials are scoped to the minimum permissions needed.
- Secrets are injected through environment/config systems, not pasted into prompts.
- Jira, GitHub, and Telegram create the audit trail for every action.
9. Edge Cases & Escalations
The system handles ~95% of tickets autonomously. Here's what escalates to Luke:
| Scenario | Trigger | Action |
|---|---|---|
| Duplicate ticket | Mark finds existing PR | Comment on Jira, wait for Luke decision |
| Ticket missing criteria | Mark can't parse requirements | Comment on Jira, flag for clarification |
| Code review blocked | Rev finds security issue | Comment on PR, don't approve, escalate |
| CI fails | Mr. Pipeline reports failures | Comment on PR + Jira |
| API rate limit | Agent hits token limit | Queue and retry: exponential backoff |
| Git conflict | Branch diverged from origin/prod | Mark rebases, retries |
Key principle: Agents make bounded, reversible decisions. Humans make ambiguous, architectural, and production decisions.
10. Cost Comparison
Before (All Manual)
- 1 feature ticket: ~4 hours of dev time
- Code review: ~1 hour
- Testing: ~1.5 hours
- Total: 6.5 hours/ticket at $150/hour (loaded cost)
- Cost per ticket: ~$975
After (Hermes + Agents)
- Agent time: ~15 min wall-clock, parallelized
- AI cost: ~$8–$18 per ticket, averaging around ~$12
- Human review: ~5 min, just the merge decision
- Human review cost: ~$12.50 at $150/hour
- Cost per ticket: ~$21–$31, usually around ~$25
Savings: roughly ~97% reduction in labor cost per routine ticket. (One caveat: works best for "routine" features. Complex architectural changes still need human design first.)
11. Monitoring & Observability
I track three key metrics.
1. Agent Success Rate
- Mark (intake): 98% (gates work correctly)
- Andrew (code): 92% (working code on first attempt)
- Rev (review): 95% (catches issues Rev should catch)
- Mr. Pipeline: 99% (CI is deterministic)
2. Time to Reviewed PR
- Before: 2-3 days, mostly waiting for handoffs and review
- After: ~4 hours from ticket intake to reviewed PR
3. Token Spend
- Tracked weekly: avg ~$12/ticket
- Alert if: >$25/ticket, which usually means regeneration, excessive context loading, retry loops, or an unusually large code change.
12. The Future
I am exploring:
- Multi-ticket features: Let Andrew handle 2-3 related tickets in sequence
- Rebase automation: When origin/prod moves, auto-rebase PRs
- QA bot integration: Rev could run actual Selenium tests (not just code review)
- Canary deployments: Auto-promote "low-risk" tickets to staging → prod
- Model iteration: Track which model (o5.5 vs. Opus) produces better code, optimize selection
13. Key Takeaways
- Name your agents. Mark, Andrew, Rev, Mr. Pipeline — each has a role. Makes debugging easier.
- Gate early. Let Mark check for existing PRs, duplicates, and blockers before triggering expensive Andrew. Saves 80% of failed work.
- Use cheap models for filtering. Haiku (Haiku 3.5) is 95% accurate for structured tasks. Reserve o5.5 Pro for open-ended reasoning.
- Never merge automatically. Humans own the merge button. Agents prepare the code; humans deploy it.
- One shot, one agent. Don't iterate 10 times. Write once, review once, merge once.
- Event-driven execution. Intake, coding, review, CI, and notifications are triggered automatically instead of waiting for humans at every handoff. CI and review can overlap where possible. Same token cost, much less wall-clock time.
- Context files (CLAUDE.md). Write once, reuse forever. Saves repetition and token cost.
- Status sync matters. Keep Jira in sync with GitHub reality using jira-transition. Prevents duplicate work and confusion.
- Telegram is your cockpit. Route all notifications there. Easy to scan, easy to act on.
- Monitor your cost. Track tokens per ticket. Anything above $25 for a routine ticket is a red flag: regeneration, infinite loops, excessive repo context, or a larger-than-expected code change.
14. Conclusion
I've built a system that generates 260+ tickets/quarter at an average AI cost of about ~$12 per ticket while keeping humans in control of the final decisions. The key is specialization: each agent does one thing well, gates prevent wasted work, and Telegram keeps everyone aligned.
The workflow isn't magic. It's boring, deterministic, and parallel. That's exactly what we want from production automation.
Thank you Hermes.
Related flows
Hermes + NotebookLM + Obsidian: Build a 3-Agent Research Department That Gets Smarter Every Day
A three-profile Hermes setup where Scout finds signals, Analyst synthesizes through NotebookLM, and Briefer delivers a morning brief — coordinated through a shared Obsidian vault. Roughly $19-27/month, one evening to set up.
How to Become a Hermes Agent Operator
Go from a single Hermes install to a control room orchestrating a team of specialist agents on one cheap VPS. Covers install, memory and SOUL.md, the orchestrator pattern, messaging surfaces, cron, and the operator mindset that makes it all compound.
Hermes + Polymarket: A Self-Learning Up/Down Trading Agent
A step-by-step guide to building a self-learning Hermes agent that trades Polymarket 5-minute up/down crypto markets — VPS setup, Telegram control, CLOB v2 execution, and a self-improving loop that adjusts probability estimates from live results.