Engineering AutomationAdvanced

How we used four AI agents to turn Jira tickets into reviewed PRs for about $12 each

An event-driven engineering workflow where four specialized Hermes agents handle ticket intake, coding, review, and CI — while humans keep merge authority. Routine tickets go from intake to reviewed PR in about four hours for roughly $12 in AI spend.

Luke12 min read17 Jun 2026

Agents in this flow

MarkClaude 3.5 Haiku

Intake & Gate Agent

AndrewOpenAI 5.5 Pro (fallback Claude Opus 4.8)

Senior Coder

RevClaude 3.5 Haiku

Code Reviewer

Mr. PipelineClaude Haiku

CI / Lint / Style Gate

Before we rebuilt our engineering workflow, our team faced a classic problem: ticket intake → development → review → merge → QA was manual, slow, and created friction at every handoff.

Developers were:

Manually reading Jira tickets
Creating branches by hand
Waiting for code reviews (which took time)
Manually moving tickets through statuses
Pushing to QA manually
Losing context between Jira and GitHub

The cost? 20-30% of dev time spent on ceremony instead of coding. Plus, when QA found bugs, the ticket status in Jira would lag behind what was actually happening in GitHub, creating confusion.

At ~50 routine tickets per quarter, the old workflow consumed roughly 325 engineering hours: 50 tickets × 6.5 hours per ticket. That is about 8 full-time engineering weeks, or roughly 2 months of engineering time. With agents, the human time drops to a few minutes per ticket, while production merge authority stays with a human.

We wanted autonomous agents handling the routine work, while keeping humans in control of the final decision (merging to production). Here's what we built.

1. The Architecture

Our system uses four specialized AI agents running on Hermes and Jira webhooks as event triggers.

The Four Named Agents

1. Mark — The Intake & Gate Agent (Claude 3.5 Haiku)

Job: When a new Jira ticket arrives (DB-* on the Development Board project board), Mark wakes up.

Tasks:

Validate the ticket is assigned to "Luke The Dev"
Check if a PR already exists for this ticket (risk gate)
Check if a similar ticket is already in progress (duplicate gate)
Create a fresh GitHub branch from origin/prod (never from another feature branch)
Decide: Is this ticket safe to implement, or are there blockers?

Cost: Cheap — Haiku is ~95% accurate for structured tasks like reading + gating.

Output: If safe → triggers Andrew. If blocked → comments on Jira with blocker reason.

2. Andrew — The Senior Coder (OpenAI 5.5 Pro, fallback Claude Opus 4.8)

Job: Write the actual code.

Tasks:

Implement the feature/fix based on ticket description + acceptance criteria
Write tests
Self-review the code
Push to GitHub
Open a PR and link it to the Jira ticket

Cost: Expensive but worth it.

Why the fallback? When 5.5 is rate-limited or unavailable, Claude Opus still produces high-quality code.

Quality gate: Requires exact commit SHA and date in Jira before Mark approves.

3. Rev — The Code Reviewer (Claude 3.5 Haiku)

Job: Review the PR that Andrew opened.

Tasks:

Check for security issues
Verify tests actually test the feature
Run smoke tests (if applicable)
Leave inline comments on the PR
If passing → approves PR and moves ticket to "Ready for Human Merge"

Cost: Cheap — Haiku is sufficient for pattern-matching (security anti-patterns, test completeness).

Human override: The PR can't be merged without Luke's manual approval.

4. Mr. Pipeline — The CI/Lint/Style Gate (Claude Haiku)

Job: Runs after every commit.

Tasks:

Verify code passes Codacy linting rules
Check test coverage meets minimum (e.g., >75%)
Validate commit messages follow format
Run style checks
Report back to GitHub + Jira

Cost: Very cheap — mostly subprocess calls to existing linters.

Output: Either "ready to merge" or "fix these issues".

Communication Paths

Jira Webhook Event
  → Mark (Gate Check)
  → (If safe) → Andrew (Code)
  → (Diff complete) → Rev (Review)
  → (Approved) → Mr. Pipeline (CI Gate)
  → (Passed) → Jira status: "Ready for QA"
  → (Telegram notification to Luke)

2. The Event-Driven Flow

Step 1: Ticket Created in Jira

Developer/PM creates a Jira ticket in the DB project board
Assigns it to "Luke The Dev" (our development filter)
Webhook fires to our local Jira proxy at 127.0.0.1:XXXX (exposed via Tailscale Funnel)

Step 2: Mark Intake (Orchestration)

Mark runs immediately:

Is this ticket assigned to "Luke The Dev"? → No? Exit silently (not our workflow)
Does a GitHub PR already exist for this ticket? → Yes? Gate: "Existing PR found" → Jira comment + wait for completion
Is a similar ticket already in progress? → Yes? Gate: "Duplicate/in-progress" → Jira comment + escalate to Luke
Status check: Is the ticket ready to implement? → No? Gate: "Missing acceptance criteria" → Jira comment
If all gates pass: → Create branch feature/DB-1234-ticket-name from fresh origin/prod → Trigger Andrew to start coding → Jira status: "In Progress"

Example Jira comment from Mark:

All gates passed. Triggering code generation...
- Branch: feature/DB-1234-new-payment-flow
- Assigned to: Andrew (Senior Coder)
- ETA: ~5-10 minutes

Step 3: Andrew Codes (Implementation)

Andrew gets the ticket details + repo context:

Pull the branch + read CLAUDE.md / AGENTS.md / .cursorrules
Understand the acceptance criteria
Write code + tests
Self-review (security, performance, test quality)
Push to GitHub
Open PR, link to Jira ticket DB-1234
Report completion to Mark

Example PR description (auto-generated):

Fixes DB-1234: New Payment Flow

## Acceptance Criteria
- [ ] Payment form validates card details
- [ ] Supports Stripe + PayPal
- [ ] Handles timeout gracefully

## Tests
- 8 new unit tests
- 2 integration tests (Stripe sandbox)
- Manual test: Can complete checkout end-to-end

## Changes
- app/payment/processor.py (+120 lines)
- app/payment/test_processor.py (+200 lines)
- requirements.txt (added stripe==8.0.0)

Step 4: Rev Reviews (Quality Gate)

Rev automatically reviews the PR:

Read PR diff
Check for security issues (SQL injection, XSS, secrets in code)
Validate tests: Does test count match complexity of change? Are tests actually testing the feature?
Run smoke tests (if configured)
Leave detailed comments
If passing → GitHub approve + Jira status: "Ready for QA"

Example review comment:

Approved (with notes)

Security: Stripe API key properly injected via env var. OK
Tests: 10 tests cover payment flows well. OK
Coverage: 86% (above 75% threshold). OK

Minor: Consider adding timeout test for slow networks.

Step 5: Mr. Pipeline Checks (CI/CD Gate)

Every commit triggers Mr. Pipeline:

Run Codacy linting rules
Verify test coverage
Check code style (Prettier/Black)
Run unit tests
Report status to GitHub

Status check:

All CI gates passed
- Linting: OK (0 issues)
- Coverage: 86% OK
- Tests: 12 passed in 45s OK
- Ready to merge when approved

Step 6: Human Approval & Merge

Luke (the human) sees the Telegram notification:

DB-1234: New Payment Flow
Code ready for review
Andrew completed implementation
Rev approved PR
All CI gates passed
Ready for merge: [Link to PR]

Luke manually clicks "Merge" on GitHub. This is intentional. We don't auto-merge — merging to production is a human decision.

Step 7: QA Handoff

Once merged:

Jira status: "In QA" (auto-transitioned)
Telegram notification to QA team
QA tests in staging environment
If bug found: QA creates a new Jira ticket (QA-*) linked to DB-1234
When QA approves: Jira status: "Done"

3. The Token Economy (How We Save Money)

I spend approximately $8–$18 per ticket on AI agents. Here's why it's still cheap compared with manual engineering time.

Token Breakdown Per Ticket

Agent	Model	Tokens	Cost	Why Cheap
Mark	Claude Haiku 3.5	1K–2K	~$0.01	Structured tasks: gating, branch creation
Andrew	5.5 Pro	80K–150K input; 25K–50K output/reasoning	$7–$14	Expensive model used once, only after gates pass
Rev	Claude Haiku 3.5	10K–25K	~$0.03–$0.10	Pattern-matching: security, test quality
Mr. Pipeline	Claude Haiku 3.5	1K–3K	~$0.01–$0.03	Mostly subprocess calls: linters
Kanban Notification	Haiku	500–1K	~$0.01	Just formatting + Telegram/Jira post

Total per ticket: ~120K–230K tokens, ~$8–$18.

4. Token Optimization Strategies

Use Cheap Models for Gating. Mark (Haiku) does gate checks, not code generation. Haiku is 95% accurate on structured tasks and costs 1/10th of Opus. We use expensive models (Andrew/o5.5 Pro) only for open-ended code generation.

Never Regenerate, Get It Right Once. Andrew writes code, submits it once. Rev reviews once, doesn't iterate with Andrew. If Rev finds issues, we escalate to Luke (human decision). This prevents token churn from multi-turn loops.

Context Reuse via CLAUDE.md. Every repo has a CLAUDE.md file (guidelines for AI). Mark references it when creating branches. Andrew reads it once, uses it to guide code style. No need to repeat context in every prompt.

Parallel Execution. Mark runs immediately on webhook. If gates pass, Andrew starts (no waiting). Rev reviews in parallel with testing. Mr. Pipeline runs on commit (not dependent on Rev). Parallelism = faster + same token cost.

Stateless Agents. Each agent is independent (no shared state between them). No need for context switching or long-running sessions. Each agent reads Jira + GitHub directly, processes, and exits. Stateless = no wasted tokens on state management.

Kanban Notifications Are Optional. Sending Telegram + Jira comments adds ~$0.01 per notification. For teams that want zero notification overhead, this is opt-in. We batch notifications (don't send one per action).

Skip Redundant Work. If a PR already exists for a ticket, Mark gates it (doesn't generate again). If a ticket is blocked, Mark doesn't trigger Andrew. Short-circuits prevent token waste on dead-end work.

Real Cost Example

A typical feature ticket (DB-1234):

Mark intake check: ~~1K tokens (~~$0.01)
Andrew implementation: ~80K–150K input tokens + ~~25K–50K output/reasoning tokens (~~$7–$14)
Rev review: ~~10K–25K tokens (~~$0.03–$0.10)
Mr. Pipeline CI: ~~1K–3K tokens (~~$0.01–$0.03)
Kanban notifications: ~~500–1K tokens (~~$0.01)

Total: ~120K–230K tokens, usually ~$8–$18 per ticket.

Using ~$12 as the average, 50 tickets/quarter costs about ~$600/quarter in AI spend, or roughly ~$200/month. At a heavier run rate of ~20 tickets/week, the same system would cost about ~$240/week, ~$1,040/month, or ~$12,480/year for autonomous code generation + review. For a team of 5 devs, that heavier run rate is roughly ~$208/month per dev in AI labor.

5. The GitHub Branch Strategy

I enforce a strict branching invariant.

Rule: Always Branch from origin/prod

# CORRECT
git checkout -b feature/DB-1234-name origin/prod

# WRONG (creates hidden dependencies)
git checkout -b feature/DB-1234-name feature/DB-999-other

Why? If you branch from another feature branch (DB-999), your PR now implicitly depends on DB-999's PR being merged first. This breaks parallelism and creates merge conflicts.

Naming Convention

Branch names follow the pattern:

feature/DB-1234-short-description
bugfix/DB-1234-short-description
hotfix/DB-1234-short-description
bau/DB-1234-short-description (business-as-usual)
parent/DB-1234 (epic parent branch)
chore/TICKET-1234-description (no prefix for chores)

Rejected patterns (GitHub branch protection rules reject):

fix/DB-1234-name (ambiguous: bugfix or hotfix?)
DB-1234-name (no type prefix)
my-feature-fix (no Jira ID)

PR Verification Before Merge

Before merging, Luke verifies:

Commit history contains ONLY DB-1234 changes
Changed files are relevant to the ticket
No accidental merge commits
No stray files from other tickets
Commit SHA matches what Mark/Andrew reported

This ensures we never accidentally merge unrelated code.

6. Jira Status Automation

I sync Jira statuses with Kanban progress automatically using jira-transition:

jira-transition DB-1234 "In Progress"
jira-transition DB-1234 "Ready for QA"
jira-transition DB-1234 "Done"

Status Flow

Unstarted
  ↓
Mark triggers Andrew
  ↓
In Progress (Mark sets this)
  ↓
Andrew pushes code
  ↓
Ready for Human Merge (Rev sets this when PR approved)
  ↓
Luke merges manually
  ↓
PR merged → Jira auto-transitions to "In QA"
  ↓
QA approves
  ↓
Done

Why auto-transition? Without it, the status in Jira lags behind reality (PR is merged in GitHub, but Jira still says "In Progress"). This confuses team members and causes duplicate work.

7. The Telegram Notification System

Every major event sends a Telegram notification to Luke's home channel:

Event: Mark gates passed
"DB-1234 ready for code generation. Triggering Andrew."

Event: Andrew completed code
"DB-1234 complete. PR: github.com/smartways/tms/pull/456. Rev reviewing now..."

Event: Rev approved
"DB-1234 approved. All CI gates passed. Ready to merge: [Link]"

Event: Blocker found
"DB-1234 blocked: Duplicate with DB-999. Please resolve and re-trigger."

Why Telegram?

Notifications arrive immediately (not email)
Easy to click through to GitHub/Jira
Can reply with voice messages (important for busy directors)
Creates an audit trail (all decisions are in chat)

8. Security Boundaries

The agents do not have production authority.

Agents can create branches and PRs, but cannot merge protected branches.
Production merges require Luke's manual approval.
GitHub branch protection still requires CI to pass.
Agent credentials are scoped to the minimum permissions needed.
Secrets are injected through environment/config systems, not pasted into prompts.
Jira, GitHub, and Telegram create the audit trail for every action.

9. Edge Cases & Escalations

The system handles ~95% of tickets autonomously. Here's what escalates to Luke:

Scenario	Trigger	Action
Duplicate ticket	Mark finds existing PR	Comment on Jira, wait for Luke decision
Ticket missing criteria	Mark can't parse requirements	Comment on Jira, flag for clarification
Code review blocked	Rev finds security issue	Comment on PR, don't approve, escalate
CI fails	Mr. Pipeline reports failures	Comment on PR + Jira
API rate limit	Agent hits token limit	Queue and retry: exponential backoff
Git conflict	Branch diverged from origin/prod	Mark rebases, retries

Key principle: Agents make bounded, reversible decisions. Humans make ambiguous, architectural, and production decisions.

10. Cost Comparison

Before (All Manual)

1 feature ticket: ~4 hours of dev time
Code review: ~1 hour
Testing: ~1.5 hours
Total: 6.5 hours/ticket at $150/hour (loaded cost)
Cost per ticket: ~$975

After (Hermes + Agents)

Agent time: ~15 min wall-clock, parallelized
AI cost: ~$8–$18 per ticket, averaging around ~$12
Human review: ~5 min, just the merge decision
Human review cost: ~$12.50 at $150/hour
Cost per ticket: ~$21–$31, usually around ~$25

Savings: roughly ~97% reduction in labor cost per routine ticket. (One caveat: works best for "routine" features. Complex architectural changes still need human design first.)

11. Monitoring & Observability

I track three key metrics.

1. Agent Success Rate

Mark (intake): 98% (gates work correctly)
Andrew (code): 92% (working code on first attempt)
Rev (review): 95% (catches issues Rev should catch)
Mr. Pipeline: 99% (CI is deterministic)

2. Time to Reviewed PR

Before: 2-3 days, mostly waiting for handoffs and review
After: ~4 hours from ticket intake to reviewed PR

3. Token Spend

Tracked weekly: avg ~$12/ticket
Alert if: >$25/ticket, which usually means regeneration, excessive context loading, retry loops, or an unusually large code change.

12. The Future

I am exploring:

Multi-ticket features: Let Andrew handle 2-3 related tickets in sequence
Rebase automation: When origin/prod moves, auto-rebase PRs
QA bot integration: Rev could run actual Selenium tests (not just code review)
Canary deployments: Auto-promote "low-risk" tickets to staging → prod
Model iteration: Track which model (o5.5 vs. Opus) produces better code, optimize selection

13. Key Takeaways

Name your agents. Mark, Andrew, Rev, Mr. Pipeline — each has a role. Makes debugging easier.
Gate early. Let Mark check for existing PRs, duplicates, and blockers before triggering expensive Andrew. Saves 80% of failed work.
Use cheap models for filtering. Haiku (Haiku 3.5) is 95% accurate for structured tasks. Reserve o5.5 Pro for open-ended reasoning.
Never merge automatically. Humans own the merge button. Agents prepare the code; humans deploy it.
One shot, one agent. Don't iterate 10 times. Write once, review once, merge once.
Event-driven execution. Intake, coding, review, CI, and notifications are triggered automatically instead of waiting for humans at every handoff. CI and review can overlap where possible. Same token cost, much less wall-clock time.
Context files (CLAUDE.md). Write once, reuse forever. Saves repetition and token cost.
Status sync matters. Keep Jira in sync with GitHub reality using jira-transition. Prevents duplicate work and confusion.
Telegram is your cockpit. Route all notifications there. Easy to scan, easy to act on.
Monitor your cost. Track tokens per ticket. Anything above $25 for a routine ticket is a red flag: regeneration, infinite loops, excessive repo context, or a larger-than-expected code change.

14. Conclusion

I've built a system that generates 260+ tickets/quarter at an average AI cost of about ~$12 per ticket while keeping humans in control of the final decisions. The key is specialization: each agent does one thing well, gates prevent wasted work, and Telegram keeps everyone aligned.

The workflow isn't magic. It's boring, deterministic, and parallel. That's exactly what we want from production automation.

Thank you Hermes.

This flow was shared by a community member. The Hermes Bible is an unofficial, community-built resource and is not affiliated with Nous Research.

Related flows

Multi-Agent5 min

Hermes + NotebookLM + Obsidian: Build a 3-Agent Research Department That Gets Smarter Every Day

A three-profile Hermes setup where Scout finds signals, Analyst synthesizes through NotebookLM, and Briefer delivers a morning brief — coordinated through a shared Obsidian vault. Roughly $19-27/month, one evening to set up.

YanXbt

Orchestration5 min

How to Become a Hermes Agent Operator

Go from a single Hermes install to a control room orchestrating a team of specialist agents on one cheap VPS. Covers install, memory and SOUL.md, the orchestrator pattern, messaging surfaces, cron, and the operator mindset that makes it all compound.

Mike

Trading5 min

Hermes + Polymarket: A Self-Learning Up/Down Trading Agent

A step-by-step guide to building a self-learning Hermes agent that trades Polymarket 5-minute up/down crypto markets — VPS setup, Telegram control, CLOB v2 execution, and a self-improving loop that adjusts probability estimates from live results.

YanXbt