8 Loops Inside Hermes Agent (And Why They Compound)
A complete map of the eight loops Hermes Agent runs simultaneously — from the millisecond core loop to the weekly Curator — how they nest across timescales, and what breaks when any one of them fails.
Overview
Most agent frameworks have one loop: prompt → response → repeat. Hermes Agent runs 8 loops simultaneously at different timescales, from milliseconds to weeks. Each loop serves a different purpose, and each one makes the others more effective. Stacked together, they create a compounding system that improves with every session.
This flow maps every loop inside Hermes Agent, explains how they nest, and shows what breaks when any of them fails. All technical details are verified against the official Hermes Agent developer documentation.
What is a loop in agent architecture?
A loop is a cycle: do → check → decide → repeat or stop.
Every agent has at least one. The core loop sends a message to the model, gets a response, checks for tool calls, executes them, and loops back. Without it, there is no agent — only a single API call.
What separates frameworks is how many loops they run, at what timescales, and whether those loops feed into each other. Four types of loops exist in agent systems:
| Loop type | What it does |
|---|---|
| Retry loops | Run again after failure. The simplest form. |
| Reflection loops | One agent critiques the output before the next pass. |
| Memory loops | Store a lesson that influences a future run. |
| Skill loops | Encode a procedure that changes how future runs execute. |
Most frameworks implement types 1 and 2. A few implement type 3. Hermes implements all four natively, plus orchestration loops that coordinate across agents and time.
Loop 1 — The core agent loop
Timescale: milliseconds to minutes per turn.
This is the heartbeat. Everything else runs on top of it. The core loop lives in run_agent.py (the AIAgent class). Each turn follows this sequence:
- Receive user message (or continuation from the
/goaljudge) - Append to conversation history
- Build or reuse the cached system prompt (
prompt_builder.py) - Check if compression is needed (>50% context)
- Build API messages from history
- Inject ephemeral prompt layers (budget warnings, context pressure)
- Apply prompt caching markers
- Make an interruptible API call
- Parse response — tool calls? Execute, append results, go to step 5. Text response? Persist session, flush memory, return.
Tool execution: A single tool call runs in the main thread. Multiple tool calls run concurrently via ThreadPoolExecutor, with results reinserted in original call order regardless of completion order.
Iteration budget: Default 90 iterations per session (configurable via agent.max_turns). At 100%, the agent stops and returns a summary. Subagents get independent budgets capped at delegation.max_iterations (default 50).
Interruptible calls: API requests run in a background thread while monitoring an interrupt event. When interrupted, the API thread is abandoned and no partial response enters history.
What breaks without this loop: everything. This is the kernel.
Loop 2 — The Ralph loop (/goal)
Timescale: minutes to hours per goal.
The core idea: keep a goal alive across turns. An auxiliary judge model evaluates after each turn — done or continue?
```text User sets /goal → Turn 1: agent works toward objective Judge evaluates: done? → no ↻ Continuing toward goal (1/20): [judge's reason] Turn 2: agent takes next step ... Turn N: agent completes Judge evaluates: done? → yes ✓ Goal achieved: [reason] ```
Key details:
- Default
max_turns: 20 (configurable viagoals.max_turns) /goal resumeresets the turn counter to zero and continues/subgoaladds acceptance criteria mid-loop without resetting- The judge prompt rewrites to include all subgoals — the goal is only done when the original objective and every subgoal are met
- Goal state persists in
SessionDB.state_meta - The judge runs on the auxiliary client (can be a cheaper model)
```text /goal [description] # start /goal status # check progress /goal pause # pause, preserve context /goal resume # continue, reset counter /goal clear # end /subgoal [text] # add criteria mid-run /undo [N] # take back last N turns ```
What breaks without this loop: the agent completes one turn and stops. No multi-step reasoning, no persistent objectives. Every task must be supervised turn by turn.
Loop 3 — The self-improvement loop
Timescale: runs after completed tasks (minutes to hours).
This is the loop that makes Hermes different. Official documentation describes it as "a closed learning loop."
- Agent completes a task
- Agent reviews what worked
- Agent identifies reusable patterns
- Agent saves the procedure as a skill file →
~/.hermes/skills/[skill-name].md - Next similar task: the agent finds the skill via search
- Agent loads the skill body into context
- Agent executes faster using the documented procedure
- If the procedure improves during use, the agent updates the skill
Skills are not prompt templates — they are full procedures containing trigger conditions, a step-by-step procedure, known pitfalls, verification steps, and required tools. The agent creates and updates them with the skill_manage tool.
The compounding math: From verified user benchmarks, agents with 20+ self-created skills cut research-task time by ~40% compared to a fresh instance. Each completed task potentially creates or refines a skill, so month 3 looks different from day 1.
Nudge system: The loop is triggered by "nudges" — periodic checks that spawn a background fork of AIAgent. The fork runs in its own prompt cache and never touches the active conversation.
What breaks without this loop: every session starts from zero. Day 90 output quality equals day 1.
Loop 4 — The Curator loop
Timescale: runs every 7 days (default), during idle periods.
Skills accumulate. Without maintenance you end up with dozens of narrow near-duplicates that pollute the catalog and waste tokens. The Curator solves this — when both interval_hours has elapsed and the agent has been idle for min_idle_hours, it spawns a background fork that scans skills, archives unused ones, consolidates related procedures, and optimizes descriptions for searchability.
```yaml curator: interval_hours: 168 # 7 days min_idle_hours: 2 # only runs when idle prune_builtins: true # can archive unused built-in skills archive_after_days: 30 # unused threshold ```
```text hermes curator status # check last run hermes curator pause # skip next run hermes curator resume # re-enable ```
Important guarantees: it's triggered by an inactivity check (not a cron daemon), the first run defers by one full interval on new installs, it never auto-deletes (worst case is recoverable archival), and Hub-installed skills are always off-limits.
What breaks without this loop: skill bloat. The agent accumulates hundreds of overlapping skills, context gets polluted, and search returns wrong results.
Loop 5 — The memory loop
Timescale: after each session and periodically during sessions.
Memory operates across three layers:
- Layer 1 — Session memory: the conversation history of the current session, living in RAM and SQLite.
- Layer 2 — Persistent memory (
MEMORY.md+USER.md): facts, preferences, and insights that survive across sessions, auto-written when the agent identifies important information. - Layer 3 — Session recall (FTS5): every CLI and messaging session stored in SQLite (
~/.hermes/state.db) with full-text search that returns actual messages — no LLM summarization, no truncation.
```yaml memory: memory_enabled: true user_profile_enabled: true memory_char_limit: 2200 # ~800 tokens, injected every turn user_char_limit: 1375 # ~500 tokens, injected every turn ```
External memory providers (8 plugins): Mem0 (knowledge graph + semantic retrieval), Honcho (two-peer dialectic), Hindsight, Holographic, RetainDB, ByteRover, Supermemory, and OpenViking. Built-in memory continues working alongside them — the external provider is additive.
What breaks without this loop: the agent forgets everything between sessions. You re-explain your preferences and projects every time.
Loop 6 — The Kanban dispatcher loop
Timescale: every 60 seconds.
The Kanban system is the orchestration layer that coordinates multiple agents and tasks. Every 60 seconds it scans the board (~/.hermes/kanban.db), finds Ready tasks, assigns them to workers, tracks heartbeats on Running tasks, detects and reclaims zombie cards, checks retry budgets, and reports blocked tasks for human review.
Statuses: Triage → To-Do → Ready → Running → Blocked → Done → Archived.
```text hermes kanban swarm ```
The swarm spawns a root orchestrator + parallel workers + a gated verifier + a gated synthesizer + a shared blackboard. When a task enters Blocked, execution pauses for human input (approval buttons are native in Telegram and Slack).
Kanban is deliberately single-host — kanban.db is a local SQLite file and the dispatcher spawns workers on the same machine. For multi-host setups, run an independent board per host and bridge them with delegate_task or a message queue.
What breaks without this loop: multi-agent work becomes manual coordination. Crashed tasks go unnoticed, with no retry and no visibility.
Loop 7 — The compression loop
Timescale: fires when context usage exceeds thresholds.
Hermes runs a dual compression system: a Gateway Session Hygiene safety net at 85% (rough character-based estimate, fires before the agent processes a message) and the Agent ContextCompressor at 50% (the primary system, with access to accurate API-reported token counts).
The algorithm has four phases:
- Prune old tool results (cheap, no LLM call) — results >200 chars outside the protected tail get replaced with a placeholder.
- Check if Phase 1 was enough — re-estimate; if below threshold, done.
- Summarize middle turns — an LLM call summarizes the compressible region. Protected: first 3 messages + last 20. Tool call/result pairs are never split.
- Create new session lineage — compression creates a "child" session ID; memory is flushed to disk before compression to prevent data loss.
```yaml compression: enabled: true threshold: 0.50 # compress at 50% of context window target_ratio: 0.20 # how much of threshold to keep as tail protect_last_n: 20 # recent messages always preserved
context: engine: "compressor" # default, lossy summarization
engine: "lcm" # plugin, lossless context management
```
What breaks without this loop: long sessions hit context limits, API calls fail, and multi-turn
/goalruns become impossible beyond 15-20 turns.
Loop 8 — The sub-agent loop
Timescale: minutes per sub-agent, parallel execution.
delegate_task spawns child agents with isolated context. Each child runs its own core loop (Loop 1) independently, can use /goal, create skills, write to memory, and run compression. Children return summaries to the parent, keeping the parent's context light.
```text delegation: max_concurrent_children: 3 max_iterations: 50 # budget per sub-agent max_spawn_depth: 2 # orchestrator nesting limit
Roles: leaf (default): cannot re-delegate orchestrator: can spawn its own workers ```
```text
Batch (parallel):
delegate_task(tasks=[ {goal: "research topic A", ...}, {goal: "research topic B", ...}, {goal: "research topic C", ...} ]) ```
Token cost note: each sub-agent runs its own full Loop 1 session — 3 concurrent sub-agents ≈ 3x your single-session cost. Use cheaper models for routine sub-agent work and reserve expensive models for the parent orchestrator.
What breaks without this loop: every task runs sequentially in one context. Parallel research, multi-angle analysis, and simultaneous code review all bottleneck on a single agent.
How the loops nest
The loops do not run independently — they nest inside each other and across timescales:
```text WEEKLY: Loop 4 (Curator) runs → cleans skills from Loop 3 → improves accuracy of Loop 7 (Tool Search in skills)
DAILY: Cron job fires → Loop 6 (Kanban) assigns task → Loop 2 (/goal) starts on the task → Loop 1 (Core) executes each turn → Loop 7 (Compression) fires if context grows → Loop 8 (Sub-agents) spawn for parallel work → Each sub-agent runs its own Loop 1 Loop 3 (Self-improvement) fires after task completes → New skill saved Loop 5 (Memory) writes persistent facts
EVERY SESSION: Loop 5 (Memory) injects MEMORY.md + USER.md Loop 1 (Core) runs turns Loop 7 (Compression) manages context Loop 3 (Self-improvement) reviews and saves ```
The compounding chain: Skills (Loop 3) make /goal (Loop 2) faster. The Curator (Loop 4) keeps skills clean and searchable. Memory (Loop 5) gives the core loop context about you. Kanban (Loop 6) orchestrates parallel goals. Compression (Loop 7) keeps long runs affordable. Sub-agents (Loop 8) multiply capacity. Remove any single loop and the others degrade.
How Hermes compares to other loop architectures
Not every framework implements the same loops:
- GenericAgent (12.4K stars) uses minimal seed code (~3K lines, 9 atomic tools) that self-evolves. Its goal mode uses time budgets instead of turn budgets, with reportedly 6x lower token consumption.
- DSPy (25K+ stars, Stanford NLP) treats prompts as programs and optimizes them against metrics — it optimizes the prompt through compilation, where Hermes optimizes the procedure through skill creation.
Hermes's advantage: all 8 loops are native, integrated, and designed to feed each other. Most frameworks implement 2-3 and leave the rest to the user.
Token cost per loop
Not all loops cost tokens equally. Cheapest: Kanban (zero), Curator (minimal), Compression (a net saver). Most expensive: Sub-agents (a multiplier), /goal (up to 20x core turns), and the Core loop (base cost).
Optimization priorities: use the auxiliary model for the /goal judge and compression; lower memory char limits on profiles that don't need deep context; set realistic max_turns per profile (20 for research, 50 only for code); enable Tool Search to avoid loading unused schemas; and run routine cron jobs on cheaper models. Use /usage to measure your actual numbers.
Start here
You don't configure all 8 loops on day one — you start with 2 and the rest come online as your system expands.
- Step 1 — Get Loop 1 + Loop 5 running (5 min): install Hermes, run
hermes setup --portal, start a session, and talk to it. Core and Memory are active from the first message. - Step 2 — Add Loop 2 (10 min): run your first structured
/goalwith an objective, sources, constraints, and a deliverable. Self-improvement (Loop 3) fires automatically after the goal completes. - Step 3 — Add time and orchestration (30 min): set a small cron job (e.g. a morning Telegram news summary). You now have 5 loops running. Kanban, Curator, and Sub-agents activate as usage grows.
The real insight
Agent frameworks are defined by their loops. One loop (prompt → response) is a chat wrapper. Two loops (+ retry) is slightly better. A framework with all 8 is an operating system.
The compounding happens in the intersection of these loops, not in any single one. An agent that improves its own procedures and maintains them and remembers your preferences and orchestrates parallel work and manages its own context is a fundamentally different tool than one that just responds to prompts. That is the loop architecture of Hermes Agent.
Originally written by YanXbt. Technical details verified against the Hermes Agent developer documentation (v0.16.0) and source references including run_agent.py, context_compressor.py, gateway/run.py, and the Curator module.
Related flows
Hermes Agent as a Personal AI Operating System
A layer-by-layer analysis of Hermes mapped to operating-system concepts — memory, profiles, Kanban, cron, /goal, skills, the Curator, Tool Search, the Gateway, voice, and security — plus the compounding effect, token economics, and how it compares to other frameworks.
Hermes Agent FULL GUIDE: Architecture, Setup, and the Self-Improving Loop
A complete walkthrough of how Hermes is put together — installation, model routing, terminal backends, messaging, context and memory engines — and how its self-improving loop turns conversations into permanent upgrades.
Hermes Agent: The Complete Guide — From Zero to Self-Improving AI Employee
An end-to-end guide to running Hermes Agent 24/7: installation, model selection, messaging, the dashboard most people use wrong, use cases, the self-improvement loop, and security.