All flows
ArchitectureIntermediate

Hermes Agent Masterclass

Everything you need to understand and customize Hermes Agent — self-evolving skills, three-tier memory, GEPA optimization, and going from 1 to 10 specialized agents that work for you 24/7.

Akshay Pachaar (@akshay_pachaar) on XAkshay Pachaar5 min read26 Jun 2026

Agents in this flow

Programmer

Staff engineer that delegates execution to Claude Code

Researcher

Runs a scheduled daily AI/ML digest to Telegram

Designer

Generates illustrations in your visual style via a self-authored skill

What this masterclass covers

Hermes Agent crossed 90,000 GitHub stars in two months. Developers are quietly building personal AI agents that learn their workflow, remember their context, and run 24/7. This guide covers how the learning loop works, what each memory layer does, and how to configure everything from scratch.

By the end, you'll have three fully isolated agents running on your machine — a programmer (delegating to your Claude Code), a deep researcher, and a designer — each with its own personality, memory, skills, and Telegram bot.

Two halves: theory first, hands-on second. Short on time? Skip to Getting up and running — the commands work standalone. But the theory pays off: knowing how skills self-evolve, how memory composes, and when GEPA earns its keep is the difference between using Hermes as a chatbot with notes and using it as something that compounds.

What Hermes actually is

The one-line pitch: an agent that gets better the longer you use it.

What makes that real is that three usually separate capabilities sit in one framework: runtime skill learning, persistent multi-layer memory, and an optional weight-training pipeline. No other open-source agent ships all three.

The closest comparison in the open ecosystem is OpenClaw. Both are persistent and messaging-friendly, but they make opposite architectural choices. A clean framing: "Hermes packages a gateway around a learning agent. OpenClaw packages an agent around a messaging gateway."

How it's built

Everything flows through a single AIAgent class in a run_agent.py script. CLI, messaging gateway, batch runner, IDE integration — they're all entry points into the same core agent. That's what makes the platform-agnostic story actually work.

The core loop is ReAct-style and synchronous: build the system prompt, check if compression is needed, make an interruptible API call, execute any tool calls, loop again. A few details that matter later:

  • The agent can run commands in six different places — local terminal, Docker, SSH, Modal, Daytona, or Singularity. Same code, just a config change. Move execution from your laptop to a cloud GPU server without touching anything else.
  • It works with almost any model. A translation layer routes any provider through one of three API formats. Swap from Claude to GPT to Gemini to local Ollama with one command and nothing breaks.
  • The agent has a hard cap of 90 turns per task. Without it, an agent stuck in a loop would silently burn through your credits. Subagents share the same budget, so a runaway delegation chain can't sneak past either.

Before memory: who is the agent?

Memory is what the agent knows. Skills are how it does things. But neither tells you who it is when it shows up. Hermes solves this with a single file: SOUL.md.

It lives at ~/.hermes/SOUL.md and occupies slot #1 in the system prompt, before anything else loads. It defines personality, tone, communication style, and hard limits.

# SOUL.md
You are a pragmatic senior engineer with strong taste.
You optimize for truth, clarity, and usefulness
over politeness theater.

SOUL.md is hand-authored and static — write it once, tweak it over time, and it stays consistent across every project and session. If the file is missing, Hermes falls back to a built-in default identity. Everything that follows (the memory the agent writes, the skills it creates) happens through the lens of this identity.

The memory system: three tiers, three speeds

Hermes doesn't have a single "memory." It has three layers, each for a different purpose.

Tier 1 — Two tiny Markdown files. MEMORY.md (2,200 chars max) holds the agent's notes about your environment, conventions, tool quirks, and lessons learned. USER.md (1,375 chars max) holds your profile: name, communication preferences, skill level, things to avoid. Both are injected into the system prompt as a frozen snapshot when a session starts. When memory fills up (~80% capacity, shown in the system prompt header), the agent consolidates — merging related entries into denser versions so only useful information survives.

Tier 2 — Full-text session search. Every conversation (CLI and messaging) is stored in SQLite with full-text search. The agent can search weeks of past conversations. The tradeoff: Tier 1 is always in context but tiny; Tier 2 has unlimited capacity but requires an active search plus summarization. Critical facts live in memory; everything else is searchable on demand.

Tier 3 — External memory providers (8 plugins). For deeper persistent memory, Hermes ships with 8 pluggable providers that run alongside built-in memory (never replacing it). Only one can be active at a time. When active, Hermes prefetches relevant memories before each turn, syncs turns after each response, and extracts memories on session end.

Self-evolving skills

Memory handles facts. Skills handle procedures. Skills are Markdown files with YAML frontmatter that function as the agent's procedural memory — not what it knows, but how it does things.

---
name: k8s-pod-debug
description: >
  Activate for crashing pods, CrashLoopBackOff,
  "why is my pod restarting", container failures.
version: 1.2.0
author: agent
platforms: [linux, macos]
---

## Procedure
1. Get pod status → check events → pull logs
2. Look for OOMKilled, ImagePullBackOff, config errors

## Pitfalls
- Forgetting --previous flag on restarted containers

## Verification
- Pod stays Running with 0 restarts for 5+ minutes

To keep token costs low, skills use progressive disclosure: Level 0, the agent sees names + descriptions only (~3k tokens for the full catalog); Level 1, it loads the full skill content when it needs one; Level 2, it can drill into specific reference files.

The self-improvement loop is the core differentiator. The agent creates its own skills autonomously via the skill_manage tool. Creation triggers when it completes a complex task (5+ tool calls), hits errors and finds the working path, gets corrected, or discovers a non-trivial workflow. The loop: encounter a problem → solve it through trial and error → save the successful approach as a SKILL.md → next time, load the skill and follow the proven procedure instead of rediscovering it.

The Curator is garbage collection for skills. Without maintenance, agent-created skills pile up. It runs on an inactivity check — if 7 days have passed since the last run and the agent has been idle for 2+ hours, a background fork spins up with its own prompt cache, never touching the active conversation. It does deterministic automatic transitions (unused 30 days → stale, 90 days → archived) and an LLM review (up to 8 iterations) deciding per-skill whether to keep, patch, consolidate, or archive. Two constraints: it never touches bundled or hub-installed skills, and it never auto-deletes — the worst outcome is recoverable archival. You can pin critical skills with hermes curator pin <skill>.

GEPA: evolving skills offline

The in-agent loop has a known weakness: the agent tends toward self-congratulation — it almost always thinks it performed well, even when it didn't. The same system that auto-generates skills can overwrite manual customizations with worse versions.

GEPA (Genetic-Pareto Prompt Evolution) addresses this. It's not built into the runtime — it lives in a companion repo (NousResearch/hermes-agent-self-evolution) and runs as an offline optimization pipeline (an ICLR 2026 Oral paper, MIT licensed). Instead of asking the agent "did you do well?", GEPA reads execution traces to understand why things failed, then proposes targeted improvements through evolutionary search.

The pipeline: read the current skill → generate an evaluation dataset (synthetic test cases, real session history, or hand-curated golden sets) → run the optimizer (read traces, understand failures, generate candidate variants) → evaluate with LLM-as-judge rubric scoring → apply constraint gates (full test suite must pass 100%, skills stay under 15KB, caching preserved, purpose doesn't drift) → best variant goes out as a PR against the Hermes repo, never a direct commit. No GPU required — roughly $2–10 per run. Skip it initially, but it's highly effective when you hit a wall and don't want to spend on fine-tuning.

To summarise: SOUL.md sets the identity, the runtime loop captures experience, the Curator keeps the library clean, and GEPA makes sure what's in the library actually works.

Getting up and running

Linux, macOS, or WSL2. Python 3.11+ comes with the installer. 8GB RAM is fine for API-based usage.

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
source ~/.bashrc   # or ~/.zshrc

Run the setup wizard (provider, API key, model, tools), then start chatting in the terminal:

hermes setup
hermes

Connect it to Telegram: get a bot token from @BotFather (run /newbot), then get your Telegram user ID from @userinfobot. Point Hermes at the bot and you have a working agent on your phone.

What lives in ~/.hermes/

~/.hermes/
├── config.yaml           # Main configuration
├── .env                  # API keys and secrets
├── auth.json             # OAuth provider credentials
├── SOUL.md               # Agent identity (slot #1 in system prompt)
├── memories/
│   ├── MEMORY.md         # Persistent agent facts
│   └── USER.md           # User model
├── skills/               # All skills (bundled, hub, agent-created)
├── sessions/             # Per-platform session metadata
├── state.db              # SQLite session store with FTS5
├── cron/
│   ├── jobs.json         # Scheduled jobs
│   └── output/           # Cron run outputs
├── plugins/              # Custom plugins
├── hooks/                # Lifecycle hooks
├── skins/                # CLI themes
└── logs/                 # agent.log, gateway.log, errors.log

config.yaml is the source of truth for everything non-secret (edit with hermes config edit). .env holds your secrets — Hermes routes secret-looking values here automatically. state.db is the SQLite database backing session search (WAL-mode safe, FTS5-indexed).

Adding new skills

Hermes maintains an official Skills Hub with 687 skills across 18 categories (87 built-in, 79 optional, 16 from Anthropic, 505 from LobeHub). You can also add any GitHub repo as a custom tap:

hermes skills tap add yourname/your-skills-repo
hermes skills install yourname/your-skills-repo/<skill-name>

Going from 1 to 10 agents

One agent is fine. Multiple specialized agents is where Hermes gets interesting. Hermes has a first-class feature for this called profiles — each profile is a fully isolated instance with its own config, memory, skills, sessions, and SOUL.md. They share nothing by default. We'll set up three.

hermes profile create designer --clone
hermes profile create programmer --clone
hermes profile create researcher --clone
hermes profile list

--clone copies your default profile's config and .env as a starting point.

Give each one its own Telegram bot. Telegram only allows one connection per token, so sharing breaks things. Run /newbot three times with BotFather, then run the gateway wizard once per profile:

hermes -p designer gateway setup
hermes -p programmer gateway setup
hermes -p researcher gateway setup

Give each one a personality via SOUL.md. This is where the agents become genuinely different. For example, the programmer at ~/.hermes/profiles/programmer/SOUL.md:

# Soul

You are my staff engineer. Terse, direct, pragmatic.

You read code before you write code. You write the smallest change
that solves the problem. You prefer standard library over dependencies,
boring tech over shiny tech, and explicit over clever.

Always check: does this already exist in the codebase? Are there
tests? What breaks if this fails? Run the tests before saying "done."

Customizing the programmer: route execution through Claude Code

The programmer is more interesting if it delegates execution to the Claude Code CLI. Hermes orchestrates; Claude Code does the file edits, runs commands, manages git. Start a session and send a single activation prompt:

I already have a Claude Max subscription. You are my staff engineer who
helps me with my day-to-day coding tasks, and under the hood you use
Claude Code for all the executions. Set yourself up accordingly.

The programmer installs the autonomous-ai-agents/claude-code skill on its own, verifies claude is on PATH, and starts using it for code execution. Make sure which claude prints a real binary path before activating.

Customizing the designer: teach it your visual style

Feed it reference designs, let it study them, then ask it to create a skill that generates new images in the same style:

Carefully study these reference illustrations. Note the color palette,
line weight, level of detail, composition, and overall aesthetic.

I want you to create a new skill called "my-design-style" that captures
this visual style. The skill should:

1. Document the style fingerprint in plain language (palette, line
   weights, composition rules, recurring motifs)
2. Include a Python script that takes a text description of a new
   illustration and generates the image using the Nano Banana model
   (google/gemini-2.5-flash-image) via the OpenRouter API in this style
3. Read OPENROUTER_API_KEY from the environment

Use skill_manage to create it. Test the generated script on a sample
prompt before saying it's done.

This is the self-improving loop being used as a setup mechanism — instead of writing a skill by hand, you show the agent good examples and ask it to encode the pattern itself.

Scheduling work: cron in plain English

The researcher's SOUL.md says it's responsible for a daily Telegram digest. Hermes ships with a built-in scheduler — the gateway daemon ticks every 60 seconds, runs due jobs in isolated sessions, and delivers output to whichever platform you specify. Jobs survive restarts. You don't write cron expressions; you describe what you want in English and Hermes converts it.

Every weekday at 8am India time, prepare a deep digest of what's new
in the AI and machine learning space over the last 24 hours. Cover
four streams in this order:

1. Trending GitHub repos (especially new AI/ML tooling)
2. Big tech and lab announcements (Anthropic, OpenAI, Google, Meta, xAI, Nous)
3. Fresh research papers worth reading
4. Social pulse from X, Reddit, and Hacker News

Lead with what changed since yesterday. Cite every claim with a URL.
Keep it under 800 words. Deliver to Telegram.

Set this up as a recurring cron job.

Verify with hermes -p researcher cron list. Other useful patterns:

  • One-shot delays. /cron add 30m "Remind me to check the build" runs once in 30 minutes.
  • Recurring intervals. /cron add "every 2h" "Check server status" runs every two hours.
  • Standard cron expressions. /cron add "0 9 * * 1-5" "..." for weekdays at 9am.
  • Skill attachment. /cron add "every 1h" "Summarize new feed items" --skill blogwatcher loads a skill before running.

You can also chain jobs — one cron's output becomes the next cron's input via a context_from flag, useful for multi-stage automations where a research step feeds a writing step.

That's a wrap

You now have the full picture: identity via SOUL.md, a three-tier memory system, self-evolving skills kept clean by the Curator, GEPA for offline optimization, and a team of three isolated specialist agents running on profiles — each with its own bot, personality, and scheduled work. The whole setup takes minutes and everything here is reproducible on your own hardware.

This flow was shared by a community member. The Hermes Bible is an unofficial, community-built resource and is not affiliated with Nous Research.

Related flows