All flows
Memory & ContextAdvanced

Forget About Memory: Building a Context OS for Your Hermes Agent

Most AI memory is a sticky note. This flow breaks down an 11-layer context architecture for Hermes Agent — identity, facts, procedures, session archives, compression, and scheduled routines — and the distinctions that decide whether your agent actually remembers how you work.

Tony (@tonysimons_) on XTony5 min read17 Jun 2026

The core idea

Most AI "memory" is a sticky note. You paste a few facts into a system prompt — the model remembers that you prefer bullet points and that your cat is named Mittens — and call it solved. That works until you have more than about 20 things to remember, at which point your context window starts eating itself and the agent gets dumber than when you started.

The reframe in this flow is simple but consequential:

Memory isn't a feature. Memory is infrastructure.

If you treat it as a single toggle ("we remember your conversations"), you get a sticky note. If you treat it as a layered stack — identity, facts, procedures, archives, compression, scheduling, and expansion surfaces — you get something that grows with you. The difference is the difference between "I know what you like" and "I know how you work."

What follows is an autopsy of a real Hermes setup that grew, layer by layer, into something closer to a local context operating system.

How to audit your own memory

Before copying anyone's architecture, audit what you actually have. The trick is to refuse vague answers. A polite agent will summarize; a thorough one will show receipts. Push it with explicit constraints:

No guesses, no assumptions. Only local files, configs, databases, command output, code paths, and evidence. Do not generalize. Show me the files, the byte counts, what is active, what is dormant, and what is broken.

The first pass is usually too soft. Push again until you get a structured, per-surface breakdown rather than a friendly overview. The goal is an honest map of every memory surface — including the ones that are aspirational scaffolding you never finished wiring.

The 11 layers

The memory architecture isn't one thing. In this setup it's at least eleven distinct layers, each with a specific job and a specific failure mode when used for the wrong purpose.

Layer 1 — SOUL.md: the identity file

Located at ~/.hermes/SOUL.md, this is the agent's operating identity: personality, role definition, delegation rules, quality standards, and tone. Roughly 15KB of markdown that says things like be direct, be opinionated, delegate aggressively, verify claims before trusting them, push back when I'm vague, and don't write like a LinkedIn influencer. Without it, Hermes still works but sounds like a generic corporate assistant. This is the one file in the stack you'd never delete.

Layer 2 — MEMORY.md and USER.md: always-on context

These live in ~/.hermes/memories/ and ride along in every turn.

  • MEMORY.md — the notebook. Environment facts, tool quirks, project conventions, and durable lessons (e.g. "Hermes cron expressions are interpreted in America/Chicago, not UTC — always verify with hermes_time.now()."). Capped at ~3,500 characters; older entries get compacted or evicted.
  • USER.md — the user profile. Pets, content strategy, preferred review surfaces, and execution preferences. Capped at ~2,500 characters.

The key design decision: these files are small on purpose. They are the warm cache, not the entire brain. Prompt real estate is expensive — if this layer gets too big, you're doing it wrong.

Layer 3 — Holographic memory (fact_store): structured facts

A SQLite-backed store at ~/.hermes/memory_store.db that holds discrete claims rather than paragraphs — with entity resolution, trust scoring, and compositional querying. Think "Tony prefers Codex over Claude" or "project hermes-vault uses MCP protocol" as small queryable atoms.

The "holographic" part refers to HRR-style (holographic reduced representation) compositional reasoning — querying across entities to find overlapping facts. Honest caveat: this layer is easy to leave degraded. If a dependency like NumPy is missing, the compositional path falls back to plain relational mode, and untrained trust scores all sit at a default of 0.5. The architecture is there; the optimization usually isn't.

Layer 4 — Session database and session_search: the archive

At ~/.hermes/state.db sits a database tracking every conversation — in this setup, 1,047 sessions and 48,422 messages across cron jobs, Telegram DMs, CLI, and TUI sessions. The raw receipts live in ~/.hermes/transcripts/ as JSONL files (~475 MB).

The database does not stuff this into the prompt. It's searchable via session_search — ask "what did we do about the Kiln promo pipeline three weeks ago?" and it retrieves the relevant sessions and summarizes. Storing 48,000 messages isn't the flex; knowing which parts are active, searchable, stale, or deliberately kept out of the prompt is.

Layer 5 — LCM: context compression

Long Context Management (~/.hermes/lcm.db) compresses older turns into hierarchical summary nodes when a session runs long, preserving semantic content while reclaiming context-window space. It also externalizes large payloads (big tool outputs, long file reads) to keep the main context lean.

This is survival gear for the current session, not long-term memory. Confusing LCM for continuity across weeks is like confusing working memory for your notes app.

Layer 6 — Skills: procedural memory

Skills turn "what I know about you" into "how I execute your workflows." Each is a markdown file with YAML frontmatter — a self-contained operating procedure for a specific task (publishing a Google Doc, running an X workflow, smart-home control). A mature setup can have 250+ installed.

The distinction that matters: "Tony uses pytest" is a fact (Layer 3). "Run pytest with these exact flags in this exact order" is a skill. Skills are what turn a chatty assistant into an operator.

Layer 7 — Project-local context files

When Hermes enters a project directory, it auto-loads context without polluting global memory:

  • AGENTS.md — project-level agent behavior rules
  • .hermes.md — Hermes-specific project configuration
  • CLAUDE.md / .cursorrules — broader agent conventions
  • SOUL.md — workspace-level identity overrides

This is the memory equivalent of walking into a workshop with your tools laid out exactly where you left them. No global memory required.

Layer 8 — Nexus: the second brain

A local knowledge base (~/nexus/, ~11 MB) of wikis, notes, journals, plans, and briefings. It is not auto-injected — 11 MB would annihilate any context window. Instead, workflows access it: a skill loads a wiki, a cron job pulls from the briefing folder, a research task queries raw notes. Nexus is the library; MEMORY.md is the notebook; session_search is the archive; skills are the SOPs. Different retrieval patterns for different purposes.

Layer 9 — Self-improving files: after-action learning

~/self-improving/ stores lessons from corrections, failures, and successful patterns in tiers:

  • memory.md — hot tier, always loaded, capped at 100 lines
  • projects/ and domains/ — warm tier, loaded on context match
  • archive/ — cold tier, decayed patterns

Honest caveat: these are often write-only from the agent's perspective. Automatic promotion/demotion and scheduled cleanup are easy to leave un-wired — the architecture supports it, but the manual writes don't always happen.

Layer 10 — Cron jobs: scheduled context loops

Scheduled routines that create and consume context rather than store it. A daily planning job generates a structured brief; a Git hygiene job auto-commits dirty repos nightly; a content-radar job turns news into ideas. Each reads from memory (preferences, project state, Nexus) and writes back (new context, artifacts, session entries). Cron jobs are the circulatory system — without them, the brain sits in a jar.

Layer 11 — Hooks, plugins, and MCP: expansion surfaces

The architecture isn't sealed. Hooks fire on events (session start, tool call, output generation). Plugins inject new tools and memory surfaces. MCP servers expose external context — databases, APIs, knowledge bases — as queryable endpoints. These are expansion ports: to make Hermes remember a Notion workspace, you point an MCP server at it instead of rewriting the memory system.

The distinctions that actually matter

This is where most "AI memory" content falls apart. Memory isn't one feature with an on/off switch — it's a stack, and using the wrong layer for the wrong job is worse than no memory at all.

LayerWhat it isWhat it is not
MEMORY.mdWarm cache — small, fast, always-onThe whole brain
session_searchSearchable archive (retrieval)Recall / always-on memory
SkillsProcedures ("how to")Facts ("what is")
NexusReference surface, workflow-accessedAuto-injected context
LCMContext compression for the current sessionLong-term memory
CronScheduled routines that move contextMemory storage

The recurring theme: more memory isn't automatically better. Stale facts, wrong preferences, and outdated procedures make an agent worse. Remembering everything is a terrible design. The actual superpower is knowing what to remember, where to put it, when to load it, and when to let it decay.

What this actually gets you

With the stack in place, the day-to-day payoff is concrete:

  • You repeat yourself less. Environment, projects, workflows, and preferences are already known — no re-explaining that cron runs in Chicago time.
  • It searches old sessions instead of bloating the prompt. Past bugs and decisions are retrievable on demand.
  • It loads procedures through skills. "Publish this as a Google Doc in the articles folder" follows a documented pipeline instead of guessing.
  • It uses project rules inside a repo. Context switches between projects are automatic.
  • It runs scheduled routines. Daily planning, Git hygiene, and idea radar happen without prompting.
  • It builds continuity without one giant prompt blob. Each layer handles its slice; the agent navigates between them.

The honest caveats

No setup like this is perfect, and pretending otherwise is the tell of a demo rather than a daily driver:

  • Holographic trust scoring may be untrained — every fact sitting at the 0.5 default, with no signal about reliability.
  • HRR compositional reasoning degrades to a relational fallback when dependencies (e.g. NumPy) are missing.
  • Some self-improving files are manual-write only; heartbeat scaffolding can exist with no signal feeding it.
  • Large transcript archives are a receipts drawer, not an indexed library — most of it will never be queried.
  • The boundary between "what the agent knows" and "what it can find" stays fuzzy.

This is the difference between architecture and optimization. The architecture is solid. The optimization — training trust scores, installing missing deps, wiring the heartbeat, pruning stale data — is the boring work that's easy to defer.

Why it matters

The industry treats "persistent memory" like a solved checkbox. It isn't. Treat memory as a feature and you get a sticky note. Treat it as layered infrastructure and you get a system that grows with you: each layer handles its slice, the agent navigates between them, and context flows through the system instead of pooling in one giant prompt.

One is a sticky note. The other is a context operating system.

This flow was shared by a community member. The Hermes Bible is an unofficial, community-built resource and is not affiliated with Nous Research.

Related flows