Remote OpenClaw Blog
Hermes Persistent Memory: Architecture and Implementation Guide
10 min read ·
Hermes persistent memory operates through three distinct layers: frozen system prompt memory (MEMORY.md and USER.md injected before every session), episodic skill memory (structured markdown records of past task outcomes), and session search via a SQLite FTS5 database that indexes every conversation the agent has ever had. These layers work on different timescales — seconds, hours, and weeks — and developers building on the Hermes framework need to understand how each one stores, compresses, and retrieves context to build effective agent applications.
The Three-Layer Memory Architecture
Hermes does not have a single memory system — it has three, each operating at a different timescale and serving a different purpose. Understanding the boundaries between these layers is essential for developers who want to build agents that genuinely remember.
| Layer | Storage | Timescale | Retrieval | Size Limit |
|---|---|---|---|---|
| System Prompt (MEMORY.md + USER.md) | Markdown files on disk | Persistent (survives restarts) | Always loaded — injected before first message | ~3,575 chars total (~1,300 tokens) |
| Episodic Skills | Markdown files in ~/.hermes/skills/ | Persistent (survives restarts) | On-demand — loaded by semantic similarity match | Unbounded (one file per skill) |
| Session Search | SQLite FTS5 (~/.hermes/state.db) | Persistent (survives restarts) | On-demand — agent invokes session_search tool | Unbounded (grows with usage) |
| External Provider (optional) | Varies by provider | Persistent | On-demand — provider-specific retrieval | Varies by provider |
The critical distinction: Layer 1 is always present in every conversation. Layers 2, 3, and 4 are demand-loaded — the agent must actively decide to search them. This means the agent's behavior depends heavily on what fits in the bounded Layer 1 files.
Layer 1: System Prompt Memory (Frozen Snapshot)
MEMORY.md and USER.md are loaded into the system prompt before the conversation begins. The agent does not need to search for this information — it already knows it, the same way it knows its own instructions.
The architecture imposes hard character limits: approximately 2,200 characters for MEMORY.md and 1,375 characters for USER.md. These limits exist to prevent memory from consuming too much of the model's context window. At approximately 1,300 tokens combined, both files take up less than 1% of a 200K-token context window.
The agent manages these files through curated summarization. After conversations where the agent learns something new, it rewrites the memory files to incorporate the new information. When files approach their limits, the agent compresses older entries — merging related facts, dropping superseded information, and prioritizing the most operationally useful knowledge. This is not simple truncation; the agent uses its reasoning capabilities to decide what to keep.
The write process is atomic: the agent generates a complete new version of the file and writes it in one operation, avoiding partial writes or corruption. Memory writes happen between turns, not during the middle of a response.
For developers extending Hermes, the key implementation detail is that these files are read once at session initialization. If another process modifies MEMORY.md mid-session, the agent will not see the change until the next session starts. The file path is controlled by the HERMES_HOME environment variable, which defaults to ~/.hermes.
Layer 2: Episodic Skill Memory (On-Demand Retrieval)
After completing complex tasks, Hermes writes structured records of what it tried, what succeeded, and what failed into markdown files stored in ~/.hermes/skills/. These skill records are the agent's episodic memory — its accumulated experience from past work.
On new tasks, the agent embeds the current task description and runs a semantic similarity search against its skill library. High-similarity matches are injected into the planning prompt as context. The retrieval query is something like: "Last time you tried approach X on this type of task, step 3 failed because Y. Consider Z instead."
This creates a feedback loop: the agent gets better at tasks it has attempted before, even if the specific details differ. A skill learned while debugging a Python import error can inform how it approaches a Node.js module resolution failure, because the structural pattern — "dependency not found, check paths and versions" — transfers across languages.
Skill files are not bounded in the same way as MEMORY.md. Each skill is a separate markdown document, and the library grows over time. The retrieval mechanism prevents the library from overwhelming context — only the most relevant skills are loaded, and they are truncated if they exceed a context budget.
For a walkthrough of how skills work in practice, see our Hermes Agent Skills Guide.
Layer 3: Session Search (FTS5 + Summarization Pipeline)
The session search system stores every conversation in a SQLite database at ~/.hermes/state.db. The database uses WAL (Write-Ahead Logging) mode for concurrent read access across Hermes's multi-platform architecture (CLI, Telegram, web).
The database schema consists of four tables:
- sessions — Session metadata including start time, platform, token counts, and billing data.
- messages — Full message history with role (user/assistant/tool), content, timestamp, and session foreign key.
- messages_fts — An FTS5 virtual table that indexes message content for full-text search.
- schema_version — Single-row migration tracking table.
When the agent invokes the session_search tool, the retrieval pipeline executes in five steps: FTS5 searches matching messages ranked by relevance; results are grouped by session; the top N unique sessions (default 3) are loaded; each session's conversation is truncated to approximately 100K characters centered on the match positions; and a fast summarization model generates focused summaries with metadata and surrounding context.
Operator Memory Stack
Operator Memory Stack is the best fit when the problem is long-running context, recall quality, and memory drift.
The summarization step is architecturally important. Rather than dumping raw conversation history into the context window, the system uses a lightweight model to extract only the relevant portions. This keeps the context budget manageable even when the session database contains months of conversations.
After each turn, the agent writes the conversation, tool calls, and results into the database. The FTS5 index updates automatically. There is no separate indexing step or batch process.
Context Compression and Window Management
Context compression is the mechanism that prevents long conversations from exceeding the model's context window. When the active conversation grows too large, Hermes triggers LLM-powered summarization that compresses earlier messages while preserving the most relevant information.
The compression process is not simple truncation. The agent sends the conversation history to a summarization model, which produces a condensed version that retains key decisions, facts, and context. The compressed summary replaces the original messages in the active context, freeing tokens for new conversation turns.
ByteRover's memory provider hooks into this process with a pre-compression extraction hook that fires specifically before Hermes compresses context. This ensures that in-flight facts — information mentioned in the conversation but not yet saved to persistent memory — are captured before they are compressed away.
For developers implementing custom memory providers, the compression event is the critical integration point. If your provider needs to extract information from the conversation, it must do so before compression discards the raw messages.
Implementation Patterns for Developers
Developers building on the Hermes framework should consider these architectural patterns when implementing memory in their applications.
Pattern 1: Memory-First Session Design
Design your agent's system prompt to reference MEMORY.md and USER.md explicitly. The agent should know that it has persistent memory and should check it before asking the user to repeat information. Include instructions like "Check MEMORY.md for project context before asking the user" in your agent's prompt template.
Pattern 2: Skill-Driven Task Improvement
After complex multi-step tasks, trigger skill creation explicitly. The default behavior creates skills from sufficiently complex tasks, but you can lower the complexity threshold in configuration to capture more experiences. Over time, this builds a library of past approaches that the agent retrieves automatically on similar future tasks.
Pattern 3: Session Search as Fallback
Use session search as a fallback for information that did not make it into MEMORY.md or skills. The bounded nature of MEMORY.md means some facts will be compressed away over time. Session search provides a safety net — the raw conversations are always available in SQLite, even if the summarized memory has moved on.
Pattern 4: External Provider for Domain Knowledge
When your agent needs domain-specific long-term memory (customer interactions, project histories, technical documentation), add an external provider. Honcho works well for user-facing agents where understanding the user over time is the goal. Holographic works well for technical agents where precise fact recall from an offline store matters. For a complete comparison of all provider options, see Hermes Agent Persistent Memory Methods.
Limitations and Tradeoffs
The three-layer architecture involves real tradeoffs that affect how developers should design agent applications.
Layer 1 is bounded, which means lossy. The agent must decide what to keep and what to compress. Over weeks of heavy use, early information will be summarized down or dropped entirely. There is no mechanism to mark specific memories as "never delete."
Layer 2 retrieval is only as good as the embedding model. If the semantic similarity between a new task and a past skill is low — even when the underlying pattern is transferable — the skill will not be retrieved. The agent cannot do creative analogical reasoning across skills; it relies on vector proximity.
Layer 3 is keyword-based, not semantic. FTS5 finds text matches, not conceptual matches. If you discussed "database optimization" but search for "query performance," FTS5 may not find the relevant session. Adding an external provider with semantic search closes this gap but adds complexity.
No cross-layer consistency guarantees. MEMORY.md may contain a summarized fact that contradicts the raw conversation in the session database, because the summary was written by the agent's best interpretation at the time. There is no referential integrity between layers.
When not to use the full stack: simple chatbot applications with short, stateless interactions do not benefit from this architecture. The overhead of memory curation, skill creation, and session indexing adds latency and cost that is only justified when persistent recall genuinely improves the user experience.
Related Guides
- Hermes Agent Persistent Memory: Methods That Survive Restarts
- Hermes Agent Memory System Explained
- What Is Hermes Agent?
- OpenClaw Persistent Memory Methods
FAQ
How does Hermes persistent memory differ from OpenClaw MEMORY.md?
Both use markdown files for persistent memory, but Hermes adds bounded curation (the agent decides what to keep within character limits), a separate USER.md file, episodic skill memory from past task outcomes, and FTS5 session search across all conversations. OpenClaw uses a single unbounded MEMORY.md that the agent and user both edit directly.
What happens when Hermes MEMORY.md reaches its character limit?
The agent compresses older entries using LLM-powered summarization. It merges related facts, drops superseded information, and prioritizes operationally useful knowledge. This is lossy — early information may be condensed or removed entirely. There is no way to mark specific memories as permanent.
Can developers access the SQLite session database directly?
Yes. The database at ~/.hermes/state.db is a standard SQLite file that can be queried with any SQLite client. The schema includes sessions, messages, messages_fts, and schema_version tables. Developers can build custom analytics, export conversation history, or integrate with external tools by querying this database directly.
How does Hermes episodic skill memory improve over time?
After complex tasks, Hermes writes structured records of what it tried, what worked, and what failed. On future tasks with similar characteristics, it retrieves these records by semantic similarity and uses them to adjust its approach before execution. The skill library grows with each complex task, creating a feedback loop where the agent performs better on problems it has encountered before.
Does context compression lose important information?
Context compression is inherently lossy. The summarization model attempts to preserve key decisions, facts, and context, but nuances and details from earlier in the conversation may be dropped. External memory providers with pre-compression hooks (like ByteRover) can capture in-flight facts before compression discards the raw messages.
Frequently Asked Questions
How does Hermes persistent memory differ from OpenClaw MEMORY.md?
Both use markdown files for persistent memory, but Hermes adds bounded curation (the agent decides what to keep within character limits), a separate USER.md file, episodic skill memory from past task outcomes, and FTS5 session search across all conversations. OpenClaw uses a single unbounded MEMORY.md that the agent and user both edit directly.
What happens when Hermes MEMORY.md reaches its character limit?
The agent compresses older entries using LLM-powered summarization. It merges related facts, drops superseded information, and prioritizes operationally useful knowledge. This is lossy — early information may be condensed or removed entirely. There is no way to mark specific memories as permanent.
Can developers access the SQLite session database directly?
Yes. The database at ~/.hermes/state.db is a standard SQLite file that can be queried with any SQLite client. The schema includes sessions, messages, messages_fts, and schema_version tables. Developers can build custom analytics, export conversation history, or integrate with external tools by querying this database directly.
How does Hermes episodic skill memory improve over time?
After complex tasks, Hermes writes structured records of what it tried, what worked, and what failed. On future tasks with similar characteristics, it retrieves these records by semantic similarity and uses them to adjust its approach before execution. The skill library grows with each complex task, creating a feedback loop where the agent performs better on problems it has encountered
Does context compression lose important information?
Context compression is inherently lossy. The summarization model attempts to preserve key decisions, facts, and context, but nuances and details from earlier in the conversation may be dropped. External memory providers with pre-compression hooks (like ByteRover) can capture in-flight facts before compression discards the raw messages.