Remote OpenClaw Blog

Long-Term Memory for AI Agents: Vector Databases and Embeddings in OpenClaw

8 min read · 1 April 2026

Why Long-Term Memory Matters

Every AI model has a context window — a maximum amount of text it can process in a single request. Claude's largest context window is 200K tokens (roughly 150,000 words). GPT-4o supports 128K tokens. That sounds like a lot, but it fills up fast when your agent has been running for weeks or months.

Without long-term memory, your OpenClaw agent is like someone with amnesia. It can have a brilliant conversation right now, but ask it about something from three weeks ago and it's blank. It can't remember that client who emailed about a project in February, the research it did on competitors last month, or the workflow preferences you've been refining over time.

Long-term memory solves this by storing information outside the context window and retrieving it when relevant. The agent doesn't need to remember everything — it just needs to find the right information when it needs it. This is where vector databases and embeddings come in.

For an overview of all memory options in OpenClaw, see the memory configuration guide.

How Embeddings Work

An embedding is a list of numbers (a vector) that represents the meaning of a piece of text. When you embed the sentence "The client wants to reschedule the meeting to Friday," the resulting vector captures the semantic content — it's about scheduling, clients, meetings, and a specific day.

The key insight is that semantically similar texts produce similar vectors. "Can we move the call to next Friday?" and "The client wants to reschedule the meeting to Friday" are different strings but produce vectors that are close together in mathematical space. This enables semantic search — finding information based on meaning rather than exact keyword matching.

In practical terms, here's what happens when your OpenClaw agent uses embeddings:

Storage: Every conversation, document, or piece of context is converted to an embedding vector and stored in the vector database.
Query: When the agent needs to recall something, the current query is also converted to an embedding.
Retrieval: The vector database finds the stored embeddings most similar to the query embedding and returns the original text.
Augmentation: The retrieved text is added to the agent's context window, giving it relevant background for the current task.

This process is fast — vector similarity search on tens of thousands of documents takes milliseconds — and it's accurate enough to find relevant context even when the wording is completely different from the original.

What Is RAG (Retrieval-Augmented Generation)?

RAG stands for Retrieval-Augmented Generation. It's the pattern of retrieving relevant information from an external source and adding it to the AI model's context before generating a response. The term was popularized by a 2020 paper from Facebook AI Research, but the concept is straightforward: give the model better context, get better answers.

In the OpenClaw context, RAG means:

Retrieval: Searching the vector database for information relevant to the current query.
Augmentation: Adding the retrieved information to the prompt, alongside the user's current message.
Generation: The AI model generates its response using both the current conversation and the retrieved context.

RAG is not the only approach to permanent memory in OpenClaw, but it's the most scalable one. File-based memory (storing facts in YAML or JSON files) works well for a few hundred items. Once you need to search across thousands of documents or conversations, vector-based RAG becomes necessary.

Choosing a Vector Database

Several vector databases work with OpenClaw. Here's how the main options compare:

Database	Self-Hosted	Managed Cloud	Best For	Cost
ChromaDB	Yes	No	Most OpenClaw operators	Free (open source)
Pinecone	No	Yes	Cloud-native, large scale	Free tier + paid plans
Weaviate	Yes	Yes	Advanced search features	Free (open source) or cloud
Qdrant	Yes	Yes	Performance-critical setups	Free (open source) or cloud
pgvector	Yes	Via Postgres hosts	Existing Postgres users	Free (extension)

For most OpenClaw operators, ChromaDB is the right choice. It's free, self-hosted (your data stays on your server), runs as a lightweight Docker container, and handles collections of several hundred thousand documents without performance issues. Unless you have a specific reason to use something else, start with ChromaDB.

Stats: ChromaDB Recommended DB; Free Self-Hosted Cost; 1000s Documents Searchable; RAG Retrieval Method — Key numbers to know

ChromaDB Setup with OpenClaw

Setting up ChromaDB alongside OpenClaw requires adding a service to your Docker Compose file and configuring OpenClaw to use it as a memory backend.

Step 1: Add ChromaDB to Docker Compose

Add the following service to your existing docker-compose.yml:

  chromadb:
    image: chromadb/chroma:latest
    container_name: openclaw-chromadb
    volumes:
      - chromadb_data:/chroma/chroma
    ports:
      - "8000:8000"
    restart: unless-stopped

volumes:
  chromadb_data:

Step 2: Configure OpenClaw memory backend

In your OpenClaw configuration file (config.yaml), add the vector memory settings:

memory:
  backend: chromadb
  chromadb:
    host: chromadb
    port: 8000
    collection: openclaw_memory
  embedding:
    provider: openai
    model: text-embedding-3-small

Step 3: Restart and verify

Run docker compose up -d to start both containers. OpenClaw will automatically create the collection and begin embedding new conversations. You can verify the connection by checking the OpenClaw logs for a "ChromaDB connected" message.

Operator Memory Stack

Operator Memory Stack is the best fit when the problem is long-running context, recall quality, and memory drift.

Start With Memory Stack →Compare Best Fits →

Step 4: Backfill existing data (optional)

If you have existing conversation history you want to make searchable, use the built-in migration command:

docker exec openclaw npm run memory:backfill

This reads your existing conversation files and embeds them into ChromaDB. Depending on the volume of data and your embedding provider's rate limits, this can take anywhere from a few seconds to several minutes. For more on persistence methods, see our dedicated guide.

Choosing an Embedding Model

The embedding model determines how well your vector database understands the semantic meaning of stored text. Better embeddings mean more accurate retrieval, which means the agent gets more relevant context.

Common embedding models used with OpenClaw:

OpenAI text-embedding-3-small: Best balance of quality and cost. $0.02 per million tokens. Fast, accurate, and sufficient for most use cases.
OpenAI text-embedding-3-large: Higher dimensional embeddings for marginally better accuracy. $0.13 per million tokens. Worth it only if you have very large, diverse document collections.
Ollama local embeddings: Free, runs on your hardware, no API calls required. Quality is lower than OpenAI's models but acceptable for many use cases. Use nomic-embed-text or mxbai-embed-large via Ollama.
Cohere embed-v3: Competitive with OpenAI on quality. Good option if you're already using Cohere for other purposes.

For most operators, text-embedding-3-small is the right choice. The cost is negligible (a typical agent generating 100 conversations per day would cost less than $0.10/month in embedding API calls), and the quality is excellent.

If you want to keep everything self-hosted with zero API dependencies, use Ollama with nomic-embed-text. The quality tradeoff is minor for personal use and small-team deployments.

When Vector Memory Actually Helps

Vector databases and RAG add complexity to your OpenClaw setup. They're worth it in specific scenarios — and overkill in others. Here's a practical guide:

Use vector memory when:

Your agent has been running for months and you need it to recall information from early conversations.
You've loaded the agent with a large knowledge base (company docs, product catalogs, research papers) that exceeds the context window.
The agent handles customer-facing interactions where recalling past support tickets or preferences improves response quality.
You're running business workflows where the agent needs to reference historical data (past invoices, project timelines, meeting notes).

Skip vector memory when:

You're using OpenClaw for simple, stateless tasks (translations, text formatting, one-off research).
Your agent handles fewer than 50 conversations per week and you're comfortable with file-based memory.
You're still experimenting with OpenClaw and haven't settled on your use case yet. Add vector memory later when you know you need it.

The memory configuration guide covers all memory options, including the simpler file-based approaches that work well for many operators.

Common Mistakes

After working with hundreds of OpenClaw operators, these are the most common mistakes with vector memory setups:

Embedding everything. Not all data is worth embedding. System logs, error messages, and routine status updates add noise to the vector store without improving retrieval quality. Be selective about what gets stored.
Ignoring chunk size. Embedding entire conversations as single documents produces poor retrieval. Split content into meaningful chunks — individual messages, paragraphs, or logical sections. A chunk size of 500-1000 tokens works well for most conversational data.
Not setting relevance thresholds. Vector search always returns results, even when nothing is truly relevant. Set a minimum similarity threshold (typically 0.7-0.8) so the agent only gets context that's actually useful, rather than tangentially related noise.
Skipping backups. ChromaDB stores data on disk, but it's just another Docker volume. Include it in your backup strategy. Losing your vector store means losing your agent's long-term memory. See our guide on persistent memory methods for backup approaches.
Using the wrong embedding model for your language. OpenAI's embedding models work well for English and major European languages. If your agent operates primarily in another language, test retrieval quality and consider multilingual embedding models like Cohere's multilingual option.

Frequently Asked Questions

What is a vector database and why does OpenClaw need one?

A vector database stores information as numerical embeddings — mathematical representations of text that capture semantic meaning. OpenClaw needs one for long-term memory because the context window of any AI model is limited. A vector database lets the agent store thousands of past conversations and documents, then retrieve only the relevant ones when answering a new query. This is called RAG (Retrieval-Augmented Generation). See the memory configuration guide for the full picture.

Which vector database should I use with OpenClaw?

ChromaDB is the recommended choice for most OpenClaw operators. It runs locally as a Docker container alongside OpenClaw, requires no cloud account or API key, and handles collections up to several hundred thousand documents without performance issues. For larger-scale deployments or cloud-native setups, Pinecone or Weaviate are solid alternatives.

Do I need a vector database for basic OpenClaw use?

No. OpenClaw's built-in memory system (file-based persistence using YAML or JSON) works fine for most personal and small business use cases. Vector databases become valuable when you need the agent to recall information from hundreds of past conversations, search through large document collections, or maintain accurate recall over months of operation. Check our permanent memory guide for simpler alternatives.

How much does it cost to run ChromaDB with OpenClaw?

ChromaDB itself is free and open source. Running it alongside OpenClaw adds approximately 512MB-1GB of RAM usage to your server. If you're already on a VPS with 8GB RAM, this fits comfortably. The only additional cost is the embedding API calls — using OpenAI's text-embedding-3-small model costs roughly $0.02 per million tokens, which translates to pennies per month for most operators. Browse the marketplace for memory-optimized skills and personas.

Ready to choose the right OpenClaw workflow?

Operator Memory StackOperator Memory Stack is the best fit when the problem is long-running context, recall quality, and memory drift.Compare Best FitsUse the marketplace filters to choose the right bundle, persona, or skill without browsing blind.More GuidesBrowse 200+ free OpenClaw guides, tutorials, and comparisons.

Loading article