Remote OpenClaw Blog
Long-Term Memory for AI Agents: Vector Databases and Embeddings in OpenClaw
8 min read ·
Remote OpenClaw Blog
8 min read ·
Every AI model has a context window — a maximum amount of text it can process in a single request. Claude's largest context window is 200K tokens (roughly 150,000 words). GPT-4o supports 128K tokens. That sounds like a lot, but it fills up fast when your agent has been running for weeks or months.
Without long-term memory, your OpenClaw agent is like someone with amnesia. It can have a brilliant conversation right now, but ask it about something from three weeks ago and it's blank. It can't remember that client who emailed about a project in February, the research it did on competitors last month, or the workflow preferences you've been refining over time.
Long-term memory solves this by storing information outside the context window and retrieving it when relevant. The agent doesn't need to remember everything — it just needs to find the right information when it needs it. This is where vector databases and embeddings come in.
For an overview of all memory options in OpenClaw, see the memory configuration guide.
An embedding is a list of numbers (a vector) that represents the meaning of a piece of text. When you embed the sentence "The client wants to reschedule the meeting to Friday," the resulting vector captures the semantic content — it's about scheduling, clients, meetings, and a specific day.
The key insight is that semantically similar texts produce similar vectors. "Can we move the call to next Friday?" and "The client wants to reschedule the meeting to Friday" are different strings but produce vectors that are close together in mathematical space. This enables semantic search — finding information based on meaning rather than exact keyword matching.
In practical terms, here's what happens when your OpenClaw agent uses embeddings:
This process is fast — vector similarity search on tens of thousands of documents takes milliseconds — and it's accurate enough to find relevant context even when the wording is completely different from the original.
RAG stands for Retrieval-Augmented Generation. It's the pattern of retrieving relevant information from an external source and adding it to the AI model's context before generating a response. The term was popularized by a 2020 paper from Facebook AI Research, but the concept is straightforward: give the model better context, get better answers.
In the OpenClaw context, RAG means:
RAG is not the only approach to permanent memory in OpenClaw, but it's the most scalable one. File-based memory (storing facts in YAML or JSON files) works well for a few hundred items. Once you need to search across thousands of documents or conversations, vector-based RAG becomes necessary.
Several vector databases work with OpenClaw. Here's how the main options compare:
| Database | Self-Hosted | Managed Cloud | Best For | Cost |
|---|---|---|---|---|
| ChromaDB | Yes | No | Most OpenClaw operators | Free (open source) |
| Pinecone | No | Yes | Cloud-native, large scale | Free tier + paid plans |
| Weaviate | Yes | Yes | Advanced search features | Free (open source) or cloud |
| Qdrant | Yes | Yes | Performance-critical setups | Free (open source) or cloud |
| pgvector | Yes | Via Postgres hosts | Existing Postgres users | Free (extension) |
For most OpenClaw operators, ChromaDB is the right choice. It's free, self-hosted (your data stays on your server), runs as a lightweight Docker container, and handles collections of several hundred thousand documents without performance issues. Unless you have a specific reason to use something else, start with ChromaDB.
Setting up ChromaDB alongside OpenClaw requires adding a service to your Docker Compose file and configuring OpenClaw to use it as a memory backend.
Add the following service to your existing docker-compose.yml:
chromadb:
image: chromadb/chroma:latest
container_name: openclaw-chromadb
volumes:
- chromadb_data:/chroma/chroma
ports:
- "8000:8000"
restart: unless-stopped
volumes:
chromadb_data:
In your OpenClaw configuration file (config.yaml), add the vector memory settings:
memory:
backend: chromadb
chromadb:
host: chromadb
port: 8000
collection: openclaw_memory
embedding:
provider: openai
model: text-embedding-3-small
Run docker compose up -d to start both containers. OpenClaw will automatically create the collection and begin embedding new conversations. You can verify the connection by checking the OpenClaw logs for a "ChromaDB connected" message.
If you have existing conversation history you want to make searchable, use the built-in migration command:
docker exec openclaw npm run memory:backfill
This reads your existing conversation files and embeds them into ChromaDB. Depending on the volume of data and your embedding provider's rate limits, this can take anywhere from a few seconds to several minutes. For more on persistence methods, see our dedicated guide.
Marketplace
Free skills and AI personas for OpenClaw — browse the marketplace.
Browse the Marketplace →The embedding model determines how well your vector database understands the semantic meaning of stored text. Better embeddings mean more accurate retrieval, which means the agent gets more relevant context.
Common embedding models used with OpenClaw:
nomic-embed-text or mxbai-embed-large via Ollama.For most operators, text-embedding-3-small is the right choice. The cost is negligible (a typical agent generating 100 conversations per day would cost less than $0.10/month in embedding API calls), and the quality is excellent.
If you want to keep everything self-hosted with zero API dependencies, use Ollama with nomic-embed-text. The quality tradeoff is minor for personal use and small-team deployments.
Vector databases and RAG add complexity to your OpenClaw setup. They're worth it in specific scenarios — and overkill in others. Here's a practical guide:
The memory configuration guide covers all memory options, including the simpler file-based approaches that work well for many operators.
After working with hundreds of OpenClaw operators, these are the most common mistakes with vector memory setups:
A vector database stores information as numerical embeddings — mathematical representations of text that capture semantic meaning. OpenClaw needs one for long-term memory because the context window of any AI model is limited. A vector database lets the agent store thousands of past conversations and documents, then retrieve only the relevant ones when answering a new query. This is called RAG (Retrieval-Augmented Generation). See the memory configuration guide for the full picture.
ChromaDB is the recommended choice for most OpenClaw operators. It runs locally as a Docker container alongside OpenClaw, requires no cloud account or API key, and handles collections up to several hundred thousand documents without performance issues. For larger-scale deployments or cloud-native setups, Pinecone or Weaviate are solid alternatives.
No. OpenClaw's built-in memory system (file-based persistence using YAML or JSON) works fine for most personal and small business use cases. Vector databases become valuable when you need the agent to recall information from hundreds of past conversations, search through large document collections, or maintain accurate recall over months of operation. Check our permanent memory guide for simpler alternatives.
ChromaDB itself is free and open source. Running it alongside OpenClaw adds approximately 512MB-1GB of RAM usage to your server. If you're already on a VPS with 8GB RAM, this fits comfortably. The only additional cost is the embedding API calls — using OpenAI's text-embedding-3-small model costs roughly $0.02 per million tokens, which translates to pennies per month for most operators. Browse the marketplace for memory-optimized skills and personas.