The Complete Guide to Reducing OpenClaw Token Costs (Up to 90% Cheaper)

OpenClaw is genuinely powerful. It's also genuinely expensive if you're not managing it carefully. People routinely exceed $200 in token costs in their first week of tinkering — not because they're doing anything extravagant, but because they don't yet understand where the tokens are actually going.

This guide covers everything: how OpenClaw's token consumption works, the specific settings that matter most, smart model routing, heartbeat management, local models, and an underrated technique involving n8n that can reduce recurring task costs to near zero.

First: How OpenClaw Actually Spends Your Tokens

Most people assume they're paying for the back-and-forth of a conversation. In reality, every single message you send triggers a much larger API call.

What gets sent with every request:

Core OpenClaw system instructions (you can't change these)
Your agents.md file (general agent behaviour instructions)
Your soul.md file (agent personality)
Your memory.md file (accumulated memories from all previous sessions)
The entire conversation history from the current session

So a simple message like "What should I have for lunch?" isn't just a short question. It's that question, plus several kilobytes of config files, plus every previous message in your current session — all sent to the API simultaneously.

Why session length matters so much: If you've been in the same Telegram thread for a week without resetting, every new message drags all seven days of conversation history into the API call. What started as a 10-cent message is now approaching 20-30 cents because of accumulated context.

Autonomous tasks compound this further. When you ask your agent to "restart OpenClaw to the latest version," that's not one API call. It's five: pull updated info, restart the server, run a health check, action any issues found, report back to you. Five cycles, each one carrying the full context load.

Once you understand this, every optimization strategy makes immediate sense.

Part 1: Context Window Management

Check Your Session Status Regularly

Use /status to see exactly how many tokens your current session is consuming. If you've never reset a session, you may be shocked by the number.

Three Commands You Should Know

/compact — Summarises your conversation history into a compressed version without wiping it entirely. Use this when you're mid-session on a complex task and don't want to lose context, but the session has grown too large. A session using 800,000 tokens can often be compacted down to 100,000.

/new (or /reset) — Wipes the current session entirely and starts fresh. The most token-efficient option when you're done with a topic. Before running this on a session where you've done significant work, ask your agent to write a temporary summary file first — then you can hand that file to the new session as a starting point.

/model — Switch models mid-conversation without starting a new session. Useful for doing expensive setup work with a high-capability model, then switching to a cheaper model for ongoing use within the same context.

Keep Your Config Files Lean

Your agents.md, soul.md, and memory.md files get sent with every single request. Bloat in these files directly increases your costs.

Take a snapshot of your file sizes today. Check again in a week. If any file has grown significantly, open it up and review for:

Repeated entries about the same topic (agents sometimes log the same thing twice across sessions)
Outdated information that no longer applies
Unnecessary detail that could be summarised more concisely

Paste bloated files into Claude or ChatGPT and ask for a condensed version that preserves all meaningful information.

Add These Instructions to Your `agents.md`

A few lines in your agent config that pay off in token savings every day:

"Respond in 1–2 paragraphs. Be concise. I'll ask for more detail if I need it." — Long AI responses feel thorough, but most of the time you don't need eight paragraphs. Shorter responses mean smaller output tokens and less context in the next turn.
"Don't narrate what you're about to do. Just do it." — "Let me check that for you!" followed by the actual check is two outputs where one would do. Cutting narration saves a cycle per action.
"Spin off sub-agents for large tasks rather than running them in the main session." — More on this below.

Part 2: Smart Model Routing

The biggest single lever for cost reduction is using the right model for each task.

The mistake most people make: Using a top-tier model like Opus for absolutely everything, including tasks that don't need it.

A practical routing framework:

| Task Type | Recommended Model | |-----------|------------------| | Complex reasoning, architecture decisions, difficult code | Opus or equivalent | | Standard coding tasks | GPT-4.1 or Sonnet | | Research, summarisation, content drafting | Sonnet or Gemini Flash | | Heartbeats and background checks | Lightweight model or local | | Simple reminders and status updates | Cheapest available or local |

Kimi K2.5 (available via OpenRouter) has become a popular choice for primary model duty — near Opus-level capability at a fraction of the cost. Worth testing for your typical use patterns.

OpenRouter is the practical solution here. One API key, one endpoint, access to 600+ models across all major providers. You can assign different models to different agents and different task types without managing multiple accounts or API keys.

To set up model routing: open your openclaw.json and assign models per agent, or simply ask your agent to update its own config based on the routing rules you describe.

Part 3: Heartbeat Optimisation

Heartbeats are one of the most overlooked sources of runaway costs.

By default, heartbeats wake your agent every 30 minutes to check if there's anything it should be doing. If your primary model is Opus, that's roughly 48 wake-ups per day, each one sending your full context load. The math gets ugly fast — potentially $50/month just from the agent sitting idle.

What to do:

Reduce heartbeat frequency. Change the heartbeat interval in your openclaw.json from every 30 minutes to every hour or longer. For most use cases, the agent doesn't need to check in that often.

Set active hours. Configure heartbeats to only run during the hours you're actually working, or at night if you specifically want background automation while you sleep.

Use a cheap model for heartbeats. Heartbeat tasks are simple — "check if anything needs doing." They don't need your best model. Assign a lightweight, inexpensive model specifically for heartbeat checks.

Use a local model for heartbeats (cost: $0). This is the most aggressive option.

Part 4: Using Local Models with Ollama

Running local models eliminates API costs for the tasks you assign to them.

Setup:

Go to ollama.com, download for your OS, and drag it to Applications
Open Ollama and download a model (it walks you through the selection)
Ask your OpenClaw agent: "I just installed Ollama and downloaded [model name]. Can you update my config to use this local model for heartbeats, keeping my current primary model for everything else?"

Your agent will handle the configuration. The result: free heartbeats, with your premium model reserved for tasks that actually need it.

Important caveat: Small local models (30B parameters and below) have weak context handling and poor agentic tool use. They'll drop tasks, miss instructions, and generally frustrate you if you try to use them for real work. Local models are excellent for simple, repetitive background tasks. They're not a substitute for a capable model when you need actual reasoning.

Part 5: Sub-Agent Architecture

One of the most impactful token-saving changes you can make is stopping your main agent from doing large tasks directly.

When your main agent codes a full application, the entire codebase and all intermediate outputs end up in your main session's context. That context stays and grows, making every subsequent message more expensive.

The better pattern: spin off a sub-agent for large tasks. The sub-agent does all the heavy work in its own isolated session, then returns only the completed output to the main session. Your main context only sees the summary, not the entire working process.

Using Codex for development tasks: If you have a ChatGPT subscription, you can install Codex on your agent machine and authenticate with your existing subscription — no additional API costs for development work. Your agent spawns a Codex session for coding tasks, uses its included tokens, and returns the result.

Setting this up: authenticate Codex using auth on your agent machine, then instruct your main agent to route all coding tasks to a Codex sub-agent automatically.

Part 6: Use n8n for Recurring Tasks Instead of OpenClaw

This is the most underrated cost-saving technique, and it's genuinely counterintuitive at first.

The insight: Not every automated task needs to run through OpenClaw's full context stack. Many recurring tasks — daily news summaries, weather reports, email monitoring, scheduled notifications — are simple enough that running them as lightweight n8n workflows is dramatically cheaper.

The difference in practice:

An OpenClaw cron job for a daily report:

Loads full context (agents.md, soul.md, memory.md, session history)
Uses your primary model
Costs 10–50 cents depending on context size and model

An n8n workflow for the same daily report:

Sends a lean API call with just a system message and today's prompt
Uses a cheap model (Minimax 2.5, for example)
Costs less than a cent

You can run three different daily reports — morning, midday, evening — for the same cost as one OpenClaw cron job.

n8n also solves the email monitoring problem elegantly. Instead of having OpenClaw check Gmail every hour (burning tokens each time, even when there's nothing new), write a simple n8n workflow that checks the mailbox and only wakes OpenClaw when there's actually something to handle. OpenClaw tokens spent: zero, unless there's a real task.

Setting up n8n with OpenClaw: n8n can send messages directly into your OpenClaw agent's session thread (in Telegram or Discord) — giving your agent context from external workflows without you lifting a finger.

If running a server or editing config files isn't your comfort zone, n8n's visual interface makes this approachable. It's used for automation by millions of non-technical users, and most of the connection patterns you'd want with OpenClaw are documented.

Part 7: Spending Limits and Token Accountability

Set API key credit limits. In your OpenRouter (or Anthropic API) account, you can set daily or monthly spending caps per API key. A $5/day hard limit prevents any single misconfiguration from becoming a $200 surprise.

Review weekly. Every week, spend ten minutes looking at where your tokens actually went. Which agents are most expensive? Which tasks are costing more than they're worth? Small adjustments based on real data make a bigger difference than optimising in theory.

Build a usage dashboard. If you're comfortable with a bit of development work, having your agent log token consumption by task and model to a database — then visualise it — gives you the observability you need to make intelligent optimisation decisions.

What a Sensible Setup Looks Like

After applying these principles, a reasonable OpenClaw setup looks something like this:

Primary model: Kimi K2 or Sonnet (not Opus) for everyday tasks
Developer sub-agent: GPT-4.1 or equivalent for coding, potentially using a subscription plan's included tokens
Heartbeat model: Ollama local model (free)
Recurring tasks: Handled by n8n where possible, only escalating to OpenClaw when real action is needed
Session hygiene: Regular /compact or /new to prevent context bloat
Lean config files: Reviewed monthly, free of redundant entries
Spending cap: API key daily limit set as a safety net

The combined effect of these changes is an 80–90% reduction in token costs compared to an unoptimised setup — with no meaningful loss in capability for the tasks that matter.

OpenClaw is powerful. It doesn't have to be ruinously expensive.