Remote OpenClaw Blog
OpenClaw Rate Limits and Token Costs: How to Stay Under Budget
10 min read ·
Remote OpenClaw Blog
10 min read ·
Most operators expect OpenClaw to cost about the same as a Claude Pro subscription — $20/month. Then they get their first API bill and it is $40, $60, or $120. The gap between expectation and reality comes from three sources that are not obvious until you understand how OpenClaw uses tokens.
OpenClaw runs a heartbeat check every few minutes to maintain state and process incoming messages. Each heartbeat sends context to the LLM, even when no new messages have arrived. On a deployment with a large SOUL.md, 4-5 active skills, and persistent memory, a single heartbeat can consume 2,000-4,000 tokens.
At one heartbeat every 3 minutes, that is 480 heartbeats per day, consuming roughly 1-2 million tokens daily on context alone — before you even ask it to do anything.
Every conversation with OpenClaw includes the full SOUL.md, active skill definitions, memory context, and conversation history in the prompt. A typical deployment sends 8,000-15,000 tokens of context with every API call. When the LLM generates a 500-token response, you are paying for 15,500 tokens total — and only 500 of those are the actual answer.
When OpenClaw performs a complex task, it chains multiple tool calls together. A single request like "research this competitor and update my CRM" might trigger:
Each call includes the full context. Five calls at 15,000 tokens of context each means 75,000 tokens consumed for what felt like a single request.
Understanding per-token pricing is essential for budgeting. Here are the current rates for models commonly used with OpenClaw (as of April 2026):
The price difference between Opus and Haiku is 60x for input and 60x for output. That gap is where model routing delivers massive savings.
Model routing sends each task to the cheapest model capable of handling it. Simple tasks (classification, formatting, yes/no decisions) go to Haiku or a local model. Complex tasks (research, long-form writing, multi-step reasoning) go to Sonnet or Opus.
# In your OpenClaw config
model_routing:
default: claude-sonnet
rules:
- task_type: classification
model: claude-haiku
- task_type: formatting
model: claude-haiku
- task_type: simple_qa
model: claude-haiku
- task_type: summarization
model: claude-sonnet
- task_type: research
model: claude-sonnet
- task_type: long_form_writing
model: claude-sonnet
- task_type: complex_reasoning
model: claude-opus
- task_type: code_generation
model: claude-sonnet
In a typical OpenClaw deployment, task distribution breaks down roughly as follows:
Without model routing, all tasks go to your default model (usually Sonnet). With routing, 60-70% of your token spend drops by 12x. For a deployment spending $40/month on Sonnet across the board, routing cuts costs to approximately $16-22/month.
Spending caps prevent runaway costs from infinite loops, misconfigured workflows, or unexpectedly verbose tasks. Without a cap, a single misconfigured cron job can burn through $50-100 in a single day before you notice.
# In your OpenClaw config
spend_cap_daily_usd: 5.00
spend_cap_monthly_usd: 100.00
spend_cap_action: pause # Options: pause, notify, downgrade
# When cap is hit:
# pause → stops all API calls until next day/month
# notify → sends Telegram alert but continues running
# downgrade → switches all tasks to cheapest model
The downgrade action is the most practical choice for production deployments. When you hit your daily cap, OpenClaw switches to Haiku (or a local model if configured) for the rest of the day. Quality degrades for complex tasks, but the agent stays running for critical operations.
Start with spend_cap_daily_usd: 5.00 for a single-user deployment. Monitor your actual daily spend for two weeks, then adjust. Most operators settle between $2-8 per day depending on workload.
The monthly cap should be set at roughly 25x your daily cap (not 30x) to leave headroom for high-usage days without exceeding your budget.
Ollama lets you run open-source language models on your own hardware. For tasks that do not require frontier-model intelligence — classification, formatting, template filling, simple Q&A — local models perform adequately at zero marginal cost.
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model (Llama 3 8B is a good starting point)
ollama pull llama3:8b
# Verify it's running
ollama list
Then configure OpenClaw to use Ollama for specific task types:
# In your OpenClaw config
model_routing:
default: claude-sonnet
rules:
- task_type: classification
model: ollama/llama3:8b
- task_type: formatting
model: ollama/llama3:8b
- task_type: simple_qa
model: ollama/llama3:8b
- task_type: heartbeat
model: ollama/llama3:8b
ollama:
endpoint: http://localhost:11434
timeout: 30s
Moving heartbeat checks to a local model alone can reduce API costs by 30-40% because heartbeats account for a disproportionate share of total token usage.
Rate limits cap how many API requests you can make per minute. Hitting a rate limit does not cost money — the request is rejected — but it causes your agent to stall until the limit resets.
Anthropic (Claude)
Marketplace
Free skills and AI personas for OpenClaw — browse the marketplace.
Browse the Marketplace →OpenAI
Reduce heartbeat frequency. If your agent does not need to respond within seconds, increase the heartbeat interval from 3 minutes to 5 or 10 minutes. This cuts API requests by 50-70%.
# In your OpenClaw config
heartbeat_interval: 300 # 5 minutes (in seconds)
# Default is usually 180 (3 minutes)
Batch related tasks. Instead of making 5 separate API calls for 5 emails, batch them into a single call that processes all 5 at once. This works well for classification tasks.
Use request queuing. OpenClaw supports a request queue that spaces out API calls to stay under rate limits. Enable it if you run high-volume workflows:
# In your OpenClaw config
rate_limit_queue:
enabled: true
max_requests_per_minute: 40 # Stay under your API tier limit
retry_on_429: true
retry_delay: 5s
The Cost Optimizer skill from the Remote OpenClaw marketplace automates the cost-saving strategies described in this guide. It monitors your token usage in real time and makes adjustments automatically.
# Install from the marketplace
cp cost-optimizer.md ~/.openclaw/skills/
# The skill activates automatically and begins tracking usage
# Send a message to your agent to verify:
"Show me today's API cost breakdown"
Most operators see a 15-25% cost reduction within the first week of running the Cost Optimizer, primarily from context pruning and waste detection identifying workflows that were consuming tokens unnecessarily.
Here is a real cost optimization case from a single-user OpenClaw deployment running email triage, daily briefings, CRM enrichment, and competitor monitoring.
Step 1: Model routing. Switched classification and formatting tasks to Haiku. This moved ~65% of API calls to a model that costs 12x less. Monthly impact: -$18.
Step 2: Heartbeat interval. Increased from 3 minutes to 5 minutes. Reduced heartbeat API calls from 480/day to 288/day. Monthly impact: -$5.
Step 3: Context pruning. Trimmed inactive skill definitions from the prompt. Reduced context from 12,000 to 8,000 tokens. Monthly impact: -$3.
Step 4: Local model for heartbeats. Moved heartbeat checks to Ollama (Llama 3 8B). Eliminated the single largest cost driver. Monthly impact: -$4.
Step 5: Spending cap. Set spend_cap_daily_usd: 1.00 with downgrade action. Catches any remaining waste.
That is a 70% reduction with zero loss of functionality. Every automation still runs. Every briefing still arrives on time. The agent just uses the right model for each task instead of sending everything to the most expensive option.
Anthropic offers prompt caching for Claude, which reduces costs for repeated context. If your SOUL.md and skill definitions are the same across most calls (they usually are), cached tokens cost 90% less than uncached tokens.
# In your OpenClaw config
prompt_caching:
enabled: true
cache_static_context: true # Cache SOUL.md and skill definitions
cache_ttl: 300 # Cache for 5 minutes
Prompt caching is the most underutilized cost optimization in the OpenClaw ecosystem. Operators who enable it report 25-40% cost reductions on top of model routing savings.
Large memory contexts inflate every API call. Compress your memory by summarizing older entries and archiving details that are not needed for real-time decisions:
# memory-management.md
compression_rules:
- entries older than 7 days → summarize to 1-2 sentences
- entries older than 30 days → archive (exclude from prompt)
- keep full detail for: active projects, client preferences, credentials
Run non-urgent workflows during off-peak hours when rate limits are less likely to be hit. Batch daily tasks into 2-3 consolidated runs instead of spreading them throughout the day:
# Consolidated schedule (fewer total API calls)
schedule:
morning_batch: "0 7 * * *" # Email triage + briefing + CRM check
midday_batch: "0 12 * * *" # Competitor monitoring + lead scoring
evening_batch: "0 18 * * *" # Invoice follow-up + meeting summaries
For the full cost optimization deep dive, see Reducing OpenClaw Token Costs (Up to 90% Cheaper).
OpenClaw runs autonomously 24/7, which means it makes API calls even when you are not actively using it. Heartbeat checks, scheduled workflows, memory retrieval, and multi-tool chains all consume tokens. A single daily briefing workflow might trigger 4-6 API calls across research, formatting, and delivery steps. The subscription chat interfaces (Claude Pro, ChatGPT Plus) absorb this cost into their monthly fee, but with OpenClaw you pay per token.
The cheapest production setup uses model routing (Claude Haiku for simple tasks, Sonnet for complex ones), a daily spending cap of $5, and Ollama for local model fallback on classification and formatting tasks. This combination typically costs $10-15 per month for a single-user deployment. Adding a VPS at $5-10 per month brings total costs to $15-25 per month.
Yes, but with trade-offs. Running OpenClaw entirely on local models via Ollama eliminates API costs but requires hardware capable of running 7B+ parameter models (minimum 16GB RAM, ideally a GPU). Local models handle classification, formatting, and simple Q&A well, but fall short on complex reasoning, long-form writing, and multi-step planning. Most operators use a hybrid approach: local models for 60-70% of tasks and cloud APIs for the rest.