Remote OpenClaw Blog

OpenClaw Rate Limits and Token Costs: How to Stay Under Budget

10 min read · 6 April 2026

Why OpenClaw Costs Spiral

Most operators expect OpenClaw to cost about the same as a Claude Pro subscription — $20/month. Then they get their first API bill and it is $40, $60, or $120. The gap between expectation and reality comes from three sources that are not obvious until you understand how OpenClaw uses tokens.

Heartbeat Checks

OpenClaw runs a heartbeat check every few minutes to maintain state and process incoming messages. Each heartbeat sends context to the LLM, even when no new messages have arrived. On a deployment with a large SOUL.md, 4-5 active skills, and persistent memory, a single heartbeat can consume 2,000-4,000 tokens.

At one heartbeat every 3 minutes, that is 480 heartbeats per day, consuming roughly 1-2 million tokens daily on context alone — before you even ask it to do anything.

Long Context Windows

Every conversation with OpenClaw includes the full SOUL.md, active skill definitions, memory context, and conversation history in the prompt. A typical deployment sends 8,000-15,000 tokens of context with every API call. When the LLM generates a 500-token response, you are paying for 15,500 tokens total — and only 500 of those are the actual answer.

Multi-Tool Chains

When OpenClaw performs a complex task, it chains multiple tool calls together. A single request like "research this competitor and update my CRM" might trigger:

Web browse to competitor site (API call 1)
Extract and summarize findings (API call 2)
Read current CRM record (API call 3)
Write updated CRM entry (API call 4)
Confirm completion via Telegram (API call 5)

Each call includes the full context. Five calls at 15,000 tokens of context each means 75,000 tokens consumed for what felt like a single request.

Cost by Model: The 2026 Pricing Table

Understanding per-token pricing is essential for budgeting. Here are the current rates for models commonly used with OpenClaw (as of April 2026):

Anthropic Claude Models

Claude Opus — $15.00 per million input tokens, $75.00 per million output tokens
Claude Sonnet — $3.00 per million input tokens, $15.00 per million output tokens
Claude Haiku — $0.25 per million input tokens, $1.25 per million output tokens

OpenAI Models

GPT-4o — $2.50 per million input tokens, $10.00 per million output tokens
GPT-4o-mini — $0.15 per million input tokens, $0.60 per million output tokens

Local Models (Ollama)

All models — $0.00 per token (runs on your own hardware)

The price difference between Opus and Haiku is 60x for input and 60x for output. That gap is where model routing delivers massive savings.

Model Routing: The Single Biggest Cost Saver

Model routing sends each task to the cheapest model capable of handling it. Simple tasks (classification, formatting, yes/no decisions) go to Haiku or a local model. Complex tasks (research, long-form writing, multi-step reasoning) go to Sonnet or Opus.

How to Configure Model Routing

# In your OpenClaw config
model_routing:
  default: claude-sonnet
  rules:
    - task_type: classification
      model: claude-haiku
    - task_type: formatting
      model: claude-haiku
    - task_type: simple_qa
      model: claude-haiku
    - task_type: summarization
      model: claude-sonnet
    - task_type: research
      model: claude-sonnet
    - task_type: long_form_writing
      model: claude-sonnet
    - task_type: complex_reasoning
      model: claude-opus
    - task_type: code_generation
      model: claude-sonnet

Impact on Costs

In a typical OpenClaw deployment, task distribution breaks down roughly as follows:

60-70% of tasks are simple (classification, formatting, routing) — these can run on Haiku
25-35% of tasks are moderate (summarization, research, writing) — these need Sonnet
5-10% of tasks are complex (multi-step reasoning, nuanced analysis) — these need Opus

Without model routing, all tasks go to your default model (usually Sonnet). With routing, 60-70% of your token spend drops by 12x. For a deployment spending $40/month on Sonnet across the board, routing cuts costs to approximately $16-22/month.

Spending Caps: Your Safety Net

Spending caps prevent runaway costs from infinite loops, misconfigured workflows, or unexpectedly verbose tasks. Without a cap, a single misconfigured cron job can burn through $50-100 in a single day before you notice.

How to Set a Spending Cap

# In your OpenClaw config
spend_cap_daily_usd: 5.00
spend_cap_monthly_usd: 100.00
spend_cap_action: pause  # Options: pause, notify, downgrade

# When cap is hit:
# pause → stops all API calls until next day/month
# notify → sends Telegram alert but continues running
# downgrade → switches all tasks to cheapest model

The downgrade action is the most practical choice for production deployments. When you hit your daily cap, OpenClaw switches to Haiku (or a local model if configured) for the rest of the day. Quality degrades for complex tasks, but the agent stays running for critical operations.

Setting the Right Cap

Start with spend_cap_daily_usd: 5.00 for a single-user deployment. Monitor your actual daily spend for two weeks, then adjust. Most operators settle between $2-8 per day depending on workload.

The monthly cap should be set at roughly 25x your daily cap (not 30x) to leave headroom for high-usage days without exceeding your budget.

Local Models With Ollama: $0 API Cost

Ollama lets you run open-source language models on your own hardware. For tasks that do not require frontier-model intelligence — classification, formatting, template filling, simple Q&A — local models perform adequately at zero marginal cost.

Setting Up Ollama With OpenClaw

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model (Llama 3 8B is a good starting point)
ollama pull llama3:8b

# Verify it's running
ollama list

Then configure OpenClaw to use Ollama for specific task types:

# In your OpenClaw config
model_routing:
  default: claude-sonnet
  rules:
    - task_type: classification
      model: ollama/llama3:8b
    - task_type: formatting
      model: ollama/llama3:8b
    - task_type: simple_qa
      model: ollama/llama3:8b
    - task_type: heartbeat
      model: ollama/llama3:8b

ollama:
  endpoint: http://localhost:11434
  timeout: 30s

Hardware Requirements

Minimum: 16GB RAM, modern CPU (no GPU required for 7B models)
Recommended: 32GB RAM or a GPU with 8GB+ VRAM for faster inference
Mac Mini M-series: runs 7B-13B models natively using Metal acceleration

What Local Models Handle Well

Email classification (urgent/normal/spam)
Data formatting and template filling
Simple yes/no decisions
Text extraction and parsing
Heartbeat processing (the single biggest cost saver)

What Local Models Struggle With

Complex multi-step reasoning
Long-form content writing (quality drops noticeably)
Nuanced conversation and personality
Tasks requiring large context windows (>4K tokens)

Moving heartbeat checks to a local model alone can reduce API costs by 30-40% because heartbeats account for a disproportionate share of total token usage.

Understanding Rate Limits

Rate limits cap how many API requests you can make per minute. Hitting a rate limit does not cost money — the request is rejected — but it causes your agent to stall until the limit resets.

Current Rate Limits (April 2026)

Anthropic (Claude)

Cost Optimizer

Cost Optimizer is the easiest first purchase when you want lower model spend without rebuilding your workflow stack.

Start With Cost Optimizer →Compare Best Fits →

Free tier: 5 requests per minute
Build tier: 50 requests per minute, 40,000 tokens per minute
Scale tier: 1,000 requests per minute, 200,000 tokens per minute

OpenAI

Tier 1: 60 requests per minute, 150,000 tokens per minute
Tier 2: 100 requests per minute, 300,000 tokens per minute
Tier 3+: higher limits based on spend history

How to Avoid Rate Limits

Reduce heartbeat frequency. If your agent does not need to respond within seconds, increase the heartbeat interval from 3 minutes to 5 or 10 minutes. This cuts API requests by 50-70%.

# In your OpenClaw config
heartbeat_interval: 300  # 5 minutes (in seconds)
# Default is usually 180 (3 minutes)

Batch related tasks. Instead of making 5 separate API calls for 5 emails, batch them into a single call that processes all 5 at once. This works well for classification tasks.

Use request queuing. OpenClaw supports a request queue that spaces out API calls to stay under rate limits. Enable it if you run high-volume workflows:

# In your OpenClaw config
rate_limit_queue:
  enabled: true
  max_requests_per_minute: 40  # Stay under your API tier limit
  retry_on_429: true
  retry_delay: 5s

OpenClaw API costs and budget key statistics — Key numbers to know

The Cost Optimizer Skill

The Cost Optimizer skill from the Remote OpenClaw marketplace automates the cost-saving strategies described in this guide. It monitors your token usage in real time and makes adjustments automatically.

What It Does

Usage tracking — logs every API call with token count, model used, and estimated cost
Automatic model downgrade — switches to cheaper models as you approach your daily cap
Context pruning — trims conversation history and inactive skill definitions from the prompt to reduce input tokens
Daily cost report — sends a Telegram summary with total spend, top cost drivers, and optimization recommendations
Waste detection — identifies repeated API calls, unnecessarily long prompts, and idle workflows consuming tokens

Installation

# Install from the marketplace
cp cost-optimizer.md ~/.openclaw/skills/

# The skill activates automatically and begins tracking usage
# Send a message to your agent to verify:
"Show me today's API cost breakdown"

Most operators see a 15-25% cost reduction within the first week of running the Cost Optimizer, primarily from context pruning and waste detection identifying workflows that were consuming tokens unnecessarily.

Real Example: $40/Month to $12/Month

Here is a real cost optimization case from a single-user OpenClaw deployment running email triage, daily briefings, CRM enrichment, and competitor monitoring.

Before Optimization

Model: Claude Sonnet for all tasks
Heartbeat interval: 3 minutes (default)
Context size: ~12,000 tokens per call (full SOUL.md + 5 skills + memory)
Daily API calls: ~600
Monthly cost: $40.20

Optimization Steps

Step 1: Model routing. Switched classification and formatting tasks to Haiku. This moved ~65% of API calls to a model that costs 12x less. Monthly impact: -$18.

Step 2: Heartbeat interval. Increased from 3 minutes to 5 minutes. Reduced heartbeat API calls from 480/day to 288/day. Monthly impact: -$5.

Step 3: Context pruning. Trimmed inactive skill definitions from the prompt. Reduced context from 12,000 to 8,000 tokens. Monthly impact: -$3.

Step 4: Local model for heartbeats. Moved heartbeat checks to Ollama (Llama 3 8B). Eliminated the single largest cost driver. Monthly impact: -$4.

Step 5: Spending cap. Set spend_cap_daily_usd: 1.00 with downgrade action. Catches any remaining waste.

After Optimization

Model routing: Haiku for 65% of tasks, Sonnet for 30%, Ollama for 5%
Heartbeat interval: 5 minutes, processed locally
Context size: ~8,000 tokens per call
Daily API calls (cloud): ~200
Monthly cost: $11.80

That is a 70% reduction with zero loss of functionality. Every automation still runs. Every briefing still arrives on time. The agent just uses the right model for each task instead of sending everything to the most expensive option.

Advanced Cost Strategies

Prompt Caching

Anthropic offers prompt caching for Claude, which reduces costs for repeated context. If your SOUL.md and skill definitions are the same across most calls (they usually are), cached tokens cost 90% less than uncached tokens.

# In your OpenClaw config
prompt_caching:
  enabled: true
  cache_static_context: true  # Cache SOUL.md and skill definitions
  cache_ttl: 300  # Cache for 5 minutes

Prompt caching is the most underutilized cost optimization in the OpenClaw ecosystem. Operators who enable it report 25-40% cost reductions on top of model routing savings.

Memory Compression

Large memory contexts inflate every API call. Compress your memory by summarizing older entries and archiving details that are not needed for real-time decisions:

# memory-management.md
compression_rules:
  - entries older than 7 days → summarize to 1-2 sentences
  - entries older than 30 days → archive (exclude from prompt)
  - keep full detail for: active projects, client preferences, credentials

Workflow Scheduling

Run non-urgent workflows during off-peak hours when rate limits are less likely to be hit. Batch daily tasks into 2-3 consolidated runs instead of spreading them throughout the day:

# Consolidated schedule (fewer total API calls)
schedule:
  morning_batch: "0 7 * * *"   # Email triage + briefing + CRM check
  midday_batch: "0 12 * * *"   # Competitor monitoring + lead scoring
  evening_batch: "0 18 * * *"  # Invoice follow-up + meeting summaries

For the full cost optimization deep dive, see Reducing OpenClaw Token Costs (Up to 90% Cheaper).

Frequently Asked Questions

Why does OpenClaw cost so much more than using Claude or ChatGPT directly?

OpenClaw runs autonomously 24/7, which means it makes API calls even when you are not actively using it. Heartbeat checks, scheduled workflows, memory retrieval, and multi-tool chains all consume tokens. A single daily briefing workflow might trigger 4-6 API calls across research, formatting, and delivery steps. The subscription chat interfaces (Claude Pro, ChatGPT Plus) absorb this cost into their monthly fee, but with OpenClaw you pay per token.

What is the cheapest way to run OpenClaw in production?

The cheapest production setup uses model routing (Claude Haiku for simple tasks, Sonnet for complex ones), a daily spending cap of $5, and Ollama for local model fallback on classification and formatting tasks. This combination typically costs $10-15 per month for a single-user deployment. Adding a VPS at $5-10 per month brings total costs to $15-25 per month.

Can I run OpenClaw entirely on local models with zero API cost?

Yes, but with trade-offs. Running OpenClaw entirely on local models via Ollama eliminates API costs but requires hardware capable of running 7B+ parameter models (minimum 16GB RAM, ideally a GPU). Local models handle classification, formatting, and simple Q&A well, but fall short on complex reasoning, long-form writing, and multi-step planning. Most operators use a hybrid approach: local models for 60-70% of tasks and cloud APIs for the rest.

Ready to choose the right OpenClaw workflow?

Cost OptimizerCost Optimizer is the easiest first purchase when you want lower model spend without rebuilding your workflow stack.Compare Best FitsUse the marketplace filters to choose the right bundle, persona, or skill without browsing blind.More GuidesBrowse 200+ free OpenClaw guides, tutorials, and comparisons.

Loading article