Remote OpenClaw Blog
How to Cut OpenClaw Bazaar Skill Token Costs by Up to 90%
7 min read ·
Every skill you install from OpenClaw Bazaar adds weight to your token bill. Not because the skills cost money to download, but because each active skill injects its instructions into every single API call your agent makes. A well-curated skills stack runs lean and fast. A bloated one turns every simple query into an expensive operation.
This guide covers the specific techniques that reduce skill-related token consumption by 80-90% without sacrificing the capabilities you installed those skills for in the first place.
How Bazaar Skills Actually Consume Your Tokens
Before optimizing, you need to understand the mechanics. When your agent processes a message, the API payload includes several components stacked together.
What gets sent with every request when skills are loaded:
- OpenClaw core system instructions (fixed overhead, roughly 800-1,200 tokens)
- Active skill definitions from your Bazaar installs (variable, 500-2,000 tokens per skill)
- Memory and persona configuration (variable, 1,000-5,000 tokens)
- Conversation history from the current session (grows over time, 2,000-20,000+ tokens)
- The user's actual message (typically 50-500 tokens)
If you have ten Bazaar skills loaded, the skill definitions alone might account for 8,000-15,000 tokens per request. That is context you pay for on every single API call, whether the current message needs those skills or not.
A developer with ten active skills processing 100 messages per day on Claude Sonnet pays roughly $2.40-4.50/day just in skill definition overhead. That adds up to $72-135/month in token costs for context that often goes unused.
Part 1: Aggressive Skill Lifecycle Management
The highest-impact optimization is brutally simple: stop loading skills you are not actively using.
The rule of three. At any given moment, most developers are doing one of three things: writing code, reviewing code, or debugging. Each activity needs a different set of skills. Instead of loading all your skills simultaneously, organize them into task-based groups and swap between them.
Create skill profiles. Set up three to four skill profiles that match your common workflows:
- Writing profile: Code generation skill, framework-specific skill (React, Python, etc.), testing skill
- Review profile: Code review skill, security audit skill, documentation checker
- Debug profile: Error analysis skill, logging skill, performance profiler
- Research profile: Web search skill, documentation lookup skill, comparison skill
Switch profiles when you switch tasks. This keeps your active skill count at three to four instead of ten to fifteen, cutting skill-related context by 60-75%.
Automate profile switching. Advanced users configure their agent to detect the current task type and load the appropriate skill profile automatically. A message starting with "review this PR" triggers the review profile. A message about "implement a new feature" triggers the writing profile. This removes the friction of manual switching.
Part 2: Skill Instruction Optimization
Most Bazaar skills ship with comprehensive instructions designed to cover every edge case. That thoroughness costs tokens on every invocation.
Audit your installed skills. Open each skill's configuration file and look for:
- Long example sections that demonstrate usage patterns you already understand
- Edge case handling for scenarios that do not apply to your codebase
- Verbose explanations of concepts you are already familiar with
- Duplicate instructions that overlap with other loaded skills
Trim without losing functionality. A skill with 1,800 tokens of instructions can often be condensed to 600-800 tokens by removing examples, shortening explanations, and cutting edge cases you never encounter. That is a 55-67% reduction in per-skill context cost.
Merge overlapping skills. Two separate skills that both include instructions about code style, error handling, or testing patterns create redundant context. Either consolidate them into a single custom skill or remove the overlapping sections from one.
Part 3: Model Routing by Skill Complexity
Different skills need different models. Paying premium rates for a formatting skill is like hiring a surgeon to apply a bandage.
Tier your skills by required model capability:
| Skill Category | Recommended Model Tier | Cost per Invocation |
|---|---|---|
| Formatting, linting, simple templates | Budget (DeepSeek V3, GPT-4o Mini) | $0.001-0.005 |
| Documentation, test generation, refactoring | Mid-tier (Claude Haiku, GPT-4o) | $0.005-0.02 |
| Architecture analysis, security review, complex reasoning | Premium (Claude Sonnet) | $0.02-0.06 |
Configure per-skill model assignment. OpenClaw supports routing different skills to different models. Set your formatting skill to always use DeepSeek. Set your security review skill to always use Claude Sonnet. Everything in between gets Claude Haiku.
This tiered approach typically reduces total model costs by 70-80% compared to running everything through a premium model. The quality difference for simple skills is negligible because budget models handle structured, well-defined tasks just as well as premium ones.
Marketplace
Free skills and AI personas for OpenClaw — browse the marketplace.
Browse the Marketplace →Part 4: Response Caching for Repetitive Skill Operations
Many skill invocations produce similar or identical results for similar inputs. A linting skill running on the same codebase patterns produces the same feedback repeatedly. A documentation skill generating boilerplate for similar function signatures produces near-identical output each time.
Enable response caching to store and reuse results from previous skill invocations. When a new request closely matches a cached one, the cached response is returned without making an API call at all.
Skills that benefit most from caching:
- Code formatters (identical input produces identical output)
- Boilerplate generators (similar patterns, similar output)
- FAQ and documentation lookups (stable content, repeated queries)
- Linting and style checking (same rules, similar feedback)
Skills where caching is less useful:
- Code review (unique code each time)
- Creative content generation (variety is the point)
- Real-time data analysis (stale cached data is worse than no cache)
With caching enabled, expect a 20-40% reduction in total API calls for a typical development workflow. That translates directly to 20-40% savings on model costs.
Part 5: MCP Server Token Optimization
MCP server skills connect your agent to databases, file systems, and external APIs. The token cost is not in the connection itself but in the data these servers feed back into the agent's context.
Set retrieval limits. If your database MCP server returns full row data by default, configure it to return only the columns your agent actually needs. Dropping from twenty columns to five relevant ones can reduce per-query context by 75%.
Use summary responses. Configure MCP server skills to return summarized results rather than raw data. Instead of dumping fifty rows into the context, have the MCP server return a count, key statistics, and the top five most relevant rows. The agent can request full details only when needed.
Implement result pagination. For large result sets, configure the MCP server to return paginated results. The agent processes one page at a time instead of loading everything into context simultaneously. This prevents single queries from consuming massive token allocations.
Part 6: Session Hygiene for Skill-Heavy Workflows
Skills amplify the session bloat problem. Each skill invocation adds its output to the conversation history. A session with fifty skill invocations might carry 30,000-50,000 tokens of history, most of which is no longer relevant.
Reset sessions after intensive skill operations. If you just ran a batch code review across twenty files, the entire review output sits in your conversation history. Start a fresh session before moving to a different task. The old context is dead weight you pay for on every subsequent message.
Use the compact command between skill-heavy tasks. If you need to preserve some context but the session has grown large, compaction summarizes the history into a fraction of its original size. A 50,000-token session often compresses to 5,000-8,000 tokens without losing meaningful information.
Instruct your agent to be concise when executing skills. Add instructions like "When executing skills, return only the actionable output. Do not explain what you are about to do or narrate the process." This eliminates verbose preambles that add tokens to every skill invocation without adding value.
What an Optimized Bazaar Skills Stack Looks Like
After applying these techniques, a well-optimized setup looks like this:
- 3-4 active skills loaded at any time, swapped via task-based profiles
- Per-skill model routing sending simple skills to budget models and complex skills to premium ones
- Response caching enabled for repetitive skill operations
- MCP server retrieval limits configured to return minimal necessary data
- Session resets after intensive skill operations
- Trimmed skill instructions condensed to essential content only
- Weekly usage review identifying which skills cost the most and whether they deliver proportional value
The combined effect of these optimizations is an 80-90% reduction in skill-related token costs. A setup that costs $135/month unoptimized drops to $15-25/month with the same skill capabilities and no meaningful quality loss on the tasks that matter.
Browse the Skills Directory
Find the right skill for your workflow. The OpenClaw Bazaar skills directory has over 2,300 community-rated skills — searchable, sortable, and free to install.
Try a Pre-Built Persona
Don't want to configure everything from scratch? OpenClaw personas come pre-loaded with skills, memory templates, and workflows designed for specific roles. Compare personas →