Remote OpenClaw Blog
Best Ollama Models for Running OpenClaw Bazaar Skills Locally: Ranked and Tested
6 min read ·
Running marketplace skills from OpenClaw Bazaar on local hardware with Ollama means zero API costs, complete data privacy, and no rate limits. The trade-off is that not every local model can handle skill execution reliably. Skills depend on tool calling, large context windows, and precise instruction following — capabilities that separate the best local models from the rest.
This guide ranks the Ollama models that actually work for Bazaar skill execution, based on real-world testing across dozens of skill categories. We cover hardware requirements, speed expectations, and which models to avoid entirely.
What Marketplace Skills Demand From a Local Model
OpenClaw Bazaar skills are more demanding than general chat. Every skill execution involves injecting 15-20K tokens of workspace context, skill definitions, and memory into the prompt before the actual task even begins. The model then needs to follow structured instructions and execute tool calls accurately.
Four requirements separate usable models from unusable ones:
- Tool calling support — Bazaar skills trigger actions through function calls. If the model cannot format tool invocations reliably, skills fail silently or produce garbage outputs.
- 64K+ token context window — With 15-20K tokens of injected context plus the skill definition plus conversation history, anything less than 64K truncates critical information.
- Precise instruction adherence — Skills contain step-by-step instructions. Models that paraphrase, skip steps, or hallucinate actions break the workflow.
- Adequate inference speed — Streaming is disabled for Ollama by default in skill execution. The full response must complete before the next step begins. Slow models cause timeout failures.
The benchmark that matters most for skill compatibility is BFCL-V4 (Berkeley Function Calling Leaderboard), which measures tool-call reliability — the core interaction pattern for every Bazaar skill.
Tier 1: Premium Skill Execution
Qwen3.5 27B — Highest Quality for Skill Workflows
| Spec | Detail |
|---|---|
| VRAM Required | ~20 GB |
| Speed (RTX 4090) | ~40 tokens/sec |
| Context Window | 128K |
| Skill Compatibility | Excellent |
This is the best local model for running Bazaar skills in 2026. Qwen3.5 27B handles complex multi-step skills — research pipelines, code review workflows, document analysis chains — with reliability that approaches cloud models. Its tool calling accuracy on BFCL-V4 (72.2) means marketplace skills that chain three or four tool invocations execute cleanly.
Best for: Operators with an RTX 4090, RTX 3090, or a 32 GB Apple Silicon Mac who want to run the full Bazaar skill catalog locally without quality compromises.
ollama pull qwen3.5:27b
Qwen3 Coder Plus 72B — Maximum Local Capability
| Spec | Detail |
|---|---|
| VRAM Required | 48 GB+ |
| Speed (RTX 4090) | ~25 tokens/sec |
| Context Window | 128K |
| Skill Compatibility | Excellent |
The most powerful model you can run locally for Bazaar skills. Excels at coding skills, multi-file refactoring workflows, and skills that require synthesizing information from multiple sources. Demands serious hardware — dual GPUs or an M2 Ultra with 64 GB+ memory.
Best for: Power users running the most compute-intensive skills in the directory, particularly code-focused and research-heavy skill categories.
ollama pull qwen3-coder-plus:72b
Tier 2: Best Value for Everyday Skills
Qwen3.5 35B-A3B (MoE) — Speed King for High-Volume Skills
| Spec | Detail |
|---|---|
| VRAM Required | ~16 GB |
| Speed (RTX 3090) | ~112 tokens/sec |
| Context Window | 128K |
| Skill Compatibility | Very Good |
The Mixture-of-Experts architecture activates only 3B parameters per forward pass despite having 35B total. The result is blazing inference speed — 112 tokens/sec on an RTX 3090. For skills that run frequently throughout the day (email triage, message classification, quick summaries), this model eliminates timeout issues and feels nearly instant.
Best for: Operators running high-volume skill workloads where speed matters more than peak reasoning depth. Ideal for personas that bundle many lightweight skills.
ollama pull qwen3.5:35b-a3b
GLM-4.7 Flash — The Reliable Default
| Spec | Detail |
|---|---|
| VRAM Required | ~12 GB |
| Speed | Fast |
| Context Window | 128K |
| Skill Compatibility | Good |
This is the officially recommended starting model for first-time Bazaar skill users. Solid tool calling, reliable instruction following, and modest hardware requirements. It handles the majority of marketplace skills — productivity workflows, writing assistants, simple automation — without issues.
Marketplace
Free skills and AI personas for OpenClaw — browse the marketplace.
Browse the Marketplace →Best for: Getting started with Bazaar skills on machines with 16 GB RAM/VRAM. The "it just works" choice when you want to install a few skills and start seeing results immediately.
ollama pull glm-4.7-flash
Tier 3: Entry-Level Skill Execution
Qwen3.5 9B — Minimum Hardware, Maximum Reach
| Spec | Detail |
|---|---|
| VRAM Required | ~8 GB |
| Speed (RTX 4090) | ~80 tokens/sec |
| Context Window | 128K |
| Skill Compatibility | Moderate |
The best option for 8 GB VRAM setups. Handles basic Bazaar skills — scheduling, file organization, simple email processing — acceptably. Struggles with skills that chain multiple complex tool calls or require nuanced reasoning.
Best for: Testing marketplace skills on budget hardware before deciding whether to invest in a more powerful setup.
ollama pull qwen3.5:9b
Models to Avoid for Bazaar Skills
Not every popular Ollama model works with marketplace skills. Skip these:
- Any model under 7B parameters — Cannot handle the tool calling and context injection that skills require.
- Older Mistral and Llama models (pre-3.3) — Inconsistent tool call formatting breaks skill workflows.
- Reasoning-mode models without configuration changes — DeepSeek-R1 and similar models generate internal chain-of-thought that interferes with tool execution. Set
"reasoning": falsein your config if you must use them. - Models with under 64K context — The 15-20K token context injection from skill definitions leaves insufficient room for the actual task.
- Heavily quantized models (Q2, Q3) — Save memory but degrade tool calling reliability below usable thresholds for skills.
Hardware Guide for Skill Workloads
| Your Hardware | Best Model | Skill Experience |
|---|---|---|
| 8 GB VRAM / 8 GB Mac | Qwen3.5 9B | Basic skills, some limitations |
| 12 GB VRAM / 16 GB Mac | GLM-4.7 Flash | Solid daily skill execution |
| 16 GB VRAM / 16 GB Mac | Qwen3.5 35B-A3B (MoE) | Fast, handles most skill categories |
| 20-24 GB VRAM / 32 GB Mac | Qwen3.5 27B | Near cloud-quality skill execution |
| 48 GB+ VRAM / 64 GB+ Mac | Qwen3 Coder Plus 72B | Full Bazaar catalog, no compromises |
Configuration Tips for Skill Reliability
Proper Ollama configuration prevents the most common skill failures:
Use the native Ollama API endpoint:
{
"baseUrl": "http://localhost:11434"
}
Never use the /v1 endpoint — the OpenAI-compatible layer breaks tool calling with Bazaar skills.
Disable reasoning mode for models that support it. Reasoning chains interfere with tool execution in skill workflows.
Match model names exactly. Run ollama list and copy the precise name. A mismatch produces a confusing "model not allowed" error rather than a helpful "model not found" message.
The Hybrid Strategy: Local Skills Plus Cloud Fallback
The most effective Bazaar skill setup uses local models for routine skills (60-70% of your workload) and routes complex skills to cloud models (30-40%):
| Skill Type | Model | Rationale |
|---|---|---|
| Email triage, scheduling | Local (GLM-4.7 Flash) | Fast, private, zero cost |
| File management, tagging | Local (Qwen3.5 9B+) | Simple operations, instant |
| Research synthesis | Cloud (Claude/Gemini) | Needs strong reasoning |
| Multi-file code refactoring | Cloud (Claude/GPT-5) | Local models struggle with complex code skills |
| Long-document analysis | Cloud (Gemini 2.5 Pro) | Benefits from 1M context |
This hybrid approach gives you the cost savings and privacy of local execution for everyday skills, with cloud quality available on demand for the skills that genuinely need it.
Browse the Skills Directory
Find the right skill for your workflow. The OpenClaw Bazaar skills directory has over 2,300 community-rated skills — searchable, sortable, and free to install.
Try a Pre-Built Persona
Don't want to configure everything from scratch? OpenClaw personas come pre-loaded with skills, memory templates, and workflows designed for specific roles. Compare personas →