Remote OpenClaw

Remote OpenClaw Blog

Best Ollama Models for Running OpenClaw Bazaar Skills Locally: Ranked and Tested

6 min read ·

Running marketplace skills from OpenClaw Bazaar on local hardware with Ollama means zero API costs, complete data privacy, and no rate limits. The trade-off is that not every local model can handle skill execution reliably. Skills depend on tool calling, large context windows, and precise instruction following — capabilities that separate the best local models from the rest.

This guide ranks the Ollama models that actually work for Bazaar skill execution, based on real-world testing across dozens of skill categories. We cover hardware requirements, speed expectations, and which models to avoid entirely.

What Marketplace Skills Demand From a Local Model

OpenClaw Bazaar skills are more demanding than general chat. Every skill execution involves injecting 15-20K tokens of workspace context, skill definitions, and memory into the prompt before the actual task even begins. The model then needs to follow structured instructions and execute tool calls accurately.

Four requirements separate usable models from unusable ones:

  1. Tool calling support — Bazaar skills trigger actions through function calls. If the model cannot format tool invocations reliably, skills fail silently or produce garbage outputs.
  2. 64K+ token context window — With 15-20K tokens of injected context plus the skill definition plus conversation history, anything less than 64K truncates critical information.
  3. Precise instruction adherence — Skills contain step-by-step instructions. Models that paraphrase, skip steps, or hallucinate actions break the workflow.
  4. Adequate inference speed — Streaming is disabled for Ollama by default in skill execution. The full response must complete before the next step begins. Slow models cause timeout failures.

The benchmark that matters most for skill compatibility is BFCL-V4 (Berkeley Function Calling Leaderboard), which measures tool-call reliability — the core interaction pattern for every Bazaar skill.

Tier 1: Premium Skill Execution

Qwen3.5 27B — Highest Quality for Skill Workflows

SpecDetail
VRAM Required~20 GB
Speed (RTX 4090)~40 tokens/sec
Context Window128K
Skill CompatibilityExcellent

This is the best local model for running Bazaar skills in 2026. Qwen3.5 27B handles complex multi-step skills — research pipelines, code review workflows, document analysis chains — with reliability that approaches cloud models. Its tool calling accuracy on BFCL-V4 (72.2) means marketplace skills that chain three or four tool invocations execute cleanly.

Best for: Operators with an RTX 4090, RTX 3090, or a 32 GB Apple Silicon Mac who want to run the full Bazaar skill catalog locally without quality compromises.

ollama pull qwen3.5:27b

Qwen3 Coder Plus 72B — Maximum Local Capability

SpecDetail
VRAM Required48 GB+
Speed (RTX 4090)~25 tokens/sec
Context Window128K
Skill CompatibilityExcellent

The most powerful model you can run locally for Bazaar skills. Excels at coding skills, multi-file refactoring workflows, and skills that require synthesizing information from multiple sources. Demands serious hardware — dual GPUs or an M2 Ultra with 64 GB+ memory.

Best for: Power users running the most compute-intensive skills in the directory, particularly code-focused and research-heavy skill categories.

ollama pull qwen3-coder-plus:72b

Tier 2: Best Value for Everyday Skills

Qwen3.5 35B-A3B (MoE) — Speed King for High-Volume Skills

SpecDetail
VRAM Required~16 GB
Speed (RTX 3090)~112 tokens/sec
Context Window128K
Skill CompatibilityVery Good

The Mixture-of-Experts architecture activates only 3B parameters per forward pass despite having 35B total. The result is blazing inference speed — 112 tokens/sec on an RTX 3090. For skills that run frequently throughout the day (email triage, message classification, quick summaries), this model eliminates timeout issues and feels nearly instant.

Best for: Operators running high-volume skill workloads where speed matters more than peak reasoning depth. Ideal for personas that bundle many lightweight skills.

ollama pull qwen3.5:35b-a3b

GLM-4.7 Flash — The Reliable Default

SpecDetail
VRAM Required~12 GB
SpeedFast
Context Window128K
Skill CompatibilityGood

This is the officially recommended starting model for first-time Bazaar skill users. Solid tool calling, reliable instruction following, and modest hardware requirements. It handles the majority of marketplace skills — productivity workflows, writing assistants, simple automation — without issues.

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Best for: Getting started with Bazaar skills on machines with 16 GB RAM/VRAM. The "it just works" choice when you want to install a few skills and start seeing results immediately.

ollama pull glm-4.7-flash

Tier 3: Entry-Level Skill Execution

Qwen3.5 9B — Minimum Hardware, Maximum Reach

SpecDetail
VRAM Required~8 GB
Speed (RTX 4090)~80 tokens/sec
Context Window128K
Skill CompatibilityModerate

The best option for 8 GB VRAM setups. Handles basic Bazaar skills — scheduling, file organization, simple email processing — acceptably. Struggles with skills that chain multiple complex tool calls or require nuanced reasoning.

Best for: Testing marketplace skills on budget hardware before deciding whether to invest in a more powerful setup.

ollama pull qwen3.5:9b

Models to Avoid for Bazaar Skills

Not every popular Ollama model works with marketplace skills. Skip these:

  • Any model under 7B parameters — Cannot handle the tool calling and context injection that skills require.
  • Older Mistral and Llama models (pre-3.3) — Inconsistent tool call formatting breaks skill workflows.
  • Reasoning-mode models without configuration changes — DeepSeek-R1 and similar models generate internal chain-of-thought that interferes with tool execution. Set "reasoning": false in your config if you must use them.
  • Models with under 64K context — The 15-20K token context injection from skill definitions leaves insufficient room for the actual task.
  • Heavily quantized models (Q2, Q3) — Save memory but degrade tool calling reliability below usable thresholds for skills.

Hardware Guide for Skill Workloads

Your HardwareBest ModelSkill Experience
8 GB VRAM / 8 GB MacQwen3.5 9BBasic skills, some limitations
12 GB VRAM / 16 GB MacGLM-4.7 FlashSolid daily skill execution
16 GB VRAM / 16 GB MacQwen3.5 35B-A3B (MoE)Fast, handles most skill categories
20-24 GB VRAM / 32 GB MacQwen3.5 27BNear cloud-quality skill execution
48 GB+ VRAM / 64 GB+ MacQwen3 Coder Plus 72BFull Bazaar catalog, no compromises

Configuration Tips for Skill Reliability

Proper Ollama configuration prevents the most common skill failures:

Use the native Ollama API endpoint:

{
  "baseUrl": "http://localhost:11434"
}

Never use the /v1 endpoint — the OpenAI-compatible layer breaks tool calling with Bazaar skills.

Disable reasoning mode for models that support it. Reasoning chains interfere with tool execution in skill workflows.

Match model names exactly. Run ollama list and copy the precise name. A mismatch produces a confusing "model not allowed" error rather than a helpful "model not found" message.

The Hybrid Strategy: Local Skills Plus Cloud Fallback

The most effective Bazaar skill setup uses local models for routine skills (60-70% of your workload) and routes complex skills to cloud models (30-40%):

Skill TypeModelRationale
Email triage, schedulingLocal (GLM-4.7 Flash)Fast, private, zero cost
File management, taggingLocal (Qwen3.5 9B+)Simple operations, instant
Research synthesisCloud (Claude/Gemini)Needs strong reasoning
Multi-file code refactoringCloud (Claude/GPT-5)Local models struggle with complex code skills
Long-document analysisCloud (Gemini 2.5 Pro)Benefits from 1M context

This hybrid approach gives you the cost savings and privacy of local execution for everyday skills, with cloud quality available on demand for the skills that genuinely need it.


Browse the Skills Directory

Find the right skill for your workflow. The OpenClaw Bazaar skills directory has over 2,300 community-rated skills — searchable, sortable, and free to install.

Browse Skills →

Try a Pre-Built Persona

Don't want to configure everything from scratch? OpenClaw personas come pre-loaded with skills, memory templates, and workflows designed for specific roles. Compare personas →