Remote OpenClaw

Remote OpenClaw Blog

Best Ollama Models for OpenClaw: Local LLMs Ranked and Tested [2026]

Published: ·Last Updated:
What changed

This post was reviewed and updated to reflect current deployment, security hardening, and operations guidance.

What should operators know about Best Ollama Models for OpenClaw: Local LLMs Ranked and Tested [2026]?

Answer: Not every Ollama model works with OpenClaw. Some are too small. Some can't handle tool calling. Some run out of context halfway through a task. This guide covers practical deployment decisions, security controls, and operations steps to run OpenClaw, ClawDBot, or MOLTBot reliably in production on your own VPS.

Updated: · Author: Zac Frulloni

The best Ollama models for OpenClaw in 2026, ranked by performance, speed, and hardware requirements. Includes benchmarks, RAM/VRAM needs, and which models to avoid.

Not every Ollama model works with OpenClaw. Some are too small. Some can't handle tool calling. Some run out of context halfway through a task.

OpenClaw is demanding. It injects 15-20K tokens of workspace files, skill descriptions, and memory into every single request before your actual prompt even begins. It needs reliable tool calling to execute actions. And it needs a context window of at least 64K tokens to function properly.

That narrows the field significantly.

This guide ranks the best Ollama models for OpenClaw based on real-world testing — covering performance, speed, hardware requirements, and which models to avoid entirely.


Marketplace

Free skills and AI personas for OpenClaw — deploy a pre-built agent in 15 minutes.

Browse the Marketplace →

Join the Community

Join 500+ OpenClaw operators sharing deployment guides, security configs, and workflow automations.

What Makes a Good OpenClaw Model?

OpenClaw requires models with reliable tool calling, 64K+ context windows, precise instruction following, and fast inference speeds to avoid timeouts during autonomous task execution.

Before the rankings, here's what OpenClaw actually needs from a model:

  1. Tool calling support — OpenClaw executes actions (read files, send emails, browse web, manage calendar) through tool calls. If the model can't reliably format and execute tool calls, it's useless for OpenClaw.
  2. 64K+ token context window — OpenClaw injects workspace files on every request. With a smaller context window, the model loses track of the conversation or truncates critical instructions.
  3. Instruction following — The model needs to follow OpenClaw's system prompts precisely, not hallucinate actions or deviate from the requested task.
  4. Speed — OpenClaw disables streaming by default for Ollama. The entire response must complete before anything is returned. Slow models cause timeouts.

The benchmark that matters most here is BFCL-V4 (Berkeley Function Calling Leaderboard) — it measures how reliably a model can handle tool/function calls, which is OpenClaw's primary interaction pattern.


How Do the Best Ollama Models for OpenClaw Rank?

The Qwen3.5 family dominates local model performance for OpenClaw in 2026, with the 27B dense model offering the best quality-to-size ratio and the 35B-A3B MoE variant leading on speed.

Tier 1: Best Overall Performance

Qwen3.5 27B — Best Quality-to-Size Ratio

SpecDetail
Parameters27B
VRAM Required~20 GB
Speed (RTX 4090)~40 tokens/sec
SWE-bench72.4%
BFCL-V472.2
Context Window128K

This is the model to beat. Qwen3.5 27B hits 72.4% on SWE-bench — the same range as GPT-5 Mini — while running on a single RTX 4090 or a 32 GB Apple Silicon Mac. Tool calling is reliable and consistent.

Best for: Developers with an RTX 4090, RTX 3090, or a Mac with 32 GB+ unified memory who want the closest thing to cloud-model quality running locally.

The catch: At 40 tokens/sec, it's not blazing fast. Complex queries with long outputs can bump against OpenClaw's timeout limits. But for most tasks, it's more than fast enough.

ollama pull qwen3.5:27b

Qwen3 Coder Plus 72B — Maximum Capability

SpecDetail
Parameters72B
VRAM Required48 GB+
Speed (RTX 4090)~25 tokens/sec
SWE-bench70.6%
Context Window128K

The most capable local model you can run for OpenClaw. Excels at complex coding tasks, multi-step reasoning, and long-context analysis. But it demands serious hardware — you need dual GPUs, an A100, or an M2 Ultra with 64 GB+ memory.

Best for: Power users with high-end hardware who need maximum local performance and can't or won't use cloud APIs.

The catch: Slow (25 t/s) and requires premium hardware. For most people, Qwen3.5 27B delivers 90% of the quality at half the hardware cost.

ollama pull qwen3-coder-plus:72b

Tier 2: Best Bang for the Buck

Qwen3.5 35B-A3B (MoE) — Fastest Local Option

SpecDetail
Parameters35B total (3B active per token)
VRAM Required~16 GB
Speed (RTX 3090)~112 tokens/sec
Context Window128K

This is the speed king. The 35B-A3B is a Mixture-of-Experts model that only activates 3 billion parameters per forward pass, despite having 35 billion total. The result is dramatically faster inference — 112 tokens/sec on an RTX 3090 — while using far less memory than its parameter count suggests.

Best for: Users who want the fastest possible local experience. Ideal for high-volume automation tasks where speed matters more than peak reasoning quality. Eliminates timeout issues that plague denser models.

The catch: Quality is slightly lower than the dense 27B model on complex reasoning tasks. But for routine OpenClaw work (email, calendar, file management, simple coding), the difference is negligible.

ollama pull qwen3.5:35b-a3b

GLM-4.7 Flash — The Official Default

SpecDetail
Parameters~14B
VRAM Required~12 GB
SpeedFast
Context Window128K

This is the model that OpenClaw and Ollama officially recommend as the default local option. Good balance of speed, capability, and hardware requirements. Reliable tool calling, solid instruction following.

Best for: First-time OpenClaw users, machines with 16 GB RAM/VRAM, anyone who wants the "it just works" option.

The catch: Noticeably less capable than the Qwen3.5 27B on complex tasks. Fine for daily automation, but you'll feel the gap on anything that requires sophisticated reasoning.

ollama pull glm-4.7-flash

Qwen3 32B — Balanced Performer

SpecDetail
Parameters32B
VRAM Required~24 GB
Speed (RTX 4090)~30 tokens/sec
Context Window128K

Solid all-rounder that sits between the 27B and 72B options. Good tool calling, reliable instruction following, handles multi-step tasks well.

Best for: Users with an RTX 4090 or 32 GB+ Apple Silicon who want a bit more headroom than the 27B model.

The catch: The Qwen3.5 27B outperforms it on SWE-bench despite being smaller. Unless you have a specific reason, the 27B is the better choice.

ollama pull qwen3:32b

Tier 3: Entry-Level Hardware

Qwen3.5 9B — Best for 8 GB VRAM

SpecDetail
Parameters9B
VRAM Required~8 GB
Speed (RTX 4090)~80 tokens/sec
Context Window128K

The best option if you're constrained to 8 GB VRAM (GTX 1080, RTX 3060, or a base-model Mac with 8 GB unified memory). Fast, handles basic tool calling, and stays within context limits.

Best for: Budget hardware, laptops, older GPUs. Good enough for basic automation tasks — email summaries, simple scheduling, file organization.

The catch: Struggles with complex multi-step reasoning. Tool calling works but is less reliable than the larger Qwen3.5 models. You'll notice quality limitations on anything beyond routine tasks.

ollama pull qwen3.5:9b

Llama 3.3 8B — Minimum Viable Agent

SpecDetail
Parameters8B
VRAM Required~8 GB
SpeedFast
Context Window128K

The best starting point if you want to test OpenClaw on minimal hardware. Handles general task instructions reliably and fits in 8 GB RAM.

Best for: Testing, experimentation, getting a feel for OpenClaw before committing to better hardware.

The catch: Tool calling is less consistent than Qwen models. Not recommended as a daily driver for serious automation. Upgrade to a Qwen3.5 model as soon as your hardware allows.

ollama pull llama3.3:8b

Cloud Models Through Ollama

If your hardware can't run local models well, Ollama also provides access to cloud-hosted models. These require ollama signin but still use the same Ollama interface.

ModelBest ForNotes
kimi-k2.5:cloudMultimodal reasoning1 trillion parameters. The most capable option through Ollama
minimax-m2.5:cloudFast productivityQuick responses for routine tasks
glm-5:cloudReasoning and codeStrong general-purpose option

Cloud models have per-token costs but no hardware requirements. Good fallback for complex tasks that local models can't handle.


What Hardware Do You Need for Each Ollama Model?

Your hardware determines which Ollama models you can run effectively with OpenClaw — from 8 GB entry-level setups to 64 GB+ configurations for maximum local capability.

Your HardwareBest ModelExpected Experience
8 GB VRAM / 8 GB MacQwen3.5 9BBasic automation, some limitations
12 GB VRAM / 16 GB MacGLM-4.7 FlashSolid daily driver for routine tasks
16 GB VRAM / 16 GB MacQwen3.5 35B-A3B (MoE)Fast and capable through sparse activation
20-24 GB VRAM / 32 GB MacQwen3.5 27BNear cloud-quality performance
48 GB+ VRAM / 64 GB+ MacQwen3 Coder Plus 72BMaximum local capability
Any hardwarekimi-k2.5:cloudBest performance, requires internet

Marketplace

4 AI personas and 7 free skills — browse the marketplace.

Browse Marketplace →

Which Ollama Models Should You Avoid for OpenClaw?

Not all popular Ollama models work well with OpenClaw — models under 7B parameters, older Mistral and Llama builds, and anything with less than 64K context should be skipped entirely.

  • Any model under 7B parameters — Cannot reliably handle OpenClaw's tool-calling requirements. Nanbeige4.1-3B scores okay on BFCL-V4 but falls apart in practice on multi-step tasks.
  • Older Mistral models — Inconsistent tool-calling format. Newer Mistral models are better but still not as reliable as Qwen for OpenClaw.
  • Older Llama models (pre-3.3) — Tool calling is unreliable. Llama 3.3+ is usable but Qwen remains the safer choice.
  • Reasoning-mode models without config changes — Models like DeepSeek-R1 can interfere with tool execution. Set "reasoning": false in your model config if you must use them.
  • Models with under 64K context — OpenClaw injects 15-20K tokens of workspace context on every request. Anything less than 64K leaves insufficient room for actual conversation.

How Should You Configure Ollama for Best OpenClaw Performance?

Proper Ollama configuration prevents the most common OpenClaw issues — use the native API endpoint, disable reasoning mode, match model names exactly, and monitor memory usage to avoid silent crashes.

1. Use the Native Ollama API

{
  "baseUrl": "http://localhost:11434"
}

Never use http://localhost:11434/v1. The OpenAI-compatible endpoint breaks tool calling with OpenClaw.

2. Disable Reasoning Mode

If using a model that supports reasoning/thinking modes:

{
  "reasoning": false
}

Reasoning mode generates internal chain-of-thought that can interfere with tool execution.

3. Match Model Names Exactly

Model names must match character-for-character between your Ollama installation and OpenClaw config. Run ollama list and copy-paste the exact name. A mismatch produces a confusing "model not allowed" error, not "model not found."

4. Monitor Memory Usage

OpenClaw's ~15-20K token context injection hits local models hard. If you're seeing crashes without clear error messages, it's likely OOM (out of memory). Drop to a smaller model or close memory-intensive background apps.


What Is the Best Hybrid Local and Cloud Strategy for OpenClaw?

The best real-world OpenClaw setup uses local models for routine tasks like email and calendar (60-70% of workload) and reserves cloud models for complex reasoning and long-context analysis (30-40%).

Task TypeModelWhy
Email triage, summariesLocal (GLM-4.7 Flash)Fast, private, no cost
Calendar managementLocal (GLM-4.7 Flash)Routine task, doesn't need peak intelligence
File organizationLocal (Qwen3.5 9B+)Simple operations, fast execution
Web researchCloud (kimi-k2.5)Needs strong reasoning for synthesis
Complex coding tasksCloud (Claude/GPT-4)Local models struggle with multi-file refactors
Long-context analysisCloud (Claude/GPT-4)Quality degrades past 32K context on consumer hardware

Configure multiple providers in OpenClaw and route tasks to the right model. Use local models for the 60-70% of tasks that are routine. Save cloud models (and their costs) for the 30-40% that actually need them.


Frequently Asked Questions

What's the single best model for OpenClaw with Ollama?

Qwen3.5 27B if your hardware supports it (~20 GB VRAM). Best overall quality with reliable tool calling. If you can't run it, GLM-4.7 Flash is the safe default.

Can I run multiple models simultaneously?

Yes, but each model consumes VRAM. On a 24 GB GPU, you can run one 20 GB model or two smaller ones. Ollama handles model loading and unloading automatically, but switching models takes a few seconds.

Do quantized models work with OpenClaw?

Yes. Most Ollama models are already quantized (Q4_K_M is common). Heavier quantization (Q2, Q3) saves memory but degrades tool-calling reliability. Stick with Q4 or higher for OpenClaw.

How much does running local models cost in electricity?

Roughly $0.10-$0.50 per day for a desktop GPU running moderate workloads. Far less than cloud API costs for equivalent usage.

Will local models ever match cloud models for OpenClaw?

They're closing the gap fast. Qwen3.5 27B already matches GPT-5 Mini on SWE-bench. But cloud models like Claude Opus 4.6 still significantly outperform on complex multi-step reasoning. For routine OpenClaw tasks, the gap is already negligible.

Can I fine-tune a model specifically for OpenClaw?

Technically possible but not recommended. OpenClaw's prompting and tool-calling format changes with updates, which would break fine-tuned behavior. Better to use the best general-purpose model with strong tool calling.

How often should I update my models?

Check monthly. The Qwen and GLM families release frequent updates with improved tool calling. Run ollama pull <model> to update to the latest version.


What Is the Bottom Line on Ollama Models for OpenClaw?

For most OpenClaw users running Ollama locally in 2026, the Qwen3.5 family offers the best combination of tool-calling reliability, context handling, and speed-to-quality ratio available.

  • If you have 20+ GB VRAM: Run Qwen3.5 27B. It's the closest thing to cloud-quality AI running on your own hardware.
  • If you want maximum speed: Run Qwen3.5 35B-A3B. The MoE architecture gives you 112 tokens/sec with less memory than you'd expect.
  • If you're on budget hardware: Start with GLM-4.7 Flash or Qwen3.5 9B. Both are usable and free.
  • If quality matters most: Use cloud models through Ollama (kimi-k2.5:cloud) or configure a cloud provider (Claude/GPT-4) for complex tasks.

Pick the model that fits your hardware, start with routine tasks, and upgrade when you hit the ceiling.


Last updated: March 2026