Remote OpenClaw Blog
GLM-5.1 on OpenClaw: Setup, Benchmarks, and What Changed
6 min read ·
GLM-5.1 is Z.ai's (formerly Zhipu AI) latest open-source model, released on April 7, 2026, scoring 58.4 on SWE-bench Pro — a new state-of-the-art result that outperforms GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on that benchmark. The model improves coding performance 28% over GLM-5 through post-training optimization alone, using the same 744-billion-parameter Mixture-of-Experts architecture. For the complete GLM model lineup, see our Best GLM Models for OpenClaw guide.
What Changed from GLM-5 to GLM-5.1
GLM-5.1 is a post-training upgrade, not an architectural change. The base model is identical — 744 billion total parameters in a Mixture-of-Experts layout with 40 billion active parameters, a 200K token context window, and 131K token maximum output. According to Z.ai's developer documentation, all improvements come from Z.ai's "progressive alignment" pipeline: multi-task supervised fine-tuning, multi-stage reinforcement learning, and cross-stage distillation.
The improvements are concentrated in coding and agentic tasks. Reasoning benchmarks are essentially flat. The key practical gains include stronger self-debugging loops (runs linters, catches errors, iterates until tasks are complete), better planning (absorbs full context before generating code), and more reliable interleaved thinking for long-horizon tasks.
GLM-5.1 can work autonomously on a single task for up to 8 hours — completing the full cycle from planning through execution, testing, fixing, and delivery. According to The Decoder, it is the first Chinese model to reach this sustained execution capability.
Benchmark Comparison
GLM-5.1 shows substantial improvements over GLM-5 on coding benchmarks while maintaining comparable performance on reasoning tasks. The biggest jump is CyberGym at +20.4 points.
| Benchmark | GLM-5.1 | GLM-5 | Change | Claude Opus 4.6 |
|---|---|---|---|---|
| SWE-bench Pro | 58.4 | — | New SOTA | Lower than 58.4 |
| SWE-bench Verified | 77.8% | — | — | 80.8% |
| Z.ai Coding Eval | 45.3 | 35.4 | +28% | 48.1 |
| CyberGym | — | — | +20.4 pts | 66.6% |
| Terminal-Bench 2.0 | — | — | +7.3 pts | — |
| NL2Repo | — | — | +6.8 pts | — |
| Context window | 200K | 200K | No change | 200K |
| Max output | 131K | 131K | No change | 128K |
At 77.8% SWE-bench Verified, GLM-5.1 is within 3 points of Claude Opus 4.6 (80.8%) while costing roughly 80% less per token. On SWE-bench Pro — which tests harder, multi-file issues — GLM-5.1 at 58.4 outperforms all major closed-source competitors.
OpenClaw Setup
GLM-5.1 works with OpenClaw through three routes: the Z.ai API directly, OpenRouter, or local deployment with Ollama. The Z.ai API provides the lowest latency; OpenRouter provides the simplest billing.
Option 1: Z.ai API (Direct)
Run openclaw onboard to launch the setup wizard. Select a custom/OpenAI-compatible provider and configure as follows:
# Via the setup wizard
openclaw onboard
# When prompted:
# Provider: Custom (OpenAI-compatible)
# API Base URL: https://api.z.ai/v1
# API Key: your Z.ai API key
# Model: glm-5.1
For manual configuration, edit ~/.openclaw/openclaw.json:
{
"models": {
"providers": {
"zai": {
"baseUrl": "https://api.z.ai/v1",
"apiKey": "YOUR_ZAI_API_KEY",
"models": ["glm-5.1"]
}
}
}
}
Option 2: OpenRouter
If you already use OpenRouter, GLM-5.1 is available as z-ai/glm-5.1:
openclaw onboard
# Provider: OpenRouter
# Model: z-ai/glm-5.1
Option 3: Local with Ollama
GLM-5.1 weights are available on Ollama, but running the full model locally requires substantial hardware — at minimum 32GB RAM for a q4 quantization of the 40B active parameters:
Best Next Step
Use the marketplace filters to choose the right OpenClaw bundle, persona, or skill for the job you want to automate.
ollama pull glm-5.1
openclaw onboard
# Provider: Ollama
# Model: glm-5.1
Pricing and Provider Options
GLM-5.1 is one of the most cost-effective frontier-adjacent models available as of April 2026. The open-source weights (MIT license) also allow self-hosting at zero marginal token cost.
| Provider | Input (per 1M tokens) | Output (per 1M tokens) | Notes |
|---|---|---|---|
| Z.ai API (direct) | Varies by plan | Varies by plan | Coding Plan from $3/mo (promo) or $10/mo (standard) |
| OpenRouter | $0.95 | $3.15 | Pay-per-token, no subscription |
| Self-hosted (Ollama/HuggingFace) | $0 (hardware cost only) | $0 (hardware cost only) | MIT license, 32GB+ RAM for q4 |
| Claude Opus 4.6 (comparison) | $5.00 | $25.00 | 5-8x more expensive per token |
At OpenRouter rates, GLM-5.1 delivers roughly 94% of Claude Opus 4.6's coding performance (measured by independent benchmarks) at approximately 80% lower cost per token.
Migrating from GLM-5
If you are already running GLM-5 on OpenClaw, migrating to GLM-5.1 requires only a model name change. The API is compatible — no prompt format changes or configuration restructuring needed.
Quick Migration Steps
- Open
~/.openclaw/openclaw.json - Change
"glm-5"to"glm-5.1"in your model configuration - Restart OpenClaw
# Verify the model is active
openclaw status
# Should show: Active model: glm-5.1
Z.ai maintains backward compatibility between GLM-5 and GLM-5.1 at the API level. System prompts, tool definitions, and skills that work with GLM-5 will work with GLM-5.1 without modification.
One behavioral change to expect: GLM-5.1 tends to plan more before generating code. This means slightly longer time-to-first-token but better results on complex, multi-file tasks. For short, simple tasks, the difference is negligible.
Limitations and Tradeoffs
GLM-5.1 is a strong coding model, but it has real constraints that matter for production use.
It is text-only — no vision or audio support. If your workflow requires image understanding or multimodal inputs, you need a different model.
English-language quality, while improved, still occasionally produces awkward phrasing on complex English tasks. The model's training corpus has a larger proportion of Chinese-language data, which makes it excellent for Chinese or multilingual workflows but slightly behind Western models on English nuance.
Inference speed on OpenRouter tops out around 70 tokens per second, which is slower than some competing models. For latency-sensitive applications, this matters.
The 28% coding improvement is real but concentrated. Reasoning benchmarks are essentially flat between GLM-5 and GLM-5.1. If your primary use case is not coding or agentic tasks, the upgrade provides minimal benefit.
When NOT to upgrade: if you need multimodal support, if your tasks are primarily reasoning or creative writing (not coding), or if you are cost-constrained and GLM-5 already meets your quality bar.
Related Guides
- GLM-5 on OpenClaw: Setup Guide
- Best GLM Models for OpenClaw
- Best Chinese Models for OpenClaw
- Best Cheap Models for OpenClaw
FAQ
Is GLM-5.1 better than Claude Opus 4.6 for coding?
On SWE-bench Pro, GLM-5.1 scores 58.4 which outperforms Claude Opus 4.6. On SWE-bench Verified, Opus 4.6 leads at 80.8% vs 77.8%. Overall, GLM-5.1 delivers roughly 94% of Opus 4.6's coding performance at about 80% lower cost, making it an excellent choice for cost-conscious teams.
Can I run GLM-5.1 locally?
Yes. The weights are available on HuggingFace under the MIT license and on Ollama. Running locally requires at least 32GB RAM for a q4 quantization. The full model at FP16 requires significantly more — a multi-GPU server is recommended for production local deployment.
What is the difference between GLM-5 and GLM-5.1?
GLM-5.1 uses the same 744B MoE architecture as GLM-5 but with improved post-training alignment focused on coding tasks. The result is a 28% improvement on Z.ai's coding eval (45.3 vs 35.4), with the biggest gains in CyberGym (+20.4 pts), Terminal-Bench 2.0 (+7.3), and NL2Repo (+6.8). Reasoning benchmarks are essentially unchanged.
How much does GLM-5.1 cost on OpenRouter?
GLM-5.1 costs $0.95 per million input tokens and $3.15 per million output tokens on OpenRouter as of April 2026. Z.ai also offers a Coding Plan starting at $3/month (promotional) or $10/month (standard) for direct API access.
Do my existing OpenClaw skills work with GLM-5.1?
Yes. GLM-5.1 is API-compatible with GLM-5. System prompts, tool definitions, and skills that work with GLM-5 will work with GLM-5.1 without modification. Migration requires only changing the model name in your configuration file.
Frequently Asked Questions
Is GLM-5.1 better than Claude Opus 4.6 for coding?
On SWE-bench Pro, GLM-5.1 scores 58.4 which outperforms Claude Opus 4.6. On SWE-bench Verified, Opus 4.6 leads at 80.8% vs 77.8%. Overall, GLM-5.1 delivers roughly 94% of Opus 4.6's coding performance at about 80% lower cost, making it an excellent choice for cost-conscious teams.
Can I run GLM-5.1 locally?
Yes. The weights are available on HuggingFace under the MIT license and on Ollama. Running locally requires at least 32GB RAM for a q4 quantization. The full model at FP16 requires significantly more — a multi-GPU server is recommended for production local deployment.
What is the difference between GLM-5 and GLM-5.1?
GLM-5.1 uses the same 744B MoE architecture as GLM-5 but with improved post-training alignment focused on coding tasks. The result is a 28% improvement on Z.ai's coding eval (45.3 vs 35.4), with the biggest gains in CyberGym (+20.4 pts), Terminal-Bench 2.0 (+7.3), and NL2Repo (+6.8). Reasoning benchmarks are essentially unchanged.