Remote OpenClaw Blog
GLM-5 on OpenClaw: Setup Guide, Benchmarks, and When to Use It
8 min read ·
Remote OpenClaw Blog
8 min read ·
GLM-5 is the flagship large language model from Zhipu AI, a Beijing-based AI research lab that has been building the GLM (General Language Model) series since 2022. Released in February 2026 under the MIT license, GLM-5 is one of the largest open-weight Mixture of Experts models publicly available — 744 billion total parameters with 40 billion active per inference pass.
What makes GLM-5 notable beyond its raw size is the hardware story. It was trained entirely on Huawei Ascend 910B chips, making it the highest-performing model trained without any NVIDIA hardware. For operators tracking the AI hardware supply chain, this is a meaningful data point about what non-NVIDIA silicon can deliver.
For OpenClaw operators, GLM-5 offers a practical middle ground: frontier-class coding and math performance at a fraction of the cost of Claude or GPT-4. It is available through three different integration paths, making it one of the more accessible open models for agent backends.
Official links:
GLM-5 uses a Mixture of Experts (MoE) architecture, which means the model has 744 billion total parameters but only activates approximately 40 billion on each forward pass. This design gives GLM-5 the knowledge capacity of a much larger model while keeping inference costs manageable.
| Specification | Value |
|---|---|
| Total Parameters | 744 billion |
| Active Parameters | ~40 billion per forward pass |
| Architecture | Mixture of Experts (MoE) |
| Training Hardware | Huawei Ascend 910B |
| Release Date | February 2026 |
| License | MIT |
| Developer | Zhipu AI (Z.ai) |
| Modalities | Text only |
| Context Window | 128K tokens |
The MIT license is the most permissive license any model of this scale has been released under. Unlike Meta's Llama license (which restricts companies above 700M monthly active users) or various custom licenses from Chinese labs, MIT imposes no usage restrictions whatsoever. You can use GLM-5 commercially, modify it, redistribute it, and build proprietary products on top of it.
GLM-5 delivers strong results across coding and mathematical reasoning benchmarks. Here are the headline numbers:
| Benchmark | GLM-5 Score | Context |
|---|---|---|
| SWE-bench Verified | 77.8% | Top-tier for open models; competitive with Claude Sonnet 4 |
| AIME 2024 | 92.7% | Near-perfect on competition-level math |
| HumanEval | 91.2% | Strong code generation from natural language |
| MMLU | 88.4% | Broad knowledge coverage across 57 subjects |
The SWE-bench Verified score of 77.8% is the number that matters most for OpenClaw operators running coding agents. SWE-bench measures the ability to resolve real-world GitHub issues end-to-end — reading the issue description, locating the relevant code, generating a fix, and producing a valid patch. A 77.8% score means GLM-5 can handle roughly four out of five real software engineering tasks autonomously.
The AIME 2024 score of 92.7% demonstrates that GLM-5 handles advanced mathematical reasoning at competition level. This translates well to tasks like data analysis, financial modeling, and any workflow that requires step-by-step quantitative logic.
GLM-5 is available through multiple providers at different price points:
| Provider | Input (per 1M tokens) | Output (per 1M tokens) | Free Tier |
|---|---|---|---|
| Ollama Cloud | Free | Free | Yes (rate-limited) |
| Z.ai API | ~$0.50 | ~$1.80 | Yes (generous dev tier) |
| OpenRouter | $0.72 | $2.30 | No |
For comparison, Claude Sonnet 4 on OpenRouter costs $3.00 per million input tokens and $15.00 per million output tokens. GLM-5 on OpenRouter runs at roughly 24% the input cost and 15% the output cost of Claude Sonnet — a significant saving for high-volume agent workflows.
Ollama Cloud provides free hosted inference for GLM-5, making it the fastest way to test the model with OpenClaw. No API key is needed for rate-limited access.
# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Verify installation
ollama --version
# Pull the model for local/cloud use
ollama pull glm5
# Verify the model is available
ollama list
# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
provider: ollama
model: glm5
base_url: http://localhost:11434
temperature: 0.7
max_tokens: 8192
# Verify Ollama is serving GLM-5
curl http://localhost:11434/api/generate -d '{
"model": "glm5",
"prompt": "Hello, are you running?",
"stream": false
}'
# Start OpenClaw
openclaw start
Note that Ollama Cloud's free tier has rate limits. For production use with sustained traffic, the OpenRouter or Z.ai routes are more reliable.
OpenRouter provides a unified API that routes to GLM-5 along with dozens of other models. This is the most flexible option if you want to switch between models without reconfiguring your setup.
Sign up at openrouter.ai and generate an API key from the dashboard. Add credits to your account — even $5 will last thousands of requests at GLM-5's pricing.
# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
provider: openrouter
model: zhipu/glm-5
api_key: your-openrouter-api-key
temperature: 0.7
max_tokens: 8192
openclaw start
OpenRouter automatically handles load balancing and failover. If GLM-5 has a service interruption on one backend, OpenRouter routes to another — giving you higher uptime than a direct API connection.
Z.ai is Zhipu AI's own inference platform. It offers the lowest per-token pricing and a generous free developer tier. This is the best option if GLM-5 is your primary model and you want the lowest possible cost.
Marketplace
Free skills and AI personas for OpenClaw — browse the marketplace.
Browse the Marketplace →Sign up at z.ai and generate an API key. The free tier includes enough credits for substantial testing before you need to add funds.
# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
provider: openai-compatible
model: glm-5
api_key: your-zai-api-key
base_url: https://open.bigmodel.cn/api/paas/v4
temperature: 0.7
max_tokens: 8192
openclaw start
The Z.ai API follows the OpenAI-compatible format, which means OpenClaw's OpenAI provider works out of the box — just point the base URL to Z.ai's endpoint.
Here is how GLM-5 stacks up against the major proprietary models that OpenClaw operators commonly use:
| Metric | GLM-5 | Claude Sonnet 4 | GPT-4.1 |
|---|---|---|---|
| SWE-bench Verified | 77.8% | ~79% | ~78% |
| AIME 2024 | 92.7% | ~88% | ~90% |
| Input Cost (OpenRouter) | $0.72/M | $3.00/M | $2.00/M |
| Output Cost (OpenRouter) | $2.30/M | $15.00/M | $8.00/M |
| Vision | No | Yes | Yes |
| Max Throughput | ~69 tok/s | ~120 tok/s | ~100 tok/s |
| License | MIT (open) | Proprietary | Proprietary |
The takeaway: GLM-5 is within striking distance of Claude and GPT on coding benchmarks at a fraction of the price. It actually beats both on AIME 2024 math scores. The trade-offs are no vision support, slower throughput, and less polished English output on nuanced tasks.
GLM-5 is the right model for your OpenClaw setup in these scenarios:
GLM-5 has clear limitations that you should understand before making it your primary OpenClaw backend:
Yes. GLM-5 is available on Ollama Cloud for free inference, and you can pull quantized versions for local execution. However, with 744 billion total parameters (40 billion active), running locally requires at least 32GB of RAM for a q4 quantization — a capable workstation or server. For most operators, the OpenRouter or Z.ai API routes are more practical.
GLM-5 scores 77.8% on SWE-bench Verified, which is competitive with Claude Sonnet 4 and GPT-4.1. For coding-heavy workflows, GLM-5 performs well. For creative writing, nuanced reasoning, and complex multi-step agent tasks, Claude Sonnet still has an edge. GLM-5's main advantage is cost — at $0.72 per million input tokens on OpenRouter, it is roughly 70% cheaper than Claude Sonnet.
Partially. GLM-5 is available for free on Ollama Cloud and Z.ai offers a generous free tier for developers. On OpenRouter, you pay $0.72 per million input tokens and $2.30 per million output tokens. The model weights are released under the MIT license, so you can self-host at zero marginal cost if you have the hardware.
GLM-5 is text-only — no vision or audio support. Inference speed tops out around 69 tokens per second on OpenRouter, which is slower than competing models like Gemma 4 or Llama 3.3. The model also has a smaller English-language training corpus compared to Western models, so it occasionally produces awkward phrasing on complex English tasks. For multilingual or Chinese-language workflows, however, it excels.