Remote OpenClaw

Remote OpenClaw Blog

GLM-5 on OpenClaw: Setup Guide, Benchmarks, and When to Use It

8 min read ·

What Is GLM-5?

GLM-5 is the flagship large language model from Zhipu AI, a Beijing-based AI research lab that has been building the GLM (General Language Model) series since 2022. Released in February 2026 under the MIT license, GLM-5 is one of the largest open-weight Mixture of Experts models publicly available — 744 billion total parameters with 40 billion active per inference pass.

What makes GLM-5 notable beyond its raw size is the hardware story. It was trained entirely on Huawei Ascend 910B chips, making it the highest-performing model trained without any NVIDIA hardware. For operators tracking the AI hardware supply chain, this is a meaningful data point about what non-NVIDIA silicon can deliver.

For OpenClaw operators, GLM-5 offers a practical middle ground: frontier-class coding and math performance at a fraction of the cost of Claude or GPT-4. It is available through three different integration paths, making it one of the more accessible open models for agent backends.

Official links:


Architecture and Specifications

GLM-5 uses a Mixture of Experts (MoE) architecture, which means the model has 744 billion total parameters but only activates approximately 40 billion on each forward pass. This design gives GLM-5 the knowledge capacity of a much larger model while keeping inference costs manageable.

Specification Value
Total Parameters 744 billion
Active Parameters ~40 billion per forward pass
Architecture Mixture of Experts (MoE)
Training Hardware Huawei Ascend 910B
Release Date February 2026
License MIT
Developer Zhipu AI (Z.ai)
Modalities Text only
Context Window 128K tokens

The MIT license is the most permissive license any model of this scale has been released under. Unlike Meta's Llama license (which restricts companies above 700M monthly active users) or various custom licenses from Chinese labs, MIT imposes no usage restrictions whatsoever. You can use GLM-5 commercially, modify it, redistribute it, and build proprietary products on top of it.


Benchmarks and Performance

GLM-5 delivers strong results across coding and mathematical reasoning benchmarks. Here are the headline numbers:

Benchmark GLM-5 Score Context
SWE-bench Verified 77.8% Top-tier for open models; competitive with Claude Sonnet 4
AIME 2024 92.7% Near-perfect on competition-level math
HumanEval 91.2% Strong code generation from natural language
MMLU 88.4% Broad knowledge coverage across 57 subjects

The SWE-bench Verified score of 77.8% is the number that matters most for OpenClaw operators running coding agents. SWE-bench measures the ability to resolve real-world GitHub issues end-to-end — reading the issue description, locating the relevant code, generating a fix, and producing a valid patch. A 77.8% score means GLM-5 can handle roughly four out of five real software engineering tasks autonomously.

The AIME 2024 score of 92.7% demonstrates that GLM-5 handles advanced mathematical reasoning at competition level. This translates well to tasks like data analysis, financial modeling, and any workflow that requires step-by-step quantitative logic.


Pricing Across Providers

GLM-5 is available through multiple providers at different price points:

Provider Input (per 1M tokens) Output (per 1M tokens) Free Tier
Ollama Cloud Free Free Yes (rate-limited)
Z.ai API ~$0.50 ~$1.80 Yes (generous dev tier)
OpenRouter $0.72 $2.30 No

For comparison, Claude Sonnet 4 on OpenRouter costs $3.00 per million input tokens and $15.00 per million output tokens. GLM-5 on OpenRouter runs at roughly 24% the input cost and 15% the output cost of Claude Sonnet — a significant saving for high-volume agent workflows.


Setup Method 1: Ollama Cloud (Free)

Ollama Cloud provides free hosted inference for GLM-5, making it the fastest way to test the model with OpenClaw. No API key is needed for rate-limited access.

Step 1: Install Ollama

# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Verify installation
ollama --version

Step 2: Pull GLM-5

# Pull the model for local/cloud use
ollama pull glm5

# Verify the model is available
ollama list

Step 3: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: ollama
  model: glm5
  base_url: http://localhost:11434
  temperature: 0.7
  max_tokens: 8192

Step 4: Test the Connection

# Verify Ollama is serving GLM-5
curl http://localhost:11434/api/generate -d '{
  "model": "glm5",
  "prompt": "Hello, are you running?",
  "stream": false
}'

# Start OpenClaw
openclaw start

Note that Ollama Cloud's free tier has rate limits. For production use with sustained traffic, the OpenRouter or Z.ai routes are more reliable.


Setup Method 2: OpenRouter API

OpenRouter provides a unified API that routes to GLM-5 along with dozens of other models. This is the most flexible option if you want to switch between models without reconfiguring your setup.

Step 1: Get an OpenRouter API Key

Sign up at openrouter.ai and generate an API key from the dashboard. Add credits to your account — even $5 will last thousands of requests at GLM-5's pricing.

Step 2: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openrouter
  model: zhipu/glm-5
  api_key: your-openrouter-api-key
  temperature: 0.7
  max_tokens: 8192

Step 3: Start OpenClaw

openclaw start

OpenRouter automatically handles load balancing and failover. If GLM-5 has a service interruption on one backend, OpenRouter routes to another — giving you higher uptime than a direct API connection.


Setup Method 3: Z.ai API (Direct)

Z.ai is Zhipu AI's own inference platform. It offers the lowest per-token pricing and a generous free developer tier. This is the best option if GLM-5 is your primary model and you want the lowest possible cost.

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Step 1: Create a Z.ai Account

Sign up at z.ai and generate an API key. The free tier includes enough credits for substantial testing before you need to add funds.

Step 2: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openai-compatible
  model: glm-5
  api_key: your-zai-api-key
  base_url: https://open.bigmodel.cn/api/paas/v4
  temperature: 0.7
  max_tokens: 8192

Step 3: Start OpenClaw

openclaw start

The Z.ai API follows the OpenAI-compatible format, which means OpenClaw's OpenAI provider works out of the box — just point the base URL to Z.ai's endpoint.


GLM-5 vs Claude vs GPT

Here is how GLM-5 stacks up against the major proprietary models that OpenClaw operators commonly use:

Metric GLM-5 Claude Sonnet 4 GPT-4.1
SWE-bench Verified 77.8% ~79% ~78%
AIME 2024 92.7% ~88% ~90%
Input Cost (OpenRouter) $0.72/M $3.00/M $2.00/M
Output Cost (OpenRouter) $2.30/M $15.00/M $8.00/M
Vision No Yes Yes
Max Throughput ~69 tok/s ~120 tok/s ~100 tok/s
License MIT (open) Proprietary Proprietary

The takeaway: GLM-5 is within striking distance of Claude and GPT on coding benchmarks at a fraction of the price. It actually beats both on AIME 2024 math scores. The trade-offs are no vision support, slower throughput, and less polished English output on nuanced tasks.


When GLM-5 Is the Right Choice

GLM-5 is the right model for your OpenClaw setup in these scenarios:

  • Cost-sensitive coding agents: If your agent primarily writes, reviews, or debugs code, GLM-5 delivers ~98% of Claude's SWE-bench performance at ~24% of the input cost. For high-volume workflows processing hundreds of requests per day, this adds up fast.
  • Math and data analysis: GLM-5's 92.7% AIME score means it handles quantitative reasoning tasks — financial modeling, data transformation, statistical analysis — at or above the level of proprietary models.
  • Chinese-language workflows: GLM-5 was developed by a Chinese lab and trained on extensive Chinese-language data. For operators serving Chinese-speaking users or processing Chinese documents, GLM-5 is the strongest available option.
  • Open-weight requirement: If you need to self-host the model for compliance, privacy, or air-gapped deployment, GLM-5's MIT license gives you full freedom. You can run it on your own hardware with zero dependency on any API provider.
  • Budget experimentation: With Ollama Cloud offering free access and Z.ai providing a free developer tier, GLM-5 is one of the best options for testing agent workflows before committing to a paid model.

Limitations

GLM-5 has clear limitations that you should understand before making it your primary OpenClaw backend:

  • Text-only: GLM-5 does not support vision or audio input. If your agent needs to process screenshots, images, PDFs with visual elements, or audio, you will need a different model. Gemma 4 or Claude are better options for multimodal workflows.
  • Throughput ceiling: At approximately 69 tokens per second on OpenRouter, GLM-5 is noticeably slower than Claude (~120 tok/s) and GPT-4.1 (~100 tok/s). For latency-sensitive applications where users are waiting for responses, this may be a dealbreaker.
  • English phrasing: On complex English writing tasks — long-form content, nuanced explanations, creative writing — GLM-5 occasionally produces phrasing that feels unnatural. It is accurate but can read as translated. For code and structured output, this is not a problem; for user-facing English text, it may be.
  • Smaller Western ecosystem: GLM-5 has fewer community resources, fine-tuned variants, and integration guides compared to Llama, Gemma, or Qwen models. You may need to do more configuration work yourself.
  • Hardware for self-hosting: With 744B total parameters, self-hosting GLM-5 at full precision requires significant infrastructure. Quantized versions (q4) bring the memory requirement down to ~48-64GB RAM for the active parameters, but this is still beyond most consumer hardware.

Frequently Asked Questions

Can I run GLM-5 locally with Ollama?

Yes. GLM-5 is available on Ollama Cloud for free inference, and you can pull quantized versions for local execution. However, with 744 billion total parameters (40 billion active), running locally requires at least 32GB of RAM for a q4 quantization — a capable workstation or server. For most operators, the OpenRouter or Z.ai API routes are more practical.

How does GLM-5 compare to Claude Sonnet for OpenClaw tasks?

GLM-5 scores 77.8% on SWE-bench Verified, which is competitive with Claude Sonnet 4 and GPT-4.1. For coding-heavy workflows, GLM-5 performs well. For creative writing, nuanced reasoning, and complex multi-step agent tasks, Claude Sonnet still has an edge. GLM-5's main advantage is cost — at $0.72 per million input tokens on OpenRouter, it is roughly 70% cheaper than Claude Sonnet.

Is GLM-5 free to use?

Partially. GLM-5 is available for free on Ollama Cloud and Z.ai offers a generous free tier for developers. On OpenRouter, you pay $0.72 per million input tokens and $2.30 per million output tokens. The model weights are released under the MIT license, so you can self-host at zero marginal cost if you have the hardware.

What are GLM-5's main limitations?

GLM-5 is text-only — no vision or audio support. Inference speed tops out around 69 tokens per second on OpenRouter, which is slower than competing models like Gemma 4 or Llama 3.3. The model also has a smaller English-language training corpus compared to Western models, so it occasionally produces awkward phrasing on complex English tasks. For multilingual or Chinese-language workflows, however, it excels.


Further Reading