Remote OpenClaw Blog

GLM-5 on OpenClaw: Setup Guide, Benchmarks, and When to Use It

8 min read · 30 May 2026

What Is GLM-5?

GLM-5 is the flagship large language model from Zhipu AI, a Beijing-based AI research lab that has been building the GLM (General Language Model) series since 2022. Released in February 2026 under the MIT license, GLM-5 is one of the largest open-weight Mixture of Experts models publicly available — 744 billion total parameters with 40 billion active per inference pass.

What makes GLM-5 notable beyond its raw size is the hardware story. It was trained entirely on Huawei Ascend 910B chips, making it the highest-performing model trained without any NVIDIA hardware. For operators tracking the AI hardware supply chain, this is a meaningful data point about what non-NVIDIA silicon can deliver.

For OpenClaw operators, GLM-5 offers a practical middle ground: frontier-class coding and math performance at a fraction of the cost of Claude or GPT-4. It is available through three different integration paths, making it one of the more accessible open models for agent backends.

Official links:

Architecture and Specifications

GLM-5 uses a Mixture of Experts (MoE) architecture, which means the model has 744 billion total parameters but only activates approximately 40 billion on each forward pass. This design gives GLM-5 the knowledge capacity of a much larger model while keeping inference costs manageable.

Specification	Value
Total Parameters	744 billion
Active Parameters	~40 billion per forward pass
Architecture	Mixture of Experts (MoE)
Training Hardware	Huawei Ascend 910B
Release Date	February 2026
License	MIT
Developer	Zhipu AI (Z.ai)
Modalities	Text only
Context Window	128K tokens

The MIT license is the most permissive license any model of this scale has been released under. Unlike Meta's Llama license (which restricts companies above 700M monthly active users) or various custom licenses from Chinese labs, MIT imposes no usage restrictions whatsoever. You can use GLM-5 commercially, modify it, redistribute it, and build proprietary products on top of it.

Benchmarks and Performance

GLM-5 delivers strong results across coding and mathematical reasoning benchmarks. Here are the headline numbers:

Benchmark	GLM-5 Score	Context
SWE-bench Verified	77.8%	Top-tier for open models; competitive with Claude Sonnet 4
AIME 2024	92.7%	Near-perfect on competition-level math
HumanEval	91.2%	Strong code generation from natural language
MMLU	88.4%	Broad knowledge coverage across 57 subjects

The SWE-bench Verified score of 77.8% is the number that matters most for OpenClaw operators running coding agents. SWE-bench measures the ability to resolve real-world GitHub issues end-to-end — reading the issue description, locating the relevant code, generating a fix, and producing a valid patch. A 77.8% score means GLM-5 can handle roughly four out of five real software engineering tasks autonomously.

The AIME 2024 score of 92.7% demonstrates that GLM-5 handles advanced mathematical reasoning at competition level. This translates well to tasks like data analysis, financial modeling, and any workflow that requires step-by-step quantitative logic.

Pricing Across Providers

GLM-5 is available through multiple providers at different price points:

Provider	Input (per 1M tokens)	Output (per 1M tokens)	Free Tier
Ollama Cloud	Free	Free	Yes (rate-limited)
Z.ai API	~$0.50	~$1.80	Yes (generous dev tier)
OpenRouter	$0.72	$2.30	No

For comparison, Claude Sonnet 4 on OpenRouter costs $3.00 per million input tokens and $15.00 per million output tokens. GLM-5 on OpenRouter runs at roughly 24% the input cost and 15% the output cost of Claude Sonnet — a significant saving for high-volume agent workflows.

Stats: 744B Total Parameters; 40B Active Per Forward Pass; 77.8% SWE-bench Score; $0.72/M Input Token Cost — Key numbers to know

Setup Method 1: Ollama Cloud (Free)

Ollama Cloud provides free hosted inference for GLM-5, making it the fastest way to test the model with OpenClaw. No API key is needed for rate-limited access.

Step 1: Install Ollama

# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Verify installation
ollama --version

Step 2: Pull GLM-5

# Pull the model for local/cloud use
ollama pull glm5

# Verify the model is available
ollama list

Step 3: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: ollama
  model: glm5
  base_url: http://localhost:11434
  temperature: 0.7
  max_tokens: 8192

Step 4: Test the Connection

# Verify Ollama is serving GLM-5
curl http://localhost:11434/api/generate -d '{
  "model": "glm5",
  "prompt": "Hello, are you running?",
  "stream": false
}'

# Start OpenClaw
openclaw start

Note that Ollama Cloud's free tier has rate limits. For production use with sustained traffic, the OpenRouter or Z.ai routes are more reliable.

Setup Method 2: OpenRouter API

OpenRouter provides a unified API that routes to GLM-5 along with dozens of other models. This is the most flexible option if you want to switch between models without reconfiguring your setup.

Step 1: Get an OpenRouter API Key

Sign up at openrouter.ai and generate an API key from the dashboard. Add credits to your account — even $5 will last thousands of requests at GLM-5's pricing.

Step 2: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openrouter
  model: zhipu/glm-5
  api_key: your-openrouter-api-key
  temperature: 0.7
  max_tokens: 8192

Step 3: Start OpenClaw

openclaw start

OpenRouter automatically handles load balancing and failover. If GLM-5 has a service interruption on one backend, OpenRouter routes to another — giving you higher uptime than a direct API connection.

Best Next Step

Use the marketplace filters to choose the right OpenClaw bundle, persona, or skill for the job you want to automate.

Find Your Workflow →Compare Best Fits →

Setup Method 3: Z.ai API (Direct)

Z.ai is Zhipu AI's own inference platform. It offers the lowest per-token pricing and a generous free developer tier. This is the best option if GLM-5 is your primary model and you want the lowest possible cost.

Step 1: Create a Z.ai Account

Sign up at z.ai and generate an API key. The free tier includes enough credits for substantial testing before you need to add funds.

Step 2: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openai-compatible
  model: glm-5
  api_key: your-zai-api-key
  base_url: https://open.bigmodel.cn/api/paas/v4
  temperature: 0.7
  max_tokens: 8192

Step 3: Start OpenClaw

openclaw start

The Z.ai API follows the OpenAI-compatible format, which means OpenClaw's OpenAI provider works out of the box — just point the base URL to Z.ai's endpoint.

GLM-5 vs Claude vs GPT

Here is how GLM-5 stacks up against the major proprietary models that OpenClaw operators commonly use:

Metric	GLM-5	Claude Sonnet 4	GPT-4.1
SWE-bench Verified	77.8%	~79%	~78%
AIME 2024	92.7%	~88%	~90%
Input Cost (OpenRouter)	$0.72/M	$3.00/M	$2.00/M
Output Cost (OpenRouter)	$2.30/M	$15.00/M	$8.00/M
Vision	No	Yes	Yes
Max Throughput	~69 tok/s	~120 tok/s	~100 tok/s
License	MIT (open)	Proprietary	Proprietary

The takeaway: GLM-5 is within striking distance of Claude and GPT on coding benchmarks at a fraction of the price. It actually beats both on AIME 2024 math scores. The trade-offs are no vision support, slower throughput, and less polished English output on nuanced tasks.

When GLM-5 Is the Right Choice

GLM-5 is the right model for your OpenClaw setup in these scenarios:

Cost-sensitive coding agents: If your agent primarily writes, reviews, or debugs code, GLM-5 delivers ~98% of Claude's SWE-bench performance at ~24% of the input cost. For high-volume workflows processing hundreds of requests per day, this adds up fast.
Math and data analysis: GLM-5's 92.7% AIME score means it handles quantitative reasoning tasks — financial modeling, data transformation, statistical analysis — at or above the level of proprietary models.
Chinese-language workflows: GLM-5 was developed by a Chinese lab and trained on extensive Chinese-language data. For operators serving Chinese-speaking users or processing Chinese documents, GLM-5 is the strongest available option.
Open-weight requirement: If you need to self-host the model for compliance, privacy, or air-gapped deployment, GLM-5's MIT license gives you full freedom. You can run it on your own hardware with zero dependency on any API provider.
Budget experimentation: With Ollama Cloud offering free access and Z.ai providing a free developer tier, GLM-5 is one of the best options for testing agent workflows before committing to a paid model.

Limitations

GLM-5 has clear limitations that you should understand before making it your primary OpenClaw backend:

Text-only: GLM-5 does not support vision or audio input. If your agent needs to process screenshots, images, PDFs with visual elements, or audio, you will need a different model. Gemma 4 or Claude are better options for multimodal workflows.
Throughput ceiling: At approximately 69 tokens per second on OpenRouter, GLM-5 is noticeably slower than Claude (~120 tok/s) and GPT-4.1 (~100 tok/s). For latency-sensitive applications where users are waiting for responses, this may be a dealbreaker.
English phrasing: On complex English writing tasks — long-form content, nuanced explanations, creative writing — GLM-5 occasionally produces phrasing that feels unnatural. It is accurate but can read as translated. For code and structured output, this is not a problem; for user-facing English text, it may be.
Smaller Western ecosystem: GLM-5 has fewer community resources, fine-tuned variants, and integration guides compared to Llama, Gemma, or Qwen models. You may need to do more configuration work yourself.
Hardware for self-hosting: With 744B total parameters, self-hosting GLM-5 at full precision requires significant infrastructure. Quantized versions (q4) bring the memory requirement down to ~48-64GB RAM for the active parameters, but this is still beyond most consumer hardware.

Frequently Asked Questions

Can I run GLM-5 locally with Ollama?

Yes. GLM-5 is available on Ollama Cloud for free inference, and you can pull quantized versions for local execution. However, with 744 billion total parameters (40 billion active), running locally requires at least 32GB of RAM for a q4 quantization — a capable workstation or server. For most operators, the OpenRouter or Z.ai API routes are more practical.

How does GLM-5 compare to Claude Sonnet for OpenClaw tasks?

GLM-5 scores 77.8% on SWE-bench Verified, which is competitive with Claude Sonnet 4 and GPT-4.1. For coding-heavy workflows, GLM-5 performs well. For creative writing, nuanced reasoning, and complex multi-step agent tasks, Claude Sonnet still has an edge. GLM-5's main advantage is cost — at $0.72 per million input tokens on OpenRouter, it is roughly 70% cheaper than Claude Sonnet.

Is GLM-5 free to use?

Partially. GLM-5 is available for free on Ollama Cloud and Z.ai offers a generous free tier for developers. On OpenRouter, you pay $0.72 per million input tokens and $2.30 per million output tokens. The model weights are released under the MIT license, so you can self-host at zero marginal cost if you have the hardware.

What are GLM-5's main limitations?

GLM-5 is text-only — no vision or audio support. Inference speed tops out around 69 tokens per second on OpenRouter, which is slower than competing models like Gemma 4 or Llama 3.3. The model also has a smaller English-language training corpus compared to Western models, so it occasionally produces awkward phrasing on complex English tasks. For multilingual or Chinese-language workflows, however, it excels.

Ready to choose the right OpenClaw workflow?

Best Next StepUse the marketplace filters to choose the right OpenClaw bundle, persona, or skill for the job you want to automate.More GuidesBrowse 200+ free OpenClaw guides, tutorials, and comparisons.Get the Production ChecklistUse the free checklist if you want the production setup sequence in one place.

Loading article

GLM-5 on OpenClaw: Setup Guide, Benchmarks, and When to Use It

What Is GLM-5?

Architecture and Specifications

Benchmarks and Performance

Pricing Across Providers

Setup Method 1: Ollama Cloud (Free)

Step 1: Install Ollama

Step 2: Pull GLM-5

Step 3: Configure OpenClaw

Step 4: Test the Connection

Setup Method 2: OpenRouter API

Step 1: Get an OpenRouter API Key

Step 2: Configure OpenClaw

Step 3: Start OpenClaw

Setup Method 3: Z.ai API (Direct)

Step 1: Create a Z.ai Account

Step 2: Configure OpenClaw

Step 3: Start OpenClaw

GLM-5 vs Claude vs GPT

When GLM-5 Is the Right Choice

Limitations

Frequently Asked Questions

Can I run GLM-5 locally with Ollama?

How does GLM-5 compare to Claude Sonnet for OpenClaw tasks?

Is GLM-5 free to use?

What are GLM-5's main limitations?

Further Reading

Ready to choose the right OpenClaw workflow?