Remote OpenClaw Blog
Kimi K2.5 on OpenClaw: Agent Swarm, Benchmarks, and Setup Guide
8 min read ·
Remote OpenClaw Blog
8 min read ·
Kimi K2.5 is the latest flagship model from Moonshot AI, a Beijing-based lab that has built a reputation for pushing the boundaries of agent-capable language models. Released in January 2026 under the Modified MIT license, K2.5 represents a significant leap from its predecessor K2 — scaling to 1 trillion total parameters in a Mixture of Experts architecture with 32 billion active per forward pass.
What sets Kimi K2.5 apart from other frontier models is not just raw benchmark performance but its native Agent Swarm capability. While most models require external orchestration frameworks like LangChain or CrewAI to coordinate multiple agents, K2.5 can internally spawn and manage up to 100 agents from a single inference call. For OpenClaw operators building complex workflows, this eliminates an entire layer of orchestration complexity.
The model also excels at web browsing and research tasks, scoring 74.9% on BrowseComp — a benchmark that measures a model's ability to find specific information on the web, verify it, and synthesize it into accurate answers. This makes K2.5 particularly well-suited for research agents, competitive intelligence workflows, and any task that requires pulling information from multiple online sources.
Agent Swarm is the headline feature of Kimi K2.5 and the primary reason OpenClaw operators should consider it. Here is how it works:
When K2.5 receives a complex task, it can autonomously decompose it into subtasks and spawn independent agents to handle each one. Each agent operates in its own context with its own reasoning chain, and the orchestrator agent synthesizes the results. This happens within a single API call — you send one request and get back one response, but internally K2.5 may have coordinated dozens of specialized workers.
Practical examples for OpenClaw workflows:
The practical limit is 100 concurrent agents per request. For most OpenClaw use cases, you will rarely need more than 20-30, but the headroom is there for large-scale batch operations.
# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
provider: openrouter
model: moonshot/kimi-k2.5
api_key: your-openrouter-api-key
temperature: 0.7
max_tokens: 16384
# Enable Agent Swarm (K2.5-specific)
extra_params:
agent_swarm: true
max_agents: 50
Kimi K2.5 uses a Mixture of Experts architecture optimized for agent workloads. The 1 trillion total parameters are distributed across expert modules, with only 32 billion active per inference pass. This design means K2.5 has the knowledge depth of a massive model while keeping per-token costs competitive with much smaller models.
| Specification | Value |
|---|---|
| Total Parameters | 1 trillion |
| Active Parameters | 32 billion per forward pass |
| Architecture | Mixture of Experts (MoE) |
| Agent Swarm | Up to 100 concurrent agents |
| Release Date | January 2026 |
| License | Modified MIT |
| Developer | Moonshot AI |
| Modalities | Text + Vision |
| Context Window | 256K tokens |
The Modified MIT license is nearly identical to standard MIT. The only additional clause requires attribution when redistributing derivative models — meaning if you fine-tune K2.5 and release the fine-tuned weights publicly, you must credit Moonshot AI. For commercial use within your own products and services, there are no restrictions.
Kimi K2.5 performs competitively across coding, reasoning, and web browsing benchmarks:
| Benchmark | Kimi K2.5 Score | Context |
|---|---|---|
| BrowseComp | 74.9% | Best-in-class for web browsing and research tasks |
| SWE-bench Verified | 72.4% | Solid coding performance; competitive with GPT-4.1 |
| AIME 2024 | 89.3% | Strong mathematical reasoning |
| MMLU | 87.8% | Broad knowledge across 57 subjects |
| HumanEval | 90.1% | Code generation from natural language |
The BrowseComp score of 74.9% is the standout number. BrowseComp tests a model's ability to navigate real websites, extract specific data points, and synthesize information across multiple pages. For OpenClaw operators running research agents, data gathering pipelines, or competitive intelligence workflows, this is the most relevant benchmark. K2.5 outperforms most open models on this metric by a significant margin.
The SWE-bench Verified score of 72.4% is respectable but not class-leading. For pure coding workflows, models like Claude Opus 4.6 (80.8%) or GLM-5 (77.8%) are stronger. Where K2.5 excels is in tasks that combine coding with research — building features that require understanding external APIs, reading documentation, and synthesizing information from multiple sources.
| Provider | Input (per 1M tokens) | Output (per 1M tokens) | Free Tier |
|---|---|---|---|
| Ollama Cloud | Free | Free | Yes (rate-limited) |
| OpenRouter | $0.45 | $2.25 | No |
| Moonshot API (Direct) | $0.60 | $2.50 | Yes (limited) |
OpenRouter is actually cheaper than the direct Moonshot API for K2.5, which is unusual. This is likely due to OpenRouter's volume-based pricing agreements. At $0.45/$2.25 on OpenRouter, K2.5 is one of the most cost-effective frontier-class models available — roughly 85% cheaper than Claude Sonnet 4 on input tokens.
Ollama Cloud provides free hosted inference for Kimi K2.5, making it the fastest path to testing the model with OpenClaw.
Marketplace
Free skills and AI personas for OpenClaw — browse the marketplace.
Browse the Marketplace →# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Verify installation
ollama --version
# Pull the model
ollama pull kimi-k2.5
# Verify the model is available
ollama list
# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
provider: ollama
model: kimi-k2.5
base_url: http://localhost:11434
temperature: 0.7
max_tokens: 16384
# Verify Ollama is serving K2.5
curl http://localhost:11434/api/generate -d '{
"model": "kimi-k2.5",
"prompt": "Hello, are you running?",
"stream": false
}'
# Start OpenClaw
openclaw start
The Ollama Cloud free tier has rate limits — typically 10-20 requests per minute. For production workloads, switch to OpenRouter or the Moonshot direct API.
OpenRouter provides the best per-token pricing for K2.5 and the flexibility to switch between models without reconfiguring your stack.
Sign up at openrouter.ai and generate an API key from the dashboard. Add credits to your account — $5 will last thousands of requests at K2.5's pricing.
# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
provider: openrouter
model: moonshot/kimi-k2.5
api_key: your-openrouter-api-key
temperature: 0.7
max_tokens: 16384
openclaw start
OpenRouter handles load balancing and failover automatically. It also provides unified billing across all models you use, which simplifies cost tracking for teams running multiple models.
The Moonshot API gives you direct access to K2.5 with full Agent Swarm support and a free developer tier.
Sign up at platform.moonshot.ai and generate an API key. The free tier includes enough credits for initial testing and prototyping.
# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
provider: openai-compatible
model: kimi-k2.5
api_key: your-moonshot-api-key
base_url: https://api.moonshot.ai/v1
temperature: 0.7
max_tokens: 16384
openclaw start
The Moonshot API follows the OpenAI-compatible format, so OpenClaw's OpenAI provider works without modification — just point the base URL to Moonshot's endpoint.
| Metric | Kimi K2.5 | Claude Sonnet 4 | GPT-4.1 |
|---|---|---|---|
| BrowseComp | 74.9% | ~65% | ~70% |
| SWE-bench Verified | 72.4% | ~79% | ~78% |
| Agent Swarm | 100 agents | N/A | N/A |
| Input Cost (OpenRouter) | $0.45/M | $3.00/M | $2.00/M |
| Output Cost (OpenRouter) | $2.25/M | $15.00/M | $8.00/M |
| Context Window | 256K | 200K | 1M |
| License | Modified MIT | Proprietary | Proprietary |
The key insight: K2.5 trades some coding performance for best-in-class browsing and native multi-agent orchestration at a fraction of the cost. If your OpenClaw workflow is research-heavy or requires coordinating multiple agent tasks in parallel, K2.5 is the strongest value proposition on the market.
Agent Swarm is Kimi K2.5's built-in multi-agent orchestration system that can coordinate up to 100 independent agents simultaneously. Each agent handles a subtask — research, coding, analysis, writing — and the orchestrator synthesizes results. For OpenClaw operators, this means a single K2.5 call can spawn a coordinated swarm of workers, dramatically increasing throughput on complex tasks without manual orchestration.
Kimi K2.5 scores 74.9% on BrowseComp, which measures a model's ability to find and synthesize information from the web. This is competitive with GPT-5 variants and significantly ahead of most open models. For OpenClaw operators running research or data-gathering agents, K2.5 is one of the strongest options available at its price point.
Kimi K2.5 is available on Ollama Cloud for free rate-limited inference. For local execution, the 1 trillion total parameter MoE architecture with 32 billion active parameters means you need at least 24GB of RAM for a quantized version. Consumer GPUs with 24GB VRAM (like the RTX 4090) can run the q4 quantization, though performance is better on server-grade hardware.
Yes. Kimi K2.5 is released under the Modified MIT license by Moonshot AI. The Modified MIT license is functionally identical to standard MIT for most commercial use cases — you can use, modify, and redistribute the model freely. The only difference is an attribution requirement in derivative model releases.