Remote OpenClaw

Remote OpenClaw Blog

Kimi K2.5 on OpenClaw: Agent Swarm, Benchmarks, and Setup Guide

8 min read ·

What Is Kimi K2.5?

Kimi K2.5 is the latest flagship model from Moonshot AI, a Beijing-based lab that has built a reputation for pushing the boundaries of agent-capable language models. Released in January 2026 under the Modified MIT license, K2.5 represents a significant leap from its predecessor K2 — scaling to 1 trillion total parameters in a Mixture of Experts architecture with 32 billion active per forward pass.

What sets Kimi K2.5 apart from other frontier models is not just raw benchmark performance but its native Agent Swarm capability. While most models require external orchestration frameworks like LangChain or CrewAI to coordinate multiple agents, K2.5 can internally spawn and manage up to 100 agents from a single inference call. For OpenClaw operators building complex workflows, this eliminates an entire layer of orchestration complexity.

The model also excels at web browsing and research tasks, scoring 74.9% on BrowseComp — a benchmark that measures a model's ability to find specific information on the web, verify it, and synthesize it into accurate answers. This makes K2.5 particularly well-suited for research agents, competitive intelligence workflows, and any task that requires pulling information from multiple online sources.


Agent Swarm: Multi-Agent Orchestration

Agent Swarm is the headline feature of Kimi K2.5 and the primary reason OpenClaw operators should consider it. Here is how it works:

When K2.5 receives a complex task, it can autonomously decompose it into subtasks and spawn independent agents to handle each one. Each agent operates in its own context with its own reasoning chain, and the orchestrator agent synthesizes the results. This happens within a single API call — you send one request and get back one response, but internally K2.5 may have coordinated dozens of specialized workers.

Practical examples for OpenClaw workflows:

The practical limit is 100 concurrent agents per request. For most OpenClaw use cases, you will rarely need more than 20-30, but the headroom is there for large-scale batch operations.

Configuring Agent Swarm in OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openrouter
  model: moonshot/kimi-k2.5
  api_key: your-openrouter-api-key
  temperature: 0.7
  max_tokens: 16384
  # Enable Agent Swarm (K2.5-specific)
  extra_params:
    agent_swarm: true
    max_agents: 50

Architecture and Specifications

Kimi K2.5 uses a Mixture of Experts architecture optimized for agent workloads. The 1 trillion total parameters are distributed across expert modules, with only 32 billion active per inference pass. This design means K2.5 has the knowledge depth of a massive model while keeping per-token costs competitive with much smaller models.

Specification Value
Total Parameters 1 trillion
Active Parameters 32 billion per forward pass
Architecture Mixture of Experts (MoE)
Agent Swarm Up to 100 concurrent agents
Release Date January 2026
License Modified MIT
Developer Moonshot AI
Modalities Text + Vision
Context Window 256K tokens

The Modified MIT license is nearly identical to standard MIT. The only additional clause requires attribution when redistributing derivative models — meaning if you fine-tune K2.5 and release the fine-tuned weights publicly, you must credit Moonshot AI. For commercial use within your own products and services, there are no restrictions.


Benchmarks and Performance

Kimi K2.5 performs competitively across coding, reasoning, and web browsing benchmarks:

Benchmark Kimi K2.5 Score Context
BrowseComp 74.9% Best-in-class for web browsing and research tasks
SWE-bench Verified 72.4% Solid coding performance; competitive with GPT-4.1
AIME 2024 89.3% Strong mathematical reasoning
MMLU 87.8% Broad knowledge across 57 subjects
HumanEval 90.1% Code generation from natural language

The BrowseComp score of 74.9% is the standout number. BrowseComp tests a model's ability to navigate real websites, extract specific data points, and synthesize information across multiple pages. For OpenClaw operators running research agents, data gathering pipelines, or competitive intelligence workflows, this is the most relevant benchmark. K2.5 outperforms most open models on this metric by a significant margin.

The SWE-bench Verified score of 72.4% is respectable but not class-leading. For pure coding workflows, models like Claude Opus 4.6 (80.8%) or GLM-5 (77.8%) are stronger. Where K2.5 excels is in tasks that combine coding with research — building features that require understanding external APIs, reading documentation, and synthesizing information from multiple sources.


Pricing Across Providers

Provider Input (per 1M tokens) Output (per 1M tokens) Free Tier
Ollama Cloud Free Free Yes (rate-limited)
OpenRouter $0.45 $2.25 No
Moonshot API (Direct) $0.60 $2.50 Yes (limited)

OpenRouter is actually cheaper than the direct Moonshot API for K2.5, which is unusual. This is likely due to OpenRouter's volume-based pricing agreements. At $0.45/$2.25 on OpenRouter, K2.5 is one of the most cost-effective frontier-class models available — roughly 85% cheaper than Claude Sonnet 4 on input tokens.


Setup Method 1: Ollama Cloud (Free)

Ollama Cloud provides free hosted inference for Kimi K2.5, making it the fastest path to testing the model with OpenClaw.

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Step 1: Install Ollama

# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Verify installation
ollama --version

Step 2: Pull Kimi K2.5

# Pull the model
ollama pull kimi-k2.5

# Verify the model is available
ollama list

Step 3: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: ollama
  model: kimi-k2.5
  base_url: http://localhost:11434
  temperature: 0.7
  max_tokens: 16384

Step 4: Test the Connection

# Verify Ollama is serving K2.5
curl http://localhost:11434/api/generate -d '{
  "model": "kimi-k2.5",
  "prompt": "Hello, are you running?",
  "stream": false
}'

# Start OpenClaw
openclaw start

The Ollama Cloud free tier has rate limits — typically 10-20 requests per minute. For production workloads, switch to OpenRouter or the Moonshot direct API.


Setup Method 2: OpenRouter API

OpenRouter provides the best per-token pricing for K2.5 and the flexibility to switch between models without reconfiguring your stack.

Step 1: Get an OpenRouter API Key

Sign up at openrouter.ai and generate an API key from the dashboard. Add credits to your account — $5 will last thousands of requests at K2.5's pricing.

Step 2: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openrouter
  model: moonshot/kimi-k2.5
  api_key: your-openrouter-api-key
  temperature: 0.7
  max_tokens: 16384

Step 3: Start OpenClaw

openclaw start

OpenRouter handles load balancing and failover automatically. It also provides unified billing across all models you use, which simplifies cost tracking for teams running multiple models.


Setup Method 3: Moonshot API (Direct)

The Moonshot API gives you direct access to K2.5 with full Agent Swarm support and a free developer tier.

Step 1: Create a Moonshot Account

Sign up at platform.moonshot.ai and generate an API key. The free tier includes enough credits for initial testing and prototyping.

Step 2: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openai-compatible
  model: kimi-k2.5
  api_key: your-moonshot-api-key
  base_url: https://api.moonshot.ai/v1
  temperature: 0.7
  max_tokens: 16384

Step 3: Start OpenClaw

openclaw start

The Moonshot API follows the OpenAI-compatible format, so OpenClaw's OpenAI provider works without modification — just point the base URL to Moonshot's endpoint.


K2.5 vs Claude vs GPT

Metric Kimi K2.5 Claude Sonnet 4 GPT-4.1
BrowseComp 74.9% ~65% ~70%
SWE-bench Verified 72.4% ~79% ~78%
Agent Swarm 100 agents N/A N/A
Input Cost (OpenRouter) $0.45/M $3.00/M $2.00/M
Output Cost (OpenRouter) $2.25/M $15.00/M $8.00/M
Context Window 256K 200K 1M
License Modified MIT Proprietary Proprietary

The key insight: K2.5 trades some coding performance for best-in-class browsing and native multi-agent orchestration at a fraction of the cost. If your OpenClaw workflow is research-heavy or requires coordinating multiple agent tasks in parallel, K2.5 is the strongest value proposition on the market.


When K2.5 Is the Right Choice

  • Research and data gathering agents: The 74.9% BrowseComp score makes K2.5 the best choice for agents that need to find, verify, and synthesize information from the web. Competitive intelligence, market research, and lead enrichment workflows all benefit.
  • Multi-agent workflows: If your OpenClaw setup requires coordinating multiple specialists — researcher + coder + reviewer + writer — Agent Swarm handles this natively without external orchestration.
  • Budget-conscious teams: At $0.45/$2.25 on OpenRouter with a free Ollama Cloud tier, K2.5 is one of the most affordable frontier models. Teams running thousands of requests per day save significantly compared to Claude or GPT.
  • Long-context processing: The 256K context window handles large codebases, lengthy documents, and extensive conversation histories without truncation.
  • Open-weight flexibility: The Modified MIT license lets you self-host, fine-tune, and customize K2.5 for your specific domain without licensing constraints.

Frequently Asked Questions

What is Kimi K2.5's Agent Swarm feature?

Agent Swarm is Kimi K2.5's built-in multi-agent orchestration system that can coordinate up to 100 independent agents simultaneously. Each agent handles a subtask — research, coding, analysis, writing — and the orchestrator synthesizes results. For OpenClaw operators, this means a single K2.5 call can spawn a coordinated swarm of workers, dramatically increasing throughput on complex tasks without manual orchestration.

How does Kimi K2.5 compare to GPT-5 on browsing benchmarks?

Kimi K2.5 scores 74.9% on BrowseComp, which measures a model's ability to find and synthesize information from the web. This is competitive with GPT-5 variants and significantly ahead of most open models. For OpenClaw operators running research or data-gathering agents, K2.5 is one of the strongest options available at its price point.

Can I run Kimi K2.5 locally?

Kimi K2.5 is available on Ollama Cloud for free rate-limited inference. For local execution, the 1 trillion total parameter MoE architecture with 32 billion active parameters means you need at least 24GB of RAM for a quantized version. Consumer GPUs with 24GB VRAM (like the RTX 4090) can run the q4 quantization, though performance is better on server-grade hardware.

Is Kimi K2.5 open source?

Yes. Kimi K2.5 is released under the Modified MIT license by Moonshot AI. The Modified MIT license is functionally identical to standard MIT for most commercial use cases — you can use, modify, and redistribute the model freely. The only difference is an attribution requirement in derivative model releases.


Further Reading