Remote OpenClaw

Remote OpenClaw Blog

DeepSeek V3.2 on OpenClaw: The Cheapest Frontier Model

7 min read ·

What Is DeepSeek V3.2?

DeepSeek V3.2 is the latest iteration of DeepSeek's flagship language model, developed by the Hangzhou-based AI lab that has repeatedly proven that frontier performance does not require frontier pricing. With 671 billion total parameters in a Mixture of Experts architecture (37 billion active per inference), V3.2 delivers benchmark scores that rival models costing 100-200x more per token.

The pricing is the story. At $0.028 per million input tokens and $0.10 per million output tokens, DeepSeek V3.2 is the cheapest frontier-class model in existence. To put this in perspective: processing one million input tokens on Claude Opus 4.6 costs $5.00. The same volume on DeepSeek V3.2 costs $0.028. That is not a typo — it is 178x cheaper.

For OpenClaw operators, this pricing unlocks workflows that were previously uneconomical. High-volume batch processing, continuous monitoring agents, large-scale data analysis, and experimental agent pipelines all become viable when per-token costs approach zero. The trade-off is lower peak accuracy compared to Claude or GPT-5, but for many workflows, 67.8% SWE-bench is more than sufficient.

The MIT license adds another dimension: you can download the weights, self-host, fine-tune, and build commercial products without any licensing restrictions. For operators who need air-gapped deployment or want to eliminate API dependency entirely, this is a significant advantage.


Architecture and Specifications

Specification Value
Total Parameters 671 billion
Active Parameters 37 billion per forward pass
Architecture Mixture of Experts (MoE)
Developer DeepSeek
License MIT
Context Window 128K tokens
Modalities Text only
Input Pricing $0.028 per 1M tokens
Output Pricing $0.10 per 1M tokens

The MoE architecture is key to understanding both the performance and pricing. With 671B total parameters spread across expert modules, V3.2 has an enormous knowledge base. But only 37B parameters activate per inference pass, which keeps compute costs low. This is how DeepSeek achieves frontier-class knowledge breadth while maintaining the inference cost of a much smaller model.


The Pricing Story

Let's put DeepSeek V3.2's pricing in concrete terms for OpenClaw operators:

Scenario DeepSeek V3.2 Cost Claude Opus 4.6 Cost Savings
1,000 agent requests/day (1K tokens each) $0.004/day $0.75/day 99.5%
10,000 requests/day $0.04/day $7.50/day 99.5%
1M requests/month $3.84/month $690/month 99.4%

At these prices, the API cost is negligible for nearly any OpenClaw workflow. The operational overhead of managing the API connection costs more than the tokens themselves. This fundamentally changes how you think about agent design — you can afford to have your agent make speculative requests, retry on failures, and process large volumes of data without worrying about cost optimization.


Benchmarks and Performance

Benchmark DeepSeek V3.2 Context
AIME 2024 96.0% Near-perfect math; beats Claude Opus and GPT-5.4
SWE-bench Verified 67.8% Solid coding; handles 2/3 of real engineering tasks
MMLU 88.1% Strong broad knowledge
HumanEval 89.5% Reliable code generation from descriptions
GPQA Diamond 71.3% Decent graduate-level reasoning

The AIME 2024 score of 96% is remarkable and actually exceeds Claude Opus 4.6 (91.5%) and GPT-5.4 (94.1%). For any workflow involving mathematical reasoning — financial calculations, data analysis, statistical modeling, scientific computation — DeepSeek V3.2 is not just the cheapest option, it is arguably the best option.

The SWE-bench Verified score of 67.8% tells a different story for coding. While V3.2 handles two-thirds of real-world coding tasks, it falls short of Claude Opus (80.8%) and GPT-5.4 (79.5%) on complex software engineering. For routine coding — fixing bugs, writing tests, implementing straightforward features — V3.2 is excellent. For architecturally complex changes that require understanding large codebases, consider routing those tasks to a more capable model.


Setup Method 1: DeepSeek API (Direct)

The DeepSeek API offers the lowest per-token pricing and follows the OpenAI-compatible format, making integration with OpenClaw straightforward.

Step 1: Get a DeepSeek API Key

Sign up at platform.deepseek.com and generate an API key. Even a $1 deposit will last you through millions of tokens at V3.2's pricing.

Step 2: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openai-compatible
  model: deepseek-v3.2
  api_key: your-deepseek-api-key
  base_url: https://api.deepseek.com/v1
  temperature: 0.7
  max_tokens: 8192

Step 3: Start OpenClaw

openclaw start

The DeepSeek API uses the OpenAI-compatible format, so OpenClaw's OpenAI provider works without modification. Response times are generally fast, though throughput can vary during peak hours due to high global demand for V3.2.


Setup Method 2: OpenRouter

OpenRouter provides V3.2 access with automatic failover and unified billing across all your model providers.

Step 1: Get an OpenRouter API Key

Sign up at openrouter.ai and generate an API key.

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Step 2: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openrouter
  model: deepseek/deepseek-v3.2
  api_key: your-openrouter-api-key
  temperature: 0.7
  max_tokens: 8192

Step 3: Start OpenClaw

openclaw start

OpenRouter pricing for V3.2 is slightly higher than direct DeepSeek API pricing, but the difference is negligible at these price levels. The added reliability and failover capabilities are worth the marginal premium.


Cost Comparison: V3.2 vs Everything

Model Input (per 1M) Output (per 1M) vs V3.2 (input)
DeepSeek V3.2 $0.028 $0.10 1x (baseline)
Kimi K2.5 (OpenRouter) $0.45 $2.25 16x more
GLM-5 (OpenRouter) $0.72 $2.30 26x more
GPT-5.4-mini $2.50 $10.00 89x more
Claude Opus 4.6 $5.00 $25.00 178x more
GPT-5.4-max $10.00 $30.00 357x more

The pricing gap is so large that it changes the decision calculus. With most models, you optimize prompts to reduce token usage. With DeepSeek V3.2, you optimize for task success rate because the marginal cost of additional tokens is effectively zero.


When DeepSeek V3.2 Is the Right Choice

  • High-volume batch processing: If your agent processes thousands of requests per day — data extraction, content classification, email triage — V3.2 keeps costs negligible regardless of volume.
  • Mathematical reasoning: With 96% on AIME, V3.2 is the strongest math model available. Financial modeling, statistical analysis, data transformation, and scientific computation all benefit.
  • Cost-sensitive coding pipelines: For routine coding tasks (bug fixes, test generation, code formatting), V3.2's 67.8% SWE-bench is sufficient at 1/178th the cost of Opus.
  • Experimental workflows: When you are prototyping new agent workflows and want to iterate quickly without worrying about API costs, V3.2 lets you run thousands of tests for pennies.
  • Secondary model in a pipeline: Use V3.2 as the workhorse that handles 80% of routine tasks, and route the remaining 20% of complex tasks to Claude or GPT. This hybrid approach often delivers 95% of peak performance at 10% of the cost.

Limitations

  • Text-only: No vision or audio support. If your agent needs to process images, screenshots, or PDFs with visual elements, use a multimodal model for those specific tasks.
  • 128K context limit: Smaller than Claude Opus (1M) or GPT-5.4 (1M). For very large codebases or documents, you may need to chunk your input.
  • Lower SWE-bench ceiling: At 67.8%, V3.2 fails on about 1 in 3 complex coding tasks. For critical code changes that must be right, consider routing to a more capable model.
  • Throughput variability: The DeepSeek API can experience variable response times during peak usage periods, particularly from Asian time zones. OpenRouter's failover mitigates this somewhat.
  • Data residency: The DeepSeek API routes through infrastructure in China. For operators with strict data residency requirements, self-hosting (MIT license) or using OpenRouter's non-Chinese endpoints may be necessary.

Frequently Asked Questions

How can DeepSeek V3.2 be so cheap?

DeepSeek V3.2's pricing reflects three factors: the Mixture of Experts architecture activates only 37 billion of 671 billion parameters per forward pass (reducing compute per token), DeepSeek operates its own inference infrastructure in China (lower operational costs), and the MIT license means the model weights are freely available — DeepSeek competes on service quality rather than model exclusivity. The result is frontier-adjacent performance at 1/180th the cost of Claude Opus.

Is DeepSeek V3.2 good enough for coding agents?

DeepSeek V3.2 scores 67.8% on SWE-bench Verified, meaning it can autonomously resolve roughly two-thirds of real-world software engineering tasks. For comparison, Claude Opus 4.6 scores 80.8%. If your coding agent handles routine bug fixes, feature implementations, and code reviews, V3.2 is more than capable at a fraction of the cost. For complex multi-file refactoring or architecturally challenging tasks, a more capable model may be worth the premium.

Can I run DeepSeek V3.2 locally?

In theory, yes — the model is MIT licensed. In practice, the 671 billion total parameters make local deployment challenging. Even with MoE architecture (37B active), you need significant infrastructure for full-precision inference. Quantized versions (q4) can run on high-end servers with 128GB+ RAM, but for most operators, the DeepSeek API at $0.028/$0.10 is far more practical and cost-effective than self-hosting.

How does DeepSeek V3.2 compare to GPT-5.4-mini?

DeepSeek V3.2 ($0.028/$0.10) is roughly 90x cheaper on input and 100x cheaper on output than GPT-5.4-mini ($2.50/$10.00). On benchmarks, GPT-5.4-mini scores higher on SWE-bench and MMLU, but V3.2 actually beats it on AIME math (96% vs ~88%). For cost-sensitive workflows where good-enough performance matters more than maximum accuracy, DeepSeek V3.2 is the clear winner. For tasks requiring higher reliability or computer use, GPT-5.4-mini justifies its premium.


Further Reading