Remote OpenClaw Blog
GPT-OSS 20B on OpenClaw: OpenAI's Free Open-Weight Model
8 min read ·
Remote OpenClaw Blog
8 min read ·
GPT-OSS 20B is OpenAI's first open-weight model, released in August 2025 under the Apache 2.0 license. After years of keeping all model weights proprietary, OpenAI entered the open-source arena with a model that was deliberately designed to compete with Llama, Qwen, and other community favorites.
The name "OSS" stands for Open Source Software, and OpenAI chose the Apache 2.0 license — the most commercially permissive option — to signal serious intent. You can download the weights, run them locally, fine-tune them for your domain, and build commercial products without any restrictions or royalties.
What makes GPT-OSS 20B remarkable is not just that it is free, but that it is genuinely good. It matches o3-mini — OpenAI's paid reasoning model — on most coding and reasoning benchmarks. For OpenClaw operators, this means you can run an agent powered by OpenAI-quality inference at zero cost, either locally on your laptop or free on OpenRouter.
The Mixture of Experts architecture is the key to its efficiency. With 21 billion total parameters but only 3.6 billion active per forward pass, GPT-OSS has the knowledge of a 20B model but the compute requirements of a 4B model. This makes it one of the most hardware-efficient models available, running comfortably on 16GB consumer devices.
OpenAI's decision to release an open-weight model was driven by competitive pressure. By mid-2025, the open-source ecosystem — led by Meta's Llama, Alibaba's Qwen, and DeepSeek — had captured a significant share of the developer market. Many startups and individual developers were building on open models, never touching the OpenAI API.
GPT-OSS 20B is OpenAI's answer: a model good enough to compete with community favorites, carrying the OpenAI brand, and serving as an on-ramp to their paid ecosystem. Developers who start with GPT-OSS often upgrade to GPT-5.3 Codex or GPT-5.4 for production — exactly as intended.
For OpenClaw operators, the motivation does not matter — the result does. GPT-OSS 20B is a high-quality, free, commercially-licensed model from the world's most recognized AI lab. That is a useful tool regardless of why it exists.
| Specification | Value |
|---|---|
| Total Parameters | 21 billion |
| Active Parameters | 3.6 billion per forward pass |
| Architecture | Mixture of Experts (MoE) |
| Developer | OpenAI |
| Release Date | August 2025 |
| License | Apache 2.0 |
| Context Window | 128K tokens |
| Modalities | Text only |
| RAM Required (local) | 16GB (q4 quantization) |
| Disk Space | ~12GB (q4 quantization) |
| OpenRouter Price | FREE |
The 3.6B active parameters is the number that matters for hardware planning. While the model has 21B total parameters stored on disk (~12GB in q4), only 3.6B are computed per token. This means inference is extremely fast on consumer hardware — comparable to running a 4B dense model, but with the accuracy of a much larger one.
| Benchmark | GPT-OSS 20B | o3-mini (paid) | Context |
|---|---|---|---|
| HumanEval | 87.2% | 88.5% | Near-identical code generation |
| MMLU | 82.1% | 83.4% | Close on broad knowledge |
| AIME 2024 | 78.3% | 80.1% | Solid mathematical reasoning |
| GSM8K | 91.5% | 92.0% | Nearly identical math problem-solving |
| SWE-bench Lite | 45.2% | 47.8% | Respectable for a free 20B model |
The benchmark story is clear: GPT-OSS 20B consistently comes within 1-3 percentage points of o3-mini across all major benchmarks. For a free model that runs on a laptop, this is exceptional. The SWE-bench Lite score of 45.2% is lower than frontier models (Claude Opus hits 80.8% on the full SWE-bench), but for a 3.6B-active model, it handles routine coding tasks competently.
The practical implication: if o3-mini was "good enough" for your coding tasks before, GPT-OSS 20B will be good enough too — and it costs nothing.
Running GPT-OSS 20B locally gives you completely free, private, offline inference with no API dependency.
# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Verify installation
ollama --version
# Pull the model (~12GB download)
ollama pull gpt-oss:20b
# Verify it downloaded
ollama list
# Interactive chat
ollama run gpt-oss:20b
# Test with a coding prompt
ollama run gpt-oss:20b "Write a Python function to parse CSV files with error handling"
# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
provider: ollama
model: gpt-oss:20b
base_url: http://localhost:11434
temperature: 0.7
max_tokens: 8192
# Make sure Ollama is running
ollama serve &
# Start OpenClaw
openclaw start
The entire setup takes under 10 minutes, most of which is the 12GB model download. Once running, you have a free, private, offline coding agent powered by OpenAI technology.
If you do not want to run models locally, OpenRouter hosts GPT-OSS 20B for free — no credits required.
Sign up at openrouter.ai with your email. No credit card needed.
Create an API key from the OpenRouter dashboard.
# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
provider: openrouter
model: openai/gpt-oss-20b:free
api_key: your-openrouter-api-key
temperature: 0.7
max_tokens: 8192
openclaw start
The OpenRouter free tier gives you 20 requests per minute. For development, testing, and light production, this is plenty. For higher volume, add $5 in credits to remove rate limits (GPT-OSS remains free — credits just lift the rate cap).
Marketplace
Free skills and AI personas for OpenClaw — browse the marketplace.
Browse the Marketplace →| Hardware | Tokens/Second | Time for 500-word Response |
|---|---|---|
| MacBook Air M2 (16GB) | 20-35 tok/s | ~18 seconds |
| MacBook Pro M3 (32GB) | 35-55 tok/s | ~11 seconds |
| Desktop + RTX 3060 (12GB) | 50-75 tok/s | ~8 seconds |
| Desktop + RTX 4090 (24GB) | 80-120 tok/s | ~5 seconds |
GPT-OSS 20B is notably faster than other models of similar total parameter count because only 3.6B parameters activate per token. On a MacBook Air M2, it runs 30-50% faster than Qwen3 8B (which is a dense 8B model with all parameters active). The MoE architecture gives you more knowledge at lower compute cost.
For comparison, the same model on OpenRouter delivers 60-100 tokens per second, so the cloud route is faster but adds network latency (~100-200ms per request). For interactive use cases, local may actually feel faster due to zero network overhead.
| Metric | GPT-OSS 20B | Qwen3 8B | Llama 3.3 8B |
|---|---|---|---|
| Total Params | 21B (MoE) | 8B (dense) | 8B (dense) |
| Active Params | 3.6B | 8B | 8B |
| RAM Required | 16GB | 16GB | 16GB |
| HumanEval | 87.2% | 82.5% | 84.1% |
| MMLU | 82.1% | 78.3% | 79.8% |
| Languages | ~15 | 119 | ~8 |
| Context Window | 128K | 32K | 128K |
| Inference Speed | Fastest (3.6B active) | Moderate (8B active) | Moderate (8B active) |
| License | Apache 2.0 | Apache 2.0 | Llama License |
| Free on OpenRouter | Yes | Yes (32B version) | Yes (70B version) |
GPT-OSS 20B wins on benchmarks and inference speed despite having fewer active parameters. It also has the largest context window (128K) of the three when running locally. Qwen3 8B wins on multilingual support (119 vs ~15 languages). Llama 3.3 8B has the most extensive community ecosystem with more fine-tuned variants available.
For OpenClaw coding agents running in English, GPT-OSS 20B is the strongest free local option. For multilingual agents, Qwen3 8B is better. For agents that need the broadest fine-tuned variant ecosystem, Llama remains the safe choice.
Yes. GPT-OSS 20B is OpenAI's first open-weight model release, published in August 2025 under the Apache 2.0 license. It represents a strategic shift for OpenAI, which had previously kept all model weights proprietary. The model is available on HuggingFace, Ollama, and free on OpenRouter. OpenAI has confirmed it in official communications and the weights are distributed through their verified accounts.
Yes, if you have 16GB of RAM. GPT-OSS 20B uses a Mixture of Experts architecture with 21 billion total parameters but only 3.6 billion active per forward pass. This means the actual compute footprint is similar to a 4B model, making it lightweight enough for consumer hardware. On a 16GB MacBook, expect 20-40 tokens per second. On 8GB machines, it will run but may be slow due to memory pressure.
GPT-OSS 20B matches o3-mini on most coding and reasoning benchmarks, which is remarkable for a free open-weight model. The key differences: o3-mini has a larger context window, slightly better performance on complex multi-step reasoning tasks, and is only available through the paid OpenAI API. GPT-OSS 20B is free everywhere — Ollama, OpenRouter, self-hosted. For most OpenClaw agent tasks, the performance difference is negligible.
OpenAI released GPT-OSS 20B as a strategic move to compete with the open-source ecosystem (Llama, Qwen, DeepSeek) that was eroding their developer mindshare. By releasing a competitive free model, OpenAI keeps developers in their ecosystem — many who start with GPT-OSS eventually upgrade to paid GPT-5 variants for production. It also generates goodwill and demonstrates that OpenAI can compete on open weights, not just proprietary APIs.