Remote OpenClaw Blog

GPT-OSS 20B on OpenClaw: OpenAI's Free Open-Weight Model

8 min read · 30 May 2026

What Is GPT-OSS 20B?

GPT-OSS 20B is OpenAI's first open-weight model, released in August 2025 under the Apache 2.0 license. After years of keeping all model weights proprietary, OpenAI entered the open-source arena with a model that was deliberately designed to compete with Llama, Qwen, and other community favorites.

The name "OSS" stands for Open Source Software, and OpenAI chose the Apache 2.0 license — the most commercially permissive option — to signal serious intent. You can download the weights, run them locally, fine-tune them for your domain, and build commercial products without any restrictions or royalties.

What makes GPT-OSS 20B remarkable is not just that it is free, but that it is genuinely good. It matches o3-mini — OpenAI's paid reasoning model — on most coding and reasoning benchmarks. For OpenClaw operators, this means you can run an agent powered by OpenAI-quality inference at zero cost, either locally on your laptop or free on OpenRouter.

The Mixture of Experts architecture is the key to its efficiency. With 21 billion total parameters but only 3.6 billion active per forward pass, GPT-OSS has the knowledge of a 20B model but the compute requirements of a 4B model. This makes it one of the most hardware-efficient models available, running comfortably on 16GB consumer devices.

Why OpenAI Went Open Source

OpenAI's decision to release an open-weight model was driven by competitive pressure. By mid-2025, the open-source ecosystem — led by Meta's Llama, Alibaba's Qwen, and DeepSeek — had captured a significant share of the developer market. Many startups and individual developers were building on open models, never touching the OpenAI API.

GPT-OSS 20B is OpenAI's answer: a model good enough to compete with community favorites, carrying the OpenAI brand, and serving as an on-ramp to their paid ecosystem. Developers who start with GPT-OSS often upgrade to GPT-5.3 Codex or GPT-5.4 for production — exactly as intended.

For OpenClaw operators, the motivation does not matter — the result does. GPT-OSS 20B is a high-quality, free, commercially-licensed model from the world's most recognized AI lab. That is a useful tool regardless of why it exists.

Architecture and Specifications

Specification	Value
Total Parameters	21 billion
Active Parameters	3.6 billion per forward pass
Architecture	Mixture of Experts (MoE)
Developer	OpenAI
Release Date	August 2025
License	Apache 2.0
Context Window	128K tokens
Modalities	Text only
RAM Required (local)	16GB (q4 quantization)
Disk Space	~12GB (q4 quantization)
OpenRouter Price	FREE

The 3.6B active parameters is the number that matters for hardware planning. While the model has 21B total parameters stored on disk (~12GB in q4), only 3.6B are computed per token. This means inference is extremely fast on consumer hardware — comparable to running a 4B dense model, but with the accuracy of a much larger one.

Benchmarks: Matching o3-mini

Benchmark	GPT-OSS 20B	o3-mini (paid)	Context
HumanEval	87.2%	88.5%	Near-identical code generation
MMLU	82.1%	83.4%	Close on broad knowledge
AIME 2024	78.3%	80.1%	Solid mathematical reasoning
GSM8K	91.5%	92.0%	Nearly identical math problem-solving
SWE-bench Lite	45.2%	47.8%	Respectable for a free 20B model

The benchmark story is clear: GPT-OSS 20B consistently comes within 1-3 percentage points of o3-mini across all major benchmarks. For a free model that runs on a laptop, this is exceptional. The SWE-bench Lite score of 45.2% is lower than frontier models (Claude Opus hits 80.8% on the full SWE-bench), but for a 3.6B-active model, it handles routine coding tasks competently.

The practical implication: if o3-mini was "good enough" for your coding tasks before, GPT-OSS 20B will be good enough too — and it costs nothing.

Setup Method 1: Ollama (Local, Free)

Running GPT-OSS 20B locally gives you completely free, private, offline inference with no API dependency.

Step 1: Install Ollama

# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Verify installation
ollama --version

Step 2: Pull GPT-OSS 20B

# Pull the model (~12GB download)
ollama pull gpt-oss:20b

# Verify it downloaded
ollama list

Step 3: Test the Model

# Interactive chat
ollama run gpt-oss:20b

# Test with a coding prompt
ollama run gpt-oss:20b "Write a Python function to parse CSV files with error handling"

Step 4: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: ollama
  model: gpt-oss:20b
  base_url: http://localhost:11434
  temperature: 0.7
  max_tokens: 8192

Step 5: Start OpenClaw

# Make sure Ollama is running
ollama serve &

# Start OpenClaw
openclaw start

The entire setup takes under 10 minutes, most of which is the 12GB model download. Once running, you have a free, private, offline coding agent powered by OpenAI technology.

Setup Method 2: OpenRouter (Cloud, Free)

If you do not want to run models locally, OpenRouter hosts GPT-OSS 20B for free — no credits required.

Step 1: Create an OpenRouter Account

Step 2: Generate an API Key

Create an API key from the OpenRouter dashboard.

Step 3: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openrouter
  model: openai/gpt-oss-20b:free
  api_key: your-openrouter-api-key
  temperature: 0.7
  max_tokens: 8192

Step 4: Start OpenClaw

openclaw start

The OpenRouter free tier gives you 20 requests per minute. For development, testing, and light production, this is plenty. For higher volume, add $5 in credits to remove rate limits (GPT-OSS remains free — credits just lift the rate cap).

Best Next Step

Use the marketplace filters to choose the right OpenClaw bundle, persona, or skill for the job you want to automate.

Find Your Workflow →Compare Best Fits →

Local Performance Expectations

Hardware	Tokens/Second	Time for 500-word Response
MacBook Air M2 (16GB)	20-35 tok/s	~18 seconds
MacBook Pro M3 (32GB)	35-55 tok/s	~11 seconds
Desktop + RTX 3060 (12GB)	50-75 tok/s	~8 seconds
Desktop + RTX 4090 (24GB)	80-120 tok/s	~5 seconds

GPT-OSS 20B is notably faster than other models of similar total parameter count because only 3.6B parameters activate per token. On a MacBook Air M2, it runs 30-50% faster than Qwen3 8B (which is a dense 8B model with all parameters active). The MoE architecture gives you more knowledge at lower compute cost.

For comparison, the same model on OpenRouter delivers 60-100 tokens per second, so the cloud route is faster but adds network latency (~100-200ms per request). For interactive use cases, local may actually feel faster due to zero network overhead.

GPT-OSS vs Qwen3 8B vs Llama 3.3 8B

Metric	GPT-OSS 20B	Qwen3 8B	Llama 3.3 8B
Total Params	21B (MoE)	8B (dense)	8B (dense)
Active Params	3.6B	8B	8B
RAM Required	16GB	16GB	16GB
HumanEval	87.2%	82.5%	84.1%
MMLU	82.1%	78.3%	79.8%
Languages	~15	119	~8
Context Window	128K	32K	128K
Inference Speed	Fastest (3.6B active)	Moderate (8B active)	Moderate (8B active)
License	Apache 2.0	Apache 2.0	Llama License
Free on OpenRouter	Yes	Yes (32B version)	Yes (70B version)

GPT-OSS 20B wins on benchmarks and inference speed despite having fewer active parameters. It also has the largest context window (128K) of the three when running locally. Qwen3 8B wins on multilingual support (119 vs ~15 languages). Llama 3.3 8B has the most extensive community ecosystem with more fine-tuned variants available.

For OpenClaw coding agents running in English, GPT-OSS 20B is the strongest free local option. For multilingual agents, Qwen3 8B is better. For agents that need the broadest fine-tuned variant ecosystem, Llama remains the safe choice.

When GPT-OSS Is the Right Choice

Zero-budget coding agent: GPT-OSS 20B matches o3-mini on coding benchmarks. If you want an AI coding assistant for free, this is the strongest option — locally or on OpenRouter.
Development and testing: Use GPT-OSS during development to avoid API costs. Its o3-mini-level performance means your agent behaves similarly to how it will with paid models, giving you a realistic testing environment.
Privacy-sensitive workflows: Run locally via Ollama with no data leaving your machine. Apache 2.0 license means no usage reporting or telemetry.
Low-latency local agent: With only 3.6B active parameters, GPT-OSS is one of the fastest models you can run locally. For agents that need quick responses — interactive chat, real-time coding assistance — the speed advantage over dense 8B models is noticeable.
Gateway to the GPT ecosystem: If you are already using OpenAI's paid models, GPT-OSS gives you a free tier for non-critical tasks, keeping your spending focused on the requests that need GPT-5.4 or Codex quality.

Frequently Asked Questions

Is GPT-OSS 20B really from OpenAI?

Yes. GPT-OSS 20B is OpenAI's first open-weight model release, published in August 2025 under the Apache 2.0 license. It represents a strategic shift for OpenAI, which had previously kept all model weights proprietary. The model is available on HuggingFace, Ollama, and free on OpenRouter. OpenAI has confirmed it in official communications and the weights are distributed through their verified accounts.

Can I run GPT-OSS 20B on my laptop?

Yes, if you have 16GB of RAM. GPT-OSS 20B uses a Mixture of Experts architecture with 21 billion total parameters but only 3.6 billion active per forward pass. This means the actual compute footprint is similar to a 4B model, making it lightweight enough for consumer hardware. On a 16GB MacBook, expect 20-40 tokens per second. On 8GB machines, it will run but may be slow due to memory pressure.

How does GPT-OSS 20B compare to o3-mini?

GPT-OSS 20B matches o3-mini on most coding and reasoning benchmarks, which is remarkable for a free open-weight model. The key differences: o3-mini has a larger context window, slightly better performance on complex multi-step reasoning tasks, and is only available through the paid OpenAI API. GPT-OSS 20B is free everywhere — Ollama, OpenRouter, self-hosted. For most OpenClaw agent tasks, the performance difference is negligible.

Why would OpenAI release a free model?

OpenAI released GPT-OSS 20B as a strategic move to compete with the open-source ecosystem (Llama, Qwen, DeepSeek) that was eroding their developer mindshare. By releasing a competitive free model, OpenAI keeps developers in their ecosystem — many who start with GPT-OSS eventually upgrade to paid GPT-5 variants for production. It also generates goodwill and demonstrates that OpenAI can compete on open weights, not just proprietary APIs.

Ready to choose the right OpenClaw workflow?

Best Next StepUse the marketplace filters to choose the right OpenClaw bundle, persona, or skill for the job you want to automate.More GuidesBrowse 200+ free OpenClaw guides, tutorials, and comparisons.Get the Production ChecklistUse the free checklist if you want the production setup sequence in one place.

Loading article