Remote OpenClaw Blog
GPT-5.3 and GPT-5.4 on OpenClaw: Setup and Configuration Guide
8 min read ·
Remote OpenClaw Blog
8 min read ·
OpenAI's GPT-5 generation arrived in two waves. GPT-5.3 (internally codenamed Codex) launched in February 2026 as a coding-focused model. GPT-5.4 followed one month later in March 2026 as the full general-purpose flagship, shipping with five distinct variants to cover different price and performance tiers.
For OpenClaw operators, the GPT-5 family represents the most comprehensive model lineup from any single provider. Whether you need a free nano model for simple classification, a mid-tier model for everyday agent tasks, or a maximum-capability model for complex autonomous workflows, there is a GPT-5 variant that fits.
The standout capability across the GPT-5.4 line is computer use. Scoring 75% on OSWorld — a benchmark that tests a model's ability to autonomously operate a computer desktop, click buttons, fill forms, navigate between applications, and complete multi-step UI tasks — GPT-5.4 sets a new bar for desktop automation. For OpenClaw operators building agents that interact with web applications, CRMs, or internal tools, this is a game-changing capability.
GPT-5.3 launched in February 2026 as OpenAI's dedicated coding model. It is not a general-purpose model — it was specifically trained and optimized for software engineering tasks: writing code, reviewing code, debugging, refactoring, and generating tests.
| Specification | Value |
|---|---|
| Model ID | gpt-5.3-codex |
| Release Date | February 2026 |
| Context Window | 400K tokens |
| Input Pricing | $1.75 per 1M tokens |
| Output Pricing | $14.00 per 1M tokens |
| Specialty | Code generation, review, debugging |
| Modalities | Text + Code |
The 400K context window is large enough to ingest entire medium-sized codebases in a single prompt. For OpenClaw operators using coding agents, this means your agent can see the full project structure, dependencies, and related files when making changes — resulting in more accurate patches that account for the broader codebase context.
At $1.75/$14 per million tokens, GPT-5.3 is positioned as a premium coding model. The input pricing is competitive, but the output pricing is notable — $14 per million output tokens is on the expensive side, reflecting the computational cost of extended code reasoning.
GPT-5.4 is OpenAI's most capable general-purpose model, launched in March 2026. It features a 1 million token context window — the largest of any commercial API model — and ships in five variants spanning from free to premium.
The headline capability is computer use. GPT-5.4 scored 75% on OSWorld, meaning it can successfully complete three out of four desktop automation tasks: navigating web applications, filling out forms, clicking through multi-step workflows, extracting data from applications, and interacting with system dialogs. This is not theoretical — it works in production through OpenClaw's computer-use integration.
The 1M context window is the other major differentiator. While Claude Opus 4.6 also offers 1M context, GPT-5.4 is the only other model at this scale. For OpenClaw operators processing large documents, entire codebases, or long conversation histories, this eliminates the context truncation problem entirely.
| Variant | Input (per 1M) | Output (per 1M) | Context | Best For |
|---|---|---|---|---|
| gpt-5.4-nano | Free | Free | 128K | Classification, routing, simple tasks |
| gpt-5.4-mini | $2.50 | $10.00 | 512K | Everyday agent tasks, balanced cost/quality |
| gpt-5.4 | $5.00 | $20.00 | 1M | Complex reasoning, extended thinking |
| gpt-5.4-turbo | $5.00 | $20.00 | 1M | Low-latency applications, real-time agents |
| gpt-5.4-max | $10.00 | $30.00 | 1M | Maximum quality, complex autonomous tasks |
The nano variant is genuinely free — no credits required, just rate limits. For OpenClaw operators, this is useful as a classifier or router: use nano to analyze incoming tasks and route them to the appropriate specialist model, saving costs on the expensive models.
The turbo variant has the same pricing as standard gpt-5.4 but is optimized for lower latency at the cost of slightly reduced reasoning depth. For agents that need to respond quickly — chat agents, real-time assistants, interactive workflows — turbo is the right choice.
| Benchmark | GPT-5.3 Codex | GPT-5.4 | Context |
|---|---|---|---|
| OSWorld | N/A | 75.0% | Best computer-use performance of any model |
| SWE-bench Verified | 81.2% | 79.5% | 5.3 edges out 5.4 on pure coding tasks |
| AIME 2024 | 85.0% | 94.1% | 5.4 significantly stronger on math |
| MMLU | 86.5% | 91.2% | 5.4 broader knowledge base |
| HumanEval | 93.8% | 92.1% | Both excellent at code generation |
The OSWorld score of 75% is the most consequential benchmark for agent operators. OSWorld tests complete computer-use workflows — not just recognizing UI elements, but executing multi-step tasks like "open the spreadsheet, sort column B, filter rows where revenue exceeds $10K, and export as CSV." A 75% success rate means GPT-5.4 can handle most routine desktop automation tasks without human intervention.
GPT-5.3 Codex's SWE-bench score of 81.2% makes it the strongest coding model from OpenAI, edging out even GPT-5.4 on pure software engineering tasks. If your OpenClaw agent primarily writes and reviews code, GPT-5.3 is the better choice despite its narrower capabilities.
| Model | Input (per 1M) | Output (per 1M) | Monthly Cost (100K requests) |
|---|---|---|---|
| GPT-5.3 Codex | $1.75 | $14.00 | ~$1,575 |
| GPT-5.4-nano | Free | Free | $0 (rate-limited) |
| GPT-5.4-mini | $2.50 | $10.00 | ~$1,250 |
| GPT-5.4 | $5.00 | $20.00 | ~$2,500 |
| GPT-5.4-turbo | $5.00 | $20.00 | ~$2,500 |
| GPT-5.4-max | $10.00 | $30.00 | ~$4,000 |
For comparison: Claude Opus 4.6 runs $5/$25 with 90% caching savings available. DeepSeek V3.2 runs $0.028/$0.10. The GPT-5 family sits at the premium end of the market, justified by capabilities like computer use and the 1M context window that cheaper models do not offer.
Marketplace
Free skills and AI personas for OpenClaw — browse the marketplace.
Browse the Marketplace →The most straightforward way to connect GPT-5 models to OpenClaw is through the OpenAI API directly.
Sign up at platform.openai.com and generate an API key. Add credits to your account based on which variant you plan to use.
# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
provider: openai
model: gpt-5.3-codex
api_key: your-openai-api-key
temperature: 0.4
max_tokens: 16384
# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
provider: openai
model: gpt-5.4 # or gpt-5.4-mini, gpt-5.4-turbo, gpt-5.4-max, gpt-5.4-nano
api_key: your-openai-api-key
temperature: 0.7
max_tokens: 16384
openclaw start
Swap between variants by changing the model field. No other configuration changes are needed — all GPT-5 variants use the same API endpoint and authentication.
OpenRouter lets you access all GPT-5 variants through a single API key, with the added benefit of automatic failover and unified billing across multiple model providers.
Sign up at openrouter.ai and generate an API key.
# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
provider: openrouter
model: openai/gpt-5.4 # or openai/gpt-5.3-codex, openai/gpt-5.4-mini, etc.
api_key: your-openrouter-api-key
temperature: 0.7
max_tokens: 16384
openclaw start
OpenRouter pricing for GPT-5 variants is typically identical to direct OpenAI pricing. The advantage is unified billing if you use multiple model providers and automatic failover during OpenAI outages.
Here is a decision framework for OpenClaw operators:
A common pattern is to use nano as a router, mini for routine tasks, and standard or max for complex tasks — creating a cost-efficient pipeline where most requests hit the cheapest tier.
GPT-5.3 (Codex) launched in February 2026 as OpenAI's coding-specialist model with a 400K context window and pricing at $1.75/$14 per million tokens. GPT-5.4 launched a month later in March 2026 as the general-purpose flagship with a 1M context window, 5 variants ranging from $2.50 to $30 per million output tokens, and 75% on OSWorld. GPT-5.3 is the better choice for pure coding tasks at lower cost; GPT-5.4 is better for complex multi-step agent workflows that require computer use.
For most OpenClaw operators, gpt-5.4-mini ($2.50/$10) offers the best balance of performance and cost. Use gpt-5.4 ($5/$20) for complex agent tasks requiring extended reasoning. Use gpt-5.4-turbo for latency-sensitive applications. Reserve gpt-5.4-max ($10/$30) for maximum-quality tasks where cost is secondary. The free gpt-5.4-nano variant is suitable for simple routing and classification tasks only.
Yes. GPT-5.4 scored 75% on OSWorld, which measures the ability to control a computer — clicking buttons, filling forms, navigating applications. OpenClaw can leverage this through its computer-use integration, allowing your agent to interact with desktop applications, web browsers, and system tools autonomously.
GPT-5.4 standard pricing ($5/$20) is comparable to Claude Opus 4.6 ($5/$25) on input but cheaper on output. GPT-5.4-mini ($2.50/$10) is significantly cheaper than any Opus tier. GPT-5.3 Codex ($1.75/$14) is the cheapest option for coding-specific tasks. Claude Opus 4.6 offers 90% caching savings which can dramatically reduce effective costs for repetitive workflows, so the true comparison depends on your usage pattern.