Remote OpenClaw

Remote OpenClaw Blog

GPT-5.3 and GPT-5.4 on OpenClaw: Setup and Configuration Guide

8 min read ·

GPT-5 Family Overview

OpenAI's GPT-5 generation arrived in two waves. GPT-5.3 (internally codenamed Codex) launched in February 2026 as a coding-focused model. GPT-5.4 followed one month later in March 2026 as the full general-purpose flagship, shipping with five distinct variants to cover different price and performance tiers.

For OpenClaw operators, the GPT-5 family represents the most comprehensive model lineup from any single provider. Whether you need a free nano model for simple classification, a mid-tier model for everyday agent tasks, or a maximum-capability model for complex autonomous workflows, there is a GPT-5 variant that fits.

The standout capability across the GPT-5.4 line is computer use. Scoring 75% on OSWorld — a benchmark that tests a model's ability to autonomously operate a computer desktop, click buttons, fill forms, navigate between applications, and complete multi-step UI tasks — GPT-5.4 sets a new bar for desktop automation. For OpenClaw operators building agents that interact with web applications, CRMs, or internal tools, this is a game-changing capability.


GPT-5.3 Codex: The Coding Specialist

GPT-5.3 launched in February 2026 as OpenAI's dedicated coding model. It is not a general-purpose model — it was specifically trained and optimized for software engineering tasks: writing code, reviewing code, debugging, refactoring, and generating tests.

Specification Value
Model ID gpt-5.3-codex
Release Date February 2026
Context Window 400K tokens
Input Pricing $1.75 per 1M tokens
Output Pricing $14.00 per 1M tokens
Specialty Code generation, review, debugging
Modalities Text + Code

The 400K context window is large enough to ingest entire medium-sized codebases in a single prompt. For OpenClaw operators using coding agents, this means your agent can see the full project structure, dependencies, and related files when making changes — resulting in more accurate patches that account for the broader codebase context.

At $1.75/$14 per million tokens, GPT-5.3 is positioned as a premium coding model. The input pricing is competitive, but the output pricing is notable — $14 per million output tokens is on the expensive side, reflecting the computational cost of extended code reasoning.


GPT-5.4: The General-Purpose Flagship

GPT-5.4 is OpenAI's most capable general-purpose model, launched in March 2026. It features a 1 million token context window — the largest of any commercial API model — and ships in five variants spanning from free to premium.

The headline capability is computer use. GPT-5.4 scored 75% on OSWorld, meaning it can successfully complete three out of four desktop automation tasks: navigating web applications, filling out forms, clicking through multi-step workflows, extracting data from applications, and interacting with system dialogs. This is not theoretical — it works in production through OpenClaw's computer-use integration.

The 1M context window is the other major differentiator. While Claude Opus 4.6 also offers 1M context, GPT-5.4 is the only other model at this scale. For OpenClaw operators processing large documents, entire codebases, or long conversation histories, this eliminates the context truncation problem entirely.


All 5 GPT-5.4 Variants Compared

Variant Input (per 1M) Output (per 1M) Context Best For
gpt-5.4-nano Free Free 128K Classification, routing, simple tasks
gpt-5.4-mini $2.50 $10.00 512K Everyday agent tasks, balanced cost/quality
gpt-5.4 $5.00 $20.00 1M Complex reasoning, extended thinking
gpt-5.4-turbo $5.00 $20.00 1M Low-latency applications, real-time agents
gpt-5.4-max $10.00 $30.00 1M Maximum quality, complex autonomous tasks

The nano variant is genuinely free — no credits required, just rate limits. For OpenClaw operators, this is useful as a classifier or router: use nano to analyze incoming tasks and route them to the appropriate specialist model, saving costs on the expensive models.

The turbo variant has the same pricing as standard gpt-5.4 but is optimized for lower latency at the cost of slightly reduced reasoning depth. For agents that need to respond quickly — chat agents, real-time assistants, interactive workflows — turbo is the right choice.


Benchmarks and Performance

Benchmark GPT-5.3 Codex GPT-5.4 Context
OSWorld N/A 75.0% Best computer-use performance of any model
SWE-bench Verified 81.2% 79.5% 5.3 edges out 5.4 on pure coding tasks
AIME 2024 85.0% 94.1% 5.4 significantly stronger on math
MMLU 86.5% 91.2% 5.4 broader knowledge base
HumanEval 93.8% 92.1% Both excellent at code generation

The OSWorld score of 75% is the most consequential benchmark for agent operators. OSWorld tests complete computer-use workflows — not just recognizing UI elements, but executing multi-step tasks like "open the spreadsheet, sort column B, filter rows where revenue exceeds $10K, and export as CSV." A 75% success rate means GPT-5.4 can handle most routine desktop automation tasks without human intervention.

GPT-5.3 Codex's SWE-bench score of 81.2% makes it the strongest coding model from OpenAI, edging out even GPT-5.4 on pure software engineering tasks. If your OpenClaw agent primarily writes and reviews code, GPT-5.3 is the better choice despite its narrower capabilities.


Complete Pricing Breakdown

Model Input (per 1M) Output (per 1M) Monthly Cost (100K requests)
GPT-5.3 Codex $1.75 $14.00 ~$1,575
GPT-5.4-nano Free Free $0 (rate-limited)
GPT-5.4-mini $2.50 $10.00 ~$1,250
GPT-5.4 $5.00 $20.00 ~$2,500
GPT-5.4-turbo $5.00 $20.00 ~$2,500
GPT-5.4-max $10.00 $30.00 ~$4,000

For comparison: Claude Opus 4.6 runs $5/$25 with 90% caching savings available. DeepSeek V3.2 runs $0.028/$0.10. The GPT-5 family sits at the premium end of the market, justified by capabilities like computer use and the 1M context window that cheaper models do not offer.

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Setup Method 1: OpenAI API (Direct)

The most straightforward way to connect GPT-5 models to OpenClaw is through the OpenAI API directly.

Step 1: Get an OpenAI API Key

Sign up at platform.openai.com and generate an API key. Add credits to your account based on which variant you plan to use.

Step 2: Configure OpenClaw for GPT-5.3 Codex

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openai
  model: gpt-5.3-codex
  api_key: your-openai-api-key
  temperature: 0.4
  max_tokens: 16384

Step 3: Configure OpenClaw for GPT-5.4

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openai
  model: gpt-5.4          # or gpt-5.4-mini, gpt-5.4-turbo, gpt-5.4-max, gpt-5.4-nano
  api_key: your-openai-api-key
  temperature: 0.7
  max_tokens: 16384

Step 4: Start OpenClaw

openclaw start

Swap between variants by changing the model field. No other configuration changes are needed — all GPT-5 variants use the same API endpoint and authentication.


Setup Method 2: OpenRouter

OpenRouter lets you access all GPT-5 variants through a single API key, with the added benefit of automatic failover and unified billing across multiple model providers.

Step 1: Get an OpenRouter API Key

Sign up at openrouter.ai and generate an API key.

Step 2: Configure OpenClaw

# In your OpenClaw config (e.g., ~/.openclaw/config.yaml)
llm:
  provider: openrouter
  model: openai/gpt-5.4       # or openai/gpt-5.3-codex, openai/gpt-5.4-mini, etc.
  api_key: your-openrouter-api-key
  temperature: 0.7
  max_tokens: 16384

Step 3: Start OpenClaw

openclaw start

OpenRouter pricing for GPT-5 variants is typically identical to direct OpenAI pricing. The advantage is unified billing if you use multiple model providers and automatic failover during OpenAI outages.


Which Variant to Choose

Here is a decision framework for OpenClaw operators:

  • Pure coding agent: Use GPT-5.3 Codex ($1.75/$14). It beats GPT-5.4 on SWE-bench and costs less. The 400K context window handles most codebases.
  • General-purpose agent on a budget: Use GPT-5.4-mini ($2.50/$10). Best balance of capability and cost for everyday tasks — email processing, document analysis, data extraction.
  • Computer use and desktop automation: Use GPT-5.4 ($5/$20). The 75% OSWorld score is only available on the standard and higher tiers. Mini and nano do not support computer use.
  • Real-time interactive agent: Use GPT-5.4-turbo ($5/$20). Same pricing as standard but optimized for lower latency. Best for chat-based agents where response time matters.
  • Maximum autonomy: Use GPT-5.4-max ($10/$30). The highest reasoning depth for complex multi-step tasks that require extended planning and self-correction.
  • Task routing and classification: Use GPT-5.4-nano (free). Let nano analyze the incoming task and route it to the appropriate specialist model.

A common pattern is to use nano as a router, mini for routine tasks, and standard or max for complex tasks — creating a cost-efficient pipeline where most requests hit the cheapest tier.


Frequently Asked Questions

What is the difference between GPT-5.3 and GPT-5.4?

GPT-5.3 (Codex) launched in February 2026 as OpenAI's coding-specialist model with a 400K context window and pricing at $1.75/$14 per million tokens. GPT-5.4 launched a month later in March 2026 as the general-purpose flagship with a 1M context window, 5 variants ranging from $2.50 to $30 per million output tokens, and 75% on OSWorld. GPT-5.3 is the better choice for pure coding tasks at lower cost; GPT-5.4 is better for complex multi-step agent workflows that require computer use.

Which GPT-5.4 variant should I use with OpenClaw?

For most OpenClaw operators, gpt-5.4-mini ($2.50/$10) offers the best balance of performance and cost. Use gpt-5.4 ($5/$20) for complex agent tasks requiring extended reasoning. Use gpt-5.4-turbo for latency-sensitive applications. Reserve gpt-5.4-max ($10/$30) for maximum-quality tasks where cost is secondary. The free gpt-5.4-nano variant is suitable for simple routing and classification tasks only.

Can I use GPT-5.4 for computer use tasks in OpenClaw?

Yes. GPT-5.4 scored 75% on OSWorld, which measures the ability to control a computer — clicking buttons, filling forms, navigating applications. OpenClaw can leverage this through its computer-use integration, allowing your agent to interact with desktop applications, web browsers, and system tools autonomously.

How does GPT-5 pricing compare to Claude Opus 4.6?

GPT-5.4 standard pricing ($5/$20) is comparable to Claude Opus 4.6 ($5/$25) on input but cheaper on output. GPT-5.4-mini ($2.50/$10) is significantly cheaper than any Opus tier. GPT-5.3 Codex ($1.75/$14) is the cheapest option for coding-specific tasks. Claude Opus 4.6 offers 90% caching savings which can dramatically reduce effective costs for repetitive workflows, so the true comparison depends on your usage pattern.


Further Reading