Remote OpenClaw

Remote OpenClaw Blog

OpenClaw Sandbox Mode: Safe Code Execution Setup Guide

Published: ·Last Updated:
What changed

This post was reviewed and updated to reflect current deployment, security hardening, and operations guidance.

What should operators know about OpenClaw Sandbox Mode: Safe Code Execution Setup Guide?

Answer: When you give an AI agent the ability to execute code, you are giving it the power to modify files, install packages, make network requests, and interact with your operating system. Without sandboxing, a single bad prompt or hallucinated command could delete important data, expose credentials, or compromise your server. This guide covers practical deployment decisions, security controls,.

Updated: · Author: Zac Frulloni

Set up OpenClaw sandbox mode for safe code execution. Docker-based isolation, SSH sandbox (OpenShell), configuring autonomy levels from ReadOnly to Full, and approval gating explained.

Marketplace

Free skills and AI personas for OpenClaw — deploy a pre-built agent in 15 minutes.

Browse the Marketplace →

Join the Community

Join 500+ OpenClaw operators sharing deployment guides, security configs, and workflow automations.

What Does Sandbox Mode Do?

When you give an AI agent the ability to execute code, you are giving it the power to modify files, install packages, make network requests, and interact with your operating system. Without sandboxing, a single bad prompt or hallucinated command could delete important data, expose credentials, or compromise your server.

OpenClaw sandbox mode creates an isolation boundary between the agent's code execution environment and your host system. Think of it as a virtual room where the agent can work freely without being able to reach anything outside that room.

There are two sandbox backends available in OpenClaw as of version 3.22:

  • Docker sandbox: Runs code inside a Docker container on the same machine. The container has its own file system, network namespace, and process space. Fast to set up, low latency, good for development and most production use cases.
  • SSH sandbox (OpenShell): Connects to a remote machine via SSH and runs code there. The agent never executes anything on your OpenClaw host. Stronger isolation at the cost of higher latency. Recommended for production deployments handling untrusted input.

Both backends support resource limits (CPU, memory, disk), network restrictions, and execution timeouts. The agent can still read and write files, run scripts, and install packages — but only inside the sandbox environment.


Docker-Based Sandbox Setup

Docker sandbox is the default and most common setup. It requires Docker to be installed and the OpenClaw process to have access to the Docker socket.

Step 1: Enable sandbox mode in your OpenClaw configuration:

{
  "sandbox": {
    "enabled": true,
    "backend": "docker",
    "image": "openclaw/sandbox:latest",
    "timeout": 30000,
    "memoryLimit": "512m",
    "cpuLimit": "1.0",
    "networkMode": "none"
  }
}

Step 2: Pull the sandbox image:

docker pull openclaw/sandbox:latest

Step 3: Mount the Docker socket in your docker-compose.yml if OpenClaw itself runs in Docker:

volumes:
  - /var/run/docker.sock:/var/run/docker.sock

Step 4: Test sandbox mode by asking your agent to run a simple command. If the agent can execute echo "Hello from sandbox" and return the output, your sandbox is working.

The key configuration options are:

  • timeout: Maximum execution time in milliseconds. Set this to prevent runaway processes. 30 seconds is a good default.
  • memoryLimit: Maximum RAM the sandbox container can use. 512MB handles most tasks.
  • cpuLimit: CPU cores allocated. 1.0 means one full core.
  • networkMode: Set to "none" to completely isolate the sandbox from the network. Set to "bridge" if the agent needs to make HTTP requests (for web scraping, API calls, etc.).

Important: Setting networkMode to "none" is the safest option but limits what the agent can do inside the sandbox. If your agent needs to install npm packages or make API calls during code execution, you will need "bridge" mode with appropriate firewall rules.


SSH Sandbox (OpenShell) Setup

SSH sandbox, also called OpenShell, was introduced in OpenClaw 3.22. It provides stronger isolation by running code on a completely separate machine. The agent connects via SSH, executes commands, and retrieves results — but never touches your OpenClaw host.

Step 1: Set up a sandbox server. This can be a cheap VPS ($3-5/month), a Raspberry Pi on your local network, or a dedicated sandbox instance:

# On your sandbox server
sudo adduser openclaw-sandbox --disabled-password
sudo mkdir -p /home/openclaw-sandbox/.ssh
sudo cp /path/to/authorized_keys /home/openclaw-sandbox/.ssh/authorized_keys
sudo chown -R openclaw-sandbox:openclaw-sandbox /home/openclaw-sandbox/.ssh

Step 2: Configure OpenClaw to use SSH sandbox:

{
  "sandbox": {
    "enabled": true,
    "backend": "ssh",
    "ssh": {
      "host": "sandbox.example.com",
      "port": 22,
      "username": "openclaw-sandbox",
      "privateKeyPath": "/data/keys/sandbox_key",
      "timeout": 30000
    }
  }
}

Step 3: Generate an SSH key pair and place the private key where OpenClaw can read it:

ssh-keygen -t ed25519 -f /opt/openclaw/data/keys/sandbox_key -N ""

Step 4: Copy the public key to your sandbox server's authorized_keys file.

The SSH sandbox provides several advantages over Docker sandbox:

  • True network isolation: The sandbox server is a completely separate machine. Even if the agent compromises the sandbox, it cannot reach your OpenClaw host.
  • Easier resource management: You can choose the exact hardware specs for your sandbox server independently of your OpenClaw server.
  • Snapshot and restore: You can take snapshots of the sandbox server and restore it to a clean state after each execution session.

The trade-off is higher latency (200-800ms per command vs 100-500ms for Docker) and the operational overhead of managing a second server.


Configuring Autonomy Levels

OpenClaw's autonomy levels control what the agent is allowed to do without human approval. There are three levels:

ReadOnly

The agent can read data and generate responses but cannot execute code, modify files, or take any action that changes state. This is the safest mode and is useful for information retrieval agents, chatbots, and research assistants.

{
  "autonomy": {
    "level": "readonly"
  }
}

Supervised

The agent can propose actions (code execution, file changes, API calls) but every action requires human approval before it runs. The agent sends a preview of what it wants to do, you review it, and approve or reject. This is the recommended starting point for all new deployments.

{
  "autonomy": {
    "level": "supervised",
    "approvalChannel": "telegram",
    "approvalTimeout": 300000
  }
}

Full

The agent can execute actions autonomously without waiting for approval. This is the most powerful mode but also the most risky. Only enable Full autonomy after extensive testing in Supervised mode and only for well-understood, repeatable workflows.

{
  "autonomy": {
    "level": "full",
    "allowedActions": ["code_exec", "file_read", "file_write", "http_get"],
    "blockedActions": ["file_delete", "system_command", "ssh_exec"]
  }
}

Even in Full autonomy mode, you can restrict which action types the agent can perform. The allowedActions and blockedActions lists give you granular control.


Approval Gating

Approval gating is the mechanism that makes Supervised mode work. When the agent wants to perform an action, it sends an approval request through your configured channel (Telegram, Discord, Slack, or the web UI). You review the proposed action and tap Approve or Reject.

To configure approval gating:

{
  "approvalGating": {
    "enabled": true,
    "channel": "telegram",
    "approvers": ["your_telegram_user_id"],
    "timeout": 300000,
    "defaultAction": "reject",
    "showFullContext": true
  }
}

Key settings:

  • timeout: How long to wait for approval before taking the default action (in milliseconds). 300000 = 5 minutes.
  • defaultAction: What happens if no one approves within the timeout. Set to "reject" for safety.
  • showFullContext: When true, the approval message includes the full code or command the agent wants to run, not just a summary.
  • approvers: List of user IDs who can approve actions. Restrict this to trusted operators.

You can also set per-action approval rules. For example, allow the agent to read files automatically but require approval for file writes and code execution:

{
  "approvalGating": {
    "rules": [
      {"action": "file_read", "approval": "auto"},
      {"action": "file_write", "approval": "required"},
      {"action": "code_exec", "approval": "required"},
      {"action": "http_get", "approval": "auto"},
      {"action": "http_post", "approval": "required"}
    ]
  }
}

Sandbox Best Practices

After setting up hundreds of OpenClaw instances, here are the sandbox practices that prevent the most incidents:

  1. Always enable sandbox mode in production. Running without a sandbox is acceptable only during local development on a throwaway machine.
  2. Start with Supervised mode. Watch what your agent does for at least a week before considering Full autonomy.
  3. Set network mode to "none" unless required. An agent that cannot make network requests cannot exfiltrate data or attack external services.
  4. Set execution timeouts. A 30-second timeout prevents runaway processes from consuming resources.
  5. Use SSH sandbox for production. The extra latency is worth the isolation, especially if your agent handles input from external users.
  6. Review logs regularly. OpenClaw logs every sandbox execution. Check these logs weekly for unexpected behavior.
  7. Rotate sandbox environments. For SSH sandbox, periodically destroy and recreate the sandbox server to ensure a clean state.
  8. Restrict file system access. Mount only the directories the agent needs, not your entire file system.

Sandbox mode is not a silver bullet. A determined attacker crafting prompts can still cause damage within the sandbox boundaries. The goal is to limit the blast radius so that the worst case scenario is a corrupted sandbox — not a compromised server.

Marketplace

4 AI personas and 7 free skills — browse the marketplace.

Browse Marketplace →