Remote OpenClaw

Remote OpenClaw Blog

Running OpenClaw 100% Locally: The Complete Self-Hosted AI Agent Stack

9 min read ·

Running OpenClaw 100% Locally: The Complete Self-Hosted AI Agent Stack

Why Go Fully Local?

I run my primary OpenClaw instance on a Mac Mini M4 sitting on a shelf in my home office. No AWS bills. No OpenAI API keys. No data leaving my local network. After six months of running this setup, I can tell you the three reasons that actually matter:

Privacy. When OpenClaw processes my emails, calendar events, and client documents, none of that data hits a third-party server. For anyone handling client data, financial information, or anything covered by an NDA, this is not a nice-to-have — it is a requirement. I work with consulting clients who would not be comfortable knowing their project details are being sent to OpenAI or Anthropic's servers for processing, even with those companies' privacy policies.

Cost predictability. Cloud API costs scale with usage, and they can surprise you. A local setup has a fixed upfront cost (the hardware) and near-zero ongoing cost (electricity). After about 4-6 months of moderate use, the Mac Mini pays for itself compared to cloud API costs. I did the math and share the full breakdown later in this guide.

Control. No rate limits. No API outages at 2 AM when your scheduled skills are supposed to run. No model deprecation notices forcing you to migrate. The model you downloaded today works the same way in six months. As the Contabo blog and Emergent.sh have both covered, self-hosting AI infrastructure is becoming increasingly viable for individuals and small teams.


Hardware Requirements

Here is what I recommend based on my own testing and what I have seen work for other operators in the community:

Setup Hardware RAM Cost Best For
Budget Mac Mini M4 16GB $599 Single 8B model, light workloads
Recommended Mac Mini M4 24GB $799 8B model + embedding model simultaneously
Power Mac Mini M4 Pro 48GB $1,599 Multiple models, 30B+ parameter models
Linux Budget Any PC + RTX 3060 12GB VRAM ~$400-600 8B models with fast GPU inference
Linux Power Any PC + RTX 4090 24GB VRAM ~$1,600-2,000 30B+ models, fastest local inference

I use the Mac Mini M4 with 24GB. The Apple Silicon unified memory architecture is uniquely good for LLM inference because the CPU and GPU share the same memory pool — no bottleneck copying data between CPU RAM and GPU VRAM. This is why a $799 Mac Mini can match or beat a $1,500 Linux setup with a dedicated GPU for most LLM workloads.

For a detailed Mac Mini setup walkthrough, see the OpenClaw Mac Mini setup guide.


Ollama Setup

Ollama is the runtime that serves local LLMs. It handles model downloading, quantization, and provides an OpenAI-compatible API that OpenClaw connects to.

Install Ollama

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Start the Ollama server
ollama serve

Pull Your Models

# Primary reasoning model
ollama pull qwen3:8b

# Fast model for simple tasks
ollama pull glm4:9b

# Embedding model for vector search
ollama pull nomic-embed-text

# Verify all models are available
ollama list

Configure OpenClaw to Use Ollama

# ~/.openclaw/config.yaml
llm:
  provider: ollama
  model: qwen3:8b
  base_url: http://localhost:11434
  temperature: 0.7
  max_tokens: 4096

embedding:
  provider: ollama
  model: nomic-embed-text
  base_url: http://localhost:11434

That is it. No API keys, no account creation, no billing setup. OpenClaw connects to the local Ollama server and uses whatever models you have pulled.


Recommended Local Models

Not all local models are equal for agent tasks. After testing over a dozen models on my Mac Mini, here are the two I recommend:

Qwen3 8B — Best All-Around

Qwen3 8B is my daily driver for OpenClaw. It handles instruction-following, code generation, summarization, and structured output extraction with quality that genuinely surprised me. On my Mac Mini M4 (24GB), it generates about 40 tokens per second — fast enough that scheduled tasks complete quickly and interactive queries feel responsive.

The key advantage of Qwen3 8B over other 8B models is its instruction-following precision. When an OpenClaw skill asks it to extract specific fields from an email and return them as JSON, Qwen3 8B does it correctly about 95% of the time. Smaller or less capable models often hallucinate extra fields, miss required ones, or return malformed JSON. See the Qwen3 8B OpenClaw guide for benchmarks and tuning tips.

GLM-4.7-flash — Fastest Inference

GLM-4.7-flash is the speed champion. It generates 55-65 tokens per second on my hardware — roughly 50% faster than Qwen3 8B. The quality is slightly lower on complex reasoning tasks, but for high-frequency, low-complexity tasks (email triage, data extraction, simple summaries), the speed advantage makes it the better choice.

I use GLM-4.7-flash for my email scanning and calendar sync skills (which run many times per day) and Qwen3 8B for weekly reports and complex analysis skills. OpenClaw lets you assign different models to different skills, so you can optimize each workflow independently.


Docker Compose Configuration

If you prefer a containerized setup (especially useful on Linux), here is a Docker Compose configuration that runs the entire stack:

# docker-compose.yaml
version: '3.8'

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]  # Remove this block for CPU-only

  openclaw:
    image: openclaw/openclaw:latest
    container_name: openclaw
    depends_on:
      - ollama
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - OPENCLAW_DATA_DIR=/data
    volumes:
      - openclaw_data:/data
      - ./config:/config
    restart: unless-stopped

volumes:
  ollama_data:
  openclaw_data:

Starting the Stack

# Start everything
docker compose up -d

# Pull models inside the Ollama container
docker exec ollama ollama pull qwen3:8b
docker exec ollama ollama pull nomic-embed-text

# Verify OpenClaw is connected
docker exec openclaw openclaw status

The Docker setup is particularly clean on Linux servers. On macOS, I prefer running Ollama natively (via Homebrew) and OpenClaw directly, since Docker on macOS adds a virtualization layer that slightly reduces inference performance.

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Remote Access with Tailscale

A local-only OpenClaw instance is great, but you probably want to access it from outside your home network — checking your agent's status from your phone, triggering a skill while traveling, or monitoring logs from a coffee shop.

Tailscale is the answer. It creates an encrypted mesh VPN between your devices without opening any ports on your router. Your Mac Mini gets a stable Tailscale IP address that you can reach from any device on your Tailscale network.

# Install Tailscale on your Mac Mini
brew install tailscale

# Start Tailscale and authenticate
tailscale up

# Your Mac Mini gets an IP like 100.x.y.z
tailscale ip

Now from any other device on your Tailscale network, you can reach your OpenClaw instance:

# From your laptop or phone (on Tailscale)
curl http://100.x.y.z:11434/api/tags  # Check Ollama models
curl http://100.x.y.z:8080/status     # Check OpenClaw status

I have been using Tailscale for remote access to my home lab for over a year. It is free for personal use, the connection is stable, and setup takes about five minutes per device. For a deeper walkthrough, see the OpenClaw Tailscale remote access guide.


Cost Comparison: Local vs Cloud

Here is the real math, based on my own usage patterns over the past six months:

Cost Category Fully Local (Mac Mini M4 24GB) Cloud API (Claude Sonnet via OpenRouter)
Hardware $799 one-time $0
Monthly electricity ~$5 (Mac Mini draws 10-30W) $0
Monthly API/LLM cost $0 ~$120-200 (moderate agent usage)
Month 1 total $804 $150
Month 6 total $829 $900
Month 12 total $859 $1,800
Break-even Approximately month 5-6

The break-even point depends entirely on your usage volume. If you run 10+ skills daily with substantial LLM processing (email scanning, content generation, data analysis), the local setup pays for itself in about five months. If you only run a few light skills, the break-even extends to 8-10 months.

The hidden advantage of local is psychological: you stop self-censoring your usage. With cloud APIs, every request costs money, so you unconsciously limit experimentation. With a local setup, the marginal cost of each request is effectively zero, so you experiment freely — which is how you discover the most valuable automations.


When Local Is Not Enough

I want to be honest: a fully local setup has real limitations, and for some use cases, cloud APIs are still the better choice.

  • Complex reasoning tasks: An 8B local model is noticeably worse than Claude or GPT-4 on multi-step reasoning, nuanced analysis, and complex code generation. For skills that require high-quality reasoning (legal document analysis, complex financial modeling, sophisticated code review), a cloud model produces better results.
  • Vision and multimodal: If your skills need to process images, screenshots, or PDFs with visual elements, most local models lack vision capabilities. Claude and GPT-4 handle multimodal input natively.
  • Very long context: Local 8B models typically support 32K-128K context windows. If your skills process very long documents (50+ page reports, large codebases), you may hit context limits that larger cloud models handle comfortably.
  • Availability during hardware maintenance: If your Mac Mini restarts, loses power, or needs a macOS update, your agent goes offline. Cloud APIs have built-in redundancy. For mission-critical workflows, consider a hybrid setup or a UPS (uninterruptible power supply).
  • Team access: A local setup is inherently single-machine. If multiple team members need to interact with the agent, a cloud-hosted solution is easier to share.

My recommendation for most operators: start local for daily automation tasks, and add a cloud API as a fallback for complex reasoning skills. OpenClaw's per-skill model configuration makes this hybrid approach straightforward.


Frequently Asked Questions

What is the minimum hardware to run OpenClaw fully locally?

The practical minimum is 16GB of unified memory (RAM) and an Apple M-series chip or a dedicated GPU with at least 8GB VRAM. This lets you run Qwen3 8B or GLM-4.7-flash comfortably. A Mac Mini M4 with 16GB is the most cost-effective option at $599. You can run on less — an 8GB machine can handle smaller 3B models — but the quality drop is significant for agent tasks.

How does local inference speed compare to cloud APIs?

On a Mac Mini M4 with 24GB, Qwen3 8B generates approximately 35-45 tokens per second. Claude via API generates around 80-120 tokens per second. For most OpenClaw tasks — processing emails, generating summaries, running scheduled skills — the local speed is more than adequate. You will notice the difference on long-form generation tasks (1000+ token outputs), but scheduled background tasks do not have a human waiting for them, so speed matters less.

Can I mix local and cloud models in the same OpenClaw setup?

Yes, and this is actually the recommended approach for many operators. Use a local model for high-frequency, low-stakes tasks (email scanning, daily digests, data extraction) and a cloud API model for complex reasoning tasks (code generation, strategic analysis). OpenClaw supports per-skill model configuration, so you can assign different models to different skills.

Is Tailscale necessary for remote access?

Not strictly necessary, but strongly recommended. Without Tailscale, you would need to expose ports on your local network to the internet, which creates security risks. Tailscale creates an encrypted mesh VPN that lets you access your local OpenClaw instance from anywhere without opening any ports. It is free for personal use (up to 100 devices) and takes about 5 minutes to set up.


Further Reading