Remote OpenClaw Blog
Running OpenClaw 100% Locally: The Complete Self-Hosted AI Agent Stack
9 min read ·
Remote OpenClaw Blog
9 min read ·
I run my primary OpenClaw instance on a Mac Mini M4 sitting on a shelf in my home office. No AWS bills. No OpenAI API keys. No data leaving my local network. After six months of running this setup, I can tell you the three reasons that actually matter:
Privacy. When OpenClaw processes my emails, calendar events, and client documents, none of that data hits a third-party server. For anyone handling client data, financial information, or anything covered by an NDA, this is not a nice-to-have — it is a requirement. I work with consulting clients who would not be comfortable knowing their project details are being sent to OpenAI or Anthropic's servers for processing, even with those companies' privacy policies.
Cost predictability. Cloud API costs scale with usage, and they can surprise you. A local setup has a fixed upfront cost (the hardware) and near-zero ongoing cost (electricity). After about 4-6 months of moderate use, the Mac Mini pays for itself compared to cloud API costs. I did the math and share the full breakdown later in this guide.
Control. No rate limits. No API outages at 2 AM when your scheduled skills are supposed to run. No model deprecation notices forcing you to migrate. The model you downloaded today works the same way in six months. As the Contabo blog and Emergent.sh have both covered, self-hosting AI infrastructure is becoming increasingly viable for individuals and small teams.
Here is what I recommend based on my own testing and what I have seen work for other operators in the community:
| Setup | Hardware | RAM | Cost | Best For |
|---|---|---|---|---|
| Budget | Mac Mini M4 | 16GB | $599 | Single 8B model, light workloads |
| Recommended | Mac Mini M4 | 24GB | $799 | 8B model + embedding model simultaneously |
| Power | Mac Mini M4 Pro | 48GB | $1,599 | Multiple models, 30B+ parameter models |
| Linux Budget | Any PC + RTX 3060 | 12GB VRAM | ~$400-600 | 8B models with fast GPU inference |
| Linux Power | Any PC + RTX 4090 | 24GB VRAM | ~$1,600-2,000 | 30B+ models, fastest local inference |
I use the Mac Mini M4 with 24GB. The Apple Silicon unified memory architecture is uniquely good for LLM inference because the CPU and GPU share the same memory pool — no bottleneck copying data between CPU RAM and GPU VRAM. This is why a $799 Mac Mini can match or beat a $1,500 Linux setup with a dedicated GPU for most LLM workloads.
For a detailed Mac Mini setup walkthrough, see the OpenClaw Mac Mini setup guide.
Ollama is the runtime that serves local LLMs. It handles model downloading, quantization, and provides an OpenAI-compatible API that OpenClaw connects to.
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Start the Ollama server
ollama serve
# Primary reasoning model
ollama pull qwen3:8b
# Fast model for simple tasks
ollama pull glm4:9b
# Embedding model for vector search
ollama pull nomic-embed-text
# Verify all models are available
ollama list
# ~/.openclaw/config.yaml
llm:
provider: ollama
model: qwen3:8b
base_url: http://localhost:11434
temperature: 0.7
max_tokens: 4096
embedding:
provider: ollama
model: nomic-embed-text
base_url: http://localhost:11434
That is it. No API keys, no account creation, no billing setup. OpenClaw connects to the local Ollama server and uses whatever models you have pulled.
Not all local models are equal for agent tasks. After testing over a dozen models on my Mac Mini, here are the two I recommend:
Qwen3 8B is my daily driver for OpenClaw. It handles instruction-following, code generation, summarization, and structured output extraction with quality that genuinely surprised me. On my Mac Mini M4 (24GB), it generates about 40 tokens per second — fast enough that scheduled tasks complete quickly and interactive queries feel responsive.
The key advantage of Qwen3 8B over other 8B models is its instruction-following precision. When an OpenClaw skill asks it to extract specific fields from an email and return them as JSON, Qwen3 8B does it correctly about 95% of the time. Smaller or less capable models often hallucinate extra fields, miss required ones, or return malformed JSON. See the Qwen3 8B OpenClaw guide for benchmarks and tuning tips.
GLM-4.7-flash is the speed champion. It generates 55-65 tokens per second on my hardware — roughly 50% faster than Qwen3 8B. The quality is slightly lower on complex reasoning tasks, but for high-frequency, low-complexity tasks (email triage, data extraction, simple summaries), the speed advantage makes it the better choice.
I use GLM-4.7-flash for my email scanning and calendar sync skills (which run many times per day) and Qwen3 8B for weekly reports and complex analysis skills. OpenClaw lets you assign different models to different skills, so you can optimize each workflow independently.
If you prefer a containerized setup (especially useful on Linux), here is a Docker Compose configuration that runs the entire stack:
# docker-compose.yaml
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu] # Remove this block for CPU-only
openclaw:
image: openclaw/openclaw:latest
container_name: openclaw
depends_on:
- ollama
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- OPENCLAW_DATA_DIR=/data
volumes:
- openclaw_data:/data
- ./config:/config
restart: unless-stopped
volumes:
ollama_data:
openclaw_data:
# Start everything
docker compose up -d
# Pull models inside the Ollama container
docker exec ollama ollama pull qwen3:8b
docker exec ollama ollama pull nomic-embed-text
# Verify OpenClaw is connected
docker exec openclaw openclaw status
The Docker setup is particularly clean on Linux servers. On macOS, I prefer running Ollama natively (via Homebrew) and OpenClaw directly, since Docker on macOS adds a virtualization layer that slightly reduces inference performance.
Marketplace
Free skills and AI personas for OpenClaw — browse the marketplace.
Browse the Marketplace →A local-only OpenClaw instance is great, but you probably want to access it from outside your home network — checking your agent's status from your phone, triggering a skill while traveling, or monitoring logs from a coffee shop.
Tailscale is the answer. It creates an encrypted mesh VPN between your devices without opening any ports on your router. Your Mac Mini gets a stable Tailscale IP address that you can reach from any device on your Tailscale network.
# Install Tailscale on your Mac Mini
brew install tailscale
# Start Tailscale and authenticate
tailscale up
# Your Mac Mini gets an IP like 100.x.y.z
tailscale ip
Now from any other device on your Tailscale network, you can reach your OpenClaw instance:
# From your laptop or phone (on Tailscale)
curl http://100.x.y.z:11434/api/tags # Check Ollama models
curl http://100.x.y.z:8080/status # Check OpenClaw status
I have been using Tailscale for remote access to my home lab for over a year. It is free for personal use, the connection is stable, and setup takes about five minutes per device. For a deeper walkthrough, see the OpenClaw Tailscale remote access guide.
Here is the real math, based on my own usage patterns over the past six months:
| Cost Category | Fully Local (Mac Mini M4 24GB) | Cloud API (Claude Sonnet via OpenRouter) |
|---|---|---|
| Hardware | $799 one-time | $0 |
| Monthly electricity | ~$5 (Mac Mini draws 10-30W) | $0 |
| Monthly API/LLM cost | $0 | ~$120-200 (moderate agent usage) |
| Month 1 total | $804 | $150 |
| Month 6 total | $829 | $900 |
| Month 12 total | $859 | $1,800 |
| Break-even | Approximately month 5-6 | |
The break-even point depends entirely on your usage volume. If you run 10+ skills daily with substantial LLM processing (email scanning, content generation, data analysis), the local setup pays for itself in about five months. If you only run a few light skills, the break-even extends to 8-10 months.
The hidden advantage of local is psychological: you stop self-censoring your usage. With cloud APIs, every request costs money, so you unconsciously limit experimentation. With a local setup, the marginal cost of each request is effectively zero, so you experiment freely — which is how you discover the most valuable automations.
I want to be honest: a fully local setup has real limitations, and for some use cases, cloud APIs are still the better choice.
My recommendation for most operators: start local for daily automation tasks, and add a cloud API as a fallback for complex reasoning skills. OpenClaw's per-skill model configuration makes this hybrid approach straightforward.
The practical minimum is 16GB of unified memory (RAM) and an Apple M-series chip or a dedicated GPU with at least 8GB VRAM. This lets you run Qwen3 8B or GLM-4.7-flash comfortably. A Mac Mini M4 with 16GB is the most cost-effective option at $599. You can run on less — an 8GB machine can handle smaller 3B models — but the quality drop is significant for agent tasks.
On a Mac Mini M4 with 24GB, Qwen3 8B generates approximately 35-45 tokens per second. Claude via API generates around 80-120 tokens per second. For most OpenClaw tasks — processing emails, generating summaries, running scheduled skills — the local speed is more than adequate. You will notice the difference on long-form generation tasks (1000+ token outputs), but scheduled background tasks do not have a human waiting for them, so speed matters less.
Yes, and this is actually the recommended approach for many operators. Use a local model for high-frequency, low-stakes tasks (email scanning, daily digests, data extraction) and a cloud API model for complex reasoning tasks (code generation, strategic analysis). OpenClaw supports per-skill model configuration, so you can assign different models to different skills.
Not strictly necessary, but strongly recommended. Without Tailscale, you would need to expose ports on your local network to the internet, which creates security risks. Tailscale creates an encrypted mesh VPN that lets you access your local OpenClaw instance from anywhere without opening any ports. It is free for personal use (up to 100 devices) and takes about 5 minutes to set up.