You set up OpenClaw, it works great for two days, and then it stops responding. You SSH into the server, restart it, it works again. Two days later, same thing. Sound familiar?
This is the most common complaint from operators who deployed OpenClaw without a proper production setup. The good news: every one of these failure patterns is fixable. Here's what's actually happening and how to stop it.
Failure Pattern 1: The Process Just Dies
Symptom: Your Telegram bot stops responding. You SSH in and run openclaw status — it's not running. No error, it just stopped.
What's happening: Node processes can exit silently for a lot of reasons — unhandled promise rejections, OOM (out of memory) kills, network timeouts that bubble up to the process level. If you started OpenClaw with openclaw start in a terminal, it dies when that terminal session ends or when the first unhandled error hits.
The fix: Run OpenClaw as a systemd service with Restart=always. This is non-negotiable for production:
sudo nano /etc/systemd/system/openclaw.service
[Unit]
Description=OpenClaw
After=network.target
[Service]
User=YOUR_USERNAME
ExecStart=/usr/bin/openclaw
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable openclaw
sudo systemctl start openclaw
With this in place, if OpenClaw crashes, systemd restarts it within 10 seconds automatically.
Failure Pattern 2: The Telegram Connection Goes Stale
Symptom: The process is running (systemd shows active), but messages sent to your bot get no response. Checking logs shows something like ETELEGRAM: 409 Conflict or just silence.
What's happening: Telegram's polling connection can stale out, especially after network interruptions on your VPS. The process is alive but the bot's message stream is in a bad state.
The fix: A few approaches, in order of preference:
First, switch to webhook mode if you have a domain with HTTPS. Webhooks are push-based — Telegram sends messages to your server directly, no persistent polling connection to maintain:
channels:
telegram:
webhook:
enabled: true
url: "https://yourdomain.com/telegram/webhook"
If you're sticking with polling, add a watchdog. Create a simple script that checks if your bot is responding and restarts OpenClaw if not:
#!/bin/bash
# /home/openclaw/watchdog.sh
CHAT_ID="YOUR_TELEGRAM_USER_ID"
BOT_TOKEN="YOUR_BOT_TOKEN"
response=$(curl -s "https://api.telegram.org/bot${BOT_TOKEN}/getUpdates?limit=1&timeout=5")
if echo "$response" | grep -q '"ok":true'; then
echo "Bot OK"
else
echo "Bot not responding, restarting OpenClaw"
sudo systemctl restart openclaw
fi
Run it every 5 minutes via cron: */5 * * * * /home/openclaw/watchdog.sh
Failure Pattern 3: Memory and Disk Eating the Server
Symptom: OpenClaw gets progressively slower over days, then the whole VPS becomes unresponsive. df -h shows disk at 100%.
What's happening: OpenClaw is verbose by default. Logs accumulate. Memory (the SQLite-based conversation history) can grow significantly if you're running active workflows. On a small VPS, this eventually becomes a problem.
The fix:
Set up log rotation so logs don't grow unbounded:
sudo nano /etc/logrotate.d/openclaw
/home/openclaw/.openclaw/logs/*.log {
daily
rotate 7
compress
missingok
notifempty
}
In OpenClaw config, tune memory retention:
memory:
retention: 90d # Don't keep everything forever
maxTokens: 50000
Set a disk usage alert so you catch it before things blow up:
# Add to crontab
0 8 * * * df -h / | awk 'NR==2{if($5+0>80) print "Disk at "$5}' | mail -s "Disk warning" you@email.com
Failure Pattern 4: The Agent Runs But Gives Wrong/Empty Responses
Symptom: The bot responds, but the answers are wrong, truncated, or it says things like "I'm unable to help with that" when it should be fine.
What's happening: Usually one of three things — your API key hit a rate limit or ran out of credits, a skill is broken and causing the agent to fall back to a degraded state, or your context window is overflowing.
Diagnosing and fixing:
Check API credit status directly with your provider (Anthropic console or OpenAI dashboard). Set up spending alerts so you know before you hit $0.
Test your provider connection:
openclaw test provider
If a skill is causing issues, disable it and test:
openclaw skills list
openclaw skills disable SKILL_NAME
For context overflow, trim your memory settings or start a fresh session: send /new to your bot.
Failure Pattern 5: WhatsApp Sessions Keep Expiring
Symptom: WhatsApp integration works initially, then stops responding. Re-scanning the QR code fixes it temporarily.
What's happening: WhatsApp Web sessions expire, especially when the server is idle or there are long gaps between messages. Unlike Telegram, WhatsApp's web bridge was designed for human-operated browsers, not persistent server connections.
What actually fixes it: This one is harder to fully solve without switching to the WhatsApp Business API (which requires Meta approval). Short-term mitigations:
Keep the session alive with periodic pings:
channels:
whatsapp:
keepAlive:
enabled: true
intervalSeconds: 300
Set up session auto-reconnect in your monitoring (restart OpenClaw when WhatsApp connectivity drops). This doesn't prevent the expiry, but it minimizes downtime.
For operators who need reliable messaging, Telegram is more robust than WhatsApp for VPS deployments. If you're hitting this problem repeatedly, the practical fix is switching channels.
Failure Pattern 6: Cron Jobs Not Firing
Symptom: You've set up scheduled tasks (morning briefings, reminders, reports) and they just don't run.
What's happening: Prior to the 2026.2.12 release, there were six known bugs in OpenClaw's cron scheduler — jobs skipping when timing advanced, duplicate fires, and more. If you're on an older version, these bugs are real.
The fix: Update:
npm install -g openclaw@latest
sudo systemctl restart openclaw
openclaw --version # Confirm you're on 2026.x
After updating, test a cron job manually:
openclaw cron test "YOUR_CRON_EXPRESSION"
The One Underlying Cause
Almost every persistent reliability problem traces back to one thing: OpenClaw was set up for demo, not production.
Running it as a daemon instead of a system service. Not hardening the configuration. Installing it as root. Skipping log rotation and disk monitoring. Not restricting Telegram access. Using a VPS plan too small for the workload.
These are all fixable. But they're also all avoidable if the setup is done right the first time.
Want a deployment that doesn't break? Remote OpenClaw's Pro package includes secured setup, permission boundaries, Slack notifications for failures, and a configuration baseline designed for reliability — not just getting it to start once. See the plans.