webchat-voice-proxy

Web & Frontend Development
v0.2.2
Benign

Voice input and microphone button for OpenClaw WebChat Control UI.

616 downloads616 installsby @neldar

Setup & Installation

Install command

clawhub install neldar/webchat-voice-proxy

If the CLI is not installed:

Install command

npx clawhub@latest install neldar/webchat-voice-proxy

Or install with OpenClaw CLI:

Install command

openclaw skills install neldar/webchat-voice-proxy

or paste the repo link into your assistant's chat

Install command

https://github.com/openclaw/skills/tree/main/skills/neldar/webchat-voice-proxy

What This Skill Does

Deprecated combined installer that set up HTTPS/WSS proxy and voice mic UI for OpenClaw WebChat in a single step. Replaced by webchat-https-proxy and webchat-voice-gui. Use webchat-voice-full-stack for equivalent one-step deployment.

Voice input and microphone button for OpenClaw WebChat Control UI.

When to Use It

  • Adding push-to-talk voice input to OpenClaw WebChat
  • Running local speech-to-text without sending audio to external services
  • Accessing WebChat over HTTPS from a LAN device
  • Keeping voice transcription proxy running after reboots via systemd
  • Switching mic recording between push-to-talk and toggle mode
View original SKILL.md file
# WebChat Voice Proxy

Set up a reboot-safe voice stack for OpenClaw WebChat (including the current polished mic/stop/hourglass UI states):
- HTTPS Control UI on port 8443
- `/transcribe` proxy to local faster-whisper service
- WebSocket passthrough to gateway (`ws://127.0.0.1:18789`)
- Voice button script injection into Control UI
- Real-time VU meter: button shadow/scale reacts to voice level
- **Push-to-Talk**: hold mic button to record, release to send (default mode)
- **Toggle mode**: click to start, click to stop (switch via double-click on mic button)
- **Keyboard shortcuts**: `Ctrl+Space` Push-to-Talk, `Ctrl+Shift+M` start/stop continuous recording
- **Localized UI**: auto-detects browser language (English, German, Chinese built-in), customizable

## Prerequisites (required)

This skill requires a **local faster-whisper HTTP service**.
Expected default:
- URL: `http://127.0.0.1:18790/transcribe`
- systemd user service: `openclaw-transcribe.service`

Verify before deployment:
```bash
systemctl --user is-active openclaw-transcribe.service
curl -s -o /dev/null -w '%{http_code}\n' http://127.0.0.1:18790/transcribe -X POST -H 'Content-Type: application/octet-stream' --data-binary 'x'
```

If this dependency is missing, set up faster-whisper first (model load + HTTP endpoint), then run this skill.

Related skills:
- `faster-whisper-local-service` (backend prerequisite)
- `webchat-voice-full-stack` (meta-installer that deploys both backend + proxy)

## Workflow

1. Ensure transcription service exists and is running (`openclaw-transcribe.service`).
2. Deploy `voice-input.js` to Control UI assets and inject script tag into `index.html`.
3. Configure gateway allowed origin for external HTTPS UI.
4. Run HTTPS+WSS proxy as persistent user systemd service (`openclaw-voice-https.service`).
5. Verify pairing/token/origin errors and resolve in order.

## Security Notes

- **Localhost by default**: The HTTPS proxy binds to `127.0.0.1` only. It is **not** reachable from other devices on your network unless you explicitly set `VOICE_HOST` to a LAN IP.
- **LAN exposure**: Setting `VOICE_HOST=<LAN-IP>` exposes the proxy (and by extension the gateway WebSocket and transcription endpoint) to all devices on that network. Only do this on trusted networks.
- **Persistence**: This skill installs a **user systemd service** (`openclaw-voice-https.service`) that starts automatically on boot, and a **gateway hook** that re-injects the UI script after updates. Use `uninstall.sh` to fully revert.
- **Self-signed TLS**: The auto-generated certificate is not trusted by browsers. You will see a certificate warning on first access.

## Deploy

Run (localhost only — default, most secure):
```bash
bash scripts/deploy.sh
```

Or expose on LAN (required to access from other devices):
```bash
VOICE_HOST=10.0.0.42 VOICE_HTTPS_PORT=8443 VOICE_LANG=de bash scripts/deploy.sh
```

When run interactively without `VOICE_LANG`, the script will ask you to choose a UI language (`auto`, `en`, `de`, `zh`). Set `VOICE_LANG=auto` to skip the prompt.

This script is idempotent.

## Quick verify

Run:
```bash
bash scripts/status.sh
```

Expected:
- both services active
- injection present
- `https:200`

## Common fixes

- `404 /chat?...` → SPA fallback missing in HTTPS proxy.
- `origin not allowed` → ensure deploy used correct `VOICE_HOST` and added matching HTTPS origin to `gateway.controlUi.allowedOrigins`.
- `token missing` → open URL with `?token=...` once.
- `pairing required` → approve pending device via `openclaw devices approve <requestId> --token <gateway-token>`.
- Mic breaks after reboot → cert paths must be persistent (not `/tmp`).
- No transcription result → check local faster-whisper endpoint first.

See `references/troubleshooting.md` for exact commands.

## What this skill modifies

Before installing, be aware of all system changes `deploy.sh` makes:

| What | Path | Action |
|---|---|---|
| Control UI HTML | `<npm-global>/openclaw/dist/control-ui/index.html` | Adds `<script>` tag for voice-input.js |
| Control UI asset | `<npm-global>/openclaw/dist/control-ui/assets/voice-input.js` | Copies mic button JS |
| Gateway config | `~/.openclaw/openclaw.json` | Adds HTTPS origin to `gateway.controlUi.allowedOrigins` |
| Systemd service | `~/.config/systemd/user/openclaw-voice-https.service` | Creates + enables persistent HTTPS proxy |
| Gateway hook | `~/.openclaw/hooks/voice-input-inject/` | Installs startup hook that re-injects JS after updates |
| Workspace files | `~/.openclaw/workspace/voice-input/` | Copies voice-input.js, https-server.py |
| TLS certs | `~/.openclaw/workspace/voice-input/certs/` | Auto-generated self-signed cert on first run |

The injected JS (`voice-input.js`) runs inside the Control UI and interacts with the chat input. Review the source before deploying.

## Mic Button Controls

| Action | Effect |
|---|---|
| **Hold** (PTT mode) | Record while held, transcribe on release |
| **Click** (Toggle mode) | Start recording / stop and transcribe |
| **Double-click** | Switch between PTT and Toggle mode |
| **Right-click** | Toggle beep sound on/off |
| **Ctrl+Space** (hold) | Push-to-Talk via keyboard (works even with text field focused) |
| **Ctrl+Shift+M** | Start/stop recording (transcribes on stop) |
| **Ctrl+Shift+B** | Start/stop live transcription [beta] — text appears in real-time, auto-sends after 2s review, stops on 5s silence or "Stop Hugo" keyword |

The current mode and available actions are shown in the button tooltip on hover.

## Language / i18n

The UI automatically detects the browser language and shows tooltips, toasts, and placeholder text in the matching language.

**Built-in languages:** English (`en`), German (`de`), Chinese (`zh`)

### Override language

Set a language override in the browser console:

```js
localStorage.setItem('oc-voice-lang', 'de');  // force German
localStorage.setItem('oc-voice-lang', 'zh');  // force Chinese
localStorage.removeItem('oc-voice-lang');      // back to auto-detect
```

Then reload the page.

### Add a custom language

Edit `voice-input.js` and add a new entry to the `I18N` object. Use `assets/i18n.json` as a template — it contains all translation keys with the built-in translations.

Example for adding French:

```js
const I18N = {
  // ... existing entries ...
  fr: {
    tooltip_ptt: "Maintenir pour parler",
    tooltip_toggle: "Cliquer pour démarrer/arrêter",
    tooltip_next_toggle: "Mode clic",
    tooltip_next_ptt: "Push-to-Talk",
    tooltip_beep_off: "Désactiver le bip",
    tooltip_beep_on: "Activer le bip",
    tooltip_dblclick: "Double-clic",
    tooltip_rightclick: "Clic droit",
    toast_ptt: "Push-to-Talk",
    toast_toggle: "Mode clic",
    toast_beep_on: "Bip activé",
    toast_beep_off: "Bip désactivé",
    placeholder_suffix: " — Voix : (Ctrl+Espace Push-To-Talk, Ctrl+Shift+M enregistrement continu)"
  }
};
```

After editing, redeploy with `bash scripts/deploy.sh` to copy the updated JS to the Control UI.

## CORS Policy

The `/transcribe` proxy endpoint uses a configurable `Access-Control-Allow-Origin` header.
Set `VOICE_ALLOWED_ORIGIN` env var to restrict. Default: `https://<VOICE_HOST>:<VOICE_HTTPS_PORT>`.

## Uninstall

To fully revert all changes:

```bash
bash scripts/uninstall.sh
```

This will:
1. Stop and remove `openclaw-voice-https.service`
2. Remove the gateway startup hook
3. Remove `voice-input.js` from Control UI and undo the index.html injection
4. Remove the HTTPS origin from gateway config
5. Restart the gateway
6. Remove TLS certificates
7. Remove workspace runtime files (`voice-input.js`, `https-server.py`, `i18n.json`)

The faster-whisper backend is **not** touched by uninstall — remove it separately via `faster-whisper-local-service` if needed.

Example Workflow

Here's how your AI assistant might use this skill in practice.

INPUT

User asks: Adding push-to-talk voice input to OpenClaw WebChat

AGENT
  1. 1Adding push-to-talk voice input to OpenClaw WebChat
  2. 2Running local speech-to-text without sending audio to external services
  3. 3Accessing WebChat over HTTPS from a LAN device
  4. 4Keeping voice transcription proxy running after reboots via systemd
  5. 5Switching mic recording between push-to-talk and toggle mode
OUTPUT
Voice input and microphone button for OpenClaw WebChat Control UI.

Share this skill

Security Audits

VirusTotalBenign
OpenClawBenign
View full report

These signals reflect official OpenClaw status values. A Suspicious status means the skill should be used with extra caution.

Details

LanguageMarkdown
Last updatedMar 6, 2026