speechall-cli

Coding Agents & IDEs
v0.1.1
Benign

Install and use the speechall CLI tool for speech-to-text transcription.

1.1K downloads1.1K installsby @atacan

Setup & Installation

Install command

clawhub install atacan/speechall-cli

If the CLI is not installed:

Install command

npx clawhub@latest install atacan/speechall-cli

Or install with OpenClaw CLI:

Install command

openclaw skills install atacan/speechall-cli

or paste the repo link into your assistant's chat

Install command

https://github.com/openclaw/skills/tree/main/skills/atacan/speechall-cli

What This Skill Does

CLI tool for transcribing audio and video files to text via the Speechall API. Routes requests through multiple speech-to-text providers from a single interface. Supports speaker diarization, subtitle formats (SRT, VTT), and custom vocabulary.

Provides access to speech-to-text models from OpenAI, Deepgram, AssemblyAI, Google, and others through a single CLI without separate SDKs or account integrations for each provider.

When to Use It

  • Transcribing a recorded interview to a text file
  • Generating SRT subtitles for a recorded presentation
  • Identifying speakers in a multi-person meeting recording
  • Processing domain-specific audio with custom terminology
  • Listing available STT models to compare provider options
View original SKILL.md file
# speechall-cli

CLI for speech-to-text transcription via the Speechall API. Supports multiple providers (OpenAI, Deepgram, AssemblyAI, Google, Gemini, Groq, ElevenLabs, Cloudflare, and more).

## Installation

### Homebrew (macOS and Linux)

```bash
brew install Speechall/tap/speechall
```

**Without Homebrew**: Download the binary for your platform from https://github.com/Speechall/speechall-cli/releases and place it on your `PATH`.

### Verify

```bash
speechall --version
```

## Authentication

An API key is required. Provide it via environment variable (preferred) or flag:

```bash
export SPEECHALL_API_KEY="your-key-here"
# or
speechall --api-key "your-key-here" audio.wav
```

The user can create an API key on https://speechall.com/console/api-keys

## Commands

### transcribe (default)

Transcribe an audio or video file. This is the default subcommand — `speechall audio.wav` is equivalent to `speechall transcribe audio.wav`.

```bash
speechall <file> [options]
```

**Options:**

| Flag | Description | Default |
|---|---|---|
| `--model <provider.model>` | STT model identifier | `openai.gpt-4o-mini-transcribe` |
| `--language <code>` | Language code (e.g. `en`, `tr`, `de`) | API default (auto-detect) |
| `--output-format <format>` | Output format (`text`, `json`, `verbose_json`, `srt`, `vtt`) | API default |
| `--diarization` | Enable speaker diarization | off |
| `--speakers-expected <n>` | Expected number of speakers (use with `--diarization`) | — |
| `--no-punctuation` | Disable automatic punctuation | — |
| `--temperature <0.0-1.0>` | Model temperature | — |
| `--initial-prompt <text>` | Text prompt to guide model style | — |
| `--custom-vocabulary <term>` | Terms to boost recognition (repeatable) | — |
| `--ruleset-id <uuid>` | Replacement ruleset UUID | — |
| `--api-key <key>` | API key (overrides `SPEECHALL_API_KEY` env var) | — |

**Examples:**

```bash
# Basic transcription
speechall interview.mp3

# Specific model and language
speechall call.wav --model deepgram.nova-2 --language en

# Speaker diarization with SRT output
speechall meeting.wav --diarization --speakers-expected 3 --output-format srt

# Custom vocabulary for domain-specific terms
speechall medical.wav --custom-vocabulary "myocardial" --custom-vocabulary "infarction"

# Transcribe a video file (macOS extracts audio automatically)
speechall presentation.mp4
```

### models

List available speech-to-text models. Outputs JSON to stdout. Filters combine with AND logic.

```bash
speechall models [options]
```

**Filter flags:**

| Flag | Description |
|---|---|
| `--provider <name>` | Filter by provider (e.g. `openai`, `deepgram`) |
| `--language <code>` | Filter by supported language (`tr` matches `tr`, `tr-TR`, `tr-CY`) |
| `--diarization` | Only models supporting speaker diarization |
| `--srt` | Only models supporting SRT output |
| `--vtt` | Only models supporting VTT output |
| `--punctuation` | Only models supporting automatic punctuation |
| `--streamable` | Only models supporting real-time streaming |
| `--vocabulary` | Only models supporting custom vocabulary |

**Examples:**

```bash
# List all available models
speechall models

# Models from a specific provider
speechall models --provider deepgram

# Models that support Turkish and diarization
speechall models --language tr --diarization

# Pipe to jq for specific fields
speechall models --provider openai | jq '.[].identifier'
```

## Tips

- On macOS, video files (`.mp4`, `.mov`, etc.) are automatically converted to audio before upload.
- On Linux, pass audio files directly (`.wav`, `.mp3`, `.m4a`, `.flac`, etc.).
- Output goes to stdout. Redirect to save: `speechall audio.wav > transcript.txt`
- Errors go to stderr, so piping stdout is safe.
- Run `speechall --help`, `speechall transcribe --help`, or `speechall models --help` to see all valid enum values for model identifiers, language codes, and output formats.

Example Workflow

Here's how your AI assistant might use this skill in practice.

INPUT

User asks: Transcribing a recorded interview to a text file

AGENT
  1. 1Transcribing a recorded interview to a text file
  2. 2Generating SRT subtitles for a recorded presentation
  3. 3Identifying speakers in a multi-person meeting recording
  4. 4Processing domain-specific audio with custom terminology
  5. 5Listing available STT models to compare provider options
OUTPUT
Install and use the speechall CLI tool for speech-to-text transcription.

Share this skill

Security Audits

VirusTotalBenign
OpenClawBenign
View full report

These signals reflect official OpenClaw status values. A Suspicious status means the skill should be used with extra caution.

Details

LanguageMarkdown
Last updatedMar 1, 2026