zvukogram
Text-to-Speech via Zvukogram API with SSML support.
Setup & Installation
Install command
clawhub install erview/zvukogramIf the CLI is not installed:
Install command
npx clawhub@latest install erview/zvukogramOr install with OpenClaw CLI:
Install command
openclaw skills install erview/zvukogramor paste the repo link into your assistant's chat
Install command
https://github.com/openclaw/skills/tree/main/skills/erview/zvukogramWhat This Skill Does
Generates speech from text using the Zvukogram API. Supports SSML markup for stress marks, pauses, and speed control. Can merge multiple audio fragments and handle English word transcription for Russian TTS.
SSML support with per-word stress control and English transcription aliases gives more accurate Russian-language output than basic TTS APIs.
When to Use It
- Converting written articles to audio files
- Adding voice notifications to automated pipelines
- Recording podcast narration with multiple voices
- Voicing news scripts with proper pronunciation control
- Generating audio from long-form text documents
View original SKILL.md file
# Zvukogram TTS
Speech generation via Zvukogram API with SSML markup support.
## Requirements
To use this skill, you need:
- **Zvukogram API token** — get it at https://zvukogram.com/
- **Zvukogram account email**
### Setup
Create file `~/.config/zvukogram/config.json`:
```bash
mkdir -p ~/.config/zvukogram
```
```json
{
"token": "your_api_token_here",
"email": "your_email@example.com"
}
```
Or use environment variables:
```bash
export ZVUKOGRAM_TOKEN=your_api_token_here
export ZVUKOGRAM_EMAIL=your_email@example.com
```
## Quick Start
```bash
# Simple TTS
python3 scripts/tts.py --text "Hello, world!" --voice Алена --output hello.mp3
# With +20% speed
python3 scripts/tts.py --text "Fast text" --voice Алена --speed 1.2 --output fast.mp3
# Check balance
python3 scripts/balance.py
```
## Features
- **TTS generation** — text to speech
- **SSML support** — stress marks, pauses, speed
- **Audio merging** — combine fragments via ffmpeg
- **Transcription** — proper pronunciation of English words
## SSML Markup
### Stress Marks
Use `+` before stressed vowel:
```
З+амок — stress on "a"
зам+ок — stress on "o"
```
### Aliases (Transcription)
```xml
<sub alias="Оупен Эй Ай">OpenAI</sub>
<sub alias="Самсунг">Samsung</sub>
<sub alias="Ал+ьтман">Альтман</sub>
```
### Speed
```xml
<prosody rate="1.2">20% faster</prosody>
<prosody rate="fast">Fast text</prosody>
```
### Pauses
```xml
<break time="500ms"/>
```
## Available Voices
- **Алена** — female, neutral (recommended)
- **Андрей** — male, neutral (recommended)
- **Александра** — female, soft
- **Антон** — male, business
Full list: see [references/VOICES.md](references/VOICES.md)
## Examples
See [references/EXAMPLES.md](references/EXAMPLES.md) for:
- Dialogs and podcasts
- News voiceover
- Voice notifications
- Long texts
## Transcription
See [references/TRANSCRIPTION.md](references/TRANSCRIPTION.md) for proper pronunciation:
- OpenAI → Оупен Эй Ай
- GPT → Джи Пи Ти
- Samsung → Самсунг
- Altman → Ал+ьтман
## SSML Reference
- Full, agent-readable reference (recommended): [references/SSML.md](references/SSML.md)
- Quick lookup: [references/SSML_CHEATSHEET.md](references/SSML_CHEATSHEET.md)
- Official Zvukogram SSML docs: https://zvukogram.com/node/ssml/
## Troubleshooting
See [references/TROUBLESHOOTING.md](references/TROUBLESHOOTING.md) for:
- API errors
- Audio issues
- Diagnostics
## API Limitations
- Max 1000 characters per request (`/text`)
- Up to 1M characters via `/longtext`
- Do not rely on `<voice>` / `<speak>` wrappers for API usage. For multi-voice, generate and merge fragments (one request per voice).
## Links
- API docs: https://zvukogram.com/node/api/
- Voice rating: https://zvukogram.com/rating/
- Support: https://t.me/zvukogram
Example Workflow
Here's how your AI assistant might use this skill in practice.
User asks: Converting written articles to audio files
- 1Converting written articles to audio files
- 2Adding voice notifications to automated pipelines
- 3Recording podcast narration with multiple voices
- 4Voicing news scripts with proper pronunciation control
- 5Generating audio from long-form text documents
Text-to-Speech via Zvukogram API with SSML support.
Security Audits
These signals reflect official OpenClaw status values. A Suspicious status means the skill should be used with extra caution.
Similar Skills
VIEW ALLahmed
Terminal Spotify playback/search via spogo (preferred)
attio-crm
Manage Attio CRM records (companies, people, deals, tasks, notes).
stackunderflow
A knowledge-retrieval protocol allowing the agent to access a verified community knowledge base.
moonbanking
Full access to Moon Banking API endpoints for data about every bank on Earth, including stories, votes, scores.