OpenClaw Voice turns UTF-8 text into cloned speech with Qwen TTS, runs fully on your own GPU, and sends the final MP3 through a configured Discord bot. It is built for automation workflows that need local synthesis, predictable configuration, and no hosted inference API.
Generate speech locally on your own machine instead of calling a paid hosted API.
Requirements
| device_map | CUDA target such as cuda:0 |
| dtype | float16, bfloat16, or float32 |
| model_name | 1.7B or 0.6B Qwen TTS model |
Compress the final waveform to MP3 in memory and deliver it through a configured Discord bot.
Bot Fields
| name | Case-insensitive bot selector for --bot-name |
| provider | discord |
| token | Discord bot token |
| user_id | Target DM recipient |
Split text on paragraphs, recursively break oversized chunks, and stitch the generated waveforms into one result.
Chunk Controls
| max_chunk_chars | Upper bound before recursive splitting |
| inter_chunk_silence_ms | Silence inserted between generated chunks |
| Output | One concatenated waveform before MP3 encoding |
Drive the entire runtime from a single YAML file that resolves relative paths from its own directory.
Config Fields
| language | Label passed to the model for generation |
| ref_audio_path | Reference WAV stored next to config.yaml |
| ref_text_path | Matching transcript stored next to config.yaml |
| --config | Override the default runtime config file |
Use the bundled Spanish reference voice or provide your own WAV and matching transcript to anchor generation to a target speaker profile.
Inputs
| Spanish | Bundled spanish_male.wav plus spanish_male.txt |
| Other languages | User-generated WAV plus matching transcript |
| Location | Copy both files into the config directory |
Ship the service with an OpenClaw skill so agents know how to invoke the command and what config it depends on.
Repository Hooks
| Skill | skill/openclaw-voice/SKILL.md |
| CLI | openclaw-voice |
Requirements
OpenClaw Voice currently targets local CUDA execution. You need Python 3.11+, a CUDA-capable GPU with a working PyTorch CUDA runtime, and Discord bot credentials for delivery. A Spanish reference voice is bundled with the repository; for other languages, create your own WAV and matching transcript.
# Lower VRAM option
model_name: Qwen/Qwen3-TTS-12Hz-0.6B-Base
device_map: cuda:0
Installation
pipx install git+https://github.com/arrase/openclaw-voice.git
Configuration
Copy the template config and the bundled Spanish reference files into the runtime directory. For any other language, replace those two files with your own WAV and matching transcription, then update the config.
mkdir -p ~/.openclaw-voice
cp config/config.yaml ~/.openclaw-voice/config.yaml
cp assets/reference/spanish_male.wav ~/.openclaw-voice/spanish_male.wav
cp assets/reference/spanish_male.txt ~/.openclaw-voice/spanish_male.txt
Basic Usage
Generate speech from a UTF-8 text file and deliver the MP3 through the selected bot.
openclaw-voice --input-text input.txt --bot-name narrator
openclaw-voice --input-text examples/long_text_es.txt --bot-name narrator
Alternate Entry Points
Use the Python module entry point or override the runtime configuration path when needed.
python -m openclaw_voice --input-text input.txt --bot-name narrator
openclaw-voice --input-text input.txt --bot-name narrator --config /path/to/config.yaml