OpenClaw Voice - Local Qwen TTS for Discord Delivery

Key Features

Local Qwen TTS

Generate speech locally on your own machine instead of calling a paid hosted API.

Requirements

device_map	CUDA target such as cuda:0
dtype	float16, bfloat16, or float32
model_name	1.7B or 0.6B Qwen TTS model

model_name: Qwen/Qwen3-TTS-12Hz-1.7B-Base device_map: cuda:0 dtype: bfloat16

Discord Delivery

Compress the final waveform to MP3 in memory and deliver it through a configured Discord bot.

Bot Fields

name	Case-insensitive bot selector for --bot-name
provider	discord
token	Discord bot token
user_id	Target DM recipient

tts: - name: narrator provider: discord token: YOUR_DISCORD_BOT_TOKEN user_id: 123456789012345678

Chunked Long-Form Audio

Split text on paragraphs, recursively break oversized chunks, and stitch the generated waveforms into one result.

Chunk Controls

max_chunk_chars	Upper bound before recursive splitting
inter_chunk_silence_ms	Silence inserted between generated chunks
Output	One concatenated waveform before MP3 encoding

inter_chunk_silence_ms: 150 max_chunk_chars: 1400 # Paragraph-aware splitting first, # recursive splitting only when needed.

YAML Configuration

Drive the entire runtime from a single YAML file that resolves relative paths from its own directory.

Config Fields

language	Label passed to the model for generation
ref_audio_path	Reference WAV stored next to config.yaml
ref_text_path	Matching transcript stored next to config.yaml
--config	Override the default runtime config file

language: Spanish ref_audio_path: spanish_male.wav ref_text_path: spanish_male.txt

Reference Voice Cloning

Use the bundled Spanish reference voice or provide your own WAV and matching transcript to anchor generation to a target speaker profile.

Inputs

Spanish	Bundled spanish_male.wav plus spanish_male.txt
Other languages	User-generated WAV plus matching transcript
Location	Copy both files into the config directory

~/.openclaw-voice/ config.yaml spanish_male.wav spanish_male.txt # For other languages, place your own # WAV and transcript in the same directory.

OpenClaw Skill Ready

Ship the service with an OpenClaw skill so agents know how to invoke the command and what config it depends on.

Repository Hooks

Skill	skill/openclaw-voice/SKILL.md
CLI	openclaw-voice

The agent can select a bot, load ~/.openclaw-voice/config.yaml, and turn a text file into a Discord DM.

Getting Started

Requirements

OpenClaw Voice currently targets local CUDA execution. You need Python 3.11+, a CUDA-capable GPU with a working PyTorch CUDA runtime, and Discord bot credentials for delivery. A Spanish reference voice is bundled with the repository; for other languages, create your own WAV and matching transcript.

# Lower VRAM option
model_name: Qwen/Qwen3-TTS-12Hz-0.6B-Base
device_map: cuda:0

Installation

pipx install git+https://github.com/arrase/openclaw-voice.git

Configuration

Copy the template config and the bundled Spanish reference files into the runtime directory. For any other language, replace those two files with your own WAV and matching transcription, then update the config.

mkdir -p ~/.openclaw-voice
cp config/config.yaml ~/.openclaw-voice/config.yaml
cp assets/reference/spanish_male.wav ~/.openclaw-voice/spanish_male.wav
cp assets/reference/spanish_male.txt ~/.openclaw-voice/spanish_male.txt

Basic Usage

Generate speech from a UTF-8 text file and deliver the MP3 through the selected bot.

openclaw-voice --input-text input.txt --bot-name narrator
openclaw-voice --input-text examples/long_text_es.txt --bot-name narrator

Alternate Entry Points

Use the Python module entry point or override the runtime configuration path when needed.

python -m openclaw_voice --input-text input.txt --bot-name narrator
openclaw-voice --input-text input.txt --bot-name narrator --config /path/to/config.yaml

Local Voice Cloning, Ready for Delivery