VibeVoice Realtime 0.5B OpenAI-Compatible TTS Server

OpenAI-compatible TTS API wrapping VibeVoice-Realtime-0.5B for Open WebUI.

Note: If both this wrapper and Open WebUI runs in a container, use host.docker.internal:8880 instead of localhost.

👆🏻 📹 YouTube video demonstration of "Mike" vocal used on Open WebUI. 📹 👆🏻

Features

✅ OpenAI API Compatible
- /v1/audio/speech
- /v1/audio/voices
- /v1/audio/models
- Can be used on many OpenAI API-compatible drop-in endpoint.
⚡ Real-time Performance - ~0.5x RTF (Real-Time Factor) on an RTX 3060.
🚀 GPU Accelerated - Requiring only ~2GB of VRAM, CUDA with Flash Attention (Docker) or SDPA
🔊 7 Voices - With OpenAI voice name aliases (alloy, nova, etc.)
🎵 Multiple Formats - MP3, WAV, OPUS, FLAC, AAC, PCM
📦 Self-contained - Models download to ./models/ on first run

Requirements

Python 3.13 (via uv) / Docker with NVIDIA GPU support
NVIDIA GPU with CUDA 13.x
ffmpeg

Option 1: Docker (Recommended)

Best performance with Flash Attention + APEX pre-installed.

CUDA 13.0.2 runtime
Python 3.13 via uv
Prebuilt wheels: flash-attn (downloaded during build), apex (bundled)

git clone https://github.com/marhensa/vibevoice-realtime-openai-api.git
cd vibevoice-realtime-openai-api

# Using docker-compose (recommended)
docker compose up -d --build

# Or manual build/run
docker build -t vibevoice-realtime-openai-api .
docker run --gpus all -p 8880:8880 \
  -v ./models:/home/ubuntu/app/models \
  -e CFG_SCALE=1.25 \
  vibevoice-realtime-openai-api

⚠️ Please be patient and check your network monitor, because on first run it downloads models 📦 (~2GB) and voice presets 🎤 (~22MB) from huggingface and Microsoft VibeVoice repositories to ./models/. It's not stuck, it's just downloading.

Option 2: Python venv

Requires Python 3.13 and NVIDIA GPU with CUDA 13.x drivers.

Windows

winget install --id Gyan.FFmpeg

git clone https://github.com/marhensa/vibevoice-realtime-openai-api.git
cd vibevoice-realtime-openai-api

# Install uv
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# Create venv
uv venv .venv --python 3.13 --seed
.venv\Scripts\activate

# Install dependencies
uv pip install -r requirements.txt

# Run (optional: set CFG_SCALE for expressiveness, 0.0-3.0)
$env:CFG_SCALE="1.25"; python vibevoice_realtime_openai_api.py --port 8880

Linux

sudo apt install ffmpeg

git clone https://github.com/marhensa/vibevoice-realtime-openai-api.git
cd vibevoice-realtime-openai-api

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create venv
uv venv .venv --python 3.13 --seed
source .venv/bin/activate

# Install dependencies
uv pip install -r requirements.txt

# Download and install prebuilt Flash Attention
curl -L -o ./prebuilt-wheels/flash_attn-2.8.3+cu130torch2.9-cp313-cp313-linux_x86_64.whl \
  "https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.5.2/flash_attn-2.8.3%2Bcu130torch2.9-cp313-cp313-linux_x86_64.whl"
uv pip install ./prebuilt-wheels/flash_attn-*.whl

# Install prebuilt APEX
uv pip install ./prebuilt-wheels/apex-*.whl

# Run (optional: set CFG_SCALE for expressiveness, 0.0-3.0)
CFG_SCALE=1.25 OPTIMIZE_FOR_SPEED=1 python vibevoice_realtime_openai_api.py --port 8880

First run downloads models (~2GB) and voice presets (~22MB) to ./models/.

Open WebUI Configuration

Setting	Value
TTS Engine	OpenAI
API Base URL	`http://localhost:8880/v1`
API Key	`sk-unused`
TTS Model	`tts-1-hd`
TTS Voice	`Carter`, `Emma`, `alloy`, `nova`, etc.
Response splitting	`Paragraph` (recommended for low-end GPU)

Note: If both this wrapper and Open WebUI runs in a container, use host.docker.internal:8880 instead of localhost.

Available Voices

OpenAI Name	VibeVoice Name	Gender
alloy	Carter	Male
echo	Davis	Male
fable	Emma	Female
onyx	Frank	Male
nova	Grace	Female
shimmer	Mike	Male
-	Samuel	Male

You can use either OpenAI names or VibeVoice names in the API.

Custom Voices / Additional Voices

If there's any updated voices, you can download them from here.

You can add custom / additional voices by placing .pt files in ./models/voices/. The server scans this directory on startup.

Note: The Realtime 0.5B model does not provide public voice cloning tools. For custom voice creation, contact Microsoft. Microsoft plans to expand available speakers in future updates.

API

# Health check
curl http://localhost:8880/health

# List voices
curl http://localhost:8880/v1/audio/voices

# Generate speech (PowerShell)
Invoke-RestMethod -Uri "http://localhost:8880/v1/audio/speech" `
  -Method Post -ContentType "application/json" `
  -Body '{"input": "Welcome to VibeVoice! This is real-time text to speech, powered by Microsoft research.", "voice": "Emma"}' `
  -OutFile "speech.mp3"

# Generate speech (bash/Linux)
curl -X POST http://localhost:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Welcome to VibeVoice! This is real-time text to speech, powered by Microsoft research.", "voice": "Emma"}' \
  --output speech.mp3

Environment Variables

Variable	Default	Description
`MODELS_DIR`	`./models`	Path to models directory
`VIBEVOICE_DEVICE`	`cuda`	Device: `cuda` (NVIDIA GPUs), `cpu`, or `mps` (Apple Silicon GPUs)
`CFG_SCALE`	`1.25`	CFG guidance scale (0.0-3.0, higher = more expressive)
`OPTIMIZE_FOR_SPEED`	`1` (Docker)	Set to `1` to suppress APEX warnings

License

VibeVoice (code + model): MIT License (Microsoft)
Qwen2.5-0.5B (base LLM): Apache 2.0 (Alibaba)
This wrapper: MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
assets		assets
models		models
prebuilt-wheels		prebuilt-wheels
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
requirements.txt		requirements.txt
vibevoice_realtime_openai_api.py		vibevoice_realtime_openai_api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VibeVoice Realtime 0.5B OpenAI-Compatible TTS Server

Features

Requirements

Option 1: Docker (Recommended)

Option 2: Python venv

Windows

Linux

Open WebUI Configuration

Available Voices

Custom Voices / Additional Voices

API

Environment Variables

License

About

Uh oh!

Releases

Packages

Languages

License

marhensa/vibevoice-realtime-openai-api

Folders and files

Latest commit

History

Repository files navigation

VibeVoice Realtime 0.5B OpenAI-Compatible TTS Server

Features

Requirements

Option 1: Docker (Recommended)

Option 2: Python venv

Windows

Linux

Open WebUI Configuration

Available Voices

Custom Voices / Additional Voices

API

Environment Variables

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages