Voice Cloning Voiceover Tools

Generate voiceovers in your own voice, locally, using Chatterbox TTS by Resemble AI. No subscriptions, no cloud, no usage limits. Everything runs on your machine.

How it works

Chatterbox is an open source neural TTS model. You give it a reference clip of your voice and a script, and it synthesizes speech in your voice. It's not an LLM — it's a diffusion-based audio model that captures your vocal identity (accent, cadence, intonation) and applies it to new text.

Requirements

1. Install `uv`

uv is a fast Python package manager. It handles dependencies automatically when you run the scripts — no separate pip install step needed.

Windows (PowerShell):

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

macOS:

curl -LsSf https://astral.sh/uv/install.sh | sh

Or via Homebrew:

brew install uv

2. Install `ffmpeg`

Required only if your reference audio is in M4A or MP3 format. Skip if you already have a WAV file or are planning to use WAV only.

Windows:

winget install ffmpeg

Or via Chocolatey:

choco install ffmpeg

macOS:

brew install ffmpeg

3. Make the scripts executable (macOS only)

chmod +x voiceover.py
chmod +x voiceover_server.py

Windows does not use file permissions this way — skip this step.

First run

The first time you run either script, Chatterbox's model weights (~3GB total) will be downloaded automatically from HuggingFace. This is a one-time download — subsequent runs load from disk instantly.

Windows: %USERPROFILE%\.cache\huggingface\hub\
macOS: ~/.cache/huggingface/hub/

Your reference audio

WAV format preferred. M4A and MP3 also work (ffmpeg handles conversion automatically).
Aim for at least 30–60 seconds of clean, natural speech.
Record in a quiet space — no background noise, no music.
Reading out Harvard sentences makes an excellent reference clip.
Record your voice and save it as sample-voice.wav in this same directory.

Scripts

`voiceover.py` — single generation

Good for one-off generations. Loads the model fresh each run.

Windows:

# Basic usage
uv run voiceover.py --ref sample-voice.wav --text "Welcome to this video"

# From a text file
uv run voiceover.py --ref sample-voice.wav --file script.txt

# Custom output filename
uv run voiceover.py --ref sample-voice.wav --text "Your script" --out intro.wav

# Tweak voice parameters
uv run voiceover.py --ref sample-voice.wav --text "Your script" --exaggeration 0.4 --cfg 0.7

macOS:

# Basic usage
./voiceover.py --ref sample-voice.wav --text "Welcome to this video"

# From a text file
./voiceover.py --ref sample-voice.wav --file script.txt

# Custom output filename
./voiceover.py --ref sample-voice.wav --text "Your script" --out intro.wav

# Tweak voice parameters
./voiceover.py --ref sample-voice.wav --text "Your script" --exaggeration 0.4 --cfg 0.7

`voiceover_server.py` — persistent server (recommended for sessions)

Loads the model once and keeps it warm. Use this when generating multiple voiceovers in one sitting — each generation is much faster since the model stays in memory.

Terminal 1 — start the server:

Windows:

uv run voiceover_server.py --ref sample-voice.wav

macOS:

./voiceover_server.py --ref sample-voice.wav

Terminal 2 — generate as many times as you want:

# Basic
curl -X POST http://localhost:8765 -d "Welcome to this video"

# Custom output filename
curl -X POST "http://localhost:8765?out=intro.wav" -d "Your full script here..."

# Tweak parameters per request
curl -X POST "http://localhost:8765?exaggeration=0.4&cfg=0.7" -d "Your script"

curl is available natively on both macOS and Windows 10/11.

Output files are saved in your current directory, auto-named voiceover_001.wav, voiceover_002.wav etc. unless you specify ?out=filename.wav.

Long scripts are automatically split into sentence-sized chunks, generated separately, and stitched into one seamless output file.

Stop the server: Ctrl+C in Terminal 1.

Parameters

Parameter	Default	Description
`--exaggeration`	`0.5`	Expressiveness. Lower = calmer, higher = more animated. Try `0.3`–`0.7`.
`--cfg`	`0.5`	How closely the output follows your reference voice. Higher = more like you. Try `0.6`–`0.7` if it sounds too generic.

Both parameters can also be passed per-request to the server via query string: ?exaggeration=0.4&cfg=0.7

Convert output to MP3 or MP4

These commands work on both macOS and Windows (requires ffmpeg installed):

# WAV → MP3
ffmpeg -i voiceover.wav voiceover.mp3

# WAV → MP4 (black screen video, useful for LinkedIn etc.)
ffmpeg -i voiceover.wav -f lavfi -i color=c=black:s=1280x720:r=24 -shortest -c:v libx264 -c:a aac voiceover.mp4

Notes

Device (Windows): If you have an NVIDIA GPU, PyTorch may use CUDA automatically for faster generation. CPU fallback works fine otherwise.
Device (macOS): Chatterbox runs on CPU on Apple Silicon. MPS (Apple's GPU backend) is not used due to a PyTorch conv1d limitation at this model's output size. CPU on an M-series Mac is fast enough for this use case.
Watermark: Generated audio includes an imperceptible neural watermark from Resemble AI (Perth watermarker). It does not affect audio quality and is not detected or flagged by YouTube or other platforms.
Model cache: Safe to delete to free disk space — it will re-download on next run. Windows: %USERPROFILE%\.cache\huggingface\hub\. macOS: ~/.cache/huggingface/hub/.
Python version: Pinned to 3.11 in the script header. uv handles this automatically.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
voiceover.py		voiceover.py
voiceover_server.py		voiceover_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Cloning Voiceover Tools

How it works

Requirements

1. Install `uv`

2. Install `ffmpeg`

3. Make the scripts executable (macOS only)

First run

Your reference audio

Scripts

`voiceover.py` — single generation

`voiceover_server.py` — persistent server (recommended for sessions)

Parameters

Convert output to MP3 or MP4

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Cloning Voiceover Tools

How it works

Requirements

1. Install uv

2. Install ffmpeg

3. Make the scripts executable (macOS only)

First run

Your reference audio

Scripts

voiceover.py — single generation

voiceover_server.py — persistent server (recommended for sessions)

Parameters

Convert output to MP3 or MP4

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Install `uv`

2. Install `ffmpeg`

`voiceover.py` — single generation

`voiceover_server.py` — persistent server (recommended for sessions)

Packages