Chatterbox TTS API supports multilingual text-to-speech generation across 22 languages using the enhanced chatterbox-tts v0.1.4 multilingual model. This feature enables high-quality voice cloning and speech synthesis in multiple languages while maintaining full OpenAI API compatibility.
🌍 22 Languages Supported - Generate speech in Arabic, Chinese, English, French, German, Italian, Japanese, Spanish, and more
🎭 Language-Aware Voice Cloning - Upload voices with specific language assignments
🔄 Automatic Language Detection - Speech generation automatically uses the voice's assigned language
🧠 Smart Fallbacks - Graceful handling of missing languages with English fallback
📚 Voice Library Integration - Language metadata stored with each voice
⚙️ Configurable - Enable/disable multilingual mode via environment variables
🔗 OpenAI Compatible - No breaking changes to existing API endpoints
📱 Frontend Support - Language selection UI with flags and native names
The multilingual model supports the following 22 languages:
| Code | Language | Native Name | Flag |
|---|---|---|---|
ar |
Arabic | العربية | 🇸🇦 |
da |
Danish | Dansk | 🇩🇰 |
de |
German | Deutsch | 🇩🇪 |
el |
Greek | Ελληνικά | 🇬🇷 |
en |
English | English | 🇺🇸 |
es |
Spanish | Español | 🇪🇸 |
fi |
Finnish | Suomi | 🇫🇮 |
fr |
French | Français | 🇫🇷 |
he |
Hebrew | עברית | 🇮🇱 |
hi |
Hindi | हिन्दी | 🇮🇳 |
it |
Italian | Italiano | 🇮🇹 |
ja |
Japanese | 日本語 | 🇯🇵 |
ko |
Korean | 한국어 | 🇰🇷 |
ms |
Malay | Bahasa Melayu | 🇲🇾 |
nl |
Dutch | Nederlands | 🇳🇱 |
no |
Norwegian | Norsk | 🇳🇴 |
pl |
Polish | Polski | 🇵🇱 |
pt |
Portuguese | Português | 🇵🇹 |
ru |
Russian | Русский | 🇷🇺 |
sv |
Swedish | Svenska | 🇸🇪 |
sw |
Swahili | Kiswahili | 🇹🇿 |
tr |
Turkish | Türkçe | 🇹🇷 |
Note: Chinese (
zh) support is available in the model but currently disabled. Contact support if you need Chinese language support.
Multilingual support is controlled by the USE_MULTILINGUAL_MODEL environment variable:
# Enable multilingual support (default)
USE_MULTILINGUAL_MODEL=true
# Disable multilingual support (English only)
USE_MULTILINGUAL_MODEL=falseDefault Behavior:
- Multilingual mode is enabled by default (
true) - When disabled, only English is supported
- Existing installations automatically get multilingual support
Add to your .env file:
# Multilingual TTS Configuration
USE_MULTILINGUAL_MODEL=true # Enable 23-language support (default: true)Retrieve the list of languages supported by your current configuration:
curl http://localhost:4123/languagesResponse (Multilingual Mode):
{
"languages": [
{ "code": "ar", "name": "Arabic" },
{ "code": "da", "name": "Danish" },
{ "code": "de", "name": "German" }
// ... all 23 languages
],
"count": 23,
"model_type": "multilingual"
}Response (Standard Mode):
{
"languages": [{ "code": "en", "name": "English" }],
"count": 1,
"model_type": "standard"
}Upload a voice sample and assign a specific language:
curl -X POST http://localhost:4123/voices \
-F "voice_name=french_speaker" \
-F "language=fr" \
-F "voice_file=@french_voice.wav"Parameters:
voice_name: Unique identifier for the voicelanguage: ISO 639-1 language code (e.g.,fr,de,ja)voice_file: Audio file in supported format
Language Validation:
- Language codes are validated against supported languages
- Invalid codes return a clear error message
- Defaults to
"en"if not specified
Once a voice is uploaded with a language, speech generation automatically uses the correct language:
# Generate French speech using French voice
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"input": "Bonjour, comment allez-vous?",
"voice": "french_speaker"
}' \
--output french_speech.wavKey Points:
- No language parameter needed in speech requests (OpenAI compatibility)
- Language is automatically determined from voice metadata
- Text can be in any language - the model handles cross-lingual synthesis
- All standard TTS parameters work with multilingual voices
List voices to see language information:
curl http://localhost:4123/voicesResponse:
{
"voices": [
{
"name": "french_speaker",
"file_path": "/voices/french_speaker.wav",
"aliases": [],
"metadata": {
"language": "fr",
"created_at": "2024-01-15T10:30:00Z",
"file_size": 2048576,
"duration": 12.5
}
}
],
"count": 1
}import requests
# Upload a German voice
with open("german_speaker.wav", "rb") as voice_file:
response = requests.post(
"http://localhost:4123/voices",
data={
"voice_name": "german_narrator",
"language": "de"
},
files={
"voice_file": ("german_speaker.wav", voice_file, "audio/wav")
}
)
print(f"Upload status: {response.status_code}")
# Generate German speech
response = requests.post(
"http://localhost:4123/v1/audio/speech",
json={
"input": "Guten Tag! Wie geht es Ihnen heute?",
"voice": "german_narrator",
"exaggeration": 0.8
}
)
with open("german_output.wav", "wb") as f:
f.write(response.content)import requests
import os
voices = [
{"file": "spanish_voice.wav", "name": "spanish_speaker", "lang": "es"},
{"file": "italian_voice.wav", "name": "italian_speaker", "lang": "it"},
{"file": "japanese_voice.wav", "name": "japanese_speaker", "lang": "ja"},
]
for voice in voices:
with open(voice["file"], "rb") as f:
response = requests.post(
"http://localhost:4123/voices",
data={
"voice_name": voice["name"],
"language": voice["lang"]
},
files={"voice_file": f}
)
print(f"Uploaded {voice['name']}: {response.status_code}")import requests
texts = [
{"text": "Hello, how are you today?", "voice": "english_speaker"},
{"text": "Hola, ¿cómo estás hoy?", "voice": "spanish_speaker"},
{"text": "Ciao, come stai oggi?", "voice": "italian_speaker"},
{"text": "こんにちは、今日はいかがですか?", "voice": "japanese_speaker"},
]
for i, item in enumerate(texts):
response = requests.post(
"http://localhost:4123/v1/audio/speech",
json={
"input": item["text"],
"voice": item["voice"]
}
)
with open(f"multilingual_output_{i+1}.wav", "wb") as f:
f.write(response.content)# Stream Japanese speech
curl -X POST http://localhost:4123/v1/audio/speech/stream \
-H "Content-Type: application/json" \
-d '{
"input": "こんにちは。私の名前は田中です。よろしくお願いします。",
"voice": "japanese_speaker",
"chunk_strategy": "sentence"
}' \
--output japanese_stream.wav# Upload with additional metadata and parameters
curl -X POST http://localhost:4123/voices \
-F "voice_name=professional_german" \
-F "language=de" \
-F "voice_file=@professional_voice.wav"The web UI includes comprehensive multilingual support:
- Dropdown with native language names and flag emojis
- Automatic validation against supported languages
- Default selection to English
- Language badges next to each voice
- Flag emojis for visual identification
- Sorting and filtering by language
- Language selection integrated into voice upload modal
- Real-time validation and feedback
- Intuitive language picker with search
The multilingual implementation consists of several key components:
- Model Loading: Automatic detection and loading of multilingual vs standard TTS model
- Language Detection: Voice metadata stores language information
- Speech Generation: Automatic language parameter injection based on voice metadata
- API Compatibility: Maintains OpenAI API format without breaking changes
# Automatic model selection based on configuration
if Config.USE_MULTILINGUAL_MODEL:
model = ChatterboxMultilingualTTS(...)
supported_languages = SUPPORTED_LANGUAGES
else:
model = ChatterboxTTS(...)
supported_languages = {"en": "English"}def resolve_voice_path_and_language(voice_name_or_path):
"""Resolve voice path and extract language metadata"""
if voice_name_or_path in voice_library:
voice_info = voice_library.get_voice_info(voice_name_or_path)
return voice_info.path, voice_info.language
else:
return voice_name_or_path, "en" # Default to English- Existing voices: Automatically assigned English (
"en") language - Existing API calls: Continue to work without modification
- Configuration: Multilingual mode can be disabled for compatibility
- Graceful degradation: Falls back to English for unsupported languages
- Multilingual model requires slightly more memory than standard model
- Language switching doesn't require model reloading
- Voice library scales efficiently with multiple languages
- Multilingual generation performance is comparable to standard model
- Language-specific optimizations built into the model
- Streaming maintains low latency across all languages
- Voice files stored with language metadata in JSON format
- No additional storage overhead for multilingual support
- Efficient indexing by language for large voice libraries
Languages endpoint returns only English
# Check multilingual configuration
curl http://localhost:4123/config | grep USE_MULTILINGUAL_MODELVoice upload fails with language validation error
{
"error": {
"message": "Unsupported language code: xx. Supported: ar, da, de, ...",
"type": "language_validation_error"
}
}Speech generation ignores voice language
- Ensure voice was uploaded with correct language parameter
- Check voice metadata:
curl http://localhost:4123/voices - Verify multilingual mode is enabled
Enable debug logging for multilingual operations:
# Check current configuration
curl http://localhost:4123/config
# Verify supported languages
curl http://localhost:4123/languages
# Check voice metadata
curl http://localhost:4123/voices-
Update dependencies (already done in v0.1.4):
uv sync # or pip install -r requirements.txt -
Enable multilingual mode:
echo "USE_MULTILINGUAL_MODEL=true" >> .env
-
Restart the API:
uv run main.py # or python main.py -
Upload new voices with languages:
curl -X POST http://localhost:4123/voices \ -F "voice_name=multilingual_voice" \ -F "language=fr" \ -F "voice_file=@voice.wav"
- Existing voices continue to work unchanged
- All existing voices default to English (
"en") - Optionally re-upload voices with correct language assignments
- No data loss or corruption
-
Language-Specific Recordings:
- Use native speakers for each language
- Record in the target language for best results
- Avoid mixing languages within a single voice sample
-
Audio Quality:
- 10-30 seconds of clear speech
- Consistent speaking pace and tone
- Minimal background noise
- High-quality audio format (WAV preferred)
-
Voice Naming:
- Include language in voice names:
french_narrator,spanish_casual - Use descriptive names for different styles:
german_formal,italian_cheerful - Consider voice characteristics:
japanese_female_young,arabic_male_deep
- Include language in voice names:
-
Development:
- Test with multiple languages during development
- Validate language assignment for uploaded voices
- Use streaming for better user experience with longer texts
-
Production:
- Monitor memory usage with multiple language models
- Implement proper error handling for unsupported languages
- Consider caching frequently used voice/language combinations
-
Content Management:
- Organize voices by language and use case
- Document voice characteristics and appropriate use cases
- Maintain consistent quality standards across languages
| Endpoint | Method | Description |
|---|---|---|
/languages |
GET | Get supported languages |
/voices |
POST | Upload voice with language |
/voices |
GET | List voices with language metadata |
/v1/audio/speech |
POST | Generate speech (language auto-detected) |
/v1/audio/speech/stream |
POST | Stream speech generation |
{
"code": "fr",
"name": "French"
}{
"languages": [SupportedLanguageItem],
"count": 23,
"model_type": "multilingual"
}{
"name": "french_speaker",
"file_path": "/voices/french_speaker.wav",
"aliases": [],
"metadata": {
"language": "fr",
"created_at": "2024-01-15T10:30:00Z",
"file_size": 2048576,
"duration": 12.5
}
}For more examples and integration patterns, see:
- 📖 Documentation: Main README | API Documentation
- 💬 Discord: Join the community
Built with chatterbox-tts v0.1.4 • Supports 22 languages • OpenAI API Compatible