LEMAS‑Edit is a multilingual version speech editing system, supporting 10 languages:
- Chinese
- English
- Spanish
- Russian
- French
- German
- Italian
- Portuguese
- Indonesian
- Vietnamese
It bundles:
- the multilingual flow-matching backend (
lemas_tts) - the decoder only edit backend (
lemas_edit) - pretrained checkpoints, vocabs and demo data (
pretrained_models/) - an end‑to‑end Gradio web UI (
gradio_mix.py)
Compared to the original LEMAS‑TTS repo, this project focuses on speech editing instead of pure TTS, and integrates both backends into a single interface.
-
Autoregressive codec speech editing backend
- Support 7 languages (zh / en / de / fr / pt / es / it)
- Integrated with WhisperX + MMS alignment for “edit by text + span”
- Uses UVR5 and DeepFilterNet for denoising (Optional Choice)
-
Multilingual speech editing (flow-matching backend)
- Based on the LEMAS‑TTS models (
multilingual_grl,multilingual_prosody) - Supports the same languages as LEMAS‑TTS (zh / en / es / ru / fr / de / it / pt / id / vi)
- Based on the LEMAS‑TTS models (
-
One Gradio UI for both backends
Edit Modelselector:multilingual_grl,multilingual_prosody,autoregressive- Shared transcription, alignment, denoise and visualization components
- All required models are expected under
pretrained_models/
git clone https://github.com/LEMAS-Project/LEMAS-Edit.git
cd ./LEMAS-Edit
conda create -n lemas-edit python=3.10
conda activate lemas-editYou can install system dependencies via apt or conda:
sudo apt-get update
sudo apt-get install -y ffmpegor
conda install -c conda-forge ffmpegpip install -r requirements.txtInstall PyTorch + Torchaudio according to your device (CUDA / ROCm / CPU / MPS) following the official PyTorch instructions.
Download the pretrained models for both backends from https://huggingface.co/LEMAS-Project/LEMAS-Edit
and place pretrained_models/ in the directory next to the lemas_edit/ folder.
Once pretrained_models/ is in place, both lemas_tts and lemas_edit
will automatically find the checkpoints and vocabs.
All commands below assume:
cd ./LEMAS-Edit
export PYTHONPATH="$PWD:${PYTHONPATH}"To launch the full editing UI locally:
python gradio_mix.pyYou can customize host/port and sharing:
python gradio_mix.py --host 0.0.0.0 --port 7861 --shareThe lemas_tts.scripts entrypoints are kept for convenience and behave as in
the original LEMAS‑TTS repo:
-
TTS from text:
- Python:
lemas_tts.scripts.tts_multilingual - Shell:
lemas_tts/scripts/tts_multilingual.sh
- Python:
-
speech editing:
- Python:
lemas_tts.scripts.speech_edit_multilingual - Shell:
lemas_tts/scripts/speech_edit_multilingual.sh
- Python:
See those scripts for detailed CLI options (model choice, ckpt paths, speed / NFE / CFG / Sway, etc.).
A direct CLI for the autoregressive codec backend is provided as a starting point:
- Python entry:
lemas_edit.scripts.inference_lemas_editing - Shell helper:
lemas_edit/scripts/inference_lemas_editing.sh
This script is a port of the original VoiceCraft/inference_lemas_editing.py
and is currently being adapted to the lemas_edit namespace. Its interface may
change; please refer to the script source for up‑to‑date arguments and usage.
We provide simple subjective listening tests (MUSHRA and ABX preference test) setup under ./eval.
To install the extra dependencies for evaluation, run:
pip install git+https://github.com/descriptinc/audiotools
pip install joypy pandasTo start the ABX preference test, install the extra dependencies and launch the tools:
cd ./eval/abx
python abx.py # launch Gradio ABX preference test UI
python plot.py # aggregate results and plot preference distributionsTo start the MUSHRA listening test, install the extra dependencies and launch the tools:
cd ./eval/mushra
python mushra.py # launch Gradio MUSHRA listening test UIThis project builds on, and reuses code from, several open‑source projects:
- VoiceCraft – Autoregressive speech editing model.
- F5‑TTS – Flow Matching based TTS.
- Vocos – Fourier-based neural vocoder.
- Seamless-Expressive – Prosody encoder.
- UVR5 – Separate an audio file into various stems, using multiple models.
- DeepFilterNet – Noise supression using deep filtering.
- audiotools – Audio tools for subjective evaluation.
If you use LEMAS‑Edit in your work, please also consider citing and acknowledging these upstream projects.
This repository is released under the CC‑BY‑NC‑4.0 license.
See https://creativecommons.org/licenses/by-nc/4.0/ for more details.