PyTTSeval v1.0

Implemented 5 key objective metrics in version 1 of this project for evaluating TTS systems' outputs.

Each metric compares a synthetic audio sample to a reference (real) audio sample, assessing aspects such as pitch accuracy, spectral similarity, and statistical features with accompanying visualization.

List of Metrics :

DTW (Dynamic Time Wrapping)
MCD (Mel Cepstral Distortion)
MSD (Mel Spectral Distortion)
F0 Frame Error
Stat Moments (Kurtosis, Mean, STD)

Setup

Create a new conda environment
Take reference audio and generate synthesis using any open source TTS (I've used LJSpeech as reference and Kokoro TTS for synthesis)
Create a folder and inside that create 2 folders with these names (synthesis, reference)
Put all your audio files in their respective folders
Zip the folder and add it to the root folder
Run main.py and give the path of the zip file and where you want to extract
Keep uncommenting all the functions one by one and save all the metrics visualizations

Future Version Ideas

More metrics
ASR (phoneme matching) for multiple languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
examples		examples
images		images
metrics		metrics
utils		utils
visualizations		visualizations
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTTSeval v1.0

List of Metrics :

Setup

Future Version Ideas

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PyTTSeval v1.0

List of Metrics :

Setup

Future Version Ideas

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages