Zonos is an open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech. It was released 6 months ago by Zyphra Team under an Apache 2.0 license. It might be a good idea to add this model to the list of audio generation models.
https://github.com/Zyphra/Zonos