-
Notifications
You must be signed in to change notification settings - Fork 18
Question regarding post training #20
Copy link
Copy link
Open
Description
First of all, thank you for public the code and pretrained models.
I have somes question about the training phase.
Here's my situation:
- I'm training the model for Korean language using around 2k hours with its corresponding streaming ASR (fast-u2++)
- At 18 epoch:
The training is going fine, however, I find the output speech in offline mode is affected a lot by the artifact from the vocoder, a lot of screeching and unatural ending sound (might due to the over smoothing spectrogram you mentioned in the paper).
Also, the results varied with different reference speechs, could be ok with some samples but rather poor in some samples.
For streaming VC, the quality is rather poor.
Is there anything, that I should consider of?
My thoughts on this:
- The streaming ASR is bad (but for my test set, it's have rather good CER)
- Not enough training, data (2k is small compared to your datasets 10k, but I think it's decent for a test drive and training is quite stable judging from the log)
And when do you think should I start the post training phase?
Below is the training log.
Thanks a lot in advance.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels