Question regarding post training

First of all, thank you for public the code and pretrained models.
I have somes question about the training phase.

Here's my situation:
- I'm training the model for Korean language using around 2k hours with its corresponding streaming ASR (fast-u2++)
- At 18 epoch:
The training is going fine, however, I find the output speech in offline mode is affected a lot by the artifact from the vocoder, a lot of screeching and unatural ending sound (might due to the over smoothing spectrogram you mentioned in the paper). 
Also, the results varied with different reference speechs, could be ok with some samples but rather poor in some samples.
For streaming VC, the quality is rather poor.

Is there anything, that I should consider of?
My thoughts on this:
- The streaming ASR is bad (but for my test set, it's have rather good CER)
- Not enough training, data (2k is small compared to your datasets 10k, but I think it's decent for a test drive and training is quite stable judging from the log)

And when do you think should I start the post training phase?
Below is the training log.

<img width="3262" height="947" alt="Image" src="https://github.com/user-attachments/assets/49e45501-9740-497c-a9cd-5cf5c1e79929" />

Thanks a lot in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding post training #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question regarding post training #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions