Skip to content

Question regarding post training #20

@nghng

Description

@nghng

First of all, thank you for public the code and pretrained models.
I have somes question about the training phase.

Here's my situation:

  • I'm training the model for Korean language using around 2k hours with its corresponding streaming ASR (fast-u2++)
  • At 18 epoch:
    The training is going fine, however, I find the output speech in offline mode is affected a lot by the artifact from the vocoder, a lot of screeching and unatural ending sound (might due to the over smoothing spectrogram you mentioned in the paper).
    Also, the results varied with different reference speechs, could be ok with some samples but rather poor in some samples.
    For streaming VC, the quality is rather poor.

Is there anything, that I should consider of?
My thoughts on this:

  • The streaming ASR is bad (but for my test set, it's have rather good CER)
  • Not enough training, data (2k is small compared to your datasets 10k, but I think it's decent for a test drive and training is quite stable judging from the log)

And when do you think should I start the post training phase?
Below is the training log.

Image

Thanks a lot in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions