Hi RVC maintainers and community,
I am currently researching RVC for an academic project on Singing Voice Conversion / AI Song Cover.
I understand from the official README that the pretrained base model uses nearly 50 hours of high-quality audio from the VCTK open-source dataset. I also understand that, during training with F0, RVC uses pretrained checkpoints such as:
where f0G40k.pth is the pretrained Generator and f0D40k.pth is the pretrained Discriminator.
I would like to ask whether there is any official information or reproducible recipe for how these checkpoints were originally created.
Specifically:
- Which exact subset of VCTK was used?
- Was mic1 or mic2 used?
- Which speakers were included or excluded?
- What preprocessing pipeline was used before pretraining?
- Which HuBERT/ContentVec feature setting was used?
- Which F0 extraction method was used?
- What were the training hyperparameters, such as batch size, learning rate, number of epochs/steps, GPU setup, and checkpoint selection criteria?
- Is there any script or command to reproduce the original
f0G40k.pth and f0D40k.pth checkpoints?
- Has anyone successfully pretrained a new RVC base model from scratch, especially on a singing voice dataset instead of VCTK?
My goal is to understand the scientific and engineering background of the pretrained RVC base model for academic documentation. Any official notes, reproduction attempts, scripts, or community experience would be very helpful.
Thank you very much.
Hi RVC maintainers and community,
I am currently researching RVC for an academic project on Singing Voice Conversion / AI Song Cover.
I understand from the official README that the pretrained base model uses nearly 50 hours of high-quality audio from the VCTK open-source dataset. I also understand that, during training with F0, RVC uses pretrained checkpoints such as:
where
f0G40k.pthis the pretrained Generator andf0D40k.pthis the pretrained Discriminator.I would like to ask whether there is any official information or reproducible recipe for how these checkpoints were originally created.
Specifically:
f0G40k.pthandf0D40k.pthcheckpoints?My goal is to understand the scientific and engineering background of the pretrained RVC base model for academic documentation. Any official notes, reproduction attempts, scripts, or community experience would be very helpful.
Thank you very much.