Use ContentVec as pre-trained model.

Hey guys!

I am working on fine-tuning a model produced by ContentVec. Firstly, I'm unsure if this is possible, but I've reached a stage where I encounter an error related to unmatched labels. How can I generate these labels for train and valid.km? And what are these labels?

command = [
    "python", "-u", "./fairseq/fairseq_cli/hydra_train.py",
    "--config-dir", config_dir,
    "--config-name", 'contentvec',
    f"hydra.run.dir={expdir}",
    f"task.data={metadata_dir}",
    f"task.label_dir={label_dir}",
    'task.labels=["km"]',
    f"task.spk2info={spk}",
    "task.crop=true",
    "dataset.train_subset=train",
    "dataset.valid_subset=valid",
    "dataset.num_workers=10",
    "dataset.max_tokens=500000",
    "checkpoint.keep_best_checkpoints=10",
    f"checkpoint.restore_file={pretrained_model_checkpoint}",  # Restore from the pre-trained model
    "checkpoint.reset_optimizer=true",  # Reset optimizer (optional, but recommended for fine-tuning)
    "criterion.loss_weights=[10,1e-5]",
    "model.label_rate=50",
    "model.encoder_layers_1=3",
    "model.logit_temp_ctr=0.1",
    "model.ctr_layers=[-6]",
    'model.extractor_mode="default"',
    "optimization.update_freq=[1]",
    "optimization.max_update=100000",
    "lr_scheduler.warmup_updates=8000",
 
]

Error:
AssertionError: number of labels does not match (5567 != 1). The error says that i have only one file in validation dataset but valid.km contrains 5567 rows. 


Is my approach to fine-tuning ContentVec correct, or is there another way to do it?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use ContentVec as pre-trained model. #31

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Use ContentVec as pre-trained model. #31

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions