Skip to content

Use ContentVec as pre-trained model. #31

@Abedalhkeem-z

Description

@Abedalhkeem-z

Hey guys!

I am working on fine-tuning a model produced by ContentVec. Firstly, I'm unsure if this is possible, but I've reached a stage where I encounter an error related to unmatched labels. How can I generate these labels for train and valid.km? And what are these labels?

command = [
"python", "-u", "./fairseq/fairseq_cli/hydra_train.py",
"--config-dir", config_dir,
"--config-name", 'contentvec',
f"hydra.run.dir={expdir}",
f"task.data={metadata_dir}",
f"task.label_dir={label_dir}",
'task.labels=["km"]',
f"task.spk2info={spk}",
"task.crop=true",
"dataset.train_subset=train",
"dataset.valid_subset=valid",
"dataset.num_workers=10",
"dataset.max_tokens=500000",
"checkpoint.keep_best_checkpoints=10",
f"checkpoint.restore_file={pretrained_model_checkpoint}", # Restore from the pre-trained model
"checkpoint.reset_optimizer=true", # Reset optimizer (optional, but recommended for fine-tuning)
"criterion.loss_weights=[10,1e-5]",
"model.label_rate=50",
"model.encoder_layers_1=3",
"model.logit_temp_ctr=0.1",
"model.ctr_layers=[-6]",
'model.extractor_mode="default"',
"optimization.update_freq=[1]",
"optimization.max_update=100000",
"lr_scheduler.warmup_updates=8000",

]

Error:
AssertionError: number of labels does not match (5567 != 1). The error says that i have only one file in validation dataset but valid.km contrains 5567 rows.

Is my approach to fine-tuning ContentVec correct, or is there another way to do it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions