-
Notifications
You must be signed in to change notification settings - Fork 38
Description
Hey guys!
I am working on fine-tuning a model produced by ContentVec. Firstly, I'm unsure if this is possible, but I've reached a stage where I encounter an error related to unmatched labels. How can I generate these labels for train and valid.km? And what are these labels?
command = [
"python", "-u", "./fairseq/fairseq_cli/hydra_train.py",
"--config-dir", config_dir,
"--config-name", 'contentvec',
f"hydra.run.dir={expdir}",
f"task.data={metadata_dir}",
f"task.label_dir={label_dir}",
'task.labels=["km"]',
f"task.spk2info={spk}",
"task.crop=true",
"dataset.train_subset=train",
"dataset.valid_subset=valid",
"dataset.num_workers=10",
"dataset.max_tokens=500000",
"checkpoint.keep_best_checkpoints=10",
f"checkpoint.restore_file={pretrained_model_checkpoint}", # Restore from the pre-trained model
"checkpoint.reset_optimizer=true", # Reset optimizer (optional, but recommended for fine-tuning)
"criterion.loss_weights=[10,1e-5]",
"model.label_rate=50",
"model.encoder_layers_1=3",
"model.logit_temp_ctr=0.1",
"model.ctr_layers=[-6]",
'model.extractor_mode="default"',
"optimization.update_freq=[1]",
"optimization.max_update=100000",
"lr_scheduler.warmup_updates=8000",
]
Error:
AssertionError: number of labels does not match (5567 != 1). The error says that i have only one file in validation dataset but valid.km contrains 5567 rows.
Is my approach to fine-tuning ContentVec correct, or is there another way to do it?