-
Notifications
You must be signed in to change notification settings - Fork 53
Open
Description
Hello,
I am currently working with the German speech recognition data provided by this project and came across the following line in the README:
"In addition, we also filter out samples that are considered 'noisy', that is, samples having very high WER (word error rate) or CER (character error rate) w.r.t. a previously trained German model."
Unfortunately, I do not have access to a pre-trained German model to calculate the WER or CER for my dataset. This makes it challenging for me to filter out the noisy samples effectively.
Could you please provide a list of these 'noisy' samples or the criteria used to identify them?
Metadata
Metadata
Assignees
Labels
No labels