Request for 'Bad List' of Noisy German Samples

Hello,
I am currently working with the German speech recognition data provided by this project and came across the following line in the README:
"In addition, we also filter out samples that are considered 'noisy', that is, samples having very high WER (word error rate) or CER (character error rate) w.r.t. a previously trained German model."
Unfortunately, I do not have access to a pre-trained German model to calculate the WER or CER for my dataset. This makes it challenging for me to filter out the noisy samples effectively.
Could you please provide a list of these 'noisy' samples or the criteria used to identify them? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Request for 'Bad List' of Noisy German Samples #193

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Request for 'Bad List' of Noisy German Samples #193

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions