Skip to content

Request for 'Bad List' of Noisy German Samples #193

@abnera1

Description

@abnera1

Hello,
I am currently working with the German speech recognition data provided by this project and came across the following line in the README:
"In addition, we also filter out samples that are considered 'noisy', that is, samples having very high WER (word error rate) or CER (character error rate) w.r.t. a previously trained German model."
Unfortunately, I do not have access to a pre-trained German model to calculate the WER or CER for my dataset. This makes it challenging for me to filter out the noisy samples effectively.
Could you please provide a list of these 'noisy' samples or the criteria used to identify them?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions