This project has moved to the corticph organization. The version on this page outdated.
You find the new project HERE.
Text-to-text alignment algorithm for speech recognition error analysis. ErrorAlign helps you dig deeper into your speech recognition projects by accurately aligning each word in a reference transcript with the model-generated transcript. Unlike traditional methods, such as Levenshtein-based alignment, it is not restricted to simple one-to-one alignment, but can map a single reference word to multiple words or subwords in the model output. This enables quick and reliable identification of error patterns in rare words, names, or domain-specific terms that matter most for your application.
Contents | Installation | Quickstart | Work-in-Progress | Citation and Research |
pip install error-align
from error_align import error_align
ref = "Some things are worth noting!"
hyp = "Something worth nothing period?"
alignments = error_align(ref, hyp)Resulting alignments:
Alignment(SUBSTITUTE: "Some" -> "Some"-),
Alignment(SUBSTITUTE: "things" -> -"thing"),
Alignment(DELETE: "are"),
Alignment(MATCH: "worth" == "worth"),
Alignment(SUBSTITUTE: "noting" -> "nothing"),
Alignment(INSERT: "period")- Optimization for longform text.
- Efficient word-level first-pass.
- C++ version with Python bindings.
@article{borgholt2021alignment,
title={A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems},
author={Borgholt, Lasse and Havtorn, Jakob and Igel, Christian and Maal{\o}e, Lars and Tan, Zheng-Hua},
journal={arXiv preprint arXiv:2509.24478},
year={2025}
}
To reproduce results from the paper:
- Install with extra evaluation dependencies - only supported with Python 3.12:
pip install error-align[evaluation]
- Clone this repository:
git clone https://github.com/borgholt/error-align.git
- Navigate to the evaluation directory:
cd error-align/evaluation
- Transcribe a dataset for evaluation. For example:
python transcribe_dataset.py --model_name whisper --dataset_name commonvoice --language_code fr
- Run evaluation script on the output file. For example:
python evaluate_dataset.py --transcript_file transcribed_data/whisper_commonvoice_test_fr.parquet
Notes:
- To reproduce results on the
primock57dataset, first run:python prepare_primock57.py. - Use the
--helpflag to see all available options fortranscribe_dataset.pyandevaluate_dataset.py. - All results reported in the paper are based on the test sets.
Collaborators: