Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 24 additions & 17 deletions .github/assets/logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
38 changes: 38 additions & 0 deletions .github/workflows/build_wheels.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Build wheels

on:
workflow_dispatch:
push:
tags:
- "v*"

jobs:
build_wheels:
name: Build wheels on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]

env:
CIBW_ARCHS_LINUX: "x86_64"

steps:
- uses: actions/checkout@v4

- name: Install Python
uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Install cibuildwheel
run: pip install cibuildwheel==2.22.0

- name: Build wheels
run: cibuildwheel --output-dir wheelhouse

- name: Upload wheels as artifacts
uses: actions/upload-artifact@v4
with:
name: wheels-${{ matrix.os }}
path: wheelhouse/*.whl
5 changes: 3 additions & 2 deletions .github/workflows/tests_and_coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,11 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install poetry
poetry install --with test
poetry install --with test,dev
poetry run pip install .

- name: Run tests with coverage
run: poetry run pytest
run: poetry run pytest --cov=error_align --cov-report=xml

- name: Upload coverage to Codecov
if: matrix.python-version == '3.12'
Expand Down
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ repos:
rev: v6.0.0
hooks:
- id: check-added-large-files # Check for large files.
exclude: ^src/error_align/baselines/power/cmudict.rep.json$
- id: check-ast # Check Python file syntax using the ast module.
- id: check-json # Check JSON file syntax.
- id: check-merge-conflict # Check for merge conflict strings.
Expand All @@ -18,6 +19,7 @@ repos:
hooks:
- id: ruff-format
- id: ruff-check
args: ["--fix"]
- repo: https://github.com/asottile/yesqa
rev: v1.5.0
hooks:
Expand Down
20 changes: 20 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
cmake_minimum_required(VERSION 3.18)
project(error_align LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

find_package(pybind11 CONFIG REQUIRED)

# Build the module
pybind11_add_module(_cpp_edit_distance cpp/edit_distance.cpp)
pybind11_add_module(_cpp_beam_search cpp/beam_search.cpp)

# Install it under the Python package directory (src/error_align)
install(
TARGETS
_cpp_edit_distance
_cpp_beam_search
LIBRARY DESTINATION error_align
RUNTIME DESTINATION error_align
)
15 changes: 4 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<p align="center">
<img src="https://raw.githubusercontent.com/corticph/error-align/refs/heads/main/.github/assets/logo.svg" alt="ErrorAlign Logo" width="85%"/>
<img src=".github/assets/logo.svg" alt="ErrorAlign Logo" width="100%"/>
</p>

<p align="center">
Expand All @@ -14,9 +14,11 @@

**Text-to-text alignment algorithm for speech recognition error analysis.** ErrorAlign helps you dig deeper into your speech recognition projects by accurately aligning each word in a reference transcript with the model-generated transcript. Unlike traditional methods, such as Levenshtein-based alignment, it is not restricted to simple one-to-one alignment, but can map a single reference word to multiple words or subwords in the model output. This enables quick and reliable identification of error patterns in rare words, names, or domain-specific terms that matter most for your application.

→ **Update [2025-12-10]:** As of version `0.1.0b5`, `error-align` will include a word-level pass to efficiently identify unambiguous matches, along with C++ extensions to accelerate beam search and backtrace construction. The combined speedup is ~15× over the pure-Python implementation ⚡

__Contents__ | [Installation](#installation) | [Quickstart](#quickstart) | [Work-in-Progress](#wip) | [Citation and Research](#citation) |
[//]: <> (https://raw.githubusercontent.com/corticph/error-align/refs/heads/main/.github/assets/logo_gpt.svg)

__Contents__ | [Installation](#installation) | [Quickstart](#quickstart) | [Citation and Research](#citation) |



Expand Down Expand Up @@ -51,15 +53,6 @@ Alignment(SUBSTITUTE: "noting" -> "nothing"),
Alignment(INSERT: "period")
```

<a name="wip">

## Work-in-Progress

- Optimization for longform text.
- Efficient word-level first-pass.
- C++ version with Python bindings.



<a name="citation">

Expand Down
11 changes: 10 additions & 1 deletion codecov.yml
Original file line number Diff line number Diff line change
@@ -1,2 +1,11 @@
ignore:
- "src/error_align/baselines/*"
- "src/error_align/baselines/*"

coverage:
status:
project:
default:
informational: true
patch:
default:
informational: true
Loading