Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@ All notable changes to this project will be documented in this file. The format

## [Unreleased]

### Added

- **Word-level Media Overlay sync** (M6.5). When `--transcribe` runs and the cleanup path is active, dpub now extracts per-token timestamps from whisper.cpp, coalesces BPE pieces back into whole words via a leading-space rule (with punctuation attachment and degenerate-timing clamping), wraps each word in a `<span id="w-NNN-MMM-KKK">` inside the cleaned `<p id="tx-NNN-MMM">`, and emits one SMIL `<par>` per word — wrapped in nested `<seq epub:textref="...#tx-...">` per paragraph. The result is karaoke-style highlight-along-with-audio in compatible reading systems (Thorium, Readium). Default-on; pass `--no-word-sync` to fall back to per-paragraph sync. Workspace EPUBCheck assertions extended to gate the new overlay shape; reference book stays 0/0/0.
- `dpub-whisper` exposes a public `Word { start_seconds, end_seconds, text }` struct and `Segment.words: Vec<Word>` populated by the new BPE coalescer (`crates/dpub-whisper/src/words.rs`). Eight unit tests cover the BPE coalescing rules.

## [0.5.0] - 2026-05-06

First tagged release. Feature-complete for the v1 candidate: DAISY 2.02 → EPUB 3 conversion with Media Overlays, EPUBCheck-clean output, ACE accessibility validation, MP3 → Opus audio recompression, local Whisper transcription with prose-shaped paragraph cleanup, automatic and explicit cover lookup, parallel batch conversion, JSON output for CI/pipeline use. No API stability commitment yet — that comes with 1.0.
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ Walks the input directory for every `ncc.html`, converts each book in parallel v
| **M4** | Built-in validation (EPUBCheck + ACE) — `dpub validate`, `dpub a11y`. ✅ |
| **M5** | Audio recompression (MP3 → Opus) — `dpub convert --audio opus --bitrate <kbps>`. ✅ |
| **M6** | Whisper transcription for audio-only books — `dpub convert --transcribe <lang> --whisper-model <path>`. ✅ (segments are merged into prose-shaped paragraphs by default; pass `--no-text-cleanup` for raw output) |
| **M6.5** | Word-level Media Overlay sync — karaoke-style highlight-along-with-audio in reading systems that honour Media Overlays. Default-on with `--transcribe`; pass `--no-word-sync` to fall back to per-paragraph sync. ✅ |
| **Tier 1 polish** | Whisper model caching, cover lookup (`--cover` and `--auto-cover`), parallel batch mode, JSON output for validators. ✅ |
| **M7** | WASM build for browser-based conversion (planned scope: `info` + `validate` only — Whisper / ffmpeg are too heavy for a browser tab). |
| **M8** | 1.0 release: signed binaries for macOS / Linux / Windows. |
Expand Down
13 changes: 13 additions & 0 deletions crates/dpub-cli/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,14 @@ enum Command {
/// output; not recommended for distribution.
#[arg(long)]
no_text_cleanup: bool,
/// Skip per-word Media Overlay sync. Word-level sync (the
/// default for transcribed books) drives karaoke-style
/// highlight-along-with-audio in compatible reading systems
/// (Thorium, Readium). Pass this flag to fall back to
/// per-paragraph sync — produces a smaller SMIL at the cost
/// of a coarser reading experience.
#[arg(long)]
no_word_sync: bool,
/// Path to a JPEG or PNG image to embed as the EPUB cover.
#[arg(long, value_name = "PATH", conflicts_with = "auto_cover")]
cover: Option<PathBuf>,
Expand Down Expand Up @@ -150,6 +158,7 @@ fn main() -> Result<()> {
transcribe,
whisper_model,
no_text_cleanup,
no_word_sync,
cover,
auto_cover,
rights,
Expand All @@ -163,6 +172,7 @@ fn main() -> Result<()> {
transcribe,
whisper_model,
no_text_cleanup,
no_word_sync,
cover,
auto_cover,
rights,
Expand Down Expand Up @@ -190,6 +200,7 @@ fn cmd_convert(
transcribe: Option<String>,
whisper_model: Option<PathBuf>,
no_text_cleanup: bool,
no_word_sync: bool,
cover: Option<PathBuf>,
auto_cover: bool,
rights: Option<String>,
Expand Down Expand Up @@ -255,6 +266,7 @@ fn cmd_convert(
cover,
auto_cover,
rights,
no_word_sync,
};
let start = std::time::Instant::now();
dpub_convert::convert_to_file(&book, output, &opts)
Expand Down Expand Up @@ -545,6 +557,7 @@ fn cmd_batch(
cover: None,
auto_cover: false,
rights: None,
no_word_sync: false,
};
let start = std::time::Instant::now();
let entries: Vec<BatchEntry> = books
Expand Down
Loading
Loading