On-disk Whisper transcription cache#34
Merged
Merged
Conversation
Repeat runs of `dpub convert --transcribe` against the same audio + model + language combination skip Whisper entirely. The cache lives in `~/.cache/dpub/transcripts/` (Unix) / `%LOCALAPPDATA%\dpub\transcripts\` (Windows); one JSON file per (audio, model, language) tuple keyed by SHA-256 of the inputs. Modifying any input invalidates the entry naturally — no manual cache management. Failures are non-fatal: corrupt cache files, IO errors, disk-full all log a warning and degrade silently to a fresh transcription. Set `DPUB_NO_TRANSCRIPT_CACHE=1` to bypass entirely (debugging). End-to-end measured on the 4h22m cavia book: - cold run: 722 s (Whisper on 109 audio files) - warm run: 21 s (109/109 cache hits) - 34× speedup Most of the warm-run time is Opus re-encoding + ZIP write; the cache lookup is dominated by audio file hashing (~ms per MB). Implementation: - `Segment` and `Word` in dpub-whisper now derive `serde::Deserialize` alongside the existing `Serialize`. Round-trip prerequisite. - New `transcript_cache` module in dpub-convert (~280 lines, 8 unit tests). `CachedTranscriber` wraps `dpub_whisper::Transcriber`, hashes the model once at construction, hashes audio per call, and stores a JSON envelope with diagnostic metadata + the segment payload. - `inject_transcripts` swaps in `CachedTranscriber`; the existing in-memory `HashMap<basename, Vec<Segment>>` cache stays so we don't re-hash audio across sections that share a file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Measured speedup (cavia book, 4h 22m audio, 109 sections)
EPUBCheck stays clean (0/0/0).
Implementation
Test plan
🤖 Generated with Claude Code