feat(audio): live speech-to-text streaming transcriber#77
Open
VineeTagarwaL-code wants to merge 24 commits intomainfrom
Open
feat(audio): live speech-to-text streaming transcriber#77VineeTagarwaL-code wants to merge 24 commits intomainfrom
VineeTagarwaL-code wants to merge 24 commits intomainfrom
Conversation
Design for a streaming transcriber (jigsaw.audio.speech_to_text_live) that accepts a WritableStream of PCM16 audio, internally chunks with overlap, and stitches SSE transcripts into delta/turn events. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removed from config, types, stitcher responsibility, file-tree comment, and stitcher test list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TDD plan in 10 tasks: fetchJSSStream extension, Chunker, Stitcher, SSE transport, LiveSTT types, Transcriber class, API wiring, cleanup (drop node-record-lpcm16 from deps), opt-in integration test, README. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collaborator
Author
|
When its ready to merge we will delete docs, it will come in handy while reviewing or doing more over it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
jigsaw.audio.speech_to_text_live(config?)— a live streaming transcriber that accepts a WritableStream of mono PCM16 audio and emitsopen/delta/turn/warning/error/closeevents. Internally handles WAV framing, chunk overlap, token-level stitching, and SSE parsing against/v1/ai/transcribe?stream=true.Public API
Full working example at
examples/live-mic.js.Design
Chunkerbuffers PCM16 bytes with configurable overlap (default 5s chunks, 2s overlap), produces WAV chunks, supports buffer-overflow frame dropping (warning emitted).Stitcherdetects token-level overlap between adjacent chunks with fuzzy matching (single-char substitution / insertion-deletion on tokens ≥ 4 chars) and strips duplicates, preserving punctuation that's genuinely new.transcript.delta/transcript.done/transcript.finalevents via a newRequestClient.fetchJSSStreammethod.Transcribercomposes the above: state machine (idle → open → closing → closed, with errored → closed branch), serial chunk processing (one in-flight at a time, required for ordered stitching), per-chunk 30sAbortControllertimeout, 3 consecutive chunk failures → fatal.Key decisions
languageis not exposed onLiveSTTConfig, always sent aseninternally.channelsis not exposed; users with stereo sources must downmix before piping (seedownmixToMonohelper intests/audio-live.test.ts). Exposingchannelswas a correctness hazard if mismatched with actual audio.node-record-lpcm16, file, etc.). Keeps the SDK bundle lean and platform-agnostic.prevTranscriptstate ordering; parallel requests would produce out-of-order stitches.closeevent fires exactly once — guaranteed viaemitClose()helper, including after fatal errors.Config
Test plan
tests/live/(chunker, stitcher, sse, transcriber, request-stream). All green.yarn test:allglob extended to includetests/live/*.tsso they run in CI.yarn build).tests/audio-live.test.ts(gated onJIGSAWSTACK_API_KEY, runnable viayarn test:audio:live) — passed against staging.Docs
docs/superpowers/specs/2026-04-21-live-stt-design.mddocs/superpowers/plans/2026-04-21-live-stt.mdexamples/live-mic.jsreferenceBackward compatibility
Purely additive. Existing
speech_to_text,SpeechToTextParams,SpeechToTextResponseuntouched.🤖 Generated with Claude Code