feat(audio): live speech-to-text streaming transcriber by VineeTagarwaL-code · Pull Request #77 · JigsawStack/jigsawstack-js

VineeTagarwaL-code · 2026-04-21T02:15:18Z

Summary

Adds jigsaw.audio.speech_to_text_live(config?) — a live streaming transcriber that accepts a WritableStream of mono PCM16 audio and emits open / delta / turn / warning / error / close events. Internally handles WAV framing, chunk overlap, token-level stitching, and SSE parsing against /v1/ai/transcribe?stream=true.

Public API

import { Readable } from "stream";
import recorder from "node-record-lpcm16";
import { JigsawStack } from "jigsawstack";

const jigsaw = JigsawStack({ apiKey: process.env.JIGSAWSTACK_API_KEY });
const transcriber = jigsaw.audio.speech_to_text_live({ sampleRate: 16000 });

transcriber.on("delta", ({ text }) => process.stdout.write(`\r… ${text}`));
transcriber.on("turn",  ({ text }) => console.log(`\n${text}`));

await transcriber.connect();

const rec = recorder.record({ sampleRate: 16000, channels: 1, audioType: "raw" });
Readable.toWeb(rec.stream()).pipeTo(transcriber.stream());

process.on("SIGINT", async () => { rec.stop(); await transcriber.close(); process.exit(); });

Full working example at examples/live-mic.js.

Design

Chunker buffers PCM16 bytes with configurable overlap (default 5s chunks, 2s overlap), produces WAV chunks, supports buffer-overflow frame dropping (warning emitted).
Stitcher detects token-level overlap between adjacent chunks with fuzzy matching (single-char substitution / insertion-deletion on tokens ≥ 4 chars) and strips duplicates, preserving punctuation that's genuinely new.
SSE transport posts WAV + parses transcript.delta / transcript.done / transcript.final events via a new RequestClient.fetchJSSStream method.
Transcriber composes the above: state machine (idle → open → closing → closed, with errored → closed branch), serial chunk processing (one in-flight at a time, required for ordered stitching), per-chunk 30s AbortController timeout, 3 consecutive chunk failures → fatal.

Key decisions

Streaming is English-only per JigsawStack docs — language is not exposed on LiveSTTConfig, always sent as en internally.
Mono audio required — channels is not exposed; users with stereo sources must downmix before piping (see downmixToMono helper in tests/audio-live.test.ts). Exposing channels was a correctness hazard if mismatched with actual audio.
No mic library in SDK deps — users bring their own audio source (browser MediaStream, Node node-record-lpcm16, file, etc.). Keeps the SDK bundle lean and platform-agnostic.
Serial chunk requests, not parallel — overlap stitching needs prevTranscript state ordering; parallel requests would produce out-of-order stitches.
close event fires exactly once — guaranteed via emitClose() helper, including after fatal errors.

Config

interface LiveSTTConfig {
  translate?: boolean;       // default false
  vad?: boolean;             // default true
  vadThreshold?: number;     // default 0.4
  sampleRate?: number;       // default 16000
  chunkSeconds?: number;     // default 5
  overlapSeconds?: number;   // default 2
  maxBufferSeconds?: number; // default 30
}

Test plan

36 unit tests across tests/live/ (chunker, stitcher, sse, transcriber, request-stream). All green.
yarn test:all glob extended to include tests/live/*.ts so they run in CI.
Build clean (yarn build).
Biome formatter applied.
Opt-in live integration test at tests/audio-live.test.ts (gated on JIGSAWSTACK_API_KEY, runnable via yarn test:audio:live) — passed against staging.
Manual verification via real mic against the live API (see demo walkthrough in the feature branch's dev history).

Docs

Design spec: docs/superpowers/specs/2026-04-21-live-stt-design.md
Implementation plan: docs/superpowers/plans/2026-04-21-live-stt.md
README section added
examples/live-mic.js reference

Backward compatibility

Purely additive. Existing speech_to_text, SpeechToTextParams, SpeechToTextResponse untouched.

🤖 Generated with Claude Code

Design for a streaming transcriber (jigsaw.audio.speech_to_text_live) that accepts a WritableStream of PCM16 audio, internally chunks with overlap, and stitches SSE transcripts into delta/turn events. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Removed from config, types, stitcher responsibility, file-tree comment, and stitcher test list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

TDD plan in 10 tasks: fetchJSSStream extension, Chunker, Stitcher, SSE transport, LiveSTT types, Transcriber class, API wiring, cleanup (drop node-record-lpcm16 from deps), opt-in integration test, README. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…t config

VineeTagarwaL-code · 2026-04-21T02:16:38Z

When its ready to merge we will delete docs, it will come in handy while reviewing or doing more over it.

VineeTagarwaL-code and others added 24 commits April 21, 2026 06:04

docs: clarify hallucinationPhrases config in live stt design

311635c

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs: drop hallucinationPhrases from live stt design

cd1b6d1

Removed from config, types, stitcher responsibility, file-tree comment, and stitcher test list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs: make live-stt plan task 8 robust to starting branch state

bf2ff10

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(request): add fetchJSSStream for raw Response access

3630bf1

feat(audio/live): add PCM16 Chunker with overlap + WAV framing

11e5abf

fix(audio/live): adjust pendingChunkBytes on overflow drops

a22b1fa

feat(audio/live): add Stitcher with token overlap + fuzzy match

752b3f8

style(audio/live): sort stitcher test imports for biome

60aa05c

style(tests/live): sort imports for biome in chunker + request-stream…

d38da93

… tests

feat(audio/live): add SSE transcript parser

83088b6

feat(audio): add LiveSTT types

471da93

feat(audio/live): add Transcriber with state machine + events

38a5951

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(audio/live): restore maxBufferSeconds guard + adjust overflow tes…

7978536

…t config

style(audio/live): apply biome formatter

8796bbf

feat(audio): expose speech_to_text_live on audio namespace

dd2d466

chore: add examples/live-mic.js reference

76807ff

test(audio): add opt-in live STT integration test

c62d9d7

docs: document speech_to_text_live in README

083c130

fix: run tests/live in test:all + biome sort for live-mic example

f4892ee

fix(audio/live): strip trailing punctuation shared with previous commit

89427e3

feat(audio/live): expose vad toggle + document streaming is English-only

e78bcd9

feat(audio/live): drop language + channels config (hardcode en + mono)

3c84df7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(audio): live speech-to-text streaming transcriber#77

feat(audio): live speech-to-text streaming transcriber#77
VineeTagarwaL-code wants to merge 24 commits intomainfrom
feat/live-stt

VineeTagarwaL-code commented Apr 21, 2026

Uh oh!

VineeTagarwaL-code commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

VineeTagarwaL-code commented Apr 21, 2026

Summary

Public API

Design

Key decisions

Config

Test plan

Docs

Backward compatibility

Uh oh!

VineeTagarwaL-code commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant