Skip to content

feat(rime): add WebSocket streaming TTS support#5663

Open
mcullan wants to merge 5 commits intolivekit:mainfrom
rimelabs:feat/rime-tts-websocket
Open

feat(rime): add WebSocket streaming TTS support#5663
mcullan wants to merge 5 commits intolivekit:mainfrom
rimelabs:feat/rime-tts-websocket

Conversation

@mcullan
Copy link
Copy Markdown
Contributor

@mcullan mcullan commented May 6, 2026

Summary

Adds opt-in WebSocket streaming to the Rime TTS plugin via a new use_websocket=True constructor argument. The existing HTTP synthesize path is unchanged and remains the default. When enabled, the plugin sets streaming=True and aligned_transcript=True during construction, opens a long-lived pooled WebSocket to Rime's /ws3 endpoint, and emits word-level timestamps via push_timed_transcript.

New constructor arguments

  • use_websocket: bool = False — opt into the streaming path. Off by default so existing consumers see no behavior change.
  • ws_base_url: str = "wss://users-ws.rime.ai" — overridable for self-hosted deployments, parallel to the existing base_url.
  • segment: NotGivenOr[str] = NOT_GIVEN — passed to Rime as a connect-time query param. Defaults to "bySentence" (server-side sentence buffering, mirrors StreamAdapter semantics). Pass "immediate" if the consumer is already feeding sentence-tokenized text and wants to skip server-side buffering.
  • tokenizer: NotGivenOr[tokenize.SentenceTokenizer] = NOT_GIVEN — overridable client-side sentence tokenizer. Defaults to tokenize.blingfire.SentenceTokenizer(). Mirrors the hook Cartesia exposes.

Implementation

The streaming class is similar to the implementation in the Cartesia plugin: single-context JSON-envelope WebSocket, base64 PCM audio frames, weakref.WeakSet[SynthesizeStream] for cleanup, utils.ConnectionPool[aiohttp.ClientWebSocketResponse] with max_session_duration=300 and mark_refreshed_on_get=True. Word timestamps are pushed as TimedString.

Connection lifecycle:

  • _connect_ws opens the pooled WebSocket using the URL built from current options. Connect-time errors propagate to the outer _run exception block, which classifies aiohttp.ClientResponseError (covering WSServerHandshakeError) as APIStatusError with the HTTP status code preserved.
  • _close_ws follows the graceful-shutdown pattern in the Deepgram plugin: send the eos operation, wait one second for the server's ack, suppress-and-log any send or recv errors during teardown so they don't mask the original cause that evicted the connection from the pool.
  • update_options invalidates the pool when the WebSocket URL changes, computed via a before/after _ws_url() diff. This automatically handles model swaps, speaker swaps, and any per-model option that participates in the URL.

A small _model_params(opts) helper consolidates the per-model option walking shared between the WebSocket query string and the HTTP JSON body.

Routes through /ws3, which accepts every model the plugin supports (mistv2, mistv3, arcana). The older /ws2 endpoint is not wired in.

Validating

  • update_options mid-session: model swap drops the existing pooled connection and reconnects with the new URL. Verified by observing two distinct _connect_ws calls and matching audio output.
  • Error propagation: invalid API key surfaces as APIStatusError(status_code=401) with the server message preserved, rather than a generic APIConnectionError.
  • Empty-input fast-fail: tts.stream() followed by end_input() with no push_text() raises APIError immediately at the protocol layer rather than hanging on the receive timeout.
  • Pool reuse: streams created within the max_session_duration window share the same WebSocket — no new handshake.
  • HTTP path unchanged: with use_websocket=False (default), synthesize() behavior is identical to before; _run payload assembly continues to use the same _model_params helper plus HTTP-only fields (samplingRate, reduceLatency for mistv2).

Adds opt-in WS streaming to the Rime TTS plugin via use_websocket=True.
Pattern mirrors the Cartesia plugin: single-context JSON+base64 WS,
ConnectionPool with mark_refreshed_on_get=True, blingfire sentence
tokenizer, weakref.WeakSet for stream cleanup.

- New SynthesizeStream class with input/send/recv task split
- _connect_ws / _close_ws (eos shutdown, mirrors Deepgram)
- _model_params helper consolidates the arcana/mist option-walking
  shared between the WS query string and the HTTP body
- update_options invalidates the pool when the WS URL changes,
  computed via before/after _ws_url() diff
- Capabilities flips streaming and aligned_transcript on with the flag
- Routes to /ws3 only (mistv2 stays HTTP-only)
@mcullan mcullan force-pushed the feat/rime-tts-websocket branch from 012c6ec to 50518df Compare May 6, 2026 23:22
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant