feat(rime): add WebSocket streaming TTS support by mcullan · Pull Request #5663 · livekit/agents

mcullan · 2026-05-06T23:18:28Z

Summary

Adds opt-in WebSocket streaming to the Rime TTS plugin via a new use_websocket=True constructor argument. The existing HTTP synthesize path is unchanged and remains the default. When enabled, the plugin sets streaming=True and aligned_transcript=True during construction, opens a long-lived pooled WebSocket to Rime's /ws3 endpoint, and emits word-level timestamps via push_timed_transcript.

New constructor arguments

use_websocket: bool = False — opt into the streaming path. Off by default so existing consumers see no behavior change.
ws_base_url: str = "wss://users-ws.rime.ai" — overridable for self-hosted deployments, parallel to the existing base_url.
segment: NotGivenOr[str] = NOT_GIVEN — passed to Rime as a connect-time query param. Defaults to "bySentence" (server-side sentence buffering, mirrors StreamAdapter semantics). Pass "immediate" if the consumer is already feeding sentence-tokenized text and wants to skip server-side buffering.
tokenizer: NotGivenOr[tokenize.SentenceTokenizer] = NOT_GIVEN — overridable client-side sentence tokenizer. Defaults to tokenize.blingfire.SentenceTokenizer(). Mirrors the hook Cartesia exposes.

Implementation

The streaming class is similar to the implementation in the Cartesia plugin: single-context JSON-envelope WebSocket, base64 PCM audio frames, weakref.WeakSet[SynthesizeStream] for cleanup, utils.ConnectionPool[aiohttp.ClientWebSocketResponse] with max_session_duration=300 and mark_refreshed_on_get=True. Word timestamps are pushed as TimedString.

Connection lifecycle:

_connect_ws opens the pooled WebSocket using the URL built from current options. Connect-time errors propagate to the outer _run exception block, which classifies aiohttp.ClientResponseError (covering WSServerHandshakeError) as APIStatusError with the HTTP status code preserved.
_close_ws follows the graceful-shutdown pattern in the Deepgram plugin: send the eos operation, wait one second for the server's ack, suppress-and-log any send or recv errors during teardown so they don't mask the original cause that evicted the connection from the pool.
update_options invalidates the pool when the WebSocket URL changes, computed via a before/after _ws_url() diff. This automatically handles model swaps, speaker swaps, and any per-model option that participates in the URL.

A small _model_params(opts) helper consolidates the per-model option walking shared between the WebSocket query string and the HTTP JSON body.

Routes through /ws3, which accepts every model the plugin supports (mistv2, mistv3, arcana). The older /ws2 endpoint is not wired in.

Validating

update_options mid-session: model swap drops the existing pooled connection and reconnects with the new URL. Verified by observing two distinct _connect_ws calls and matching audio output.
Error propagation: invalid API key surfaces as APIStatusError(status_code=401) with the server message preserved, rather than a generic APIConnectionError.
Empty-input fast-fail: tts.stream() followed by end_input() with no push_text() raises APIError immediately at the protocol layer rather than hanging on the receive timeout.
Pool reuse: streams created within the max_session_duration window share the same WebSocket — no new handshake.
HTTP path unchanged: with use_websocket=False (default), synthesize() behavior is identical to before; _run payload assembly continues to use the same _model_params helper plus HTTP-only fields (samplingRate, reduceLatency for mistv2).

Adds opt-in WS streaming to the Rime TTS plugin via use_websocket=True. Pattern mirrors the Cartesia plugin: single-context JSON+base64 WS, ConnectionPool with mark_refreshed_on_get=True, blingfire sentence tokenizer, weakref.WeakSet for stream cleanup. - New SynthesizeStream class with input/send/recv task split - _connect_ws / _close_ws (eos shutdown, mirrors Deepgram) - _model_params helper consolidates the arcana/mist option-walking shared between the WS query string and the HTTP body - update_options invalidates the pool when the WS URL changes, computed via before/after _ws_url() diff - Capabilities flips streaming and aligned_transcript on with the flag - Routes to /ws3 only (mistv2 stays HTTP-only)

mcullan force-pushed the feat/rime-tts-websocket branch from 012c6ec to 50518df Compare May 6, 2026 23:22

This comment was marked as resolved.

Sign in to view

fix(rime): synchronize WS recv with first send to avoid spurious timeout

1a914f6

This comment was marked as resolved.

Sign in to view

fix(rime): encode WS query bools as lowercase strings

04a9c02

This comment was marked as resolved.

Sign in to view

mcullan added 2 commits May 7, 2026 10:36

style(rime): ruff format

f06a0c0

fix(rime): exit cleanly on empty WS input instead of raising

b5b5e2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rime): add WebSocket streaming TTS support#5663

feat(rime): add WebSocket streaming TTS support#5663
mcullan wants to merge 5 commits intolivekit:mainfrom
rimelabs:feat/rime-tts-websocket

mcullan commented May 6, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mcullan commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New constructor arguments

Implementation

Validating

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mcullan commented May 6, 2026 •

edited

Loading