-
Notifications
You must be signed in to change notification settings - Fork 2.7k
update inference models to match the latest #4597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughUpdates to model/type declarations across LLM, STT, and TTS inference modules: LLM models rebalanced (OpenAI/Google/DeepSeek/Qwen/LLM union), STT adds Elevenlabs and modifies Deepgram/AssemblyAI, and TTS adds Deepgram plus Cartesia/Inworld/Rime model adjustments and option TypedDicts. Changes
Sequence Diagram(s)(Skipped — changes are type/enum/typing updates rather than new multi-component control flows.) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@livekit-agents/livekit/agents/inference/llm.py`:
- Around line 64-69: The DeepSeekModels literal includes an unsupported
identifier; update the DeepSeekModels union to remove or replace
"deepseek-ai/deepseek-v3.2" — change it to the supported
"deepseek-ai/deepseek-v3.1" (or remove that entry entirely) so LLMModels (which
unions OpenAIModels, GoogleModels, KimiModels, DeepSeekModels) only references
valid DeepSeek model IDs; edit the DeepSeekModels definition accordingly.
In `@livekit-agents/livekit/agents/inference/stt.py`:
- Around line 29-43: Update the Deepgram model identifier in the DeepgramModels
Literal to match Deepgram's API by replacing "deepgram/flux-general" with
"deepgram/flux-general-en"; specifically edit the DeepgramModels definition in
livekit/agents/inference/stt.py so it aligns with the plugin's V2Models and
official docs (refer to the DeepgramModels symbol to locate the change).
In `@livekit-agents/livekit/agents/inference/tts.py`:
- Around line 108-112: The CartesiaOptions TypedDict currently mismatches the
Cartesia plugin: change the speed field from Literal["slow","normal","fast"] to
a numeric type (float) because the plugin checks `isinstance(self._opts.speed,
float)` for sonic-3 models, and change emotion from `str` to a sequence type
(e.g., `Sequence[str]` or `List[str]`) because the plugin accesses
`opts.emotion[0]`; keep volume as `float`. Update the CartesiaOptions definition
accordingly so it aligns with the plugin's expectations.
🧹 Nitpick comments (1)
livekit-agents/livekit/agents/inference/tts.py (1)
114-115: DeepgramOptions should enumerate supported parameters instead of remaining empty.Deepgram TTS API supports multiple query parameters including
encoding,container,sample_rate,bit_rate,tag,callback,callback_method, andmip_opt_out. The empty TypedDict currently blocks type-safe access to these options.♻️ Optional: widen DeepgramOptions until keys are enumerated
-class DeepgramOptions(TypedDict, total=False): - pass +DeepgramOptions = dict[str, Any]Also applies to: 173-190, 280-282
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
livekit-agents/livekit/agents/inference/llm.pylivekit-agents/livekit/agents/inference/stt.pylivekit-agents/livekit/agents/inference/tts.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-agents/livekit/agents/inference/stt.pylivekit-agents/livekit/agents/inference/tts.pylivekit-agents/livekit/agents/inference/llm.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Implement Model Interface Pattern for STT, TTS, LLM, and Realtime models with provider-agnostic interfaces, fallback adapters for resilience, and stream adapters for different streaming patterns
📚 Learning: 2026-01-16T07:44:56.353Z
Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Implement Model Interface Pattern for STT, TTS, LLM, and Realtime models with provider-agnostic interfaces, fallback adapters for resilience, and stream adapters for different streaming patterns
Applied to files:
livekit-agents/livekit/agents/inference/stt.py
📚 Learning: 2026-01-22T03:28:16.289Z
Learnt from: longcw
Repo: livekit/agents PR: 4563
File: livekit-agents/livekit/agents/beta/tools/end_call.py:65-65
Timestamp: 2026-01-22T03:28:16.289Z
Learning: In code paths that check capabilities or behavior of the LLM processing the current interaction, prefer using the activity's LLM obtained via ctx.session.current_agent._get_activity_or_raise().llm instead of ctx.session.llm. The session-level LLM may be a fallback and not reflect the actual agent handling the interaction. Use the activity LLM to determine capabilities and to make capability checks or feature toggles relevant to the current processing agent.
Applied to files:
livekit-agents/livekit/agents/inference/stt.pylivekit-agents/livekit/agents/inference/tts.pylivekit-agents/livekit/agents/inference/llm.py
📚 Learning: 2026-01-18T01:09:01.847Z
Learnt from: davidzhao
Repo: livekit/agents PR: 4548
File: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/responses/llm.py:77-81
Timestamp: 2026-01-18T01:09:01.847Z
Learning: In the OpenAI responses LLM (`livekit-plugins-openai/livekit/plugins/openai/responses/llm.py`), reasoning effort defaults are intentionally set lower than OpenAI's API defaults for voice interactions: "none" for gpt-5.1/gpt-5.2 and "minimal" for other reasoning-capable models like gpt-5, to avoid enabling reasoning by default in voice contexts.
Applied to files:
livekit-agents/livekit/agents/inference/llm.py
🧬 Code graph analysis (2)
livekit-agents/livekit/agents/inference/stt.py (2)
livekit-plugins/livekit-plugins-assemblyai/livekit/plugins/assemblyai/stt.py (1)
model(113-114)livekit-agents/livekit/agents/stt/stt.py (1)
model(115-124)
livekit-agents/livekit/agents/inference/tts.py (1)
livekit-agents/livekit/agents/inference/stt.py (2)
DeepgramOptions(51-62)model(343-344)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: livekit-plugins-inworld
- GitHub Check: livekit-plugins-openai
- GitHub Check: livekit-plugins-deepgram
- GitHub Check: unit-tests
- GitHub Check: type-check (3.9)
- GitHub Check: type-check (3.13)
🔇 Additional comments (4)
livekit-agents/livekit/agents/inference/stt.py (2)
73-80: Elevenlabs option keys/units verified as correct.
All field names and units inElevenlabsOptionsmatch the official Elevenlabs scribe_v2_realtime API documentation:commit_strategy(enum),include_timestamps(bool),vad_silence_threshold_secs(double in seconds),vad_threshold(double),min_speech_duration_ms(int in milliseconds), andmin_silence_duration_ms(int in milliseconds). These will be correctly recognized by the Elevenlabs provider.
127-133: Gateway indeed usessettings.extra(SDK'sextra_kwargs) for Elevenlabs options—verify code type definitions match.The LiveKit gateway does pass Elevenlabs STT options through
settings.extra(exposed asextra_kwargsin the SDK) as a passthrough to the ElevenLabs API. Valid options includelanguage_code,audio_format,commit_strategy,include_timestamps,vad_silence_threshold_secs, and similar session config parameters forscribe_v2_realtime.However, manual code inspection is needed to verify that the union definitions (lines 127–133), overloads (lines 208–223), and
extra_kwargstyping (lines 253–255) correctly reflect this passthrough pattern and match the expected payload structure.livekit-agents/livekit/agents/inference/tts.py (1)
21-55: All TTS model identifiers are supported by their respective providers and will not cause invalid-model errors. Verification confirms: Cartesia (sonic-3, sonic-2, sonic-turbo, sonic), Deepgram (aura-2, aura), Rime (arcana, mistv2), Inworld (inworld-tts-1.5-max, inworld-tts-1.5-mini, inworld-tts-1-max, inworld-tts-1), and ElevenLabs (eleven_flash_v2, eleven_flash_v2_5, eleven_turbo_v2, eleven_turbo_v2_5, eleven_multilingual_v2) are all officially supported. Code also complies with the 100-character line length requirement.livekit-agents/livekit/agents/inference/llm.py (1)
36-60: No issues found. All model identifiers in theOpenAIModelsandGoogleModelsliterals are confirmed against the LiveKit Inference Gateway contract and match the official supported models exactly. The code is correct.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| DeepgramModels = Literal[ | ||
| "deepgram/flux-general", | ||
| "deepgram/nova-3", | ||
| "deepgram/nova-3-general", | ||
| "deepgram/nova-3-medical", | ||
| "deepgram/nova-2", | ||
| "deepgram/nova-2-general", | ||
| "deepgram/nova-2-medical", | ||
| "deepgram/nova-2-conversationalai", | ||
| "deepgram/nova-2-phonecall", | ||
| ] | ||
| CartesiaModels = Literal["cartesia/ink-whisper",] | ||
| AssemblyAIModels = Literal["assemblyai/universal-streaming",] | ||
| AssemblyAIModels = Literal[ | ||
| "assemblyai/universal-streaming", | ||
| "assemblyai/universal-streaming-multilingual", | ||
| ] | ||
| ElevenlabsModels = Literal["elevenlabs/scribe_v2_realtime",] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Explore the STT file context
cd /tmp/repo 2>/dev/null || cd .
find . -name "stt.py" -type f | head -5Repository: livekit/agents
Length of output: 362
🏁 Script executed:
# Get more context around the reviewed lines
git ls-files | grep -E "(stt|inference)" | head -20Repository: livekit/agents
Length of output: 1163
🏁 Script executed:
# Check if there are any validation utilities for these models
rg -i "deepgram|assemblyai|elevenlabs" --type py -l | head -15Repository: livekit/agents
Length of output: 1012
🌐 Web query:
Deepgram flux-general STT model documentation
💡 Result:
Deepgram’s “Flux general” STT model is documented under the model option flux-general-en (English) and is used via the /v2/listen realtime endpoint (WebSocket). Key docs:
- Models & languages overview (Flux section, model option list):
flux-general-enand what Flux is for (voice agents with model-native turn detection). [1] modelparameter reference (lists Flux + example URL pattern): shows how to setmodel=flux-general-en. [2]- API reference for Flux (“Turn-based Audio (Flux)”,
/v2/listen): handshake, required query params (includingmodel=flux-general-en) and streaming details. [3] - Self-hosted Flux guide (if you’re deploying on-prem): requirements, enabling
[flux], and enablinglisten_v2. [4]
Sources: [1] Models & Languages Overview | Deepgram Docs, [2] Model Options | Deepgram Docs, [3] Turn-based Audio (Flux) API Reference | Deepgram Docs, [4] Using the Flux Model (Self-Hosted) | Deepgram Docs.
🌐 Web query:
AssemblyAI universal-streaming-multilingual model supported
💡 Result:
Yes—AssemblyAI supports a Universal-Streaming multilingual speech-to-text model.
- Model / parameter: set
speech_modeltouniversal-streaming-multilingualin your Streaming WebSocket connection (docs also show this as an allowedspeech_modeloption for streaming sessions). [1], [2] - Currently supported languages (streaming): English, Spanish, French, German, Italian, Portuguese (multilingual streaming is described as beta in the streaming docs). [1], [2], [3]
- Language detection: the multilingual streaming model supports automatic language detection and returns detected language + confidence with final turns/utterances. [1]
Sources: AssemblyAI multilingual streaming docs [1], Streaming Speech-to-Text docs (connection parameters / allowed speech_model values) [2], AssemblyAI product/FAQ pages and release posts confirming the same language set [3].
Citations: [1] (assemblyai.com) [2] (assemblyai.com) [3] (assemblyai.com)
Citations:
- 1: https://www.assemblyai.com/docs/speech-to-text/universal-streaming/multilingual-transcription?utm_source=openai
- 2: https://www.assemblyai.com/docs/universal-streaming?utm_source=openai
- 3: https://www.assemblyai.com/products/streaming-speech-to-text?utm_source=openai
🌐 Web query:
ElevenLabs scribe_v2_realtime STT model documentation
💡 Result:
ElevenLabs’ scribe_v2_realtime is the Scribe v2 Realtime streaming STT model used with the Realtime Speech-to-Text WebSocket API (model id = scribe_v2_realtime). It supports low-latency partial transcripts, word timestamps (optional), VAD-based or manual “commit” control, and PCM/μ-law audio formats. [1][2]
Key official docs pages (most relevant first):
Realtime Speech to Text (WebSocket API reference):
https://elevenlabs.io/docs/api-reference/speech-to-text/v-1-speech-to-text-realtime
Realtime Speech to Text streaming guide (cookbook):
https://elevenlabs.io/docs/developers/guides/cookbooks/speech-to-text/streaming
Client-side streaming guide:
https://elevenlabs.io/docs/developer-guides/cookbooks/speech-to-text/client-side-streaming
Server-side streaming guide:
https://elevenlabs.io/docs/developer-guides/cookbooks/speech-to-text/server-side-streaming
STT capability overview (models/features):
https://elevenlabs.io/docs/capabilities/speech-to-text
Models overview (includes Scribe v2 Realtime summary):
https://elevenlabs.io/docs/overview/models
WebSocket endpoint: wss://api.elevenlabs.io/v1/speech-to-text/realtime (with required model_id=scribe_v2_realtime). Authentication is via xi-api-key header (server) or a short-lived token query param (client). [1]
References:
[1] ElevenLabs API Reference — Realtime Speech to Text (WebSocket): v1/speech-to-text/realtime (elevenlabs.io)
[2] ElevenLabs Docs — Models (Scribe v2 Realtime): overview/models (elevenlabs.io)
Citations:
- 1: https://elevenlabs.io/docs/api-reference/speech-to-text/v-1-speech-to-text-realtime?utm_source=openai
- 2: https://elevenlabs.io/docs/overview/models?utm_source=openai
🏁 Script executed:
# Read the full stt.py file that contains the model definitions
wc -l livekit-agents/livekit/agents/inference/stt.pyRepository: livekit/agents
Length of output: 108
🏁 Script executed:
# Read the file to understand context
cat -n livekit-agents/livekit/agents/inference/stt.py | head -100Repository: livekit/agents
Length of output: 3539
🏁 Script executed:
# Check the Deepgram plugin implementation to see how models are used
cat -n livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/models.pyRepository: livekit/agents
Length of output: 1913
🏁 Script executed:
# Check how Deepgram models are validated/used in the plugin
rg -A 5 -B 5 "flux-general|nova-2|nova-3" livekit-plugins/livekit-plugins-deepgram/ --type pyRepository: livekit/agents
Length of output: 13222
Fix Deepgram "flux-general" identifier—should be "flux-general-en".
Deepgram's Flux model is documented as flux-general-en (with language suffix). The inference gateway definition on line 30 uses "deepgram/flux-general", which will fail. Align with the Deepgram plugin's own V2Models = Literal["flux-general-en"] definition and official Deepgram API docs. AssemblyAI (universal-streaming-multilingual) and ElevenLabs (scribe_v2_realtime) identifiers are correct.
🤖 Prompt for AI Agents
In `@livekit-agents/livekit/agents/inference/stt.py` around lines 29 - 43, Update
the Deepgram model identifier in the DeepgramModels Literal to match Deepgram's
API by replacing "deepgram/flux-general" with "deepgram/flux-general-en";
specifically edit the DeepgramModels definition in
livekit/agents/inference/stt.py so it aligns with the plugin's V2Models and
official docs (refer to the DeepgramModels symbol to locate the change).
| class CartesiaOptions(TypedDict, total=False): | ||
| duration: float # max duration of audio in seconds | ||
| speed: Literal["slow", "normal", "fast"] # default: not specified | ||
| emotion: str | ||
| speed: Literal["slow", "normal", "fast"] | ||
| volume: float | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the file context to understand the CartesiaOptions usage
cat -n livekit-agents/livekit/agents/inference/tts.py | head -130 | tail -40Repository: livekit/agents
Length of output: 172
🏁 Script executed:
# Also check if there are any other references to CartesiaOptions in the codebase
rg "CartesiaOptions" --type py -B 2 -A 5Repository: livekit/agents
Length of output: 4238
🌐 Web query:
Cartesia TTS API documentation options parameters
💡 Result:
Below are the main Cartesia TTS API request options/parameters (from the official docs).
Common (all TTS endpoints)
- Header:
Cartesia-Version(required). Supported versions shown in the API reference include2024-06-10,2024-11-13,2025-04-16. [1][2][3] - Auth:
/tts/bytes,/tts/sse:Authorization: Bearer <access_token>(per examples). [2][3]- WebSocket: API key can be provided via header (
X-API-Key) or via query paramapi_key(useful in browsers). [4]
POST /tts/bytes (stream audio bytes)
Body fields:
model_id(required) [2]transcript(required) [2]voice(required):{ "mode": "id", "id": "<voice_id>" }[2]output_format(required) [2]language(optional enum) [2]generation_config(optional; only affectssonic-3): includesvolume,speed,emotion[2]save(optional bool; default false): when true, response includesCartesia-File-IDheader [2]pronunciation_dict_id(optional; supported bysonic-3and newer) [2]speed(deprecated enum:slow|normal|fast; prefergeneration_config.speedforsonic-3) [2]
POST /tts/sse (Server-Sent Events streaming)
Includes the same core fields as /tts/bytes (model_id, transcript, voice, output_format, language, generation_config, pronunciation_dict_id) plus streaming/timestamps controls: [3]
add_timestamps(bool; word timestamps) [3]add_phoneme_timestamps(bool) [3]use_normalized_timestamps(bool) [3]context_id(optional string) [3]speed(deprecated; same as above) [3]
GET wss://api.cartesia.ai/tts/websocket (bidirectional streaming)
Handshake query params (useful when headers aren’t available):
cartesia_version(alternative toCartesia-Version) [4]api_key(alternative toX-API-Key) [4]
Generation request message example fields include: model_id, transcript, voice, language, context_id, output_format, add_timestamps, continue (for continuation on the same context_id). [4]
output_format options (containers/encodings/sample rates)
The docs show these common options for output_format: [5]
container:raw|wav|mp3sample_rate:8000,16000,22050,24000,44100,48000encoding(required forrawandwav):pcm_f32le,pcm_s16le,pcm_mulaw,pcm_alawbit_rate(required formp3)
Sources: [1] API conventions/versioning, [2] /tts/bytes, [3] /tts/sse, [4] /tts/websocket, [5] output format enums (shown on Cartesia API reference pages).
🏁 Script executed:
# Search for how CartesiaOptions is used/sent to the Cartesia API
rg "generation_config|CartesiaOptions" --type py -B 3 -A 3Repository: livekit/agents
Length of output: 10488
🏁 Script executed:
# Look for Cartesia API client calls or serialization of CartesiaOptions
rg "cartesia|emotion|tts" livekit-agents/livekit/agents/inference/tts.py --type py -i -A 2Repository: livekit/agents
Length of output: 6387
Speed and emotion types don't match Cartesia plugin implementation.
The CartesiaOptions TypedDict defines speed as Literal["slow", "normal", "fast"], but the Cartesia plugin expects floats for sonic-3 models (if not isinstance(self._opts.speed, float)). Additionally, emotion is typed as str but accessed as opts.emotion[0] in the plugin, suggesting indexing that doesn't align with a plain string type. Volume as float is correct. Update the TypedDict to match actual plugin expectations.
🤖 Prompt for AI Agents
In `@livekit-agents/livekit/agents/inference/tts.py` around lines 108 - 112, The
CartesiaOptions TypedDict currently mismatches the Cartesia plugin: change the
speed field from Literal["slow","normal","fast"] to a numeric type (float)
because the plugin checks `isinstance(self._opts.speed, float)` for sonic-3
models, and change emotion from `str` to a sequence type (e.g., `Sequence[str]`
or `List[str]`) because the plugin accesses `opts.emotion[0]`; keep volume as
`float`. Update the CartesiaOptions definition accordingly so it aligns with the
plugin's expectations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@livekit-agents/livekit/agents/inference/tts.py`:
- Around line 41-45: RimeModels Literal no longer includes "rime/mist", causing
mypy failures where tests expect that value; restore backward compatibility by
adding "rime/mist" back into the RimeModels Literal in
livekit/agents/inference/tts.py (the RimeModels symbol) so tests referencing
"rime/mist" pass, or alternatively update all test occurrences in
tests/test_inference_tts_fallback.py to use "rime/mistv2" if you intend to
remove the old alias—prefer restoring "rime/mist" in RimeModels for minimal
change.
♻️ Duplicate comments (2)
livekit-agents/livekit/agents/inference/tts.py (1)
108-112: CartesiaOptions types may not align with Cartesia plugin expectations.This was previously flagged. The Cartesia plugin may expect
speedas afloatfor sonic-3 models (viageneration_config.speed) rather than the deprecated enum, andemotionmay need to be a sequence type based on plugin usage patterns. Verify alignment with how the LiveKit gateway processes these options.livekit-agents/livekit/agents/inference/stt.py (1)
29-37: Deepgram model identifier "flux-general" should be "flux-general-en".This issue was previously flagged. Deepgram's Flux model is documented as
flux-general-en(with language suffix). The identifier"deepgram/flux-general"will likely fail API calls. Align with Deepgram's official API documentation and the plugin'sV2Models = Literal["flux-general-en"]definition.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
livekit-agents/livekit/agents/inference/stt.pylivekit-agents/livekit/agents/inference/tts.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-agents/livekit/agents/inference/stt.pylivekit-agents/livekit/agents/inference/tts.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Implement Model Interface Pattern for STT, TTS, LLM, and Realtime models with provider-agnostic interfaces, fallback adapters for resilience, and stream adapters for different streaming patterns
📚 Learning: 2026-01-16T07:44:56.353Z
Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Implement Model Interface Pattern for STT, TTS, LLM, and Realtime models with provider-agnostic interfaces, fallback adapters for resilience, and stream adapters for different streaming patterns
Applied to files:
livekit-agents/livekit/agents/inference/stt.py
📚 Learning: 2026-01-22T03:28:16.289Z
Learnt from: longcw
Repo: livekit/agents PR: 4563
File: livekit-agents/livekit/agents/beta/tools/end_call.py:65-65
Timestamp: 2026-01-22T03:28:16.289Z
Learning: In code paths that check capabilities or behavior of the LLM processing the current interaction, prefer using the activity's LLM obtained via ctx.session.current_agent._get_activity_or_raise().llm instead of ctx.session.llm. The session-level LLM may be a fallback and not reflect the actual agent handling the interaction. Use the activity LLM to determine capabilities and to make capability checks or feature toggles relevant to the current processing agent.
Applied to files:
livekit-agents/livekit/agents/inference/stt.pylivekit-agents/livekit/agents/inference/tts.py
🧬 Code graph analysis (1)
livekit-agents/livekit/agents/inference/tts.py (1)
livekit-agents/livekit/agents/types.py (1)
APIConnectOptions(54-88)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: type-check (3.13)
- GitHub Check: livekit-plugins-deepgram
- GitHub Check: unit-tests
🔇 Additional comments (7)
livekit-agents/livekit/agents/inference/stt.py (3)
73-80: LGTM!The
ElevenlabsOptionsTypedDict fields align well with ElevenLabs Scribe v2 Realtime API parameters for VAD configuration and timestamp control.
208-224: LGTM!The new
ElevenlabsModelsoverload follows the established pattern for provider-specific type hints, correctly associatingElevenlabsOptionswith the model type.
253-259: LGTM!The
extra_kwargsunion properly includesElevenlabsOptions, maintaining type consistency with the overloads.livekit-agents/livekit/agents/inference/tts.py (4)
114-115: LGTM!Empty
DeepgramOptionsTypedDict is acceptable as a placeholder. It can be extended as provider-specific options are identified.
173-191: LGTM!The new
DeepgramModelsoverload follows the established pattern for provider-specific type hints.
46-52: LGTM!The expanded
InworldModelswith new TTS variants follows the consistent naming pattern and properly extends provider support.
280-287: LGTM!The
extra_kwargsunion properly includesDeepgramOptions, maintaining type consistency with the overloads.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| RimeModels = Literal[ | ||
| "rime", | ||
| "rime/mist", | ||
| "rime/mistv2", | ||
| "rime/arcana", | ||
| "rime/mistv2", | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, verify the actual file content at the specified lines
head -n 50 livekit-agents/livekit/agents/inference/tts.py | tail -n 15Repository: livekit/agents
Length of output: 172
🏁 Script executed:
# Search for usages of "rime/mist" (without the v) in the codebase
rg "rime/mist[^v]" --type py -nRepository: livekit/agents
Length of output: 566
🏁 Script executed:
# Also search for any references to RimeModels to understand usage patterns
rg "RimeModels" --type py -B 2 -A 2Repository: livekit/agents
Length of output: 1078
🏁 Script executed:
# Check if there are any deprecation notices or comments about the change
rg -i "rime.*mist|mistv2|deprecat" livekit-agents/livekit/agents/inference/tts.py -n -C 3Repository: livekit/agents
Length of output: 192
Type mismatch: Tests reference "rime/mist" which is no longer in RimeModels.
The removal of "rime/mist" from the RimeModels type definition breaks the test suite. Multiple test cases in tests/test_inference_tts_fallback.py (lines 53, 121, 127, 151, 157) reference "rime/mist", but this value is not present in the RimeModels Literal. This will cause mypy strict mode to fail on these tests.
Either:
- Update test references to use
"rime/mistv2"instead of"rime/mist", or - Keep
"rime/mist"in RimeModels for backward compatibility.
🤖 Prompt for AI Agents
In `@livekit-agents/livekit/agents/inference/tts.py` around lines 41 - 45,
RimeModels Literal no longer includes "rime/mist", causing mypy failures where
tests expect that value; restore backward compatibility by adding "rime/mist"
back into the RimeModels Literal in livekit/agents/inference/tts.py (the
RimeModels symbol) so tests referencing "rime/mist" pass, or alternatively
update all test occurrences in tests/test_inference_tts_fallback.py to use
"rime/mistv2" if you intend to remove the old alias—prefer restoring "rime/mist"
in RimeModels for minimal change.
chenghao-mou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Probably should update the tests with rime/mist mentions following the code rabbit's comment.
When testing openai/gpt-5-nano, I got a crazy TTFT:
{"model_name": "openai/gpt-5-nano", "model_provider": "livekit", "ttft": 41.44, "prompt_tokens": 427, "prompt_cached_tokens": 0,
"completion_tokens": 1907, "tokens_per_second": 38.54, "room": "console"}
|
|
||
| GoogleModels = Literal[ | ||
| "google/gemini-3-pro", | ||
| "google/gemini-3-flash", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we clarify they are preview versions?
Summary by CodeRabbit
New Features
Removed
✏️ Tip: You can customize this high-level summary in your review settings.