Skip to content

Feat/personaplex plugin#4660

Open
milanperovic wants to merge 36 commits intolivekit:mainfrom
milanperovic:feat/personaplex-plugin
Open

Feat/personaplex plugin#4660
milanperovic wants to merge 36 commits intolivekit:mainfrom
milanperovic:feat/personaplex-plugin

Conversation

@milanperovic
Copy link

@milanperovic milanperovic commented Jan 30, 2026

Summary
Add new livekit-plugins-personaplex plugin for NVIDIA PersonaPlex full-duplex conversational AI
Implements RealtimeModel / RealtimeSession as a WebSocket client connecting to a separately-deployed PersonaPlex server
Binary WebSocket protocol with Opus audio encoding/decoding via sphn
Silence-based generation boundary detection, exponential backoff reconnection, and metrics emission

Summary by CodeRabbit

  • New Features

    • Added PersonaPlex plugin for LiveKit agents for real‑time AI conversations with full‑duplex audio I/O, session lifecycle management and generation events
    • Provides 18 selectable voice options and configurable silence threshold, seed and prompt settings
  • Documentation

    • Added comprehensive README with installation, usage examples, CLI entrypoint, prewarm guidance and environment/configuration details
  • Tests

    • Added test suite covering initialization, options, audio conversions, constants and realtime behaviors

✏️ Tip: You can customize this high-level summary in your review settings.


Open with Devin

milanperovic and others added 3 commits January 30, 2026 11:54
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Guard audio/text channel sends with ChanClosed suppression
- Wrap recv_task message handlers in try/except to prevent single
  malformed frame from breaking the receive loop
- Replace deprecated asyncio.get_event_loop() with get_running_loop()
- Add response_id to GenerationCreatedEvent emissions
- Add exponential backoff (1s -> 30s max) on reconnect instead of
  fixed 1s delay
- Add 32 unit tests covering init, URL building, audio conversion,
  retry backoff, options, and data structures

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove unused imports in tests (ruff F401)
- Fix import sorting (ruff I001)
- Apply ruff format to realtime_model.py
- Add __pdoc__ dict to realtime/__init__.py for docs generation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@CLAassistant
Copy link

CLAassistant commented Jan 30, 2026

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

📝 Walkthrough

Walkthrough

Adds a new PersonaPlex LiveKit plugin: packaging and README, plugin bootstrap (registration, logger, version), voice type definitions, a full RealtimeModel/RealtimeSession implementation (WebSocket + Opus audio I/O, generation lifecycle, retry/backoff, metrics), and tests.

Changes

Cohort / File(s) Summary
Documentation & Manifest
livekit-plugins/livekit-plugins-personaplex/README.md, livekit-plugins/livekit-plugins-personaplex/pyproject.toml
New README with install/usage/prewarm/CLI and package manifest (Hatch build), dependencies, and version config.
Plugin Bootstrap
livekit-plugins/livekit-plugins-personaplex/livekit/plugins/personaplex/__init__.py, .../personaplex/version.py, .../personaplex/log.py
Registers PersonaplexPlugin, exposes public symbols, provides module logger and package version.
Type Definitions
livekit-plugins/livekit-plugins-personaplex/livekit/plugins/personaplex/models.py
Adds PersonaplexVoice Literal enumerating supported voice IDs.
Realtime Implementation
livekit-plugins/livekit-plugins-personaplex/livekit/plugins/personaplex/realtime/__init__.py, .../realtime/realtime_model.py
Implements RealtimeModel and RealtimeSession: WebSocket lifecycle, retry/backoff, PCM↔Opus encoding/decoding, resampling (24kHz mono), generation lifecycle, token filtering, silence finalization, metrics/events.
Tests
livekit-plugins/livekit-plugins-personaplex/tests/test_realtime_model.py
Adds comprehensive tests for model/session init, URL/options handling, token filtering, audio conversions, generation state, and backoff behavior.

Sequence Diagram

sequenceDiagram
    participant Agent as LiveKit Agent
    participant Session as RealtimeSession
    participant Encoder as Opus Encoder
    participant WS as WebSocket
    participant Server as PersonaPlex Server
    participant Decoder as Opus Decoder

    Agent->>Session: push_audio(AudioFrame)
    Session->>Session: resample to 24kHz mono
    Session->>Encoder: encode(PCM)
    Encoder-->>Session: Opus bytes
    Session->>WS: send(MSG_AUDIO, Opus bytes)
    WS->>Server: WebSocket message

    Server->>WS: send(MSG_TEXT / MSG_AUDIO)
    WS->>Session: receive(MSG_TEXT / MSG_AUDIO)

    alt MSG_TEXT
        Session->>Session: filter special tokens
        Session->>Agent: emit(text token)
    else MSG_AUDIO
        Session->>Decoder: decode(Opus bytes)
        Decoder-->>Session: PCM samples
        Session->>Agent: emit(AudioFrame `@24kHz`)
    end

    Note over Session: silence timeout triggers finalization
    Session->>Session: _finalize_generation()
    Session->>Agent: emit(GenerationCreatedEvent + metrics)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • davidzhao
  • tinalenguyen

Poem

🐰 I hopped a WebSocket, bytes in tow,
Opus hums beneath my snow.
Voices threaded, tokens pruned,
Silence sealed the final tune.
Hooray — new hops across the flow!

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is vague and generic, using the word 'Feat' without clearly describing the primary change from the developer's perspective. Replace with a more specific title like 'Add PersonaPlex plugin for NVIDIA conversational AI integration' to clearly communicate the main change.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cdb7e55 and 546d03b.

📒 Files selected for processing (1)
  • livekit-plugins/livekit-plugins-personaplex/tests/test_realtime_model.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

  • livekit-plugins/livekit-plugins-personaplex/tests/test_realtime_model.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Follow the Plugin System pattern where plugins in livekit-plugins/ are separate packages registered via the Plugin base class
📚 Learning: 2026-01-30T12:53:12.738Z
Learnt from: milanperovic
Repo: livekit/agents PR: 4660
File: livekit-plugins/livekit-plugins-personaplex/livekit/plugins/personaplex/__init__.py:19-21
Timestamp: 2026-01-30T12:53:12.738Z
Learning: In plugin __init__.py files under the livekit-plugins or similar plugin directories, place internal imports (for example, from .log import logger) after the __all__ definition. These imports are used for plugin registration and are not part of the public API. This pattern is used across plugins (e.g., openai, deepgram, ultravox) and helps avoid E402 violations while keeping the public API surface clean.

Applied to files:

  • livekit-plugins/livekit-plugins-personaplex/tests/test_realtime_model.py
🧬 Code graph analysis (1)
livekit-plugins/livekit-plugins-personaplex/tests/test_realtime_model.py (1)
livekit-plugins/livekit-plugins-personaplex/livekit/plugins/personaplex/realtime/realtime_model.py (5)
  • RealtimeModel (74-164)
  • _PersonaplexOptions (49-55)
  • _ResponseGeneration (59-71)
  • model (144-145)
  • provider (148-149)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: unit-tests
  • GitHub Check: type-check (3.13)
  • GitHub Check: type-check (3.9)
🔇 Additional comments (8)
livekit-plugins/livekit-plugins-personaplex/tests/test_realtime_model.py (8)

1-21: LGTM!

The imports are well-organized, with from __future__ import annotations ensuring Python 3.9+ compatibility. Importing internal symbols (_PersonaplexOptions, _ResponseGeneration, _SPECIAL_TOKENS) for testing implementation details is appropriate for unit tests.


25-112: LGTM!

Comprehensive test coverage for RealtimeModel initialization:

  • URL handling with all scheme variants (ws/wss/http/https) and SSL detection
  • Environment variable fallback and explicit override behavior
  • All constructor parameters and default values
  • Model properties (model, provider, _label) and capabilities flags

The tests effectively validate the URL resolution logic documented in the RealtimeModel.__init__ docstring.


117-152: LGTM!

The _make_opts helper method is a clean pattern for creating test fixtures with sensible defaults, reducing boilerplate across test methods.


157-167: LGTM!

Clear tests verifying the special token set membership for filtering logic.


172-205: LGTM!

Well-designed audio tests:

  • Constants verification ensures compatibility with the Opus codec requirements
  • The roundtrip test mirrors the actual PCM conversion logic, validating data integrity
  • Clipping test covers edge cases with out-of-range float values

210-235: LGTM!

Good practices in this test:

  • The try/finally block ensures proper channel cleanup even if assertions fail (as noted in commit message)
  • Inline imports isolate test dependencies, which can be useful when running partial test suites

240-254: LGTM!

The exponential backoff sequence test effectively validates the retry logic by verifying both the doubling behavior and the maximum delay cap.


259-284: LGTM!

Tests comprehensively cover all _PersonaplexOptions dataclass fields, including the use_ssl field and seed=None edge case. The test_none_seed correctly relies on the dataclass default value for use_ssl.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

milanperovic and others added 2 commits January 30, 2026 13:52
- Preserve wss:// / https:// scheme for TLS deployments
- Use logger.warning for unsupported dynamic instructions
- Remove redundant length check on opus_bytes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
coderabbitai[bot]

This comment was marked as resolved.

milanperovic and others added 4 commits January 30, 2026 15:47
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Rename TestBuildWsUrl to TestSessionOptions with proper typed helper
- Rename TestHandleTextToken to TestSpecialTokens to reflect actual scope
- Wrap ResponseGeneration assertions in try/finally for channel cleanup

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added by upstream in AGT-2474 (livekit#4622). PersonaPlex is full-duplex
and does not support explicit user turn commits.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 6 additional flags.

Open in Devin Review

- Use ws:// scheme in all example URLs (README, docstrings)
- Sync uv.lock after upstream merge

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@neo-lyzr
Copy link

neo-lyzr commented Feb 7, 2026

Thanks for the PR!

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@aheadrox
Copy link

Unable to connect to the Personaplex server via Agent Livekit. The server runs correctly standalone (port 8998), but throws an error when Agent Livekit is connected.

AttributeError: 'NoneType' object has no attribute 'shape'
Task exception was never retrieved
future: <Task finished name='Task-60' coro=<ServerState.handle_chat.<locals>.opus_loop() done, defined at /app/moshi/moshi/server.py:204> exception=AttributeError("'NoneType' object has no attribute 'shape'")>
Traceback (most recent call last):
  File "/app/moshi/moshi/server.py", line 212, in opus_loop
    if pcm.shape[-1] == 0:
       ^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'shape'

@milanperovic
Copy link
Author

This is a known issue in the moshi server (NVIDIA PersonaPlex fork), not in the LiveKit plugin. In moshi/server.py:212, opus_loop() calls pcm.shape without checking if pcm is None. When the Opus decoder doesn't have enough data to produce a full frame, read_pcm() returns None and it crashes.

To fix this, patch moshi/server.py around line 212:

before

if pcm.shape[-1] == 0:

after

if pcm is None or pcm.shape[-1] == 0:

On the LiveKit plugin side, we've already pushed defensive fixes (input validation, sphn 0.2+ API compatibility) in the feat/personaplex-plugin branch.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@tinalenguyen
Copy link
Member

Hi @milanperovic, thank you so much for the PR!! I'll take a look and test this out, could you move the files to livekit-plugins-nvidia under an experimental folder?

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@milanperovic
Copy link
Author

Hi @tinalenguyen, done! I've moved everything into livekit-plugins-nvidia under experimental/personaplex/. The import path is now livekit.plugins.nvidia.experimental.personaplex. I also made sphn an optional dependency (pip install livekit-plugins-nvidia[personaplex]) so it doesn't affect existing nvidia plugin users. Let me know if you'd like any changes!

@tinalenguyen
Copy link
Member

Could we restructure to livekit.plugins.nvidia.experimental.realtime actually? This would match our AWS plugin

I wasn't able to fully test the plugin after encountering some problems with my server, though I think it might be due to the SSL certificate handling, I'll report back on that. Here are a few notes in the meantime:

  • I ran into this error:
agents\livekit-plugins\livekit-plugins-nvidia\livekit\plugins\nvidia\__init__.py", line 25, in __getattr__
    from . import experimental
  File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist
RecursionError: maximum recursion depth exceeded

Changing that line to from .experimental import realtime worked (assuming the suggested restructure above was made)

  • Could we move test_personaplex_realtime_model.py to this folder?

Let me know your thoughts!

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 28 additional findings in Devin Review.

Open in Devin Review

Comment on lines +449 to +453
try:
await ws_conn.send_bytes(msg)
except Exception as e:
logger.error(f"Error sending message: {e}", exc_info=True)
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 _send_task swallows WebSocket send exceptions, bypassing error recovery in _main_task

In _send_task (line 449-453), exceptions from ws_conn.send_bytes(msg) are caught, logged, and the loop breaks normally. Because the task completes without raising, task.result() at realtime_model.py:390 does not re-raise, and the outer except Exception as e block (line 404) is never entered. This means on a send failure: (1) no error event is emitted to listeners, (2) no retry delay is applied — the loop immediately reconnects, potentially causing a tight reconnect-fail loop, (3) _msg_ch is not reset (it is only reset in _mark_restart_needed at line 328 and the error handler at line 415), so stale Opus packets encoded with the now-discarded _opus_writer will be sent on the new connection with a fresh server-side decoder, causing audio corruption. By contrast, _recv_task correctly raises APIConnectionError on failure (realtime_model.py:496-500), which propagates to _main_task's error recovery block.

Suggested change
try:
await ws_conn.send_bytes(msg)
except Exception as e:
logger.error(f"Error sending message: {e}", exc_info=True)
break
try:
await ws_conn.send_bytes(msg)
except Exception:
raise
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@milanp-sh
Copy link

Hi @tinalenguyen ! I've addressed all your feedback:

Restructured to livekit.plugins.nvidia.experimental.realtime — matches the AWS plugin pattern exactly
Fixed the RecursionError — switched to from .experimental import realtime in getattr, same as AWS
Moved tests to the root tests/ directory
Also fixed a few things Devin flagged: race condition in _start_new_generation future resolution, moved [codecs] to the [personaplex] optional extra so STT/TTS users don't get numpy, and propagated send exceptions to the error recovery path
CI is passing now. Let me know how the SSL testing goes when you get a chance!

@tinalenguyen
Copy link
Member

Hi @milanp-sh, thanks again for iterating! We made a PR with some edits on top of yours, it'd be great to combine the changes here so it works all around for everyone.

I get spammed these logs when testing your PR:

WARNI… livekit.….realtime Skipping invalid audio frame in _encode_and_send: pcm length has to match an allowed frame size [120, 240, 480, 960, 1920, 2880], got 2400

I believe some edits from Shayne's PR will resolve it. Could you take a look at the differences made? In the linked PR, I don't see those logs, but I don't receive any responses from the model either. With your PR, I see those logs, but the model responds after I resolved the audioframe issue (though sometimes I see the agent transcripts twice).

Let me know your thoughts!

- Use 1920 samples (80ms) instead of SAMPLE_RATE//10 (2400) for valid
  Opus frame sizes, fixing audio frame warning spam
- Wait for server handshake before sending audio in _send_task to
  prevent dropped frames during system prompt processing
- Reset _msg_ch on reconnect to discard stale Opus packets
- Handle WS close code 1000 gracefully instead of treating as error
devin-ai-integration[bot]

This comment was marked as resolved.

milanperovic and others added 3 commits March 17, 2026 15:13
…oByteStream

The AudioByteStream in _main_task was created with SAMPLE_RATE // 10 = 2400
samples, which is not a valid Opus frame size. This caused warning spam:
"Skipping invalid audio frame... got 2400". Changed to 1920 (80ms) to match
the __init__ value and valid Opus frame sizes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@milanperovic
Copy link
Author

Hi @tinalenguyen ! I looked at both the warning spam and Shayne's PR. (@ShayneP)

The "Skipping invalid audio frame... got 2400" warnings are caused by a frame size bug — _main_task recreates AudioByteStream with SAMPLE_RATE // 10 (2400 samples), which isn't a valid Opus frame size. The fix is changing it to 1920 (80ms). Devin flagged the same thing.

I've also integrated Shayne's connection reliability fixes (handshake wait, graceful WS close, channel cleanup on reconnect) which should address the duplicate transcripts. I kept our Opus encoding API since it uses the newer sphn interface — that might be why Shayne's version wasn't getting model responses.

@tinalenguyen
Copy link
Member

@milanperovic I just tested it out and I was able to get it working with audio 🎉

During my testing, I noticed that the generation cycle would not update accordingly. After the model initiates the conversation first, another generation is immediately started and the conversation would continue under that one generation. Let me know if you are able to repro this!

Also, it seems that generate_reply wouldn't work or have any effect for PersonaPlex, right? I don't think we have to implement it if so, historically we've just thrown an error (example from a first iteration of a S2S plugin)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants