Skip to content

Opt in voice mode support with /multimodal endpoint#445

Open
pranavjoshi001 wants to merge 7 commits intomicrosoft:masterfrom
pranavjoshi001:post-voice-traffic-via-websocket
Open

Opt in voice mode support with /multimodal endpoint#445
pranavjoshi001 wants to merge 7 commits intomicrosoft:masterfrom
pranavjoshi001:post-voice-traffic-via-websocket

Conversation

@pranavjoshi001
Copy link
Copy Markdown
Contributor

@pranavjoshi001 pranavjoshi001 commented Nov 27, 2025

Description

This change adds voice mode support to DirectLineJS with a client opt-in mechanism via the enableVoiceMode option. When enabled, it allows audio streaming through WebSocket connections using the /stream/multimodal endpoint.

Background

  • DirectLineJS currently routes all activities through HTTP POST instead of WebSocket due to limitations on the ABS side.
  • ABS does not process incoming WebSocket traffic; it only supports server-to-client push.
  • Voice traffic is not supported over HTTP POST and must be sent through WebSocket instead of API calls.
  • This PR introduces an opt-in voice mode that uses the /stream/multimodal endpoint and routes all traffic through WebSocket when enabled.

Changes in this PR

  • Added enableVoiceMode option:

    • true → Enables voice mode
    • false → Disables voice mode
    • undefined → Auto-detects iframe microphone permission
  • Enhanced stream URL:

    • Voice mode → /stream/multimodal
    • Standard mode → /stream
  • Modified activity routing:

    • Voice mode → Sends all activities (text + voice) via WebSocket
    • Standard mode → Uses HTTP POST
  • Added server capabilities handling:

    • Parses agent.capabilities event with modalities object to detect audio support
  • Added new public methods:

    • getIsVoiceModeEnabled()
    • getVoiceConfiguration()
    • addEventListener()
    • removeEventListener()
  • Added test coverage:

    • Explicit enableVoiceMode: true/false
    • Auto-detect in iframe
    • WebSocket vs HTTP routing verification
    • Reconnect behavior
    • 403 retry handling
    • agent.capabilities event handling

Backward Compatibility

  • No breaking changes:

    • Default behavior (enableVoiceMode: undefined in non-iframe context) maintains existing HTTP POST flow
  • Opt-in only:

    • Voice mode must be explicitly enabled or auto-detected in iframe with microphone permission

Copy link
Copy Markdown
Collaborator

@compulim compulim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented.

@pranavjoshi001 pranavjoshi001 changed the title Post voice traffic only to WebSocket and not via http Opt in voice mode support with /multimodal endpoint Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants