Core speech to speech implementation #5654

pranavjoshi001 · 2025-12-12T13:44:12Z

Changelog Entry

Added Speech-to-Speech (S2S) support for real-time voice conversations, in PR #5654, by @pranavjoshi

Description

This PR introduces Speech-to-Speech (S2S) functionality in Web Chat, enabling real-time voice conversations with bots. The implementation includes audio recording via AudioWorklet, audio playback with buffer queueing, and speech state management. This foundation supports upcoming MMRT (Multi-Modal Real-Time), ABS (Azure Bot Service), and CCV2 integration changes.

Activity structure - microsoft/Agents#377

Design

The Speech-to-Speech feature is built on three main components:

Voice State Management (voiceActivity reducer) - Manages:
- voiceState: Current speech state (idle, listening, user_speaking, processing, bot_speaking)
- voiceHandlers: Registered audio handler functions (supports multiple handlers)
SpeechToSpeech Provider (SpeechToSpeechComposer.tsx) - A React component that manages:
- VoiceHandlerBridge - Registers audio playback functions (queueAudio, stopAllAudio) with Redux
- VoiceRecorderBridge - Bridges Redux voice state with microphone recording, sends audio chunks via postVoiceActivity
Exposed control hooks:
- useVoiceStart.ts - Hook to start s2s interaction
- useVoiceStop.ts - Hook to stop s2s interaction (mic + audio stop)
- useVoiceState.ts - Current state of voice interaction

Speech State Flow

idle → listening → user_speaking → processing → bot_speaking → listening

Voice Activity Flow (Fire-and-Forget Pattern)

Outgoing (User → Bot):

User speech captured via AudioWorklet → postVoiceActivity action → postVoiceActivitySaga → DirectLine (no Redux storage)

Incoming (Bot → User):

DirectLine activity$ → observeActivitySaga → calls voiceHandlers.queueAudio() directly (no Redux storage)
Only transcript activities go through standard activity pipeline for rendering

Performance Optimization

Voice activities use a fire-and-forget pattern to optimize performance:

No Storage: Voice chunks (stream.chunk) are NOT stored in Redux - they flow directly to/from audio handlers
Function References: Redux stores handler functions (queueAudio, stopAllAudio), not data
Separate Saga: postVoiceActivitySaga sends without waiting for echo-back or dispatching PENDING/FULFILLED actions
Reduced Overhead: Prevents clogging the main activities array with high-frequency voice events
Selective Processing: Only voice transcript activities (which need rendering) go through the standard activity pipeline

Specific Changes

New Files Added:

Core Utilities (packages/core)

isVoiceActivity.ts - Type guard for voice/DTMF activities
isVoiceTranscriptActivity.ts - Type guard for transcript activities
getVoiceActivityRole.ts - Extract role (user/bot) from voice activity
getVoiceActivityText.ts - Extract transcription text from voice activity

Actions (packages/core/src/actions)

setVoiceState.ts - Set voice state action
startVoiceRecording.ts - Start recording action (transitions to listening)
stopVoiceRecording.ts - Stop recording action (transitions to idle)
registerVoiceHandler.ts - Register audio handler with unique ID
unregisterVoiceHandler.ts - Unregister audio handler by ID
postVoiceActivity.ts - Fire-and-forget voice activity posting

Reducer (packages/core/src/reducers)

voiceActivity.ts - Manages voiceState and voiceHandlers (Map<string, VoiceHandler>)

Sagas (packages/core/src/sagas)

postVoiceActivitySaga.ts - Handles outgoing voice activities (fire-and-forget)
Updated observeActivitySaga.ts - Routes incoming voice activities to handlers

Provider & Hooks (packages/api)

SpeechToSpeechComposer.tsx - Main S2S provider (integrated into Composer)
VoiceHandlerBridge.tsx - Registers audio player with Redux
VoiceRecorderBridge.tsx - Bridges recording state with microphone
useRecorder.ts - AudioWorklet-based recording (CSP compliant)
useAudioPlayer.ts - Audio playback with buffer queueing
useVoiceHandlers.ts - Hook to get registered voice handlers
useRegisterVoiceHandler.ts - Hook to register a voice handler (returns unregister function)
useSetVoiceState.ts - Hook to set voice state
usePostVoiceActivity.ts - Hook to post voice activities

Test Coverage Added:

Unit tests for useRecorder and useAudioPlayer hooks
E2E HTML tests covering:
- Happy path conversation flow
- Barge-in/interruption handling
- CSP compliance for AudioWorklet
- Audio chunk timing and intervals

I have added tests and executed them locally
I have updated CHANGELOG.md
I have updated documentation

Review Checklist

This section is for contributors to review your work.

Accessibility reviewed (tab order, content readability, alt text, color contrast)
Browser and platform compatibilities reviewed
CSS styles reviewed (minimal rules, no z-index)
Documents reviewed (docs, samples, live demo)
Internationalization reviewed (strings, unit formatting)
package.json and package-lock.json reviewed
Security reviewed (no data URIs, check for nonce leak)
Tests reviewed (coverage, legitimacy)

…joshi001/BotFramework-WebChat into feature/core-s2s-composer

packages/api/src/providers/SpeechToSpeech/private/useRecorder.ts

…joshi001/BotFramework-WebChat into feature/core-s2s-composer

packages/api/src/decorator/ActivityBorder/private/ActivityBorderDecoratorRequestContext.ts

packages/api/src/hooks/internal/WebChatAPIContext.ts

packages/api/src/hooks/internal/useSpeechState.ts

packages/api/src/hooks/internal/WebChatAPIContext.ts

packages/api/src/hooks/internal/useVoiceHandler.ts

packages/api/src/hooks/useSpeechToSpeech.ts

packages/api/src/hooks/usePostVoiceActivity.ts

packages/api/src/hooks/internal/useVoiceRecording.ts

packages/api/src/hooks/Composer.tsx

packages/core/src/reducers/voiceActivity.ts

packages/api/src/providers/SpeechToSpeech/private/VoiceHandlerBridge.tsx

packages/api/src/providers/SpeechToSpeech/private/VoiceRecorderBridge.tsx

packages/api/src/providers/SpeechToSpeech/private/useRecorder.ts

packages/api/src/providers/SpeechToSpeech/private/useAudioPlayer.ts

packages/api/src/defaultStyleOptions.ts

packages/bundle/src/boot/actual/internal.ts

packages/component/src/TextArea/TextArea.tsx

packages/core/src/actions/voiceActivityActions.ts

packages/core/src/reducers/voiceActivity.ts

packages/core/src/sagas/observeActivitySaga.ts

packages/core/src/reducers/voiceActivity.ts

pranavjoshi001 added 2 commits December 12, 2025 13:26

initial no-op s2s core implementation

d132592

minor

a982457

pranavjoshi001 changed the title ~~Feature/core s2s composer~~ Core speech to speech composer implementation (no-op code) Dec 12, 2025

Merge branch 'main' into feature/core-s2s-composer

0978e7d

pranavjoshi001 marked this pull request as ready for review December 17, 2025 05:44

pranavjoshi001 requested review from a-b-r-o-w-n, beyackle2, compulim, cwhitten, srinaath and tdurnford as code owners December 17, 2025 05:44

pranavjoshi001 and others added 6 commits December 17, 2025 11:14

Merge branch 'main' into feature/core-s2s-composer

08c7a76

Merge branch 'main' into feature/core-s2s-composer

27a1cb4

Merge branch 'feature/core-s2s-composer' of https://github.com/pranav…

6437ee1

…joshi001/BotFramework-WebChat into feature/core-s2s-composer

refactor to align close to activity structure

9ddc63c

refactor composer to not use direct state inside effect

0838e44

Merge branch 'main' into feature/core-s2s-composer

4036a03

OEvgeny reviewed Jan 13, 2026

View reviewed changes

packages/api/src/providers/SpeechToSpeech/private/useRecorder.ts Show resolved Hide resolved

compulim reviewed Jan 13, 2026

View reviewed changes

packages/api/src/providers/SpeechToSpeech/private/useRecorder.ts Outdated Show resolved Hide resolved

compulim reviewed Jan 13, 2026

View reviewed changes

packages/api/src/providers/SpeechToSpeech/private/useRecorder.ts Show resolved Hide resolved

pranavjoshi001 and others added 2 commits January 14, 2026 11:19

Merge branch 'main' into feature/core-s2s-composer

9be0bcb

more implementation chunk

a3b2c8b

pranavjoshi001 changed the title ~~Core speech to speech composer implementation (no-op code)~~ Core speech to speech implementation Jan 14, 2026

pranavjoshi001 and others added 6 commits January 15, 2026 13:24

minor refactor

e31a8f7

Mic Implementation and animation in fluent theme

cf9d2f5

test case added

af1dd65

Merge branch 'main' into feature/core-s2s-composer

ce9f6c5

screenshot added

8fac1b3

Merge branch 'feature/core-s2s-composer' of https://github.com/pranav…

e01130a

…joshi001/BotFramework-WebChat into feature/core-s2s-composer

pranavjoshi001 requested a review from compulim January 16, 2026 10:54

test case updated

1a90b20