-
Notifications
You must be signed in to change notification settings - Fork 17
Processing audio streaming from the backend (mastra only) #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR implements voice streaming functionality for Cedar OS, specifically targeting Mastra backend providers. The implementation adds the capability to process audio responses in real-time as they arrive from the backend, rather than waiting for complete responses.
The core changes introduce a new voiceStreamLLM method to the agent connection architecture, following the established streaming pattern used for text responses. The implementation adds new types (VoiceStreamEvent, VoiceStreamHandler) to handle various voice-specific events including transcription updates, audio chunks, and structured objects. The voice slice is enhanced with a new stream boolean configuration setting that determines whether to use streaming or traditional voice processing.
Key architectural additions include:
- Provider abstraction: The
ProviderImplementationinterface gains an optionalvoiceStreamLLMmethod, maintaining backward compatibility - Event handling: Comprehensive event processing for different voice stream event types (transcription, audio, chunk, object, done, error)
- Dual-path implementation: The agent connection slice intelligently detects provider capabilities and falls back to non-streaming voice processing when streaming isn't supported
- Mastra integration: Specific implementation for Mastra providers with proper URL construction, header management, and event transformation
The implementation maintains consistency with the existing Cedar OS streaming architecture while extending it to support voice use cases. Helper functions were extracted in the Mastra provider to promote code reuse between streaming and non-streaming voice methods. The voice slice introduces a handled flag mechanism to prevent duplicate processing when responses contain multiple data types.
Confidence score: 4/5
- This PR introduces complex streaming logic but follows established patterns and includes comprehensive error handling
- Score reflects well-structured implementation with proper fallback mechanisms and backward compatibility
- Pay close attention to the event handling logic in
voiceStreamLLMand the dual-path processing in the voice slice
4 files reviewed, 3 comments
| if (response.audioData || response.audioUrl) { | ||
| wrappedHandler({ | ||
| type: 'audio', | ||
| audioData: response.audioData || response.audioUrl || '', | ||
| audioFormat: response.audioFormat, | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Audio data fallback logic uses audioData || audioUrl || '' which could result in empty string for audio data if both are undefined
| try { | ||
| const headers = createVoiceHeaders(config); | ||
| const baseUrl = resolveVoiceEndpoint(params.voiceSettings, config); | ||
| const streamUrl = `${baseUrl}/stream`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: appending '/stream' to baseUrl could create malformed URLs if baseUrl already ends with '/stream' or has query parameters
| // Voice processing completed successfully (streaming or non-streaming) | ||
| get().setIsProcessing(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Processing state is cleared after streaming completion, but error handling at line 296 also clears it. Consider moving the success case inside a try block to ensure consistent state management.
No description provided.