Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

Update Speech-to-Speech docs with OpenAI Realtime API schema

Summary

Rewrites the Speech-to-Speech documentation to reflect the new API that mirrors the OpenAI Realtime API schema. The new docs use the endpoint wss://speech-to-speech.assemblyai.com/v1/realtime and include comprehensive code examples for multiple integration patterns.

Key changes:

  • Added prominent beta warning at the top
  • Updated to new API endpoint and OpenAI-compatible schema
  • Added WebSocket code examples for Python and JavaScript
  • Added OpenAI Python client example
  • Added LiveKit and Pipecat integration examples with full agent code
  • Added tool calling documentation with examples
  • Added subagent routing documentation
  • Added three complete sample agents (debt collection, interviewer, lead qualification) with tool calling
  • Added WebSocket events reference (client and server events)
  • Added roadmap and known issues placeholder sections
  • Removed unhelpful ASCII diagram

Review & Testing Checklist for Human

  • Verify API endpoint and schema accuracy: Confirm wss://speech-to-speech.assemblyai.com/v1/realtime is correct and the event types (session.update, response.audio.delta, conversation.item.input_audio_transcription.completed, etc.) match the actual API
  • Verify available voices: Check that sage, coral, verse, alloy are the correct voice options for this API
  • Test LiveKit integration example: Verify the import paths (from livekit.plugins.openai.realtime import AudioTranscription) and configuration work correctly
  • Review sample agent code: The debt collection, interviewer, and lead qualification examples are comprehensive but untested - verify they would work with the actual API
  • Run fern docs dev locally: Preview the rendered documentation to ensure formatting looks correct

Recommended test plan: Run one of the Python WebSocket examples against the actual API to verify the connection flow, event types, and audio handling work as documented.

Notes

- Add beta warning prominently at the top
- Update to use new API endpoint (wss://speech-to-speech.assemblyai.com/v1/realtime)
- Add WebSocket code examples for Python and JavaScript
- Add OpenAI Python client example
- Add LiveKit integration with full agent example
- Add Pipecat integration with full pipeline example
- Add tool calling documentation and examples
- Add subagent routing documentation with multi-agent example
- Add complete sample agents:
  - Debt collection agent with FDCPA compliance
  - Interview agent with scoring and notes
  - Lead qualification agent with BANT methodology
- Add WebSocket events reference (client and server events)
- Add roadmap and known issues sections
- Remove unhelpful ASCII diagram
- Reorganize content for better readability

Co-Authored-By: Dan Ince <dince@assemblyai.com>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link

Co-Authored-By: Dan Ince <dince@assemblyai.com>
@github-actions
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant