Support OpenAI Realtime Whisper STT#1429
Conversation
🦋 Changeset detectedLatest commit: 2ef35f0 The changes in this PR will be included in the next version bump. This PR includes changesets to release 31 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
…gents-js into brian/oai-rt-translate
|
|
||
| const REALTIME_SAMPLE_RATE = 24000; | ||
| const REALTIME_NUM_CHANNELS = 1; | ||
| const DEFAULT_REALTIME_MODEL = 'gpt-realtime-whisper'; |
There was a problem hiding this comment.
should we match python's default of gpt-4o-mini-transcribe or standardize across both?
There was a problem hiding this comment.
Maybe let's update the default on both side since 'gpt-realtime-whisper' just came out recently and should be better than previous model?
There was a problem hiding this comment.
One thing to note is that 'gpt-realtime-whisper' does not support server-side VAD like previous model does. So I have to add a VAD option just like the way we did to mistral STT. Happy to make the same PR on python side once we merged this.
There was a problem hiding this comment.
oh i didn't know it was new, yeah we should make that the default on both sides if it's better, sounds good!
tinalenguyen
left a comment
There was a problem hiding this comment.
tested it and it lgtm! one note is that in the mistral stt plugin, if a vad is not passed then one will be created by default, i think we could do that too. wdyt?
Sounds good! I'll merge this and create a follow up PR, just to make things separate and cleaner |
Summary
This PR adds realtime OpenAI STT support for
gpt-realtime-whisperinopenai.STT.openai.STTnow defaults togpt-realtime-whisperwithuseRealtime: true. Because this model does not support OpenAI server-sideturn_detection, callers must provide a VAD instance so the plugin can commit the audio buffer at end-of-speech.Usage
If
turnDetectionis provided withgpt-realtime-whisper, the plugin warns and ignores it by normalizing tonull.Notes
gpt-realtime-whisperuses the OpenAI Realtime transcription WebSocket path.useRealtime: falsestill uses the previous batchwhisper-1path.turnDetectionhandling.