Skip to content

Support OpenAI Realtime Whisper STT#1429

Merged
toubatbrian merged 6 commits into
mainfrom
brian/oai-rt-translate
May 12, 2026
Merged

Support OpenAI Realtime Whisper STT#1429
toubatbrian merged 6 commits into
mainfrom
brian/oai-rt-translate

Conversation

@toubatbrian
Copy link
Copy Markdown
Contributor

@toubatbrian toubatbrian commented May 8, 2026

Summary

This PR adds realtime OpenAI STT support for gpt-realtime-whisper in openai.STT.

openai.STT now defaults to gpt-realtime-whisper with useRealtime: true. Because this model does not support OpenAI server-side turn_detection, callers must provide a VAD instance so the plugin can commit the audio buffer at end-of-speech.

Usage

const vad = await silero.VAD.load();

const session = new voice.AgentSession({
  vad,
  stt: new openai.STT({
    model: 'gpt-realtime-whisper',
    vad,
  }),
  // llm, tts, ...
});

If turnDetection is provided with gpt-realtime-whisper, the plugin warns and ignores it by normalizing to null.

Notes

  • gpt-realtime-whisper uses the OpenAI Realtime transcription WebSocket path.
  • WebSocket server events are parsed with Zod for type safety.
  • useRealtime: false still uses the previous batch whisper-1 path.
  • Tests cover realtime defaults, batch opt-out, required VAD behavior, and model-specific turnDetection handling.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 8, 2026

🦋 Changeset detected

Latest commit: 2ef35f0

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 31 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-fishaudio Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-hume Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

chatgpt-codex-connector[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Comment thread plugins/openai/src/stt.ts

const REALTIME_SAMPLE_RATE = 24000;
const REALTIME_NUM_CHANNELS = 1;
const DEFAULT_REALTIME_MODEL = 'gpt-realtime-whisper';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we match python's default of gpt-4o-mini-transcribe or standardize across both?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's update the default on both side since 'gpt-realtime-whisper' just came out recently and should be better than previous model?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to note is that 'gpt-realtime-whisper' does not support server-side VAD like previous model does. So I have to add a VAD option just like the way we did to mistral STT. Happy to make the same PR on python side once we merged this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i didn't know it was new, yeah we should make that the default on both sides if it's better, sounds good!

@toubatbrian toubatbrian requested a review from tinalenguyen May 11, 2026 18:45
Copy link
Copy Markdown
Member

@tinalenguyen tinalenguyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested it and it lgtm! one note is that in the mistral stt plugin, if a vad is not passed then one will be created by default, i think we could do that too. wdyt?

@toubatbrian
Copy link
Copy Markdown
Contributor Author

tested it and it lgtm! one note is that in the mistral stt plugin, if a vad is not passed then one will be created by default, i think we could do that too. wdyt?

Sounds good! I'll merge this and create a follow up PR, just to make things separate and cleaner

@toubatbrian toubatbrian merged commit d9c3d8b into main May 12, 2026
9 checks passed
@toubatbrian toubatbrian deleted the brian/oai-rt-translate branch May 12, 2026 01:50
@github-actions github-actions Bot mentioned this pull request May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants