Skip to content

fix(realtime): normalize non-string audio formats in length calc#3242

Closed
adityasingh2400 wants to merge 1 commit intoopenai:mainfrom
adityasingh2400:fix/realtime-audio-length-format-normalization
Closed

fix(realtime): normalize non-string audio formats in length calc#3242
adityasingh2400 wants to merge 1 commit intoopenai:mainfrom
adityasingh2400:fix/realtime-audio-length-format-normalization

Conversation

@adityasingh2400
Copy link
Copy Markdown
Contributor

Summary

calculate_audio_length_ms (in src/agents/realtime/_util.py) only inspected the format argument when it was a str. The RealtimeAudioFormat type alias also accepts:

  • the OpenAIRealtimeAudioFormats pydantic models (AudioPCM, AudioPCMU, AudioPCMA)
  • Mapping[str, Any] such as {"type": "audio/pcmu"}

In both cases the function fell through to the PCM16 branch and computed duration at 24 kHz with a 2-byte sample width. For g711 audio that overstates the length by ~6x, which feeds the wrong elapsed_ms and audio_length_ms into ModelAudioTracker / RealtimePlaybackTracker and ultimately breaks interrupt truncation timing whenever a caller wires the format through as anything other than the canonical pcm16 / g711_ulaw / g711_alaw strings.

This PR:

  • Adds _extract_format_type to read the format type from strings, mappings, and pydantic models uniformly.
  • Recognizes the wire-format aliases audio/pcmu / audio/pcma (returned by _normalize_audio_format for some payloads) as g711.
  • Preserves the historical behavior: empty bytes return 0.0, unknown formats fall back to PCM16, and case-insensitive matching is retained.

The fix is a 22-line surgical change in _util.py plus a focused test module covering pydantic models, mappings, the audio/pcmu alias, uppercase strings, unknown-format fallback, and empty bytes.

Test plan

  • pytest tests/realtime/test_audio_length_format_normalization.py (new, 7 cases, all pass)
  • pytest tests/realtime full suite still green (239 passed)
  • Existing test_calculate_audio_length_ms_pure_function (in test_openai_realtime.py) still passes — string-input behavior unchanged

`calculate_audio_length_ms` only inspected the format when it was a `str`,
so g711 audio supplied via the `OpenAIRealtimeAudioFormats` pydantic models
(`AudioPCMU`, `AudioPCMA`) or a `Mapping` (e.g. `{"type": "audio/pcmu"}`)
fell through to the PCM16 path and reported a duration computed at 24 kHz
with a 2-byte sample width. For g711 streams that overstates the length
roughly 6x, breaking interrupt timing and playback accounting whenever a
caller wires the format through directly rather than via the canonical
`pcm16` / `g711_*` strings.

Extract the type from strings, mappings, and pydantic models, and also
recognize the wire-format aliases `audio/pcmu` / `audio/pcma` so a raw API
string is treated as g711.
@github-actions github-actions Bot added bug Something isn't working feature:realtime labels May 8, 2026
@adityasingh2400
Copy link
Copy Markdown
Contributor Author

Closing as duplicate of #3196 (same wave, my own oversight). #3196 adds the equivalent _normalize_format_to_str helper for str/Mapping/pydantic-typed formats and matches both g711_* prefix and audio/pcm{u,a} aliases. Sorry for the noise — please review #3196 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working feature:realtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant