Skip to content

fix: OGG/Opus audio truncation — final page lost in write_chunk finalize#448

Open
will-assistant wants to merge 1 commit intoremsky:masterfrom
will-assistant:fix/opus-truncation
Open

fix: OGG/Opus audio truncation — final page lost in write_chunk finalize#448
will-assistant wants to merge 1 commit intoremsky:masterfrom
will-assistant:fix/opus-truncation

Conversation

@will-assistant
Copy link

Summary

One-line fix: container.close() must be called before output_buffer.getvalue() in the write_chunk finalize block. The current order loses the final OGG page containing ~1-2 seconds of audio.

The Bug

When using response_format: "opus" on /v1/audio/speech, output audio is consistently truncated. The last 1-2 seconds are silently dropped. All other formats (MP3, WAV, FLAC, PCM) work correctly.

Related issue: #447

Root Cause

In api/src/services/streaming_audio_writer.py, the finalize block does:

# ❌ BEFORE (broken)
data = self.output_buffer.getvalue()  # reads buffer BEFORE final page is written
self.close()                           # closes container, writing final OGG page to buffer (too late)
return data                            # returns incomplete audio

For OGG/Opus, the container writes the final audio page to the output buffer during close(). By reading the buffer first, that last page is lost. MP3/WAV/FLAC aren't affected because their container close only writes metadata trailers, not audio frames.

Fix

# ✅ AFTER (fixed)
self.container.close()                 # writes final OGG page to buffer
data = self.output_buffer.getvalue()   # now includes all audio data
self.output_buffer.close()
return data

Test Results

Same text, same voice, same speed — only response_format differs:

Before fix

Text MP3 duration Opus duration Lost
Short 3.408s 2.000s 1.4s
Medium 5.016s 3.000s 2.0s
Long 10.224s 9.000s 1.2s

Note the round-number opus durations — OGG pages emit at ~1s granule boundaries, and the final partial page was being dropped.

After fix

Text MP3 duration Opus duration Delta
Short 3.408s 3.347s 0.06s ✅
Medium 5.016s 4.959s 0.06s ✅
Long 10.224s 10.163s 0.06s ✅

Durations now match within ~60ms (normal codec framing overhead).

Changed Files

  • api/src/services/streaming_audio_writer.py — 10 lines changed in write_chunk() finalize block

Testing

  • Tested on GPU Docker build (CUDA 12.9.1, PyTorch)
  • Verified with voice blending (am_puck(1)+am_liam(1)+am_onyx(0.5) at 1.2x speed)
  • Confirmed MP3/WAV/FLAC output unchanged
  • Sent fixed opus output as Discord voice messages — plays completely, no cutoff

The finalize block in write_chunk() called output_buffer.getvalue() before
container.close(). For OGG/Opus, the final page of audio data is only written
to the buffer during close(), causing ~1-2 seconds of audio to be lost.

Swap the order: close container first, then read buffer.

Fixes: remsky#447
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant