Skip to content

Bug Report: Gemini Live API Native Audio Premature turnComplete Causes Mid-Sentence Audio Truncation #2117

@SunflowersLwtech

Description

@SunflowersLwtech

Title

Gemini Live API Native Audio Premature turnComplete Causes Mid-Sentence Audio Truncation

Description

When using the Gemini Live API with native audio output, the model frequently stops speaking mid-sentence.
The server sends a turnComplete message while the model is still generating audio, with no interrupted flag set.

This is not caused by client-side echo or VAD. It appears to be a server-side issue where the model prematurely terminates its own turn.
This has been reported across multiple repos and confirmed by ~40 developers over the past ~8 months (see Related Issues below). Despite multiple fix attempts by Google engineers, the problem persists or regresses.

Environment

  • Model: gemini-2.5-flash-native-audio-preview-12-2025
  • SDK: google-genai 1.64.0 via google-adk 1.25.1
  • API: Google AI Developer API (not Vertex AI)
  • Platform: FastAPI WebSocket server (Python 3.12) + iOS client (SwiftUI)
  • OS: macOS (server), iOS 18 (client)

Steps to Reproduce

  1. Establish a Live API session with native audio output (Python).
    • response_modalities=["AUDIO"]
    • speech_config with a prebuilt voice (e.g. Aoede)
    • realtime_input_config.automatic_activity_detection.disabled=True
  2. Send a user message that requires a multi-sentence response (e.g. "Describe what you see in detail").
  3. Observe the server stream.

Expected Behavior

The model completes its entire response before sending turnComplete.

Actual Behavior

  • The model begins generating audio normally.
  • After 1–3 sentences (sometimes mid-word), a turnComplete message arrives without interrupted: true.
  • The remaining audio is never delivered.
  • This happens intermittently (sometimes the model completes, sometimes it truncates).
  • Frequency increases over the course of a session.

Key Evidence: This Is Server-Side, Not Echo / VAD

We have implemented every possible client-side mitigation and the problem persists:

  1. Hardware AEC (iOS Voice Processing IO)
  2. Client-side echo gating (send silence frames during model speech)
  3. SileroVAD confirmation (no speech being sent during model output)
  4. NOINTERRUPTION mode (model should not be interruptible)
  5. Disabled automatic activity detection (manual activity signals)

Despite all five layers of protection, the model still truncates its own output. turnComplete arrives with no interrupted flag, confirming the server decided to end the turn on its own.

Google's own documentation acknowledges "self-interruption"

From the Gemini Live API Get Started page:

Note: Use headphones... To prevent the model from interrupting itself, use headphones.

However:

  • Our iOS app already has hardware-level AEC.
  • We already implement echo gating.
  • The problem persists because the root cause is server-side premature turn termination, not echo.
  • Requiring headphones is not an acceptable solution for an accessibility app serving visually impaired users.

Aggravating Factors (observed across community + our tests)

Our application (SightLine, an AI assistant for visually impaired users) hits all five factors simultaneously (tool usage, accumulating context, Chinese support, compression), so the bug becomes production-blocking.

Impact

SightLine relies on native audio for real-time voice interaction. Truncation makes the product unusable for target users:

  • Safety-critical info gets cut off (navigation directions, obstacle warnings).

This is not cosmetic. For accessibility applications, reliable audio output is a hard requirement.

No Alternative Models Available (as of now)

There is currently no Gemini Live audio model without this bug.

Related Issues / Community Reports

Core & related issues reported across repos:

Forum reports also mention stuttering, delays, extremely short audio playback, etc.

Developer sentiment (from googleapis/js-genai#707 and others) indicates teams are switching to OpenAI Realtime due to this unresolved issue.

Request

  1. Please acknowledge this as a server-side model issue, not a client-side echo problem.
  2. Prioritize a fix (P2 for ~8 months is too long for a production-blocking bug).
  3. Provide a timeline or an interim workaround beyond "use headphones".
  4. Consider accessibility use cases: visually impaired users cannot be told to "just wear headphones".

Contest note:
I'm currently participating in the Gemini Live Agent Challenge (Devpost): https://geminiliveagentchallenge.devpost.com/?linkId=54514909

Contact:
LiuWei
sunflowers0607@outlook.com
https://github.com/SunflowersLwtech

Metadata

Metadata

Labels

priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions