Bug Report: Gemini Live API Native Audio Premature turnComplete Causes Mid-Sentence Audio Truncation

## Title
Gemini Live API Native Audio Premature turnComplete Causes Mid-Sentence Audio Truncation

## Description
When using the Gemini Live API with native audio output, the model frequently stops speaking mid-sentence.
The server sends a `turnComplete` message while the model is still generating audio, with no `interrupted` flag set.

This is not caused by client-side echo or VAD. It appears to be a server-side issue where the model prematurely terminates its own turn.
This has been reported across multiple repos and confirmed by ~40 developers over the past ~8 months (see Related Issues below). Despite multiple fix attempts by Google engineers, the problem persists or regresses.

## Environment
- Model: `gemini-2.5-flash-native-audio-preview-12-2025`
- SDK: `google-genai 1.64.0` via `google-adk 1.25.1`
- API: Google AI Developer API (not Vertex AI)
- Platform: FastAPI WebSocket server (Python 3.12) + iOS client (SwiftUI)
- OS: macOS (server), iOS 18 (client)

## Steps to Reproduce
1. Establish a Live API session with native audio output (Python).
   - `response_modalities=["AUDIO"]`
   - `speech_config` with a prebuilt voice (e.g. `Aoede`)
   - `realtime_input_config.automatic_activity_detection.disabled=True`
2. Send a user message that requires a multi-sentence response (e.g. "Describe what you see in detail").
3. Observe the server stream.

## Expected Behavior
The model completes its entire response before sending `turnComplete`.

## Actual Behavior
- The model begins generating audio normally.
- After 1–3 sentences (sometimes mid-word), a `turnComplete` message arrives **without** `interrupted: true`.
- The remaining audio is never delivered.
- This happens intermittently (sometimes the model completes, sometimes it truncates).
- Frequency increases over the course of a session.

## Key Evidence: This Is Server-Side, Not Echo / VAD
We have implemented every possible client-side mitigation and the problem persists:
1. Hardware AEC (iOS Voice Processing IO)
2. Client-side echo gating (send silence frames during model speech)
3. SileroVAD confirmation (no speech being sent during model output)
4. NOINTERRUPTION mode (model should not be interruptible)
5. Disabled automatic activity detection (manual activity signals)

Despite all five layers of protection, the model still truncates its own output. `turnComplete` arrives with no `interrupted` flag, confirming the server decided to end the turn on its own.

### Google's own documentation acknowledges "self-interruption"
From the Gemini Live API Get Started page:
> Note: Use headphones... To prevent the model from interrupting itself, use headphones.

However:
- Our iOS app already has hardware-level AEC.
- We already implement echo gating.
- The problem persists because the root cause is server-side premature turn termination, not echo.
- Requiring headphones is not an acceptable solution for an accessibility app serving visually impaired users.

## Aggravating Factors (observed across community + our tests)
- Tool calls / function calling (related: #707, #139, #1894): truncation increases after tool return.
- Growing context length (related: #707): longer conversations worsen.
- Non-English languages (related: #707): Chinese/Japanese significantly worse.
- `enable_affective_dialog` (related: #707): correlated with premature `turnComplete`.
- `context_window_compression` (related: #117): enabling worsens.

Our application (SightLine, an AI assistant for visually impaired users) hits all five factors simultaneously (tool usage, accumulating context, Chinese support, compression), so the bug becomes production-blocking.

## Impact
SightLine relies on native audio for real-time voice interaction. Truncation makes the product unusable for target users:
- Safety-critical info gets cut off (navigation directions, obstacle warnings).

This is not cosmetic. For accessibility applications, reliable audio output is a hard requirement.

## No Alternative Models Available (as of now)
- `gemini-2.5-flash-native-audio-preview-12-2025` (current): has this bug.
- `gemini-2.5-flash-native-audio-preview-09-2025` (deprecated 2026-03-19): worse / raspy audio (related: googleapis/js-genai#1209).
- Vertex AI `gemini-live-2.5-flash-native-audio` (GA): same underlying behavior.
- `gemini-2.0-flash-live-001` (decommissioned 2025-12-09): unavailable.
- Gemini 3.x: no Live API support.

There is currently no Gemini Live audio model without this bug.

## Related Issues / Community Reports
Core & related issues reported across repos:
- googleapis/js-genai#707 — premature `turnComplete`, OPEN, P2, ~8 months, ~40 confirmations
- google-gemini-live-api-web-console#117 — model stops midway, OPEN
- #872 — audio quality degradation, CLOSED but problem persists
- googleapis/js-genai#1209 — 12-2025 model raspy voice, OPEN, P2
- google-gemini-live-api-web-console#139 — model self-interrupts / talks over itself, OPEN
- #1894 — post-tool-call hallucination, CLOSED by bot
- #1275 — response truncation, OPEN, P3

Forum reports also mention stuttering, delays, extremely short audio playback, etc.

Developer sentiment (from googleapis/js-genai#707 and others) indicates teams are switching to OpenAI Realtime due to this unresolved issue.

## Request
1. Please acknowledge this as a server-side model issue, not a client-side echo problem.
2. Prioritize a fix (P2 for ~8 months is too long for a production-blocking bug).
3. Provide a timeline or an interim workaround beyond "use headphones".
4. Consider accessibility use cases: visually impaired users cannot be told to "just wear headphones".

---

**Contest note:**
I'm currently participating in the Gemini Live Agent Challenge (Devpost): https://geminiliveagentchallenge.devpost.com/?linkId=54514909

**Contact:**
LiuWei
sunflowers0607@outlook.com
https://github.com/SunflowersLwtech

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Report: Gemini Live API Native Audio Premature turnComplete Causes Mid-Sentence Audio Truncation #2117

Title

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Key Evidence: This Is Server-Side, Not Echo / VAD

Google's own documentation acknowledges "self-interruption"

Aggravating Factors (observed across community + our tests)

Impact

No Alternative Models Available (as of now)

Related Issues / Community Reports

Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug Report: Gemini Live API Native Audio Premature turnComplete Causes Mid-Sentence Audio Truncation #2117

Description

Title

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Key Evidence: This Is Server-Side, Not Echo / VAD

Google's own documentation acknowledges "self-interruption"

Aggravating Factors (observed across community + our tests)

Impact

No Alternative Models Available (as of now)

Related Issues / Community Reports

Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions