feat: Refactor live SequentialAgent worklow and live event handling for reliability and cleaner session history #2295

ac-machache · 2025-08-01T14:18:17Z

Architectural Improvements for Live-Running (BIDI) in ADK

1. SequentialAgent Transition Refactor #2261

Removed reliance on the task_completed tool injection in sequential_agent.py.
Modified gemini_llm_connection.py to listen for the Gemini API’s generation_complete signal.
Yield a dedicated LlmResponse with generation_complete=True and partial=True.
Updated _run_live_impl to break the current sub-agent loop on generation_complete events, enabling reliable sequential agent transitions based on deterministic signals.

2. Capturing User Text Messages as Events #2175, #2045

Re-architected run_live in base_llm_flow.py to use an asyncio.Queue as a unified event bus for both user and model events.
_send_to_model detects user text messages, creates proper Event objects (author='user'), and enqueues them.
Concurrent send_handler and receive_handler tasks feed events into the queue.
Main run_live loop consumes the queue, yielding user text messages alongside other events for proper session recording.

3. Preventing Session Clutter from Partial Events, #2162

In gemini_llm_connection.py, set partial=True for LlmResponse instances with audio inline_data to avoid saving transient audio chunks.
Implemented text accumulation for user speech Streaming Conversation History Fragmentation Issue #2273:
- Partial transcriptions yield real-time feedback events.
- Model turn start signals end of user speech; then a consolidated full transcription event (partial=False) is yielded.
Result: cleaner session history, saving only complete, meaningful events.

…or reliability and cleaner session history

ac-machache · 2025-08-01T21:36:27Z

@hangfei,

Can you check here as well? I’ve added a number of fixes related to events in live streaming and improvements to the sequential agents in live streaming, making them more deterministic.

If everything looks good to you, I can proceed with implementing the goaway message and the setup_complete event as requested in #2103, and also address #2161

polong-lin · 2025-08-15T08:58:00Z

@hangfei Could you please take a look? This can potentially solve multiple issues with the Live API usage raised across users.

gianluca-henze-parloa · 2025-08-15T13:22:47Z

Hi,
Just checking in, has a decision been made on when this might be merged? My conversation history is getting quite cluttered, which I believe could be affecting agent performance. A timeline would be much appreciated. Thanks!

hangfei · 2025-08-19T19:48:25Z

Thanks for the contribution. need to get this in first: #1867 and then we merge this change.

gianluca-henze-parloa · 2025-09-09T21:02:30Z

Thanks for the contribution. need to get this in first: #1867 and then we merge this change.

Any updates here?

hangfei · 2025-10-28T21:24:39Z

@ac-machache Could you resolve the conflicts?

ac-machache · 2025-10-30T18:22:46Z

@hangfei
Hello Hangfei,

I’ve updated the code and resolved the previous conflicts, but there are still a few issues to address:

Thinking capabilities
- The live model models/gemini-2.5-flash-native-audio-preview-09-2025 has thinking capabilities enabled.
- I wasn’t able to disable them — setting thinking_budget to 0 or changing its value doesn’t have any effect.
- The include_thoughts flag can’t be set to false, and the thought_signature is always None.
- For now, when the model produces thoughts, we only see thought=true in the session service.
Transcription behavior
- All Gemini models return both audio and text, even when the input is plain text.
- Their text outputs are treated as transcriptions, not as text completions.
- According to the Google documentation, there should be a "finished" signal for transcription to indicate the end, but it’s always None.
- For now, we rely on control events to determine turns in conversations.
Session service
- Input/output transcriptions are accumulated before being flushed and saved to the session service.
- No more audio blobs are being saved to the session service.
- However, the database session service doesn’t yet have input_transcription and output_transcription fields, so you’ll see null events.
Sequential agent behavior
- Sequential agents now rely on the generate_complet signal, and that part works fine.
- If the first agent makes a tool call, the second one doesn’t fire.
- If the second agent is the one performing the tool call, everything works as expected.
Remaining issues
- Missing thought_signature.
- Sequential agent chaining (as described above).

Let me know if you want me to focus on fixing a specific part next.

hangfei · 2025-11-11T15:31:59Z

2 and 3 should be fixed by other PRs. Please take another look and resolve merge conflicts.

For 1, the purpose of task_completed is to ensure that the first agent knows when the task are completed. In bi-di streaming, as it's continuous streaming, it's not clear when it's completed. So that's why we need this function. the user needs to trigger this function by saying something like the task is completed. Without this, we will have another model where when the turn is complete, we transfer to next task.

For both of the two design options, i'd like to learn more about your use cases to see which make more sense. We can also consider support both if both cases are reasonable.

hangfei · 2025-11-11T15:35:36Z

@hangfei Hello Hangfei,

I’ve updated the code and resolved the previous conflicts, but there are still a few issues to address:

Thinking capabilities

The live model models/gemini-2.5-flash-native-audio-preview-09-2025 has thinking capabilities enabled.

I wasn’t able to disable them — setting thinking_budget to 0 or changing its value doesn’t have any effect.

The include_thoughts flag can’t be set to false, and the thought_signature is always None.

For now, when the model produces thoughts, we only see thought=true in the session service.

Transcription behavior

All Gemini models return both audio and text, even when the input is plain text.

Their text outputs are treated as transcriptions, not as text completions.

According to the Google documentation, there should be a "finished" signal for transcription to indicate the end, but it’s always None.

For now, we rely on control events to determine turns in conversations.

Session service

Input/output transcriptions are accumulated before being flushed and saved to the session service.

No more audio blobs are being saved to the session service.

However, the database session service doesn’t yet have input_transcription and output_transcription fields, so you’ll see null events.

Sequential agent behavior

Sequential agents now rely on the generate_complet signal, and that part works fine.

If the first agent makes a tool call, the second one doesn’t fire.

If the second agent is the one performing the tool call, everything works as expected.

Remaining issues

Missing thought_signature.

Sequential agent chaining (as described above).

Let me know if you want me to focus on fixing a specific part next.

Does all other model has thinking enabled by default? If not, let's test other models first.
transcription should be fixed as i tested last time, i do see finished signal. Which model are you using? Maybe it's different from one model to the other.
yes. it's not supported yet. i am working on it.

Let's focus on fixing the sequential agent issues first.

hangfei · 2025-11-11T15:36:52Z

Let's focus on solving one problem in one PR so it's easier and faster to merge. Thanks!

hangfei · 2025-11-20T20:28:51Z

Closing due to inactivity. Let's propose a new PR with a single focus.

ac-machache force-pushed the fix/user-input-fragmantaion branch from ffe480f to e25e725 Compare August 1, 2025 14:22

ac-machache changed the title ~~feat[live] : Refactor live SequentialAgent worklow and live event handling for reliability and cleaner session history~~ feat : Refactor live SequentialAgent worklow and live event handling for reliability and cleaner session history Aug 1, 2025

ac-machache changed the title ~~feat : Refactor live SequentialAgent worklow and live event handling for reliability and cleaner session history~~ feat: Refactor live SequentialAgent worklow and live event handling for reliability and cleaner session history Aug 1, 2025

feat: Refactor live SequentialAgent worklow and live event handling f…

923f57d

…or reliability and cleaner session history

ac-machache force-pushed the fix/user-input-fragmantaion branch from e25e725 to 923f57d Compare August 1, 2025 14:24

polong-lin requested a review from hangfei August 15, 2025 08:55

polong-lin mentioned this pull request Aug 15, 2025

Sequential LlmAgents not returning audio with bidi streaming #2261

Closed

resoleved conflicts

9a2ba42

hangfei closed this Nov 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Refactor live SequentialAgent worklow and live event handling for reliability and cleaner session history #2295

feat: Refactor live SequentialAgent worklow and live event handling for reliability and cleaner session history #2295

Uh oh!

ac-machache commented Aug 1, 2025

Uh oh!

ac-machache commented Aug 1, 2025

Uh oh!

polong-lin commented Aug 15, 2025

Uh oh!

gianluca-henze-parloa commented Aug 15, 2025

Uh oh!

hangfei commented Aug 19, 2025

Uh oh!

gianluca-henze-parloa commented Sep 9, 2025

Uh oh!

hangfei commented Oct 28, 2025

Uh oh!

ac-machache commented Oct 30, 2025

Uh oh!

hangfei commented Nov 11, 2025

Uh oh!

hangfei commented Nov 11, 2025

Uh oh!

hangfei commented Nov 11, 2025

Uh oh!

hangfei commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Refactor live SequentialAgent worklow and live event handling for reliability and cleaner session history #2295

feat: Refactor live SequentialAgent worklow and live event handling for reliability and cleaner session history #2295

Uh oh!

Conversation

ac-machache commented Aug 1, 2025

Architectural Improvements for Live-Running (BIDI) in ADK

1. SequentialAgent Transition Refactor #2261

2. Capturing User Text Messages as Events #2175, #2045

3. Preventing Session Clutter from Partial Events, #2162

Uh oh!

ac-machache commented Aug 1, 2025

Uh oh!

polong-lin commented Aug 15, 2025

Uh oh!

gianluca-henze-parloa commented Aug 15, 2025

Uh oh!

hangfei commented Aug 19, 2025

Uh oh!

gianluca-henze-parloa commented Sep 9, 2025

Uh oh!

hangfei commented Oct 28, 2025

Uh oh!

ac-machache commented Oct 30, 2025

Uh oh!

hangfei commented Nov 11, 2025

Uh oh!

hangfei commented Nov 11, 2025

Uh oh!

hangfei commented Nov 11, 2025

Uh oh!

hangfei commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants