Skip to content

Conversation

@ac-machache
Copy link
Contributor

Architectural Improvements for Live-Running (BIDI) in ADK

1. SequentialAgent Transition Refactor #2261

  • Removed reliance on the task_completed tool injection in sequential_agent.py.
  • Modified gemini_llm_connection.py to listen for the Gemini API’s generation_complete signal.
  • Yield a dedicated LlmResponse with generation_complete=True and partial=True.
  • Updated _run_live_impl to break the current sub-agent loop on generation_complete events, enabling reliable sequential agent transitions based on deterministic signals.

2. Capturing User Text Messages as Events #2175, #2045

  • Re-architected run_live in base_llm_flow.py to use an asyncio.Queue as a unified event bus for both user and model events.
  • _send_to_model detects user text messages, creates proper Event objects (author='user'), and enqueues them.
  • Concurrent send_handler and receive_handler tasks feed events into the queue.
  • Main run_live loop consumes the queue, yielding user text messages alongside other events for proper session recording.

3. Preventing Session Clutter from Partial Events, #2162

  • In gemini_llm_connection.py, set partial=True for LlmResponse instances with audio inline_data to avoid saving transient audio chunks.
  • Implemented text accumulation for user speech Streaming Conversation History Fragmentation Issue #2273:
    • Partial transcriptions yield real-time feedback events.
    • Model turn start signals end of user speech; then a consolidated full transcription event (partial=False) is yielded.
  • Result: cleaner session history, saving only complete, meaningful events.

@ac-machache ac-machache force-pushed the fix/user-input-fragmantaion branch from ffe480f to e25e725 Compare August 1, 2025 14:22
@ac-machache ac-machache changed the title feat[live] : Refactor live SequentialAgent worklow and live event handling for reliability and cleaner session history feat : Refactor live SequentialAgent worklow and live event handling for reliability and cleaner session history Aug 1, 2025
@ac-machache ac-machache changed the title feat : Refactor live SequentialAgent worklow and live event handling for reliability and cleaner session history feat: Refactor live SequentialAgent worklow and live event handling for reliability and cleaner session history Aug 1, 2025
@ac-machache ac-machache force-pushed the fix/user-input-fragmantaion branch from e25e725 to 923f57d Compare August 1, 2025 14:24
@ac-machache
Copy link
Contributor Author

@hangfei,

Can you check here as well? I’ve added a number of fixes related to events in live streaming and improvements to the sequential agents in live streaming, making them more deterministic.

If everything looks good to you, I can proceed with implementing the goaway message and the setup_complete event as requested in #2103, and also address #2161

@polong-lin
Copy link
Collaborator

@hangfei Could you please take a look? This can potentially solve multiple issues with the Live API usage raised across users.

@gianluca-henze-parloa
Copy link

Hi,
Just checking in, has a decision been made on when this might be merged? My conversation history is getting quite cluttered, which I believe could be affecting agent performance. A timeline would be much appreciated. Thanks!

@hangfei
Copy link
Collaborator

hangfei commented Aug 19, 2025

Thanks for the contribution. need to get this in first: #1867 and then we merge this change.

@gianluca-henze-parloa
Copy link

Thanks for the contribution. need to get this in first: #1867 and then we merge this change.

Any updates here?

@hangfei
Copy link
Collaborator

hangfei commented Oct 28, 2025

@ac-machache Could you resolve the conflicts?

@ac-machache
Copy link
Contributor Author

@hangfei
Hello Hangfei,

I’ve updated the code and resolved the previous conflicts, but there are still a few issues to address:

  1. Thinking capabilities

    • The live model models/gemini-2.5-flash-native-audio-preview-09-2025 has thinking capabilities enabled.
    • I wasn’t able to disable them — setting thinking_budget to 0 or changing its value doesn’t have any effect.
    • The include_thoughts flag can’t be set to false, and the thought_signature is always None.
    • For now, when the model produces thoughts, we only see thought=true in the session service.
  2. Transcription behavior

    • All Gemini models return both audio and text, even when the input is plain text.
    • Their text outputs are treated as transcriptions, not as text completions.
    • According to the Google documentation, there should be a "finished" signal for transcription to indicate the end, but it’s always None.
    • For now, we rely on control events to determine turns in conversations.
  3. Session service

    • Input/output transcriptions are accumulated before being flushed and saved to the session service.
    • No more audio blobs are being saved to the session service.
    • However, the database session service doesn’t yet have input_transcription and output_transcription fields, so you’ll see null events.
  4. Sequential agent behavior

    • Sequential agents now rely on the generate_complet signal, and that part works fine.
    • If the first agent makes a tool call, the second one doesn’t fire.
    • If the second agent is the one performing the tool call, everything works as expected.
  5. Remaining issues

    • Missing thought_signature.
    • Sequential agent chaining (as described above).

Let me know if you want me to focus on fixing a specific part next.

@hangfei
Copy link
Collaborator

hangfei commented Nov 11, 2025

2 and 3 should be fixed by other PRs. Please take another look and resolve merge conflicts.

For 1, the purpose of task_completed is to ensure that the first agent knows when the task are completed. In bi-di streaming, as it's continuous streaming, it's not clear when it's completed. So that's why we need this function. the user needs to trigger this function by saying something like the task is completed. Without this, we will have another model where when the turn is complete, we transfer to next task.

For both of the two design options, i'd like to learn more about your use cases to see which make more sense. We can also consider support both if both cases are reasonable.

@hangfei
Copy link
Collaborator

hangfei commented Nov 11, 2025

@hangfei Hello Hangfei,

I’ve updated the code and resolved the previous conflicts, but there are still a few issues to address:

  1. Thinking capabilities

    • The live model models/gemini-2.5-flash-native-audio-preview-09-2025 has thinking capabilities enabled.
    • I wasn’t able to disable them — setting thinking_budget to 0 or changing its value doesn’t have any effect.
    • The include_thoughts flag can’t be set to false, and the thought_signature is always None.
    • For now, when the model produces thoughts, we only see thought=true in the session service.
  2. Transcription behavior

    • All Gemini models return both audio and text, even when the input is plain text.
    • Their text outputs are treated as transcriptions, not as text completions.
    • According to the Google documentation, there should be a "finished" signal for transcription to indicate the end, but it’s always None.
    • For now, we rely on control events to determine turns in conversations.
  3. Session service

    • Input/output transcriptions are accumulated before being flushed and saved to the session service.
    • No more audio blobs are being saved to the session service.
    • However, the database session service doesn’t yet have input_transcription and output_transcription fields, so you’ll see null events.
  4. Sequential agent behavior

    • Sequential agents now rely on the generate_complet signal, and that part works fine.
    • If the first agent makes a tool call, the second one doesn’t fire.
    • If the second agent is the one performing the tool call, everything works as expected.
  5. Remaining issues

    • Missing thought_signature.
    • Sequential agent chaining (as described above).

Let me know if you want me to focus on fixing a specific part next.

  1. Does all other model has thinking enabled by default? If not, let's test other models first.

  2. transcription should be fixed as i tested last time, i do see finished signal. Which model are you using? Maybe it's different from one model to the other.

  3. yes. it's not supported yet. i am working on it.

Let's focus on fixing the sequential agent issues first.

@hangfei
Copy link
Collaborator

hangfei commented Nov 11, 2025

Let's focus on solving one problem in one PR so it's easier and faster to merge. Thanks!

@hangfei
Copy link
Collaborator

hangfei commented Nov 20, 2025

Closing due to inactivity. Let's propose a new PR with a single focus.

@hangfei hangfei closed this Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants