-
Notifications
You must be signed in to change notification settings - Fork 371
feat: Enable configurable streaming support for OpenAI models #1161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Enable configurable streaming support for OpenAI models #1161
Conversation
EItanya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the contribution, and this change definitely makes sense. The one thing i'd like to note is that I actually originally tried to make stream: true the default, but ran into issues. Have you done e2e testing with stream = true?
This adds a 'Stream' field to the Agent API (v1alpha1/v1alpha2), defaulting to false, and propagates it through the ADK and translator layers to allow enabling streaming for OpenAI-compatible model services. Fixes kagent-dev#1099 Signed-off-by: Azeez Syed <syedazeez337@gmail.com>
9557ee3 to
5ecca4a
Compare
|
Hi @EItanya, I have tested and passed the e2e tests locally. Let me know if it is working now. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not seem to work. I've made some additional modifications in _openai.py for streaming handling and task store, converters, etc and the streaming behaviour is functional.
I'll just mention the biggest change here:
-
When streaming for A2A using
TaskStatusUpdateEventsevents containing ADK "partial" events (chunks from LLM streaming) are being saved to the task store, so on reload it will in fact load all the chunks from LLM in the frontend (like the messages will look like["hello my", "name is"]. Thefinalandstatusfields onA2AEventjust distinguishes if it's a task in progress, not a streaming chunk for LLM as indicated on thepartialfield in ADK events. So it is required to handle this explicitly and avoid persisting them to the task store even if we want them to be streamed. -
Function calls in streaming were previously not handled properly!
Screen.Recording.2026-01-09.at.7.32.42.PM.mov
@syedazeez337 lmk if you're still interested in working on this, otherwise I will finish this up. I've pushed my code to the jetc/fix-streaming branch. I couldn't find any example of this working with ADK and A2A, neither does the ADK a2a extension implement this I believe, so do let me know if you have a better implementation idea.
We would also need e2e testing with stream: true, but it will probably depend on the MockLLM package to support streaming (which has just been added).
EDIT: This has been edited since I was able to fix one of the issue above later
|
Hi @supreme-gg-gg I have reviewed your changes in jetc/fix-streaming - great work on the streaming handler and partial event filtering. You have covered almost all of the issue. I would like to propose where collaboration is needed, since you are one of the maintainers as well, let me know in what capacity I can be helpful here. Thank you. |
How about I'll open a PR from my branch since this branch is somewhat out of date as well and I'll add you as coauthor? there are some work that needs to be finished, like updating types in go, e2e tests with streaming (that requires new mockLLM though), perhaps making streaming false by default (the original behaviour), etc. |
|
Hi @supreme-gg-gg Let's do that |
|
This sounds like a great plan, @syedazeez337 would you mind closing this PR as it is now superseded by #1202 |
Original PR: #1161, close #1099 https://github.com/user-attachments/assets/95847d85-5062-4755-93b8-0bb07fae9602 - Extends Kagent's ADK OpenAI provider to properly handle streaming with `LmResponse` partial field, aggregating results, and function calling. I tested that this works with other model providers as well like Gemini. - Stream partial events from ADK to the event queue for frontend updates on LLM response chunks, but also propagate `Event.partial` at the A2A `Message` level so that the task store will filter them out and avoid persisting partial chunks (only save `TaskStatusUpdateEvent` that contain complete chunks). - An **alternative** would be to mark the partial field at the metadata at the `Task` level in `_get_context_metadata` and reconstruct the event in the client side, but currently we do it at the Message level since it makes more sense. There is a discussion from ADK Python on this regarding their A2A-ADK integration. - Update the UI to include a toggle to select whether to use model streaming. Note that model streaming is different from A2A streaming, the former streams "chunks" of text, the latter streams tasks. - Added e2e test that uses `stream: true` supported by the new MockLLM --------- Signed-off-by: Jet Chiang <pokyuen.jetchiang-ext@solo.io> Co-authored-by: Azeez Syed <syedazeez337@gmail.com>
Description:
This PR addresses the issue where requests to OpenAI-compatible model
services were hardcoded with "stream": false.
I have introduced a new stream configuration field to the Agent API. This gives
developers explicit control over whether they want streaming responses enabled or
disabled for their agents.
Key Changes:
Testing:
Everything has been linted and verified against the contribution guidelines. Let me
know if you have any questions
Fixes #1099