-
Notifications
You must be signed in to change notification settings - Fork 648
RFC: [JS] Agent Primitive #4212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @pavelgj, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an RFC for a new defineAgent primitive. The document is well-structured and clearly outlines the motivation, design, and usage of the new API. The proposed agent primitive is a powerful abstraction for building stateful, multi-turn agents with flexible state management. My feedback focuses on improving the clarity and completeness of the RFC by suggesting additions to the documentation and examples, particularly around API migration, stream handling, and the usage of custom state and state updates.
|
|
||
| Introduces `defineAgent` (renamed from `defineSessionFlow`), a high-level abstraction built on top of Bidi Actions designed to simplify the creation of stateful, multi-turn agents. It unifies state management, allowing both client-side state handling and server-side persistence via pluggable stores. | ||
|
|
||
| `defineAgent` would replace the current Chat API as there is significant overlap and Agent primitive is more flexible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC states that defineAgent will replace the current Chat API. This is a significant change that will impact existing users. It would be beneficial to add a section discussing the migration strategy from the Chat API to defineAgent. This could cover:
- Key differences in usage patterns.
- A suggested path for refactoring existing chat implementations.
- Whether there will be a deprecation period for the old API.
Providing this information will help users plan for the transition.
docs/rfc-agent-primitive.md
Outdated
|
|
||
| The Agent primitive uses standardized schemas to ensure compatibility across tools and UI. | ||
|
|
||
| - **InitSchema**: `sessionId`, `messages`, `state`, `artifacts`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The InitSchema and OutputSchema include a generic state property, which is a great feature for managing custom session data. However, the provided examples only manage the messages array.
To better showcase the capabilities of defineAgent, consider adding a small example that demonstrates managing custom state. For instance, an agent that keeps track of a turn counter or stores user preferences in the state object. This would provide a more complete picture of the state management capabilities.
docs/rfc-agent-primitive.md
Outdated
| The Agent primitive uses standardized schemas to ensure compatibility across tools and UI. | ||
|
|
||
| - **InitSchema**: `sessionId`, `messages`, `state`, `artifacts`. | ||
| - **StreamSchema**: `chunk` (generation), `stateUpdate` (patches), `sessionId`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The StreamSchema includes a stateUpdate field for sending state patches, which is an excellent optimization for streaming. The current examples only demonstrate streaming generation chunks.
It would be very helpful to include an example that uses sendChunk to send a stateUpdate. This could show how to, for example, update a tool's status on the client-side incrementally without waiting for the full model response. This would clarify the intended use of stateUpdate and make the RFC more comprehensive.
|
Added the same commment in the GO RFC: Overall, while this design introduces useful primitives for bidirectional flows and persistence, it does not yet align with the agentic patterns seen in modern frameworks like LangChain or LlamaIndex. The current approach feels closer to a single, persistent agent rather than a true agent system where agents can call, delegate to, or coordinate with other agents. A key limitation is the lack of clear multi-agent orchestration. Modern agent frameworks treat agents as composable units that can collaborate, delegate tasks, or be orchestrated in workflows. Additionally, the API surface feels too low-level for end users. Concepts like bidirectional flows and actions require developers to manage execution mechanics explicitly. Many of these concerns should be handled by the framework itself, exposing a cleaner and more minimal interface that is easier to read, reason about, and scale over time. This contrasts with existing patterns in Genkit JS and ADK, where agent definitions are more declarative and focused on intent rather than wiring. For reference, Genkit JS chat agents provide a much simpler mental model: https://genkit.dev/docs/tutorials/chat-with-pdf/#6-implement-the-ui, and ADK’s Go agent definitions show a clearer agent-first abstraction: https://google.github.io/adk-docs/get-started/go/#define-the-agent-code. |
This RFC is for one step in our overall vision and yeah, we need to find a way to communicate the overall vision as well. I mostly agree with you -- this is not an agent framework, this does not include multi-agent orchestration. If you look at other "agent frameworks", they usually make opinionated stances on how agent orchestration must be done. They have hardcoded prompts and magical tools that get injected into the model call to facilitate "handoffs" or subagent calls. This is actually very inconsistent with how we've been doing things in Genkit. We've been avoiding hardcoded prompts or doing any unexpected/surprising LLM call alterations. The Chat API is no exception -- it does not inject any hardcoded prompts (it injects prompts that developer provides). This RFC is all about agent abstraction. One agent. That agent might have sub-agents, but this RFC does not concern itself with that. It's a simple, but extremely flexible abstraction that can support agents of any complexity. At the same time enabling Dev UI tooling for those agents. The next step, after we have a flexible way to represent one agent (or multiple independent agents), we can design agent orchestration layer on top of the agent abstraction, but the current thinking is that the orchestration layer, by nature, is too opinionated, so it will not be in the core Genkit framework. It will most likely be a plugin, or even even multiple plugins, separate package(s). It will allow users/community to build custom orchestration layers while still taking advantage of genkit tooling that already understands agent building blocks. |
Introduces
defineAgent, a high-level abstraction built on top of Bidi Actions designed to simplify the creation of stateful, multi-turn agents. It unifies state management, allowing both client-side state handling and server-side persistence via pluggable stores.Relates RFCs: