Skip to content

refactor(a2a): use tool calling for delegation instead of structured output#5751

Draft
greysonlalonde wants to merge 3 commits intomainfrom
gl/refactor/a2a-tool-based-delegation
Draft

refactor(a2a): use tool calling for delegation instead of structured output#5751
greysonlalonde wants to merge 3 commits intomainfrom
gl/refactor/a2a-tool-based-delegation

Conversation

@greysonlalonde
Copy link
Copy Markdown
Contributor

Why

Closes #3897.

A2A delegation currently relies on a Literal[endpoint_url, ...]-constrained AgentResponse model to pick a remote agent. The prompt shows the LLM each agent's card (skill IDs, names, URLs), but the only valid value for a2a_ids is the well-known endpoint URL — which is never explicitly labeled as the identifier. Predictable failures:

  1. Original report (gpt-4.1): the LLM picks skills[0].id (e.g. "Research") instead of the endpoint URL → pydantic ValidationError: literal_error.
  2. Reopened thread (Gemini flash-lite): small models that don't reliably honor Literal/enum constraints in JSON Schema emit out-of-set values → same error.

A fuzzy-match fallback would paper over the symptom; the structural fix is to make the identifier set itself unambiguous and provider-enforced.

What

Each remote A2A agent is now exposed to the local LLM as a BaseTool (delegate_to_<sanitized_card_name>); the local agent's tool-call loop drives multi-turn delegation. AgentResponse(a2a_ids, message, is_a2a) and the explicit per-turn re-prompting loop are gone.

  • New crewai/a2a/tools.py: A2ADelegationTool + A2ADelegationState (per-task shared state with per-endpoint history, IDs, turn counts).
  • crewai/a2a/wrapper.py collapsed from 1772 → ~530 LOC. Deleted _delegate_to_a2a / _adelegate_to_a2a, _prepare_delegation_context, _parse_agent_response, _handle_agent_response_and_continue, _handle_max_turns_exceeded, _emit_delegation_failed, _process_response_result, _init_delegation_state, _get_turn_context, _handle_task_completion, DelegationContext, DelegationState. Each of the four entry points (sync/async × execute_task/kickoff) now augments the prompt with agent cards, builds A2A tools, merges them into the call's tools list (or temporarily extends self.tools for kickoff), and calls original_fn.
  • Templates trimmed: dropped PREVIOUS_A2A_CONVERSATION_TEMPLATE, CONVERSATION_TURN_INFO_TEMPLATE, REMOTE_AGENT_*_NOTICE. AVAILABLE_AGENTS_TEMPLATE now describes the tool-call protocol.
  • response_model.py: create_agent_response_model / get_a2a_agents_and_response_model replaced with a single extract_a2a_client_configs().
  • types.py: AgentResponseProtocol removed.
  • agent/core.py + lite_agent.py updated to drop the AgentResponseProtocol branch and the agent_response_model arg.

The original failure is now structurally impossible: provider-side tool-call validation (OpenAI / Anthropic / Gemini) enforces the tool name; there's no competing identifier set for the model to confuse.

A2AConfig.max_turns wires through to BaseTool.max_usage_count, so the existing per-agent turn limit is preserved without an explicit Python-side loop.

Notes

  • Diff stat: +730 / −1709 across existing files; +400 in the new tools.py. Net ≈ −600 LOC.
  • Stacked on chore(deps): bump mem0ai to >=2.0.0 #5750 (mem0ai bump for pip-audit); will retarget to main once that lands.
  • All a2a tests pass (20 passed, 7 skipped). mypy clean across 473 files. ruff clean.

…output

Each remote A2A agent is now exposed to the local LLM as a BaseTool
(delegate_to_<card_name>); the local agent's tool-call loop drives
multi-turn delegation. The Literal-constrained AgentResponse model and
the explicit per-turn re-prompting loop are gone.

Closes #3897. The original failure mode — Pydantic literal_error when
skill.id != endpoint URL, and Gemini flash-lite hallucinating
out-of-enum values — is structurally impossible: provider-side tool-call
validation enforces the tool name, and there's no competing identifier.
Base automatically changed from gl/chore/bump-mem0ai to main May 8, 2026 16:17
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: fd31f8b9-e1f0-414d-adb9-5390262b29ed

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch gl/refactor/a2a-tool-based-delegation

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.


try:
from a2a.types import Message, Role
from a2a.types import TaskState # noqa: F401
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] A2A - pydantic error, when AgentCard-skill-id <> endpoint url. (1.3.0)

1 participant