Python: Emit tool call events in GitHubCopilotAgent streaming by jsturtevant · Pull Request #4711 · microsoft/agent-framework

jsturtevant · 2026-03-15T20:07:53Z

Motivation and Context

_stream_updates now yields FunctionCallContent for TOOL_EXECUTION_START and FunctionResultContent for TOOL_EXECUTION_COMPLETE events from the Copilot SDK session. This enables DevUI and other consumers to display tool calls during streaming agent execution. Previously only ASSISTANT_MESSAGE_DELTA, SESSION_IDLE, and SESSION_ERROR were handled — tool execution events were silently dropped.

Description

Closes Python: [Bug]: GitHubCopilotAgent streaming silently drops tool execution events #4734

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

_stream_updates now yields FunctionCallContent for TOOL_EXECUTION_START and FunctionResultContent for TOOL_EXECUTION_COMPLETE events from the Copilot SDK session. This enables DevUI and other consumers to display tool calls during streaming agent execution. Previously only ASSISTANT_MESSAGE_DELTA, SESSION_IDLE, and SESSION_ERROR were handled — tool execution events were silently dropped. Signed-off-by: James Sturtevant <jsturtevant@gmail.com>

markwallace-microsoft · 2026-03-15T20:10:11Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
TOTAL	24006	2640	89%

report-only-changed-files is enabled. No files were changed during this commit :)

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
5245	20 💤	0 ❌	0 🔥	1m 23s ⏱️

Copilot

Pull request overview

This PR extends the GitHub Copilot Python agent’s streaming event handling to surface tool execution start/complete events as AgentResponseUpdate chunks, enabling downstream consumers to observe tool calls/results during a streamed run.

Changes:

Convert SessionEventType.TOOL_EXECUTION_START into Content.from_function_call(...) streaming updates.
Convert SessionEventType.TOOL_EXECUTION_COMPLETE into Content.from_function_result(...) streaming updates.

python/packages/github_copilot/agent_framework_github_copilot/_agent.py

Signed-off-by: James Sturtevant <jsturtevant@gmail.com>

moonbox3 · 2026-03-16T04:57:20Z

@jsturtevant please link an issue to this PR.

jsturtevant · 2026-03-17T01:24:28Z

@jsturtevant please link an issue to this PR.

#4734

moonbox3

Automated Code Review

Reviewers: 4 | Confidence: 83%

✓ Correctness

The new event handlers for TOOL_EXECUTION_START and TOOL_EXECUTION_COMPLETE are structurally sound and follow the existing pattern for other event types. The fallback logic using getattr with defaults is reasonable. One potential correctness concern: if result_obj exists but its text_result_for_llm attribute is explicitly None, the result passed to Content.from_function_result will be None rather than the empty string the code appears to intend. The tests are thorough and cover the main paths including missing fields and failure cases. Assuming ToolResult and the new SessionEventType values are imported/defined elsewhere (not shown in the diff), the tests should work. The import of ToolResult should be verified in the test file.

✓ Security Reliability

The diff adds handling for TOOL_EXECUTION_START and TOOL_EXECUTION_COMPLETE session events, mapping them to function_call and function_result content updates. The implementation uses defensive getattr calls with sensible defaults, follows existing patterns in the codebase, and includes thorough test coverage for happy paths and edge cases. No security or reliability issues were identified.

✗ Test Coverage

The new tests cover the happy paths for TOOL_EXECUTION_START and TOOL_EXECUTION_COMPLETE, missing fields on START, None result on COMPLETE, failure result type, and a full interleaved sequence. Coverage is generally solid. However, there is likely a missing import for ToolResult in the test file (the diff does not show it being added to imports), and there is no test for the TOOL_EXECUTION_COMPLETE handler when event.data itself has missing fields (the missing-fields test only covers the START event type). Additionally, there is no test verifying that a success result with an error field does NOT propagate the error as an exception, which is an important edge case of the exception = error if result_type == "failure" else None logic.

✗ Design Approach

The TOOL_EXECUTION_COMPLETE handler is built on incorrect assumptions about the SDK's type system, making it silently broken in production. The event.data.result field is typed as copilot.generated.session_events.Result (a dataclass with content: str, contents, and detailed_content fields), not copilot.types.ToolResult. The diff accesses .text_result_for_llm, .result_type, and .error on this object — none of which exist on session_events.Result — so result_text will always be "" and failure detection will never fire. Failure information (success: Optional[bool], error: Union[ErrorClass, str, None]) lives directly on event.data, not inside event.data.result. The tests mask this by mocking tool_event_data.result with a copilot.types.ToolResult TypedDict and using getattr on it; since TypedDicts are plain dicts and don't expose keys as attributes, getattr(result_obj, "text_result_for_llm", "") also always returns "" — meaning the assertion assert content.result == "Sunny, 72°F" should actually fail, indicating the tests themselves are not passing against the real code path.

Flagged Issues

event.data.result is copilot.generated.session_events.Result (fields: content, contents, detailed_content), NOT copilot.types.ToolResult. The handler accesses .text_result_for_llm, .result_type, and .error — none of which exist on Result — so result_text will always be "" and failure detection will never fire. Use result_obj.content for the result text, and read failure state from event.data.success (Optional[bool]) and event.data.error (Union[ErrorClass, str, None]) directly.
The tests mock tool_event_data.result with ToolResult(...) (a TypedDict, i.e. a plain dict) and the implementation calls getattr(result_obj, "text_result_for_llm", "") on it. getattr on a dict does not expose keys as attributes, so this always returns "". The assertion assert content.result == "Sunny, 72°F" should fail, meaning the tests are not actually validating the implementation. Additionally, ToolResult is used (lines 514, 636, 688) but no import is shown in the diff — though the correct fix is to use session_events.Result instead.

Suggestions

Add a test for TOOL_EXECUTION_COMPLETE with missing fields on event.data (analogous to the existing missing-fields test which only covers START), to exercise the getattr fallback paths for tool_call_id and result in the COMPLETE handler.
Add a test for TOOL_EXECUTION_COMPLETE where the result is successful but an error field is present, to validate that the exception guard correctly returns None.
When correcting the field access, consider that ToolResultType includes "rejected" and "denied" in addition to "failure" — all are non-success states and should likely surface the error. With the corrected event.data.success approach, ensure all failure conditions are handled.

Automated review by moonbox3's agents

python/packages/github_copilot/agent_framework_github_copilot/_agent.py

python/packages/github_copilot/tests/test_github_copilot_agent.py

- Read result text from session_events.Result.content (not ToolResult.text_result_for_llm) - Read failure state from event.data.success/error (not result_obj.result_type/error) - Handle ErrorClass.message and plain string errors - Update tests to use session_events.Result and ErrorClass - Add tests for string errors, success-with-error, and COMPLETE missing fields Signed-off-by: James Sturtevant <jsturtevant@gmail.com>

jsturtevant · 2026-03-19T18:00:22Z

@moonbox3 I've addressed the comments

Copilot AI review requested due to automatic review settings March 15, 2026 20:07

markwallace-microsoft added the python label Mar 15, 2026

github-actions bot changed the title ~~Emit tool call events in GitHubCopilotAgent streaming~~ Python: Emit tool call events in GitHubCopilotAgent streaming Mar 15, 2026

Copilot started reviewing on behalf of jsturtevant March 15, 2026 20:08 View session

Copilot AI reviewed Mar 15, 2026

View reviewed changes

python/packages/github_copilot/agent_framework_github_copilot/_agent.py Outdated Show resolved Hide resolved

python/packages/github_copilot/agent_framework_github_copilot/_agent.py Show resolved Hide resolved

jsturtevant added 2 commits March 15, 2026 13:24

Add some tests

2ed004f

Signed-off-by: James Sturtevant <jsturtevant@gmail.com>

Respond to feedback

e386143

Signed-off-by: James Sturtevant <jsturtevant@gmail.com>

moonbox3 reviewed Mar 17, 2026

View reviewed changes

moonbox3 approved these changes Mar 20, 2026

View reviewed changes

giles17 approved these changes Mar 20, 2026

View reviewed changes

moonbox3 added this pull request to the merge queue Mar 20, 2026

Merged via the queue into microsoft:main with commit b4c4f50 Mar 20, 2026
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Emit tool call events in GitHubCopilotAgent streaming#4711

Python: Emit tool call events in GitHubCopilotAgent streaming#4711
moonbox3 merged 4 commits intomicrosoft:mainfrom
jsturtevant:copilot-tools

jsturtevant commented Mar 15, 2026 •

edited

Loading

Uh oh!

markwallace-microsoft commented Mar 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

moonbox3 commented Mar 16, 2026

Uh oh!

jsturtevant commented Mar 17, 2026

Uh oh!

moonbox3 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jsturtevant commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jsturtevant commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

Contribution Checklist

Uh oh!

markwallace-microsoft commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

moonbox3 commented Mar 16, 2026

Uh oh!

jsturtevant commented Mar 17, 2026

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✗ Test Coverage

✗ Design Approach

Flagged Issues

Suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jsturtevant commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jsturtevant commented Mar 15, 2026 •

edited

Loading

markwallace-microsoft commented Mar 15, 2026 •

edited

Loading