Python: avoid duplicate agent response telemetry by eavanvalkenburg · Pull Request #4685 · microsoft/agent-framework

eavanvalkenburg · 2026-03-13T11:07:47Z

Motivation and Context

Nested agent runs currently record the same response ID and token usage on both the outer invoke_agent span and the inner chat completion span. That duplicates telemetry for a single response and makes span-level metrics noisier than they should be. Fixes #4675.

Description

stop AgentTelemetryLayer from attaching response_id and token usage to agent spans
keep response telemetry ownership on the inner chat span while preserving the rest of the agent span metadata
add regression coverage for nested agent/chat telemetry in streaming and non-streaming paths, plus helper coverage for suppressing response_id

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

markwallace-microsoft · 2026-03-13T11:10:13Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/core/agent_framework
observability.py	738	30	95%	388–389, 416, 418–420, 423–425, 430–431, 437–438, 444–445, 735, 935–936, 1098, 1344–1345, 1597–1601, 1799, 1997, 2215, 2217
TOTAL	27297	3225	88%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
5335	20 💤	0 ❌	0 🔥	1m 26s ⏱️

Copilot

Pull request overview

This PR updates Python observability to prevent duplicate response telemetry when an Agent run produces both an outer agent span and an inner chat completion span, ensuring response ownership (response id + token usage) stays on the chat span.

Changes:

Suppress gen_ai.response.id and gen_ai.usage.* attributes on invoke_agent spans.
Extend _get_response_attributes with a capture_response_id switch (in addition to capture_usage).
Add regression tests covering nested agent/chat telemetry for streaming and non-streaming paths.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
python/packages/core/agent_framework/observability.py	Stops agent spans from capturing response id and token usage; adds `capture_response_id` option to response attribute extraction.
python/packages/core/tests/core/test_observability.py	Updates agent-span assertions and adds regression tests to ensure response telemetry is only attached to chat spans.

You can also share your feedback on Copilot code review. Take the survey.

TaoChenOSU

Is this the correct thing to do? What if customers do want the response IDs in the agent spans.

The span names are different for agent and LLM: invoke_agent and chat. Is that not enough to dedup the data? @sphenry

eavanvalkenburg · 2026-03-17T09:32:33Z

Is this the correct thing to do? What if customers do want the response IDs in the agent spans.

The span names are different for agent and LLM: invoke_agent and chat. Is that not enough to dedup the data? @sphenry

The issue here is that 1) you can have two response id's in a single agent span when you have function calling, and that would mean that one is set (the last one) at the agent level, so that is already wrong, and then 2) they apparently use response_id to do token count, and then having two places where that is set it counts double, you are right that they should be able to filter to only count at chat level, and not invoke_agent level, but that does mean that the response id at the agent level is still wrong (I think it does sum up the token count of all underlying responses), but if you have a mixed setup, where some agents are chat agents (with response id at both chat and invoke_agent spans) and some agents that only have invoke_agent spans, like A2A, then it will make their live a lot easier if they do not have to worry about this, they can just count up all usages where there is a response_id and go from there.

sphenry · 2026-03-18T05:03:42Z

@TaoChenOSU Is there a scenario where they would want it in both?

TaoChenOSU · 2026-03-18T20:21:13Z

Is this the correct thing to do? What if customers do want the response IDs in the agent spans.
The span names are different for agent and LLM: invoke_agent and chat. Is that not enough to dedup the data? @sphenry

The issue here is that 1) you can have two response id's in a single agent span when you have function calling, and that would mean that one is set (the last one) at the agent level, so that is already wrong, and then 2) they apparently use response_id to do token count, and then having two places where that is set it counts double, you are right that they should be able to filter to only count at chat level, and not invoke_agent level, but that does mean that the response id at the agent level is still wrong (I think it does sum up the token count of all underlying responses), but if you have a mixed setup, where some agents are chat agents (with response id at both chat and invoke_agent spans) and some agents that only have invoke_agent spans, like A2A, then it will make their live a lot easier if they do not have to worry about this, they can just count up all usages where there is a response_id and go from there.

Could you explain the first scenario further? That does sound like a bug.

For 2) why couldn't they just use data from the agent spans and not worry about the chat spans at all?

TaoChenOSU · 2026-03-18T20:27:48Z

@TaoChenOSU Is there a scenario where they would want it in both?

I don't have a particular scenario, but I think we should record as much data as we can at each layer because customers rely on the traces to monitor applications. It's generally bad if we selectively drop things.

We should only care about recording the data, i.e. creating the spans and giving them the expected attributes. The application should take care of sending the data to a monitoring backend. Then the consumer of the data can decide how they want to use or parse the data.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The invoke_agent span now carries the aggregated input/output token counts from all inner chat completion spans that occur during an agent run. Previously, when inner ChatTelemetryLayer spans captured usage, the outer AgentTelemetryLayer skipped setting usage entirely to avoid duplication. Now a new INNER_ACCUMULATED_USAGE context variable tracks cumulative usage across all inner completions, and the agent span always reports the total. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 13, 2026 11:07

markwallace-microsoft added the python label Mar 13, 2026

Copilot started reviewing on behalf of eavanvalkenburg March 13, 2026 11:08 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

eavanvalkenburg mentioned this pull request Mar 13, 2026

Python: [Bug]: Duplicate LLM Telemetry Emission #4675

Closed

giles17 approved these changes Mar 13, 2026

View reviewed changes

TaoChenOSU reviewed Mar 16, 2026

View reviewed changes

moonbox3 approved these changes Mar 18, 2026

View reviewed changes

eavanvalkenburg enabled auto-merge March 19, 2026 10:47

eavanvalkenburg added this pull request to the merge queue Mar 19, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 19, 2026

eavanvalkenburg added this pull request to the merge queue Mar 19, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 19, 2026

eavanvalkenburg force-pushed the copilot/issue-4675-duplicate-telemetry branch from 55cc6e8 to 95dbff7 Compare March 19, 2026 15:38

eavanvalkenburg enabled auto-merge March 19, 2026 15:38

eavanvalkenburg force-pushed the copilot/issue-4675-duplicate-telemetry branch from 95dbff7 to 2f506a1 Compare March 20, 2026 08:55

eavanvalkenburg and others added 3 commits March 20, 2026 10:00

Python: avoid duplicate agent response telemetry

6b5ef19

Python: conditionally suppress duplicate agent telemetry

c2d07d5

Simplify telemetry ownership tracking

c12300c

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

eavanvalkenburg force-pushed the copilot/issue-4675-duplicate-telemetry branch from 2f506a1 to c12300c Compare March 20, 2026 09:00

moonbox3 approved these changes Mar 20, 2026

View reviewed changes

eavanvalkenburg added this pull request to the merge queue Mar 20, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 20, 2026

eavanvalkenburg added this pull request to the merge queue Mar 20, 2026

Merged via the queue into microsoft:main with commit 81e2336 Mar 20, 2026
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: avoid duplicate agent response telemetry#4685

Python: avoid duplicate agent response telemetry#4685
eavanvalkenburg merged 4 commits intomicrosoft:mainfrom
eavanvalkenburg:copilot/issue-4675-duplicate-telemetry

eavanvalkenburg commented Mar 13, 2026

Uh oh!

markwallace-microsoft commented Mar 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

TaoChenOSU left a comment •

edited

Loading

Uh oh!

eavanvalkenburg commented Mar 17, 2026

Uh oh!

sphenry commented Mar 18, 2026

Uh oh!

TaoChenOSU commented Mar 18, 2026

Uh oh!

TaoChenOSU commented Mar 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

eavanvalkenburg commented Mar 13, 2026

Motivation and Context

Description

Contribution Checklist

Uh oh!

markwallace-microsoft commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

TaoChenOSU left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eavanvalkenburg commented Mar 17, 2026

Uh oh!

sphenry commented Mar 18, 2026

Uh oh!

TaoChenOSU commented Mar 18, 2026

Uh oh!

TaoChenOSU commented Mar 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

markwallace-microsoft commented Mar 13, 2026 •

edited

Loading

TaoChenOSU left a comment •

edited

Loading