Python: avoid duplicate agent response telemetry#4685
Conversation
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
This PR updates Python observability to prevent duplicate response telemetry when an Agent run produces both an outer agent span and an inner chat completion span, ensuring response ownership (response id + token usage) stays on the chat span.
Changes:
- Suppress
gen_ai.response.idandgen_ai.usage.*attributes oninvoke_agentspans. - Extend
_get_response_attributeswith acapture_response_idswitch (in addition tocapture_usage). - Add regression tests covering nested agent/chat telemetry for streaming and non-streaming paths.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| python/packages/core/agent_framework/observability.py | Stops agent spans from capturing response id and token usage; adds capture_response_id option to response attribute extraction. |
| python/packages/core/tests/core/test_observability.py | Updates agent-span assertions and adds regression tests to ensure response telemetry is only attached to chat spans. |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Is this the correct thing to do? What if customers do want the response IDs in the agent spans.
The span names are different for agent and LLM: invoke_agent and chat. Is that not enough to dedup the data? @sphenry
The issue here is that 1) you can have two response id's in a single agent span when you have function calling, and that would mean that one is set (the last one) at the agent level, so that is already wrong, and then 2) they apparently use response_id to do token count, and then having two places where that is set it counts double, you are right that they should be able to filter to only count at |
|
@TaoChenOSU Is there a scenario where they would want it in both? |
Could you explain the first scenario further? That does sound like a bug. For 2) why couldn't they just use data from the agent spans and not worry about the chat spans at all? |
I don't have a particular scenario, but I think we should record as much data as we can at each layer because customers rely on the traces to monitor applications. It's generally bad if we selectively drop things. We should only care about recording the data, i.e. creating the spans and giving them the expected attributes. The application should take care of sending the data to a monitoring backend. Then the consumer of the data can decide how they want to use or parse the data. |
55cc6e8 to
95dbff7
Compare
95dbff7 to
2f506a1
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2f506a1 to
c12300c
Compare
The invoke_agent span now carries the aggregated input/output token counts from all inner chat completion spans that occur during an agent run. Previously, when inner ChatTelemetryLayer spans captured usage, the outer AgentTelemetryLayer skipped setting usage entirely to avoid duplication. Now a new INNER_ACCUMULATED_USAGE context variable tracks cumulative usage across all inner completions, and the agent span always reports the total. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation and Context
Nested agent runs currently record the same response ID and token usage on both the outer
invoke_agentspan and the inner chat completion span. That duplicates telemetry for a single response and makes span-level metrics noisier than they should be. Fixes #4675.Description
AgentTelemetryLayerfrom attachingresponse_idand token usage to agent spansresponse_idContribution Checklist