fix(openai): align reasoning and cache token attributes with OTel GenAI spec#4130
Conversation
|
Dvir Rezenman seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
📝 WalkthroughWalkthroughThis PR enhances token tracking across OpenTelemetry instrumentation by renaming a semantic convention constant for clarity, adding cache-read and reasoning-token extraction to the OpenAI Agents instrumentor, and aligning attribute references across OpenAI and Traceloop SDK code. ChangesToken Tracking & Semantic Convention Alignment
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)
554-557: ⚡ Quick winSelect the target span by attribute instead of positional index.
Using
response_spans[0]is brittle when multipleopenai.responsespans are emitted; pick the span that actually has the asserted attribute.More deterministic span selection
- response_span = response_spans[0] + response_span = next( + (s for s in response_spans if GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS in s.attributes), + None, + ) + assert response_span is not None- response_span = response_spans[0] + response_span = next( + (s for s in response_spans if SpanAttributes.GEN_AI_USAGE_REASONING_OUTPUT_TOKENS in s.attributes), + None, + ) + assert response_span is not NoneAlso applies to: 574-577
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py` around lines 554 - 557, The test currently picks a span by position (response_spans[0]) which is brittle; replace that with a deterministic selection that finds the span by the attribute you assert later (use response_spans and the specific attribute key used in the assertions). For example, replace uses of response_spans[0] in the test with code that selects the span via next(s for s in response_spans if '<ATTRIBUTE_KEY>' in s.attributes or s.attributes.get('<ATTRIBUTE_KEY>') == '<EXPECTED_VALUE>'), where <ATTRIBUTE_KEY> is the attribute name asserted later in the test; do this for both occurrences (around the response_spans usage at the two ranges noted) so the test picks the span by attribute, not index.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_cache_read_input_tokens_recorded.yaml`:
- Around line 90-93: Remove all Cookie and set-cookie header values from the
cassette test_openai_agents/test_cache_read_input_tokens_recorded.yaml (and the
other occurrences noted) by redacting those header lines and re-recording the
cassette; ensure the test harness that generates cassettes filters or normalizes
"Cookie" request headers and "set-cookie" response headers before saving (update
the cassette recording/filtering logic to strip or replace these headers with a
stable placeholder) so future recordings do not include session cookies and
re-run the tests to commit the sanitized cassette.
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_reasoning_output_tokens_recorded.yaml`:
- Around line 16-17: The cassette
test_openai_agents/test_reasoning_output_tokens_recorded.yaml contains
sensitive/volatile headers (e.g., Cookie and request/infra identifiers) that
must be filtered from VCR recordings; update the test recording configuration
used for this suite to strip or replace Cookie and any request/infra ID headers
(e.g., CF identifiers, X-Request-Id) during capture, re-record the cassette, and
commit the regenerated YAML so the cassette no longer contains those sensitive
values.
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`:
- Around line 546-547: The test's cache-hit assertion currently uses the
expression ">= 0" which is too weak; locate the assertion that checks for a
cache hit (the line asserting ">= 0") in tests/test_openai_agents.py and change
it to use "> 0" so the second call is enforced as a cache read per the test
docstring; update any variable names involved (e.g., cache_hit_count or similar)
accordingly and run the test to confirm it now fails when there is no cache hit.
- Around line 544-545: The VCR configuration in conftest.py only filters
["authorization", "api-key"] and misses CloudFlare cookies, which are present in
cassettes like test_reasoning_output_tokens_recorded.yaml; update the VCR
config's filter_headers list to include "cookie" (i.e., modify the
filter_headers variable/setting in the VCR setup in conftest.py to add "cookie")
and then re-run the test suite to re-record the affected cassettes so
session/tracking cookies are scrubbed from the YAML files.
In
`@packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py`:
- Line 82: Add a deprecated alias for the renamed constant so existing imports
still work: in opentelemetry.semconv_ai.__init__.py (next to
GEN_AI_USAGE_REASONING_OUTPUT_TOKENS) define the old symbol
SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS (or simply
GEN_AI_USAGE_REASONING_TOKENS if constants are module-level) and assign it the
same string value as GEN_AI_USAGE_REASONING_OUTPUT_TOKENS, include a brief
comment noting it is deprecated and optionally emit a DeprecationWarning when
accessed to guide users to the new constant name.
---
Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`:
- Around line 554-557: The test currently picks a span by position
(response_spans[0]) which is brittle; replace that with a deterministic
selection that finds the span by the attribute you assert later (use
response_spans and the specific attribute key used in the assertions). For
example, replace uses of response_spans[0] in the test with code that selects
the span via next(s for s in response_spans if '<ATTRIBUTE_KEY>' in s.attributes
or s.attributes.get('<ATTRIBUTE_KEY>') == '<EXPECTED_VALUE>'), where
<ATTRIBUTE_KEY> is the attribute name asserted later in the test; do this for
both occurrences (around the response_spans usage at the two ranges noted) so
the test picks the span by attribute, not index.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 21e8e29b-0fa9-4aff-8aaf-b153db91fba1
⛔ Files ignored due to path filters (28)
packages/opentelemetry-instrumentation-agno/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-anthropic/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-bedrock/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-chromadb/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-cohere/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-crewai/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-groq/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-haystack/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-lancedb/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-marqo/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-mcp/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-milvus/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-mistralai/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-ollama/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-openai-agents/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-openai/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-pinecone/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-qdrant/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-replicate/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-sagemaker/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-together/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-transformers/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-vertexai/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-voyageai/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-watsonx/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-writer/uv.lockis excluded by!**/*.lockpackages/sample-app/uv.lockis excluded by!**/*.lockpackages/traceloop-sdk/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/pyproject.tomlpackages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_cache_read_input_tokens_recorded.yamlpackages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_reasoning_output_tokens_recorded.yamlpackages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.pypackages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.pypackages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_azure.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_chat.pypackages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.pypackages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/_testing.pypackages/traceloop-sdk/tests/test_manual.pypackages/traceloop-sdk/traceloop/sdk/tracing/manual.py
| set-cookie: | ||
| - __cf_bm=GqvMTTwxqvHlYO_zMGr15vhngiHBKowc4W9cytloB1U-1778513729.5012145-1.0.1.1-TuWmsWma1J3OWFenDizlhbf_5qCYt89CjxNLLDDm.zLzVxhm2ep.2mEgG6rExy5xHOM0Quwrvd09cIn.yYTv.TO2jCa5bit4sQJ6ZIeHevWLiejQyIXZYqEUyXQItPzY; | ||
| HttpOnly; SameSite=None; Secure; Path=/; Domain=api.openai.com; Expires=Mon, | ||
| 11 May 2026 16:05:32 GMT |
There was a problem hiding this comment.
Remove cookie material from test cassette.
This cassette stores request Cookie and response set-cookie values. Please redact/filter these headers and re-record to avoid leaking sensitive session artifacts and to improve replay stability.
Also applies to: 165-168, 190-191
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_cache_read_input_tokens_recorded.yaml`
around lines 90 - 93, Remove all Cookie and set-cookie header values from the
cassette test_openai_agents/test_cache_read_input_tokens_recorded.yaml (and the
other occurrences noted) by redacting those header lines and re-recording the
cassette; ensure the test harness that generates cassettes filters or normalizes
"Cookie" request headers and "set-cookie" response headers before saving (update
the cassette recording/filtering logic to strip or replace these headers with a
stable placeholder) so future recordings do not include session cookies and
re-run the tests to commit the sanitized cassette.
| Cookie: | ||
| - __cf_bm=GqvMTTwxqvHlYO_zMGr15vhngiHBKowc4W9cytloB1U-1778513729.5012145-1.0.1.1-TuWmsWma1J3OWFenDizlhbf_5qCYt89CjxNLLDDm.zLzVxhm2ep.2mEgG6rExy5xHOM0Quwrvd09cIn.yYTv.TO2jCa5bit4sQJ6ZIeHevWLiejQyIXZYqEUyXQItPzY |
There was a problem hiding this comment.
Scrub sensitive/volatile headers from cassette recordings.
Cookie and request/infra identifiers should not be committed in cassettes; they are sensitive/ephemeral and make fixtures less stable. Please filter these headers during VCR recording and regenerate this cassette.
Also applies to: 60-64, 104-105
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_reasoning_output_tokens_recorded.yaml`
around lines 16 - 17, The cassette
test_openai_agents/test_reasoning_output_tokens_recorded.yaml contains
sensitive/volatile headers (e.g., Cookie and request/infra identifiers) that
must be filtered from VCR recordings; update the test recording configuration
used for this suite to strip or replace Cookie and any request/infra ID headers
(e.g., CF identifiers, X-Request-Id) during capture, re-record the cassette, and
commit the regenerated YAML so the cassette no longer contains those sensitive
values.
| @pytest.mark.vcr | ||
| def test_cache_read_input_tokens_recorded(exporter, test_agent): |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate the new cassettes referenced by these tests
fd -a 'test_(cache_read_input_tokens_recorded|reasoning_output_tokens_recorded)\.yaml' \
packages/opentelemetry-instrumentation-openai-agents/tests/cassettes
# Scan cassettes for common secret/PII indicators (review any hits; placeholders/masked values are acceptable)
rg -n --iglob '*.yaml' -e 'Authorization|Bearer\s+[A-Za-z0-9\-_\.]+|api[_-]?key|sk-[A-Za-z0-9]|@|email' \
packages/opentelemetry-instrumentation-openai-agents/tests/cassettesRepository: traceloop/openllmetry
Length of output: 365
🏁 Script executed:
#!/bin/bash
# Check the cassette files for any sensitive content
cd packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents
# Show file sizes and examine key sections of the cassettes
for cassette in test_cache_read_input_tokens_recorded.yaml test_reasoning_output_tokens_recorded.yaml; do
if [ -f "$cassette" ]; then
echo "=== $cassette ==="
wc -l "$cassette"
echo "--- First 50 lines ---"
head -50 "$cassette"
echo "--- Looking for potential secrets (detailed check) ---"
rg -n 'sk_|Bearer|Authorization|api.?key|token|secret|password' "$cassette" || echo "No matches found"
fi
doneRepository: traceloop/openllmetry
Length of output: 6564
🏁 Script executed:
#!/bin/bash
# Check the test file to see if VCR filters are configured
cd packages/opentelemetry-instrumentation-openai-agents/tests
# Search for VCR decorator and filter configuration around the test methods
rg -B5 -A10 'def test_(cache_read_input_tokens_recorded|reasoning_output_tokens_recorded)' test_openai_agents.pyRepository: traceloop/openllmetry
Length of output: 1274
🏁 Script executed:
#!/bin/bash
# Look for VCR configuration in conftest files
fd conftest.py packages/opentelemetry-instrumentation-openai-agents/tests | head -5
# Check the conftest for VCR filter setup
if [ -f packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py ]; then
echo "=== conftest.py content ==="
cat -n packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py
fiRepository: traceloop/openllmetry
Length of output: 11337
🏁 Script executed:
#!/bin/bash
# Also check for pytest.ini or setup.cfg with VCR config
find packages/opentelemetry-instrumentation-openai-agents/tests -maxdepth 1 \( -name "pytest.ini" -o -name "setup.cfg" -o -name "pyproject.toml" \) -exec echo "=== {} ===" \; -exec cat {} \;Repository: traceloop/openllmetry
Length of output: 47
VCR cassettes need improved filtering configuration and re-recording.
The conftest.py VCR config (lines 284–286) only filters ["authorization", "api-key"] headers, but the cassettes contain unfiltered CloudFlare bot management cookies (e.g., __cf_bm=... in test_reasoning_output_tokens_recorded.yaml). Add "cookie" to the filter_headers list and re-record the cassettes to ensure all session/tracking data is properly scrubbed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`
around lines 544 - 545, The VCR configuration in conftest.py only filters
["authorization", "api-key"] and misses CloudFlare cookies, which are present in
cassettes like test_reasoning_output_tokens_recorded.yaml; update the VCR
config's filter_headers list to include "cookie" (i.e., modify the
filter_headers variable/setting in the VCR setup in conftest.py to add "cookie")
and then re-run the test suite to re-record the affected cassettes so
session/tracking cookies are scrubbed from the YAML files.
| """Cache read tokens should be > 0 on the second call — OpenAI writes to cache | ||
| on the first call and serves from it on the second.""" |
There was a problem hiding this comment.
Cache-hit assertion is weaker than the test’s stated intent.
Line 559 uses >= 0, but the docstring (Lines 546-547) explicitly says the second call should be a cache read (> 0). This can let a non-cache-hit pass.
Proposed tightening
- assert response_span.attributes[GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS] >= 0
+ assert response_span.attributes[GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS] > 0Also applies to: 559-559
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`
around lines 546 - 547, The test's cache-hit assertion currently uses the
expression ">= 0" which is too weak; locate the assertion that checks for a
cache hit (the line asserting ">= 0") in tests/test_openai_agents.py and change
it to use "> 0" so the second call is enforced as a cache read per the test
docstring; update any variable names involved (e.g., cache_hit_count or similar)
accordingly and run the test to confirm it now fails when there is no cache hit.
| GEN_AI_CONTENT_COMPLETION_CHUNK = "gen_ai.content.completion.chunk" | ||
| GEN_AI_REQUEST_REASONING_EFFORT = "gen_ai.request.reasoning_effort" | ||
| GEN_AI_USAGE_REASONING_TOKENS = "gen_ai.usage.reasoning_tokens" | ||
| GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens" |
There was a problem hiding this comment.
Preserve backward compatibility for the renamed reasoning constant.
Dropping SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS can break consumers importing that symbol. Keep a deprecated alias pointing to the new key to avoid runtime import breaks.
Proposed compatibility patch
class SpanAttributes:
@@
GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens"
+ # Backward-compatible alias (deprecated): remove in next major version
+ GEN_AI_USAGE_REASONING_TOKENS = GEN_AI_USAGE_REASONING_OUTPUT_TOKENS📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens" | |
| GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens" | |
| # Backward-compatible alias (deprecated): remove in next major version | |
| GEN_AI_USAGE_REASONING_TOKENS = GEN_AI_USAGE_REASONING_OUTPUT_TOKENS |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py`
at line 82, Add a deprecated alias for the renamed constant so existing imports
still work: in opentelemetry.semconv_ai.__init__.py (next to
GEN_AI_USAGE_REASONING_OUTPUT_TOKENS) define the old symbol
SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS (or simply
GEN_AI_USAGE_REASONING_TOKENS if constants are module-level) and assign it the
same string value as GEN_AI_USAGE_REASONING_OUTPUT_TOKENS, include a brief
comment noting it is deprecated and optionally emit a DeprecationWarning when
accessed to guide users to the new constant name.
|
Reviewed the change — direction is right (aligning attribute names with the OTel GenAI spec). A few things worth addressing: 1. Two sources of truth for the cache constantsThe same constants exist in both places, resolving to the same string today:
The PR mixes both: I'd lean toward standardizing on the upstream OTel-native 2.
|
Fixes #4060
Problem:
gen_ai.usage.reasoning_tokens was the wrong attribute name — the OTel GenAI spec defines it as gen_ai.usage.reasoning.output_tokens. Similarly, the LLM_USAGE_CACHE_READ_INPUT_TOKENS constant resolved to gen_ai.usage.cache_read_input_tokens (underscore) instead of
the spec-correct gen_ai.usage.cache_read.input_tokens (dot). The openai-agents instrumentor also wasn't recording cache read or reasoning tokens at all. The opentelemetry-semantic-conventions floor was too old (0.60b1) to provide the official upstream cache token constants.
Fix:
Bumped opentelemetry-semantic-conventions to >=0.62b1 so cache token constants come from the upstream OTel package. Added GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens" to semconv_ai and removed the old incorrect constant. Updated the
openai and openai-agents instrumentors to use the corrected constants. Added two new tests with recorded cassettes for cache read and reasoning token recording in openai-agents.
Summary by CodeRabbit
New Features
Tests
Chores