Skip to content

fix(openai): align reasoning and cache token attributes with OTel GenAI spec#4130

Open
dvirski wants to merge 2 commits into
mainfrom
dr/feat(semconv)-align-reasoning-and-cache-token-attributes-with-OTel-GenAI-spec
Open

fix(openai): align reasoning and cache token attributes with OTel GenAI spec#4130
dvirski wants to merge 2 commits into
mainfrom
dr/feat(semconv)-align-reasoning-and-cache-token-attributes-with-OTel-GenAI-spec

Conversation

@dvirski
Copy link
Copy Markdown

@dvirski dvirski commented May 12, 2026

Fixes #4060

Problem:
gen_ai.usage.reasoning_tokens was the wrong attribute name — the OTel GenAI spec defines it as gen_ai.usage.reasoning.output_tokens. Similarly, the LLM_USAGE_CACHE_READ_INPUT_TOKENS constant resolved to gen_ai.usage.cache_read_input_tokens (underscore) instead of
the spec-correct gen_ai.usage.cache_read.input_tokens (dot). The openai-agents instrumentor also wasn't recording cache read or reasoning tokens at all. The opentelemetry-semantic-conventions floor was too old (0.60b1) to provide the official upstream cache token constants.

Fix:
Bumped opentelemetry-semantic-conventions to >=0.62b1 so cache token constants come from the upstream OTel package. Added GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens" to semconv_ai and removed the old incorrect constant. Updated the
openai and openai-agents instrumentors to use the corrected constants. Added two new tests with recorded cassettes for cache read and reasoning token recording in openai-agents.

Summary by CodeRabbit

  • New Features

    • Added tracking for cached input tokens in OpenAI responses.
    • Added tracking for reasoning output tokens (o3-mini model support).
  • Tests

    • Added test coverage for cache token usage recording.
    • Added test coverage for reasoning token output recording.
  • Chores

    • Updated semantic conventions for reasoning token attribute naming.
    • Updated dependency requirements for semantic conventions compatibility.

Review Change Stack

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Dvir Rezenman seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

📝 Walkthrough

Walkthrough

This PR enhances token tracking across OpenTelemetry instrumentation by renaming a semantic convention constant for clarity, adding cache-read and reasoning-token extraction to the OpenAI Agents instrumentor, and aligning attribute references across OpenAI and Traceloop SDK code.

Changes

Token Tracking & Semantic Convention Alignment

Layer / File(s) Summary
Semantic convention constant rename for reasoning tokens
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py, packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/_testing.py
Renamed GEN_AI_USAGE_REASONING_TOKENS to GEN_AI_USAGE_REASONING_OUTPUT_TOKENS and updated the attribute string from gen_ai.usage.reasoning_tokens to gen_ai.usage.reasoning.output_tokens for semantic clarity.
OpenAI instrumentation migrated to new reasoning constant
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py, packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py, packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py, packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
Updated span attribute keys in both chat and response wrappers to use the renamed constant, and updated test assertions accordingly.
OpenAI Agents cache and reasoning token extraction
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py, packages/opentelemetry-instrumentation-openai-agents/pyproject.toml, packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py, packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/*
Extended _extract_response_attributes to extract input_tokens_details.cached_tokens and output_tokens_details.reasoning_tokens from response usage and emit corresponding span attributes. Updated dependency constraint and added two VCR-based test cases with cassettes validating cache-read and reasoning token recording.
Traceloop SDK cache token attribute alignment
packages/traceloop-sdk/traceloop/sdk/tracing/manual.py, packages/traceloop-sdk/tests/test_manual.py
Updated LLMSpan.report_usage to reference GenAIAttributes cache token constants instead of the SpanAttributes.LLM_USAGE_CACHE_* keys, with corresponding test updates.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

📦 Cache and reasoning tokens, now in sight,
Semantic conventions aligned just right,
From Agents to SDK, the whole stack glows,
Token tracking flows where OpenAI goes. 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly summarizes the main change: aligning reasoning and cache token attributes with the OTel GenAI spec across OpenAI and OpenAI Agents instrumentations.
Linked Issues check ✅ Passed The PR fully addresses issue #4060 by implementing cache read token recording in openai-agents [#4060], updating attribute naming to spec-compliant constants in semconv_ai, and syncing related OpenAI instrumentor code.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing attribute naming and adding cache/reasoning token recording as specified in issue #4060, with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dr/feat(semconv)-align-reasoning-and-cache-token-attributes-with-OTel-GenAI-spec

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)

554-557: ⚡ Quick win

Select the target span by attribute instead of positional index.

Using response_spans[0] is brittle when multiple openai.response spans are emitted; pick the span that actually has the asserted attribute.

More deterministic span selection
-    response_span = response_spans[0]
+    response_span = next(
+        (s for s in response_spans if GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS in s.attributes),
+        None,
+    )
+    assert response_span is not None
-    response_span = response_spans[0]
+    response_span = next(
+        (s for s in response_spans if SpanAttributes.GEN_AI_USAGE_REASONING_OUTPUT_TOKENS in s.attributes),
+        None,
+    )
+    assert response_span is not None

Also applies to: 574-577

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`
around lines 554 - 557, The test currently picks a span by position
(response_spans[0]) which is brittle; replace that with a deterministic
selection that finds the span by the attribute you assert later (use
response_spans and the specific attribute key used in the assertions). For
example, replace uses of response_spans[0] in the test with code that selects
the span via next(s for s in response_spans if '<ATTRIBUTE_KEY>' in s.attributes
or s.attributes.get('<ATTRIBUTE_KEY>') == '<EXPECTED_VALUE>'), where
<ATTRIBUTE_KEY> is the attribute name asserted later in the test; do this for
both occurrences (around the response_spans usage at the two ranges noted) so
the test picks the span by attribute, not index.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_cache_read_input_tokens_recorded.yaml`:
- Around line 90-93: Remove all Cookie and set-cookie header values from the
cassette test_openai_agents/test_cache_read_input_tokens_recorded.yaml (and the
other occurrences noted) by redacting those header lines and re-recording the
cassette; ensure the test harness that generates cassettes filters or normalizes
"Cookie" request headers and "set-cookie" response headers before saving (update
the cassette recording/filtering logic to strip or replace these headers with a
stable placeholder) so future recordings do not include session cookies and
re-run the tests to commit the sanitized cassette.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_reasoning_output_tokens_recorded.yaml`:
- Around line 16-17: The cassette
test_openai_agents/test_reasoning_output_tokens_recorded.yaml contains
sensitive/volatile headers (e.g., Cookie and request/infra identifiers) that
must be filtered from VCR recordings; update the test recording configuration
used for this suite to strip or replace Cookie and any request/infra ID headers
(e.g., CF identifiers, X-Request-Id) during capture, re-record the cassette, and
commit the regenerated YAML so the cassette no longer contains those sensitive
values.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`:
- Around line 546-547: The test's cache-hit assertion currently uses the
expression ">= 0" which is too weak; locate the assertion that checks for a
cache hit (the line asserting ">= 0") in tests/test_openai_agents.py and change
it to use "> 0" so the second call is enforced as a cache read per the test
docstring; update any variable names involved (e.g., cache_hit_count or similar)
accordingly and run the test to confirm it now fails when there is no cache hit.
- Around line 544-545: The VCR configuration in conftest.py only filters
["authorization", "api-key"] and misses CloudFlare cookies, which are present in
cassettes like test_reasoning_output_tokens_recorded.yaml; update the VCR
config's filter_headers list to include "cookie" (i.e., modify the
filter_headers variable/setting in the VCR setup in conftest.py to add "cookie")
and then re-run the test suite to re-record the affected cassettes so
session/tracking cookies are scrubbed from the YAML files.

In
`@packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py`:
- Line 82: Add a deprecated alias for the renamed constant so existing imports
still work: in opentelemetry.semconv_ai.__init__.py (next to
GEN_AI_USAGE_REASONING_OUTPUT_TOKENS) define the old symbol
SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS (or simply
GEN_AI_USAGE_REASONING_TOKENS if constants are module-level) and assign it the
same string value as GEN_AI_USAGE_REASONING_OUTPUT_TOKENS, include a brief
comment noting it is deprecated and optionally emit a DeprecationWarning when
accessed to guide users to the new constant name.

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`:
- Around line 554-557: The test currently picks a span by position
(response_spans[0]) which is brittle; replace that with a deterministic
selection that finds the span by the attribute you assert later (use
response_spans and the specific attribute key used in the assertions). For
example, replace uses of response_spans[0] in the test with code that selects
the span via next(s for s in response_spans if '<ATTRIBUTE_KEY>' in s.attributes
or s.attributes.get('<ATTRIBUTE_KEY>') == '<EXPECTED_VALUE>'), where
<ATTRIBUTE_KEY> is the attribute name asserted later in the test; do this for
both occurrences (around the response_spans usage at the two ranges noted) so
the test picks the span by attribute, not index.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 21e8e29b-0fa9-4aff-8aaf-b153db91fba1

📥 Commits

Reviewing files that changed from the base of the PR and between 6d3e696 and 5306120.

⛔ Files ignored due to path filters (28)
  • packages/opentelemetry-instrumentation-agno/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-anthropic/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-bedrock/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-chromadb/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-cohere/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-crewai/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-groq/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-haystack/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-lancedb/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-marqo/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-mcp/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-milvus/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-mistralai/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-ollama/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-openai-agents/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-openai/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-pinecone/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-qdrant/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-replicate/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-sagemaker/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-together/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-transformers/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-vertexai/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-voyageai/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-watsonx/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-writer/uv.lock is excluded by !**/*.lock
  • packages/sample-app/uv.lock is excluded by !**/*.lock
  • packages/traceloop-sdk/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
  • packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_cache_read_input_tokens_recorded.yaml
  • packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_reasoning_output_tokens_recorded.yaml
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/_testing.py
  • packages/traceloop-sdk/tests/test_manual.py
  • packages/traceloop-sdk/traceloop/sdk/tracing/manual.py

Comment on lines +90 to +93
set-cookie:
- __cf_bm=GqvMTTwxqvHlYO_zMGr15vhngiHBKowc4W9cytloB1U-1778513729.5012145-1.0.1.1-TuWmsWma1J3OWFenDizlhbf_5qCYt89CjxNLLDDm.zLzVxhm2ep.2mEgG6rExy5xHOM0Quwrvd09cIn.yYTv.TO2jCa5bit4sQJ6ZIeHevWLiejQyIXZYqEUyXQItPzY;
HttpOnly; SameSite=None; Secure; Path=/; Domain=api.openai.com; Expires=Mon,
11 May 2026 16:05:32 GMT
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove cookie material from test cassette.

This cassette stores request Cookie and response set-cookie values. Please redact/filter these headers and re-record to avoid leaking sensitive session artifacts and to improve replay stability.

Also applies to: 165-168, 190-191

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_cache_read_input_tokens_recorded.yaml`
around lines 90 - 93, Remove all Cookie and set-cookie header values from the
cassette test_openai_agents/test_cache_read_input_tokens_recorded.yaml (and the
other occurrences noted) by redacting those header lines and re-recording the
cassette; ensure the test harness that generates cassettes filters or normalizes
"Cookie" request headers and "set-cookie" response headers before saving (update
the cassette recording/filtering logic to strip or replace these headers with a
stable placeholder) so future recordings do not include session cookies and
re-run the tests to commit the sanitized cassette.

Comment on lines +16 to +17
Cookie:
- __cf_bm=GqvMTTwxqvHlYO_zMGr15vhngiHBKowc4W9cytloB1U-1778513729.5012145-1.0.1.1-TuWmsWma1J3OWFenDizlhbf_5qCYt89CjxNLLDDm.zLzVxhm2ep.2mEgG6rExy5xHOM0Quwrvd09cIn.yYTv.TO2jCa5bit4sQJ6ZIeHevWLiejQyIXZYqEUyXQItPzY
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scrub sensitive/volatile headers from cassette recordings.

Cookie and request/infra identifiers should not be committed in cassettes; they are sensitive/ephemeral and make fixtures less stable. Please filter these headers during VCR recording and regenerate this cassette.

Also applies to: 60-64, 104-105

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_reasoning_output_tokens_recorded.yaml`
around lines 16 - 17, The cassette
test_openai_agents/test_reasoning_output_tokens_recorded.yaml contains
sensitive/volatile headers (e.g., Cookie and request/infra identifiers) that
must be filtered from VCR recordings; update the test recording configuration
used for this suite to strip or replace Cookie and any request/infra ID headers
(e.g., CF identifiers, X-Request-Id) during capture, re-record the cassette, and
commit the regenerated YAML so the cassette no longer contains those sensitive
values.

Comment on lines +544 to +545
@pytest.mark.vcr
def test_cache_read_input_tokens_recorded(exporter, test_agent):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate the new cassettes referenced by these tests
fd -a 'test_(cache_read_input_tokens_recorded|reasoning_output_tokens_recorded)\.yaml' \
  packages/opentelemetry-instrumentation-openai-agents/tests/cassettes

# Scan cassettes for common secret/PII indicators (review any hits; placeholders/masked values are acceptable)
rg -n --iglob '*.yaml' -e 'Authorization|Bearer\s+[A-Za-z0-9\-_\.]+|api[_-]?key|sk-[A-Za-z0-9]|@|email' \
  packages/opentelemetry-instrumentation-openai-agents/tests/cassettes

Repository: traceloop/openllmetry

Length of output: 365


🏁 Script executed:

#!/bin/bash
# Check the cassette files for any sensitive content
cd packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents

# Show file sizes and examine key sections of the cassettes
for cassette in test_cache_read_input_tokens_recorded.yaml test_reasoning_output_tokens_recorded.yaml; do
  if [ -f "$cassette" ]; then
    echo "=== $cassette ==="
    wc -l "$cassette"
    echo "--- First 50 lines ---"
    head -50 "$cassette"
    echo "--- Looking for potential secrets (detailed check) ---"
    rg -n 'sk_|Bearer|Authorization|api.?key|token|secret|password' "$cassette" || echo "No matches found"
  fi
done

Repository: traceloop/openllmetry

Length of output: 6564


🏁 Script executed:

#!/bin/bash
# Check the test file to see if VCR filters are configured
cd packages/opentelemetry-instrumentation-openai-agents/tests

# Search for VCR decorator and filter configuration around the test methods
rg -B5 -A10 'def test_(cache_read_input_tokens_recorded|reasoning_output_tokens_recorded)' test_openai_agents.py

Repository: traceloop/openllmetry

Length of output: 1274


🏁 Script executed:

#!/bin/bash
# Look for VCR configuration in conftest files
fd conftest.py packages/opentelemetry-instrumentation-openai-agents/tests | head -5

# Check the conftest for VCR filter setup
if [ -f packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py ]; then
  echo "=== conftest.py content ==="
  cat -n packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py
fi

Repository: traceloop/openllmetry

Length of output: 11337


🏁 Script executed:

#!/bin/bash
# Also check for pytest.ini or setup.cfg with VCR config
find packages/opentelemetry-instrumentation-openai-agents/tests -maxdepth 1 \( -name "pytest.ini" -o -name "setup.cfg" -o -name "pyproject.toml" \) -exec echo "=== {} ===" \; -exec cat {} \;

Repository: traceloop/openllmetry

Length of output: 47


VCR cassettes need improved filtering configuration and re-recording.

The conftest.py VCR config (lines 284–286) only filters ["authorization", "api-key"] headers, but the cassettes contain unfiltered CloudFlare bot management cookies (e.g., __cf_bm=... in test_reasoning_output_tokens_recorded.yaml). Add "cookie" to the filter_headers list and re-record the cassettes to ensure all session/tracking data is properly scrubbed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`
around lines 544 - 545, The VCR configuration in conftest.py only filters
["authorization", "api-key"] and misses CloudFlare cookies, which are present in
cassettes like test_reasoning_output_tokens_recorded.yaml; update the VCR
config's filter_headers list to include "cookie" (i.e., modify the
filter_headers variable/setting in the VCR setup in conftest.py to add "cookie")
and then re-run the test suite to re-record the affected cassettes so
session/tracking cookies are scrubbed from the YAML files.

Comment on lines +546 to +547
"""Cache read tokens should be > 0 on the second call — OpenAI writes to cache
on the first call and serves from it on the second."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Cache-hit assertion is weaker than the test’s stated intent.

Line 559 uses >= 0, but the docstring (Lines 546-547) explicitly says the second call should be a cache read (> 0). This can let a non-cache-hit pass.

Proposed tightening
-    assert response_span.attributes[GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS] >= 0
+    assert response_span.attributes[GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS] > 0

Also applies to: 559-559

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`
around lines 546 - 547, The test's cache-hit assertion currently uses the
expression ">= 0" which is too weak; locate the assertion that checks for a
cache hit (the line asserting ">= 0") in tests/test_openai_agents.py and change
it to use "> 0" so the second call is enforced as a cache read per the test
docstring; update any variable names involved (e.g., cache_hit_count or similar)
accordingly and run the test to confirm it now fails when there is no cache hit.

GEN_AI_CONTENT_COMPLETION_CHUNK = "gen_ai.content.completion.chunk"
GEN_AI_REQUEST_REASONING_EFFORT = "gen_ai.request.reasoning_effort"
GEN_AI_USAGE_REASONING_TOKENS = "gen_ai.usage.reasoning_tokens"
GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve backward compatibility for the renamed reasoning constant.

Dropping SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS can break consumers importing that symbol. Keep a deprecated alias pointing to the new key to avoid runtime import breaks.

Proposed compatibility patch
 class SpanAttributes:
@@
     GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens"
+    # Backward-compatible alias (deprecated): remove in next major version
+    GEN_AI_USAGE_REASONING_TOKENS = GEN_AI_USAGE_REASONING_OUTPUT_TOKENS
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens"
GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens"
# Backward-compatible alias (deprecated): remove in next major version
GEN_AI_USAGE_REASONING_TOKENS = GEN_AI_USAGE_REASONING_OUTPUT_TOKENS
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py`
at line 82, Add a deprecated alias for the renamed constant so existing imports
still work: in opentelemetry.semconv_ai.__init__.py (next to
GEN_AI_USAGE_REASONING_OUTPUT_TOKENS) define the old symbol
SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS (or simply
GEN_AI_USAGE_REASONING_TOKENS if constants are module-level) and assign it the
same string value as GEN_AI_USAGE_REASONING_OUTPUT_TOKENS, include a brief
comment noting it is deprecated and optionally emit a DeprecationWarning when
accessed to guide users to the new constant name.

@doronkopit5
Copy link
Copy Markdown
Member

Reviewed the change — direction is right (aligning attribute names with the OTel GenAI spec). A few things worth addressing:

1. Two sources of truth for the cache constants

The same constants exist in both places, resolving to the same string today:

  • opentelemetry.semconv._incubating.attributes.gen_ai_attributes.GEN_AI_USAGE_CACHE_*_INPUT_TOKENS (upstream OTel)
  • opentelemetry.semconv_ai.SpanAttributes.GEN_AI_USAGE_CACHE_*_INPUT_TOKENS (this repo)

The PR mixes both: _hooks.py and manual.py use GenAIAttributes (upstream), while pre-existing code in chat_wrappers.py, responses_wrappers.py, anthropic, langchain uses SpanAttributes from semconv_ai. Both work, but it's nicer to have one source of truth.

I'd lean toward standardizing on the upstream OTel-native GenAIAttributes here — that's the spec source — but happy to leave the call to you. Whatever you pick, please apply it consistently across the touched files so callers don't have to know which path leads where.

2. traceloop-sdk needs an explicit upstream semconv floor

manual.py now uses GenAIAttributes.GEN_AI_USAGE_CACHE_*_INPUT_TOKENS, but packages/traceloop-sdk/pyproject.toml doesn't pin opentelemetry-semantic-conventions. With <0.62b1 installed transitively, that attribute either doesn't exist (→ AttributeError) or has the old underscore string (silently wrong telemetry).

Please add opentelemetry-semantic-conventions>=0.62b1 to packages/traceloop-sdk/pyproject.toml (same bump you did for openai-agents).

3. Removing GEN_AI_USAGE_REASONING_TOKENS is a breaking change with no alias

Deleting the class attribute from SpanAttributes will break any downstream code importing SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS with AttributeError. The repo already has prior art for graceful renames (see the "Legacy LLM_* aliases" block in semconv_ai/__init__.py). Suggest keeping a deprecation alias:

GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens"
# Deprecated alias; use GEN_AI_USAGE_REASONING_OUTPUT_TOKENS
GEN_AI_USAGE_REASONING_TOKENS = GEN_AI_USAGE_REASONING_OUTPUT_TOKENS

Also note that the string value changes too (gen_ai.usage.reasoning_tokensgen_ai.usage.reasoning.output_tokens), which is a wire-level break for dashboards. Worth a line in MIGRATION.md under the "Changed string values" section.

5. New test assertions don't actually verify the feature

assert response_span.attributes[GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS] >= 0
assert response_span.attributes[SpanAttributes.GEN_AI_USAGE_REASONING_OUTPUT_TOKENS] >= 0

Both >= 0 assertions are trivially true once the attribute is recorded. The cache test is intentionally structured (first call writes cache → clear() → second call reads) precisely to exercise caching — so it should assert > 0 to prove the second call actually got a cache hit. Same for o3-mini reasoning tokens, which per the docstring always reports > 0. Tightening to > 0 makes these tests actually catch regressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Bug Report: openai-agents instrumentation doesn't record cached input tokens

3 participants