Skip to content

feat(openai): otel new messages attributes#3804

Open
galzilber wants to merge 7 commits intomainfrom
gz/otel-new-messages-attributes
Open

feat(openai): otel new messages attributes#3804
galzilber wants to merge 7 commits intomainfrom
gz/otel-new-messages-attributes

Conversation

@galzilber
Copy link
Contributor

@galzilber galzilber commented Mar 15, 2026

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added configuration option to enable structured message attributes for enhanced AI interaction tracking
    • Support for detailed input and output messages with comprehensive message parts including text, tool calls, file references, reasoning, and more
    • New telemetry attributes available for structured message-based data collection
    • Full backward compatibility maintained with legacy attribute behavior

@coderabbitai
Copy link

coderabbitai bot commented Mar 15, 2026

📝 Walkthrough

Walkthrough

This pull request adds support for structured message attributes in OpenAI instrumentation. A new configuration option use_messages_attributes enables recording chat and completion messages as gen_ai.input.messages and gen_ai.output.messages span attributes, with comprehensive TypedDict schemas defining message parts and structures. Legacy attribute behavior is preserved for backward compatibility.

Changes

Cohort / File(s) Summary
Configuration & Initialization
shared/config.py, __init__.py
Added use_messages_attributes config parameter and wired it through the OpenAIInstrumentor constructor.
Message Attribute Handlers
shared/chat_wrappers.py, shared/completion_wrappers.py
Introduced config-driven routing to new _set_input_messages() and _set_output_messages() functions that transform messages into structured formats, while preserving legacy behavior via _legacy_set_prompts() and _legacy_set_completions().
Event Models
shared/event_models.py
Renamed _FunctionToolCall.function_name field to name for alignment with OpenAI API conventions.
Response Processing
v1/responses_wrappers.py
Refactored process_content_block() to use dynamic block type from input instead of hardcoded type values for text, image, and file blocks.
Semantic Conventions
../opentelemetry-semantic-conventions-ai/semconv_ai/__init__.py, ../opentelemetry-semantic-conventions-ai/semconv_ai/schemas.py
Added GEN_AI_INPUT_MESSAGES and GEN_AI_OUTPUT_MESSAGES span attributes; updated reasoning-related attribute names; introduced comprehensive TypedDict schemas for message parts (TextPart, ToolCallRequestPart, ToolCallResponsePart, BlobPart, FilePart, UriPart, ReasoningPart, GenericPart) and composite message structures.
Testing
tests/conftest.py, tests/traces/test_chat.py, tests/traces/test_completions.py, tests/traces/cassettes/test_chat/test_chat_with_messages_attributes.yaml, tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml
Added fixture instrument_with_messages_attributes, test functions validating structured message input/output shapes, and HTTP interaction cassettes for deterministic test replay.
Dependencies
pyproject.toml
Updated opentelemetry-semantic-conventions to >=0.60b1 and added local editable source for opentelemetry-semantic-conventions-ai.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Config
    participant Instrumentation
    participant OpenAI API
    participant Span

    Client->>Instrumentation: Initialize with use_messages_attributes=True
    Instrumentation->>Config: Set use_messages_attributes
    Client->>OpenAI API: Call chat/completion
    OpenAI API-->>Instrumentation: Return response
    Instrumentation->>Config: Check use_messages_attributes
    
    alt use_messages_attributes enabled
        Instrumentation->>Instrumentation: _set_input_messages()
        Instrumentation->>Span: Set gen_ai.input.messages (structured)
        Instrumentation->>Instrumentation: _set_output_messages()
        Instrumentation->>Span: Set gen_ai.output.messages (structured)
    else use_messages_attributes disabled
        Instrumentation->>Instrumentation: _legacy_set_prompts()
        Instrumentation->>Span: Set individual prompt attributes
        Instrumentation->>Instrumentation: _legacy_set_completions()
        Instrumentation->>Span: Set individual completion attributes
    end
    
    Span-->>Client: Trace with attributes
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop! Hop! Messages structured bright,
From config flags to attributes so right,
TypedDicts dance with parts galore,
Tool calls and text forevermore,
Legacy paths still holding tight! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: introducing new message attributes for OpenAI OpenTelemetry instrumentation, which aligns with the extensive changes across chat/completion wrappers, config, and schema files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch gz/otel-new-messages-attributes
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@galzilber galzilber changed the title Gz/otel new messages attributes feat(openai): otel new messages attributes Mar 15, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)

1534-1553: ⚠️ Potential issue | 🔴 Critical

Fix semantic conventions inconsistency: attribute defined with wrong prefix.

The test fails because it expects "gen_ai.request.reasoning_effort" but the instrumentation sets "llm.request.reasoning_effort". The semantic conventions file (line 107) defines LLM_REQUEST_REASONING_EFFORT = "llm.request.reasoning_effort", which is inconsistent with the naming pattern used elsewhere:

  • Request attributes: gen_ai.request.reasoning_summary, gen_ai.request.model, gen_ai.request.temperature
  • Response attributes: gen_ai.response.reasoning_effort

The definition on line 107 should be LLM_REQUEST_REASONING_EFFORT = "gen_ai.request.reasoning_effort" to match the consistent pattern. This single fix will resolve the KeyError in the test without modifying test expectations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py`
around lines 1534 - 1553, Update the semantic convention constant
LLM_REQUEST_REASONING_EFFORT so its value uses the gen_ai.request prefix instead
of llm.request; specifically change the constant (currently defined as
"llm.request.reasoning_effort") to "gen_ai.request.reasoning_effort" in the
semantic conventions module so the instrumentation (which reads
LLM_REQUEST_REASONING_EFFORT) matches the other request attributes and the test
expecting "gen_ai.request.reasoning_effort" passes.
🧹 Nitpick comments (3)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)

83-88: Remove or uncomment the dead code.

The commented-out assertion for LLM_OPENAI_RESPONSE_SYSTEM_FINGERPRINT should either be removed if it's not applicable to this test, or uncommented if it should be validated.

Suggested fix
-    # assert (
-    #     open_ai_span.attributes.get(
-    #         SpanAttributes.LLM_OPENAI_RESPONSE_SYSTEM_FINGERPRINT
-    #     )
-    #     == "fp_2b778c6b35"
-    # )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py`
around lines 83 - 88, The commented assertion for
SpanAttributes.LLM_OPENAI_RESPONSE_SYSTEM_FINGERPRINT in the test using
open_ai_span is dead code—either remove these commented lines entirely or
restore the assertion by uncommenting and ensuring the expected fingerprint
value ("fp_2b778c6b35") is correct; update the test to assert
open_ai_span.attributes.get(SpanAttributes.LLM_OPENAI_RESPONSE_SYSTEM_FINGERPRINT)
== "fp_2b778c6b35" if the fingerprint should be validated, otherwise delete the
commented block to keep the test clean.
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml (1)

79-90: Consider scrubbing sensitive identifiers from cassette.

The cassette contains potentially sensitive information that should be filtered:

  • openai-organization: Contains account-specific identifier
  • openai-project: Contains project identifier
  • set-cookie: Contains session token

As per coding guidelines: "Ensure VCR cassettes never contain secrets or PII; scrub them using VCR filters."

Example VCR filter configuration
`@pytest.fixture`(scope="module")
def vcr_config():
    return {
        "filter_headers": [
            ("openai-organization", "REDACTED"),
            ("openai-project", "REDACTED"),
            ("set-cookie", "REDACTED"),
        ],
    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml`
around lines 79 - 90, The cassette contains sensitive headers
(openai-organization, openai-project, set-cookie); add a pytest VCR filter
fixture (vcr_config) that filters these headers to "REDACTED" and then re-record
or regenerate the cassette to remove the real values; look for the test fixture
named vcr_config or add one in the tests suite used by the traces cassettes and
ensure it includes filter_headers for ("openai-organization","REDACTED"),
("openai-project","REDACTED"), and ("set-cookie","REDACTED") so the cassette
file test_completions/test_completion_with_messages_attributes.yaml no longer
contains secrets.
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py (1)

198-213: Defaulting finish_reason to "stop" may mask actual API values.

Line 207 defaults finish_reason to "stop" when choice.get("finish_reason") is falsy. However, the API may return "length", "content_filter", or other values. Using or "stop" could mask None scenarios but also changes the semantic meaning when the actual value should be preserved.

Consider using a more explicit sentinel or preserving the original value:

♻️ Alternative approach
-            "finish_reason": choice.get("finish_reason") or "stop",
+            "finish_reason": choice.get("finish_reason", "unknown"),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py`
around lines 198 - 213, The code in _set_output_messages currently forces
finish_reason to "stop" via choice.get("finish_reason") or "stop", which can
mask real API values; change this to preserve the actual value or explicitly use
a sentinel (e.g., None or "unknown") when the key is absent or null. Update the
logic in _set_output_messages to read finish_reason using
choice.get("finish_reason", None) (or an explicit existence check) and include
that value in the messages payload so real values like "length" or
"content_filter" are retained instead of being overwritten with "stop". Ensure
SpanAttributes.GEN_AI_OUTPUT_MESSAGES still receives JSON-serializable data
(None will become null) and adjust any downstream consumers if they relied on
the old "stop" default.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py`:
- Around line 552-574: In _set_output_messages, the code incorrectly reads
finish_reason from message; change it to read the finish reason from the choice
object (use choice.get("finish_reason") with the existing fallback to "stop") so
the assistant message entries use the real finish_reason from the choice; update
the construction of the messages.append(...) entry in _set_output_messages to
reference choice.get("finish_reason") instead of message.get("finish_reason").
- Around line 561-568: The loop treats each tool_call as an object but
_parse_tool_calls returns dict/TypedDicts; change attribute access to dict
access: read name via tool_call.get("function", {}).get("name"), id via
tool_call.get("id"), and arguments via tool_call.get("function",
{}).get("arguments") before appending to parts so missing keys are handled
safely; update the for-loop that builds parts (references: _parse_tool_calls,
tool_calls, parts) accordingly.
- Around line 480-487: In _set_input_messages, tool_call items returned from
_parse_tool_calls are dicts, but the code uses attribute access
(tool_call.function.get("name") and tool_call.id); change these to dict-style
lookups and guard against missing keys — e.g., extract function =
tool_call.get("function", {}) and use function.get("name") and
function.get("arguments"), and use tool_call.get("id") for the id when building
the parts entry so we don't raise attribute errors.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py`:
- Around line 49-84: The test test_completion_with_messages_attributes is
asserting the wrong finish_reason for the recorded cassette; inspect the
output_messages extracted from open_ai_span.attributes in that test and update
the assertion on output_messages[0]["finish_reason"] (currently asserting
"stop") to assert "length" to match the cassette/API response.

In
`@packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/schemas.py`:
- Around line 49-51: The classes like FilePart(FilePartBase, dict) mix TypedDict
with dict inheritance which violates PEP 589; remove TypedDict from the runtime
inheritance chain and split type definitions from runtime implementations: keep
the TypedDict variants (e.g., FilePartBase/FilePart) only as typing constructs
and create separate runtime dict subclasses or use Mapping/Protocol for behavior
(e.g., introduce FilePartRuntime or use dict subclass for runtime behavior and
leave FilePart as TypedDict), then update all similar patterns in this module
(the classes at the FilePart declaration and the other occurrences around the
blocks at lines referenced: the classes following the same pattern at the
positions after FilePart) so static typing and runtime behavior are separated
consistently. Ensure imports/annotations reference the TypedDict names for
typing and the new runtime classes (or Mapping/Protocol) where actual dict-like
instances are created or manipulated.

---

Outside diff comments:
In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py`:
- Around line 1534-1553: Update the semantic convention constant
LLM_REQUEST_REASONING_EFFORT so its value uses the gen_ai.request prefix instead
of llm.request; specifically change the constant (currently defined as
"llm.request.reasoning_effort") to "gen_ai.request.reasoning_effort" in the
semantic conventions module so the instrumentation (which reads
LLM_REQUEST_REASONING_EFFORT) matches the other request attributes and the test
expecting "gen_ai.request.reasoning_effort" passes.

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py`:
- Around line 198-213: The code in _set_output_messages currently forces
finish_reason to "stop" via choice.get("finish_reason") or "stop", which can
mask real API values; change this to preserve the actual value or explicitly use
a sentinel (e.g., None or "unknown") when the key is absent or null. Update the
logic in _set_output_messages to read finish_reason using
choice.get("finish_reason", None) (or an explicit existence check) and include
that value in the messages payload so real values like "length" or
"content_filter" are retained instead of being overwritten with "stop". Ensure
SpanAttributes.GEN_AI_OUTPUT_MESSAGES still receives JSON-serializable data
(None will become null) and adjust any downstream consumers if they relied on
the old "stop" default.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml`:
- Around line 79-90: The cassette contains sensitive headers
(openai-organization, openai-project, set-cookie); add a pytest VCR filter
fixture (vcr_config) that filters these headers to "REDACTED" and then re-record
or regenerate the cassette to remove the real values; look for the test fixture
named vcr_config or add one in the tests suite used by the traces cassettes and
ensure it includes filter_headers for ("openai-organization","REDACTED"),
("openai-project","REDACTED"), and ("set-cookie","REDACTED") so the cassette
file test_completions/test_completion_with_messages_attributes.yaml no longer
contains secrets.

In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py`:
- Around line 83-88: The commented assertion for
SpanAttributes.LLM_OPENAI_RESPONSE_SYSTEM_FINGERPRINT in the test using
open_ai_span is dead code—either remove these commented lines entirely or
restore the assertion by uncommenting and ensuring the expected fingerprint
value ("fp_2b778c6b35") is correct; update the test to assert
open_ai_span.attributes.get(SpanAttributes.LLM_OPENAI_RESPONSE_SYSTEM_FINGERPRINT)
== "fp_2b778c6b35" if the fingerprint should be validated, otherwise delete the
commented block to keep the test clean.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 61945727-297b-446a-9462-8a7937765f3d

📥 Commits

Reviewing files that changed from the base of the PR and between 93786d9 and 34b0f9c.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-openai/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/__init__.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/config.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
  • packages/opentelemetry-instrumentation-openai/pyproject.toml
  • packages/opentelemetry-instrumentation-openai/tests/conftest.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_messages_attributes.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/schemas.py

Comment on lines +480 to +487
tool_calls = _parse_tool_calls(msg.get("tool_calls")) or []
for tool_call in tool_calls:
parts.append({
"type": "tool_call_request",
"name": tool_call.function.get("name"),
"id": tool_call.id,
"arguments": tool_call.function.get("arguments"),
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same dict access issue in _set_input_messages.

Lines 484-486 access tool_call.function.get("name") and tool_call.id using attribute access on what should be dict-like objects from _parse_tool_calls.

🐛 Proposed fix using dict access
             for tool_call in tool_calls:
                 parts.append({
                     "type": "tool_call_request",
-                    "name": tool_call.function.get("name"),
-                    "id": tool_call.id,
-                    "arguments": tool_call.function.get("arguments"),
+                    "name": tool_call.get("function", {}).get("name"),
+                    "id": tool_call.get("id"),
+                    "arguments": tool_call.get("function", {}).get("arguments"),
                 })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
tool_calls = _parse_tool_calls(msg.get("tool_calls")) or []
for tool_call in tool_calls:
parts.append({
"type": "tool_call_request",
"name": tool_call.function.get("name"),
"id": tool_call.id,
"arguments": tool_call.function.get("arguments"),
})
tool_calls = _parse_tool_calls(msg.get("tool_calls")) or []
for tool_call in tool_calls:
parts.append({
"type": "tool_call_request",
"name": tool_call.get("function", {}).get("name"),
"id": tool_call.get("id"),
"arguments": tool_call.get("function", {}).get("arguments"),
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py`
around lines 480 - 487, In _set_input_messages, tool_call items returned from
_parse_tool_calls are dicts, but the code uses attribute access
(tool_call.function.get("name") and tool_call.id); change these to dict-style
lookups and guard against missing keys — e.g., extract function =
tool_call.get("function", {}) and use function.get("name") and
function.get("arguments"), and use tool_call.get("id") for the id when building
the parts entry so we don't raise attribute errors.

Comment on lines +561 to +568
tool_calls = _parse_tool_calls(message.get("tool_calls")) or []
for tool_call in tool_calls:
parts.append({
"type": "tool_call_request",
"name": tool_call.function.get("name"),
"id": tool_call.id,
"arguments": tool_call.function.get("arguments"),
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Accessing tool_call as object but _parse_tool_calls returns dicts.

Lines 565-567 access tool_call.function.get("name") and tool_call.id using attribute access, but _parse_tool_calls returns either dicts or ToolCall TypedDicts. TypedDicts are accessed like dicts, not objects.

🐛 Proposed fix using dict access
         for tool_call in tool_calls:
             parts.append({
                 "type": "tool_call_request",
-                "name": tool_call.function.get("name"),
-                "id": tool_call.id,
-                "arguments": tool_call.function.get("arguments"),
+                "name": tool_call.get("function", {}).get("name"),
+                "id": tool_call.get("id"),
+                "arguments": tool_call.get("function", {}).get("arguments"),
             })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py`
around lines 561 - 568, The loop treats each tool_call as an object but
_parse_tool_calls returns dict/TypedDicts; change attribute access to dict
access: read name via tool_call.get("function", {}).get("name"), id via
tool_call.get("id"), and arguments via tool_call.get("function",
{}).get("arguments") before appending to parts so missing keys are handled
safely; update the for-loop that builds parts (references: _parse_tool_calls,
tool_calls, parts) accordingly.

Comment on lines +49 to +51
class FilePart(FilePartBase, dict):
# Allows for additional properties.
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Python TypedDict multiple inheritance with dict class

💡 Result:

You can’t create a TypedDict that also inherits from dict (or a dict subclass). TypedDict “inheritance” is only for combining other TypedDicts (to reuse/extend key schemas), and a TypedDict value is just a plain dict at runtime—it does not define a new runtime type you can meaningfully mix into a dict class hierarchy. In fact, the TypedDict class-based form is specified as using TypedDict as the sole base class (aside from inheriting from other TypedDicts). [2], [1]

What to do instead (common patterns):

  • If a function just needs “read-only dict-like access”, accept Mapping[str, object] (all TypedDicts are compatible with that). [1], [3]
  • If you truly need a real dict subclass (custom methods/behavior), define a normal class X(dict[...]) for runtime behavior, and keep the schema typing separate (e.g., via a TypedDict annotation, a Protocol, or conversions/casts at the boundary). TypedDicts are generally not subtypes of dict[...] because dict allows unsafe mutations like arbitrary key insertion / clear(). [1], [3]

Sources
[1] Typing spec: “Typed dictionaries (TypedDict)” (multiple inheritance; subtyping vs dict/Mapping)
[2] PEP 589: TypedDict definition rules (TypedDict as sole base; not a runtime dict subtype/class)
[3] Mypy docs: TypedDict (runtime is normal dict; not compatible with dict[...], but compatible with Mapping[str, object])


🏁 Script executed:

cat -n packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/schemas.py | head -100

Repository: traceloop/openllmetry

Length of output: 4226


Remove TypedDict from the inheritance chain and separate schema definition from runtime behavior.

This file violates PEP 589 by inheriting from both TypedDict and dict. Classes like class FilePart(FilePartBase, dict) cannot combine TypedDict (a static typing construct) with dict inheritance. TypedDict must be the sole base class (aside from other TypedDicts).

Instead, use one of these patterns:

  • Define a regular dict subclass for runtime and use TypedDict only for type hints/annotations
  • Accept Mapping[str, object] for read-only dict-like access
  • Use a Protocol for structural typing if custom methods are needed

Apply this throughout the file where the pattern appears (lines 49, 62, 75, 92, etc.).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/schemas.py`
around lines 49 - 51, The classes like FilePart(FilePartBase, dict) mix
TypedDict with dict inheritance which violates PEP 589; remove TypedDict from
the runtime inheritance chain and split type definitions from runtime
implementations: keep the TypedDict variants (e.g., FilePartBase/FilePart) only
as typing constructs and create separate runtime dict subclasses or use
Mapping/Protocol for behavior (e.g., introduce FilePartRuntime or use dict
subclass for runtime behavior and leave FilePart as TypedDict), then update all
similar patterns in this module (the classes at the FilePart declaration and the
other occurrences around the blocks at lines referenced: the classes following
the same pattern at the positions after FilePart) so static typing and runtime
behavior are separated consistently. Ensure imports/annotations reference the
TypedDict names for typing and the new runtime classes (or Mapping/Protocol)
where actual dict-like instances are created or manipulated.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (1)

456-456: Consider renaming id to avoid shadowing the built-in.

Using id as a variable name shadows Python's built-in id() function. While it won't cause issues here, it's a minor code smell.

♻️ Suggested rename
         if role == "tool":
-            id = msg.get("tool_call_id")
+            tool_call_id = msg.get("tool_call_id")
             content = msg.get("content")
             attr_messages.append({
                 "role": role,
-                "parts": [{"type": "tool_call_response", "id": id, "response": content}],
+                "parts": [{"type": "tool_call_response", "id": tool_call_id, "response": content}],
             })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py`
at line 456, The local variable named id shadows Python's built-in id(); rename
it (for example to tool_call_id or tool_id) where it's assigned from
msg.get("tool_call_id") and update every subsequent use in the same scope (e.g.,
in the function/method in chat_wrappers.py that contains the line
msg.get("tool_call_id")) to the new name to avoid the shadowing; ensure any
references, f-strings, returns or dict assignments in that function are updated
accordingly and run tests/linting to confirm no remaining references to the old
name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py`:
- Around line 556-574: The code sets parts = content when content is not a
string, which can be None and causes AttributeError on parts.append; change the
parts initialization in the choices loop to ensure parts is always a list: if
content is a string create the text dict as before, if content is a list use a
shallow copy (e.g., list(content)) to ensure mutability, otherwise set parts =
[] so subsequent calls to parts.append (when using _parse_tool_calls or adding
tool_call dicts) won't fail; keep the rest of the loop and the
_set_span_attribute call unchanged.
- Around line 477-491: The code sets parts = content when content is not a
string which can be None and later calls parts.append; change the initialization
in the assistant branch so parts is always a list: if content is a string build
[{"content": content, "type": "text"}], else if content is already a list keep
it, otherwise initialize to an empty list; then proceed to extend with
tool_calls as done using tool_call and append into parts before adding the entry
to attr_messages. This ensures parts, msg, tool_call, and attr_messages logic in
the role == "assistant" branch never attempts to call append on None.

In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py`:
- Around line 64-116: The test cassette contains sensitive headers; update the
VCR cassette/filter config used by tests (the setup that applies to
test_chat_with_messages_attributes / instrument_with_messages_attributes) to
redact the openai-organization and openai-project headers and to scrub any
set-cookie values (add them to vcr.filter_headers or equivalent redaction list),
then re-record or sanitize the cassette so those headers are replaced with
placeholders; also either delete the commented-out fingerprint assertions
referencing SpanAttributes.LLM_OPENAI_RESPONSE_SYSTEM_FINGERPRINT or add a short
comment explaining why they are disabled (e.g., non-deterministic fingerprint)
next to the commented block.

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py`:
- Line 456: The local variable named id shadows Python's built-in id(); rename
it (for example to tool_call_id or tool_id) where it's assigned from
msg.get("tool_call_id") and update every subsequent use in the same scope (e.g.,
in the function/method in chat_wrappers.py that contains the line
msg.get("tool_call_id")) to the new name to avoid the shadowing; ensure any
references, f-strings, returns or dict assignments in that function are updated
accordingly and run tests/linting to confirm no remaining references to the old
name.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 50fec224-7e3f-4d9f-8a9b-9941305a88e2

📥 Commits

Reviewing files that changed from the base of the PR and between 34b0f9c and 709a572.

📒 Files selected for processing (4)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py

Comment on lines +477 to +491
elif role == "assistant":
content = msg.get("content")
parts = [{"content": content, "type": "text"}] if isinstance(content, str) else content
tool_calls = _parse_tool_calls(msg.get("tool_calls")) or []
for tool_call in tool_calls:
parts.append({
"type": "tool_call_request",
"name": tool_call.function.get("name"),
"id": tool_call.id,
"arguments": tool_call.function.get("arguments"),
})
attr_messages.append({
"role": "assistant",
"parts": parts,
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: parts can be None when content is not a string, causing AttributeError on append.

When content is None (not a string), line 479 sets parts = content which is None. If tool_calls exist, line 482 calls parts.append() which will raise AttributeError: 'NoneType' object has no attribute 'append'.

🐛 Proposed fix to initialize parts safely
         elif role == "assistant":
             content = msg.get("content")
-            parts = [{"content": content, "type": "text"}] if isinstance(content, str) else content
+            parts = [{"content": content, "type": "text"}] if isinstance(content, str) else (content or [])
             tool_calls = _parse_tool_calls(msg.get("tool_calls")) or []
             for tool_call in tool_calls:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py`
around lines 477 - 491, The code sets parts = content when content is not a
string which can be None and later calls parts.append; change the initialization
in the assistant branch so parts is always a list: if content is a string build
[{"content": content, "type": "text"}], else if content is already a list keep
it, otherwise initialize to an empty list; then proceed to extend with
tool_calls as done using tool_call and append into parts before adding the entry
to attr_messages. This ensures parts, msg, tool_call, and attr_messages logic in
the role == "assistant" branch never attempts to call append on None.

Comment on lines +556 to +574
messages = []
for choice in choices:
message = choice.get("message")
content = message.get("content")
parts = [{"content": content, "type": "text"}] if isinstance(content, str) else content
tool_calls = _parse_tool_calls(message.get("tool_calls")) or []
for tool_call in tool_calls:
parts.append({
"type": "tool_call_request",
"name": tool_call.function.get("name"),
"id": tool_call.id,
"arguments": tool_call.function.get("arguments"),
})
messages.append({
"role": "assistant",
"parts": parts,
"finish_reason": choice.get("finish_reason") or "stop",
})
_set_span_attribute(span, SpanAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps(messages))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: parts can be None when content is not a string, causing AttributeError on append.

Same issue as in _set_input_messages. When content is None, line 560 sets parts = content which is None. If tool_calls exist, line 563 calls parts.append() which will fail.

🐛 Proposed fix to initialize parts safely
     for choice in choices:
         message = choice.get("message")
         content = message.get("content")
-        parts = [{"content": content, "type": "text"}] if isinstance(content, str) else content
+        parts = [{"content": content, "type": "text"}] if isinstance(content, str) else (content or [])
         tool_calls = _parse_tool_calls(message.get("tool_calls")) or []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py`
around lines 556 - 574, The code sets parts = content when content is not a
string, which can be None and causes AttributeError on parts.append; change the
parts initialization in the choices loop to ensure parts is always a list: if
content is a string create the text dict as before, if content is a list use a
shallow copy (e.g., list(content)) to ensure mutability, otherwise set parts =
[] so subsequent calls to parts.append (when using _parse_tool_calls or adding
tool_call dicts) won't fail; keep the rest of the loop and the
_set_span_attribute call unchanged.

Comment on lines +64 to +116
@pytest.mark.vcr
def test_chat_with_messages_attributes(
instrument_with_messages_attributes, span_exporter, log_exporter, openai_client
):
openai_client.chat.completions.create(
model="gpt-5-nano",
messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}],
)

spans = span_exporter.get_finished_spans()

assert [span.name for span in spans] == [
"openai.chat",
]
open_ai_span = spans[0]
assert (
open_ai_span.attributes.get(SpanAttributes.LLM_OPENAI_API_BASE)
== "https://api.openai.com/v1/"
)
# assert (
# open_ai_span.attributes.get(
# SpanAttributes.LLM_OPENAI_RESPONSE_SYSTEM_FINGERPRINT
# )
# == "fp_2b778c6b35"
# )
assert open_ai_span.attributes.get(
SpanAttributes.LLM_IS_STREAMING) is False
assert (
open_ai_span.attributes.get("gen_ai.response.id")
== "chatcmpl-D9X7ymcBweNWeadgkGyG69kVYY16t"
)

logs = log_exporter.get_finished_logs()
assert (
len(logs) == 0
), "Assert that it doesn't emit logs when use_legacy_attributes is True"
# Validate input messages shape
input_messages = json.loads(open_ai_span.attributes.get("gen_ai.input.messages"))
assert len(input_messages) == 1
assert input_messages[0]["role"] == "user"
assert len(input_messages[0]["parts"]) == 1
assert input_messages[0]["parts"][0]["type"] == "text"
assert input_messages[0]["parts"][0]["content"] == "Tell me a joke about opentelemetry"

# Validate output messages shape
output_messages = json.loads(open_ai_span.attributes.get("gen_ai.output.messages"))
assert len(output_messages) == 1
assert output_messages[0]["role"] == "assistant"
assert output_messages[0]["finish_reason"] == "stop"
assert len(output_messages[0]["parts"]) >= 1
assert output_messages[0]["parts"][0]["type"] == "text"
assert "content" in output_messages[0]["parts"][0]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify VCR cassette exists and check for potential secrets
fd "test_chat_with_messages_attributes" --type f --extension yaml
fd "test_chat_with_messages_attributes" --type f --extension yaml --exec grep -l "api-key\|authorization\|Bearer" {} \;

Repository: traceloop/openllmetry

Length of output: 183


🏁 Script executed:

# Check the cassette content for potential secrets
head -100 "packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_messages_attributes.yaml"

Repository: traceloop/openllmetry

Length of output: 3189


🏁 Script executed:

# Check for VCR configuration and filters in conftest or test file
fd "conftest.py" packages/opentelemetry-instrumentation-openai/tests --exec grep -n "vcr\|filter" {} \;

Repository: traceloop/openllmetry

Length of output: 295


🏁 Script executed:

# Check the test file itself for VCR setup or filters
grep -n "vcr\|filter_headers\|before_record\|VCR" "packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py" | head -30

Repository: traceloop/openllmetry

Length of output: 706


Add VCR filters for additional sensitive headers in the cassette.

The cassette file exists and API key authentication is properly filtered, but the cassette contains potentially sensitive OpenAI account identifiers and session cookies that should be scrubbed:

  • openai-organization and openai-project headers contain account-specific IDs
  • set-cookie headers contain session tokens

Configure additional VCR filters to redact these headers per the coding guidelines that require cassettes to be scrubbed of all secrets and PII.

Separately, consider either removing the commented-out assertions at lines 83-88 or adding an explanation for why they're disabled.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py`
around lines 64 - 116, The test cassette contains sensitive headers; update the
VCR cassette/filter config used by tests (the setup that applies to
test_chat_with_messages_attributes / instrument_with_messages_attributes) to
redact the openai-organization and openai-project headers and to scrub any
set-cookie values (add them to vcr.filter_headers or equivalent redaction list),
then re-record or sanitize the cassette so those headers are replaced with
placeholders; also either delete the commented-out fingerprint assertions
referencing SpanAttributes.LLM_OPENAI_RESPONSE_SYSTEM_FINGERPRINT or add a short
comment explaining why they are disabled (e.g., non-deterministic fingerprint)
next to the commented block.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)

101-109: Use semantic-convention constants and assert attribute presence before parsing.

Using SpanAttributes constants avoids key drift, and explicit non-None checks produce clearer failures than json.loads(None).

Suggested patch
-    input_messages = json.loads(open_ai_span.attributes.get("gen_ai.input.messages"))
+    input_messages_raw = open_ai_span.attributes.get(SpanAttributes.GEN_AI_INPUT_MESSAGES)
+    assert input_messages_raw is not None
+    input_messages = json.loads(input_messages_raw)
@@
-    output_messages = json.loads(open_ai_span.attributes.get("gen_ai.output.messages"))
+    output_messages_raw = open_ai_span.attributes.get(SpanAttributes.GEN_AI_OUTPUT_MESSAGES)
+    assert output_messages_raw is not None
+    output_messages = json.loads(output_messages_raw)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py`
around lines 101 - 109, Replace literal attribute keys with the
semantic-convention constants (e.g. SpanAttributes.GEN_AI_INPUT_MESSAGES and
SpanAttributes.GEN_AI_OUTPUT_MESSAGES) and assert the attributes exist and are
not None on open_ai_span.attributes before calling json.loads; specifically,
check that open_ai_span.attributes.get(SpanAttributes.GEN_AI_INPUT_MESSAGES) is
not None and then parse it, and do the same for
SpanAttributes.GEN_AI_OUTPUT_MESSAGES, keeping the subsequent role/parts/content
assertions unchanged so failures are clear and robust.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py`:
- Around line 96-99: The assertion failure message is stale: replace the message
text that currently mentions use_legacy_attributes with one that references
instrument_with_messages_attributes; update the assertion around
log_exporter.get_finished_logs() / the len(logs) check in the test_chat.py test
so the description accurately states that no logs should be emitted when
instrument_with_messages_attributes is True (referencing the assertion around
logs = log_exporter.get_finished_logs()).

---

Nitpick comments:
In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py`:
- Around line 101-109: Replace literal attribute keys with the
semantic-convention constants (e.g. SpanAttributes.GEN_AI_INPUT_MESSAGES and
SpanAttributes.GEN_AI_OUTPUT_MESSAGES) and assert the attributes exist and are
not None on open_ai_span.attributes before calling json.loads; specifically,
check that open_ai_span.attributes.get(SpanAttributes.GEN_AI_INPUT_MESSAGES) is
not None and then parse it, and do the same for
SpanAttributes.GEN_AI_OUTPUT_MESSAGES, keeping the subsequent role/parts/content
assertions unchanged so failures are clear and robust.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1f4103a0-1e64-4035-a1e9-ed0e802ac253

📥 Commits

Reviewing files that changed from the base of the PR and between 709a572 and 8810586.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-openai/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/init.py

Comment on lines +96 to +99
logs = log_exporter.get_finished_logs()
assert (
len(logs) == 0
), "Assert that it doesn't emit logs when use_legacy_attributes is True"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix stale assertion message text.

The failure message references use_legacy_attributes, but this test runs with instrument_with_messages_attributes.

Suggested patch
-    ), "Assert that it doesn't emit logs when use_legacy_attributes is True"
+    ), "Assert that it doesn't emit logs when use_messages_attributes is enabled"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logs = log_exporter.get_finished_logs()
assert (
len(logs) == 0
), "Assert that it doesn't emit logs when use_legacy_attributes is True"
logs = log_exporter.get_finished_logs()
assert (
len(logs) == 0
), "Assert that it doesn't emit logs when use_messages_attributes is enabled"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py`
around lines 96 - 99, The assertion failure message is stale: replace the
message text that currently mentions use_legacy_attributes with one that
references instrument_with_messages_attributes; update the assertion around
log_exporter.get_finished_logs() / the len(logs) check in the test_chat.py test
so the description accurately states that no logs should be emitted when
instrument_with_messages_attributes is True (referencing the assertion around
logs = log_exporter.get_finished_logs()).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants