Skip to content

feat(anthropic): conform instrumentation to OTel GenAI semantic conventions#3808

Open
max-deygin-traceloop wants to merge 9 commits intomax/tlp-1925-python-sdk-otel-semantic-conventionfrom
max/tlp-1926-anthropic-instrumentation
Open

feat(anthropic): conform instrumentation to OTel GenAI semantic conventions#3808
max-deygin-traceloop wants to merge 9 commits intomax/tlp-1925-python-sdk-otel-semantic-conventionfrom
max/tlp-1926-anthropic-instrumentation

Conversation

@max-deygin-traceloop
Copy link

@max-deygin-traceloop max-deygin-traceloop commented Mar 16, 2026

What

Refactors the Anthropic instrumentation package to emit span attributes that comply with the
OpenTelemetry GenAI semantic conventions spec.

Changes

New attributes (replaces legacy flat gen_ai.prompt.* / gen_ai.completion.* keys):

  • gen_ai.input.messages — full message array as JSON
  • gen_ai.output.messages — assistant response as JSON
  • gen_ai.system_instructions — system prompt as standalone attribute
  • gen_ai.tool.definitions — tool definitions as JSON
  • gen_ai.provider.name — new required attribute ("anthropic")
  • gen_ai.operation.name — new required attribute ("chat" / "text_completion")
  • gen_ai.response.finish_reasons — now emitted as an array regardless of TRACELOOP_TRACE_CONTENT
    setting

Bug fixes:

  • gen_ai.system value corrected from "Anthropic" to "anthropic" (spec enum value)
  • gen_ai.request.max_tokens now correctly reads max_tokens (Messages API) with fallback to
    max_tokens_to_sample (legacy Completions API)
  • Tool-use blocks in input messages no longer duplicated across both content and tool_calls
  • Streaming tool_calls.arguments correctly handles both string (streaming delta accumulation) and dict
    inputs without double-encoding

Checklist

  • I have added tests that cover my changes.

  • If adding a new instrumentation or changing an existing one, I've added screenshots from some
    observability platform showing the change.

  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....

  • (If applicable) I have updated the documentation accordingly.

  • Validated locally against a running agent using ConsoleSpanExporter.

  • Confirmed all appear correctly on anthropic.chat spans:

gen_ai.system: "anthropic", gen_ai.provider.name: "anthropic", gen_ai.operation.name: "chat",
gen_ai.input.messages, gen_ai.output.messages,  gen_ai.response.finish_reasons

Summary by CodeRabbit

  • New Features

    • Emit structured JSON input/output messages, system instructions, tool definitions, and response finish reasons as span attributes.
    • Add provider and operation-name attributes; normalize streaming output and serialize tool_call arguments.
    • Better token fallback and consolidated multi-turn/tool-use handling.
  • Tests

    • Updated and expanded tests to validate structured messaging, tool extraction/handling, streaming behavior, and GenAI semantic conventions.
  • Chores

    • Package version bumped to 0.54.0

@coderabbitai
Copy link

coderabbitai bot commented Mar 16, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1d8a49ac-0697-4095-af27-4896505e88f6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Switches Anthropic instrumentation from legacy flat gen_ai.prompt/completion attributes to structured GenAI semantic attributes (gen_ai.input.messages, gen_ai.output.messages, gen_ai.system_instructions, gen_ai.tool.definitions, gen_ai.response.finish_reasons), uses GenAi enum values for provider/system/operation attributes, and updates tests and version.

Changes

Cohort / File(s) Summary
Instrumentation entrypoint
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py
Import GenAi enums; set GEN_AI_SYSTEM/provider to GenAiSystemValues.ANTHROPIC.value; add Gen_AI_PROVIDER_NAME and Gen_AI_OPERATION_NAME attributes (CHAT or TEXT_COMPLETION) for both sync and async wrappers.
Span attribute handling
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
Large refactor to emit GEN_AI_INPUT_MESSAGES/GEN_AI_OUTPUT_MESSAGES JSON arrays; add GEN_AI_SYSTEM_INSTRUCTIONS, GEN_AI_TOOL_DEFINITIONS, GEN_AI_RESPONSE_FINISH_REASONS; normalize roles/content, consolidate streaming and non-streaming paths, extract/serialize tool_calls (JSON-stringify arguments), handle base64 image content, and choose max_tokens precedence.
Tests — updated assertions to new model
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py, .../test_completion.py, .../test_messages.py, .../test_prompt_caching.py, .../test_structured_outputs.py, .../test_thinking.py
Replace legacy per-field prompt/completion checks with JSON parsing of GEN_AI_INPUT_MESSAGES / GEN_AI_OUTPUT_MESSAGES; adapt assertions for roles, content, tools, finish reasons; add json imports where needed.
New tests — semantic conventions
packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py
Add comprehensive GenAI semantic-conventions tests validating gen_ai.* attribute names/values, GenAISystem enum mappings, tool definitions/calls extraction, finish reasons, streaming behavior, max_tokens precedence, and span identity attributes for sync/async paths.
Version bump / packaging
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/version.py, packages/opentelemetry-instrumentation-anthropic/pyproject.toml
Package version bumped from 0.53.0 to 0.54.0.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐇✨ I stitched messages into tidy JSON threads,
System hints and tool calls tucked into beds.
From prompts to outputs the rabbit hops with glee,
Spans now speak GenAI—structured, clean, and free.
sniff-hop

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.48% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: conforming Anthropic instrumentation to OpenTelemetry GenAI semantic conventions, which aligns with the substantial refactoring across multiple files to emit compliant span attributes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch max/tlp-1926-anthropic-instrumentation
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CLAassistant
Copy link

CLAassistant commented Mar 16, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ max-deygin-traceloop
❌ max-deygin-servicenow
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py (1)

52-56: Consider handling missing role gracefully.

Using next(m for m in output_messages if m.get("role") == "thinking") will raise StopIteration if no thinking message exists. While this is likely intentional to fail the test, adding a default or explicit assertion message would improve test diagnostics:

thinking_msg = next((m for m in output_messages if m.get("role") == "thinking"), None)
assert thinking_msg is not None, "Expected thinking message not found in output"

This is a minor improvement for test clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py`
around lines 52 - 56, Replace the use of next(...) without a default when
locating the thinking message to avoid StopIteration; in the test reading
output_messages from
anthropic_span.attributes[GenAIAttributes.GEN_AI_OUTPUT_MESSAGES], use next((m
for m in output_messages if m.get("role") == "thinking"), None) to assign
thinking_msg and then assert thinking_msg is not None with a clear message
(e.g., "Expected thinking message not found in output") before checking
thinking_msg["content"]; do the same defensive pattern or clear assertions for
assistant_msg if desired.
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1)

92-97: Consider extracting repeated span-message parsing/assertion helpers.

The same parse/assert pattern appears in many tests, which increases maintenance cost for future semconv shape updates.

Refactor sketch
+def assert_io_messages(span, expected_input_role, expected_input_content, expected_output_role, expected_output_content):
+    input_messages = json.loads(span.attributes[GenAIAttributes.GEN_AI_INPUT_MESSAGES])
+    output_messages = json.loads(span.attributes[GenAIAttributes.GEN_AI_OUTPUT_MESSAGES])
+    assert input_messages[0]["role"] == expected_input_role
+    assert input_messages[0]["content"] == expected_input_content
+    assert output_messages[-1]["role"] == expected_output_role
+    assert output_messages[-1]["content"] == expected_output_content
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`
around lines 92 - 97, Extract a small test helper to centralize parsing and
assertions for span message attributes used across tests: create a utility
(e.g., parse_span_messages or assert_span_message) that accepts a span and an
attribute key (GenAIAttributes.GEN_AI_INPUT_MESSAGES or
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES) and returns the parsed JSON list or
asserts the expected content/role against a provided expected message; then
replace the repeated json.loads and assertions in tests referencing
anthropic_span and response with calls to this helper to reduce duplication and
make future semconv shape updates easier to change in one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py`:
- Around line 78-80: The GEN_AI_REQUEST_MAX_TOKENS attribute currently prefers
the legacy parameter by calling kwargs.get("max_tokens_to_sample") or
kwargs.get("max_tokens"); change this to prefer the Messages API by using
kwargs.get("max_tokens") or kwargs.get("max_tokens_to_sample") when calling
set_span_attribute (referencing set_span_attribute and
GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS and the kwargs keys) so max_tokens
takes precedence with max_tokens_to_sample as the fallback.

In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`:
- Around line 1567-1573: The test validates assistant history but misses an
explicit regression guard to ensure tool-use isn't duplicated between message
content and tool_calls; update the test around anthropic_span and
GenAIAttributes.GEN_AI_INPUT_MESSAGES to parse input_messages and the associated
tool_calls (from
anthropic_span.attributes[GenAIAttributes.GEN_AI_INPUT_MESSAGES] or nearby
attribute used in the test) and add an assertion that no assistant message's
content string(s) (e.g., msg1_content or input_messages[n]["content"]) appear
again in the corresponding tool_calls entries—i.e., explicitly assert that for
assistant-role messages (input_messages[i]["role"] == "assistant") the textual
content is not duplicated in the parsed tool_calls list.

---

Nitpick comments:
In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`:
- Around line 92-97: Extract a small test helper to centralize parsing and
assertions for span message attributes used across tests: create a utility
(e.g., parse_span_messages or assert_span_message) that accepts a span and an
attribute key (GenAIAttributes.GEN_AI_INPUT_MESSAGES or
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES) and returns the parsed JSON list or
asserts the expected content/role against a provided expected message; then
replace the repeated json.loads and assertions in tests referencing
anthropic_span and response with calls to this helper to reduce duplication and
make future semconv shape updates easier to change in one place.

In `@packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py`:
- Around line 52-56: Replace the use of next(...) without a default when
locating the thinking message to avoid StopIteration; in the test reading
output_messages from
anthropic_span.attributes[GenAIAttributes.GEN_AI_OUTPUT_MESSAGES], use next((m
for m in output_messages if m.get("role") == "thinking"), None) to assign
thinking_msg and then assert thinking_msg is not None with a clear message
(e.g., "Expected thinking message not found in output") before checking
thinking_msg["content"]; do the same defensive pattern or clear assertions for
assistant_msg if desired.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 55b633d3-e07f-4128-9a6c-7ee9773fb151

📥 Commits

Reviewing files that changed from the base of the PR and between 93786d9 and fb5e0e9.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-anthropic/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py

Comment on lines 78 to 80
set_span_attribute(
span, GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, kwargs.get("max_tokens_to_sample")
span, GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, kwargs.get("max_tokens_to_sample") or kwargs.get("max_tokens")
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

max_tokens precedence is inverted from PR description.

The PR description states: "gen_ai.request.max_tokens now reads max_tokens (Messages API) with fallback to max_tokens_to_sample (legacy Completions API)". However, the code uses max_tokens_to_sample or max_tokens, which prioritizes the legacy parameter.

For the Messages API (which is the primary/modern API), max_tokens should take precedence:

🔧 Proposed fix
     set_span_attribute(
-        span, GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, kwargs.get("max_tokens_to_sample") or kwargs.get("max_tokens")
+        span, GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, kwargs.get("max_tokens") or kwargs.get("max_tokens_to_sample")
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
set_span_attribute(
span, GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, kwargs.get("max_tokens_to_sample")
span, GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, kwargs.get("max_tokens_to_sample") or kwargs.get("max_tokens")
)
set_span_attribute(
span, GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, kwargs.get("max_tokens") or kwargs.get("max_tokens_to_sample")
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py`
around lines 78 - 80, The GEN_AI_REQUEST_MAX_TOKENS attribute currently prefers
the legacy parameter by calling kwargs.get("max_tokens_to_sample") or
kwargs.get("max_tokens"); change this to prefer the Messages API by using
kwargs.get("max_tokens") or kwargs.get("max_tokens_to_sample") when calling
set_span_attribute (referencing set_span_attribute and
GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS and the kwargs keys) so max_tokens
takes precedence with max_tokens_to_sample as the fallback.

Comment on lines +1567 to +1573
input_messages = json.loads(anthropic_span.attributes[GenAIAttributes.GEN_AI_INPUT_MESSAGES])
assert input_messages[0]["content"] == "What is the weather and current time in San Francisco?"
assert input_messages[0]["role"] == "user"
msg1_content = json.loads(input_messages[1]["content"])
assert msg1_content[0]["text"] == "I'll help you get the weather and current time in San Francisco."
assert input_messages[1]["role"] == "assistant"
assert json.loads(input_messages[2]["content"]) == [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add an explicit regression assertion for non-duplicated tool-use in input history.

This block validates assistant history content but does not explicitly assert that tool-use is not duplicated between content and tool_calls in input messages (one of the PR bugfix goals). Add a direct guard here.

Suggested assertion add-on
     msg1_content = json.loads(input_messages[1]["content"])
     assert msg1_content[0]["text"] == "I'll help you get the weather and current time in San Francisco."
+    tool_use_blocks = [b for b in msg1_content if b.get("type") == "tool_use"]
+    assert len(tool_use_blocks) == 1
+    assert "tool_calls" not in input_messages[1]
     assert input_messages[1]["role"] == "assistant"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`
around lines 1567 - 1573, The test validates assistant history but misses an
explicit regression guard to ensure tool-use isn't duplicated between message
content and tool_calls; update the test around anthropic_span and
GenAIAttributes.GEN_AI_INPUT_MESSAGES to parse input_messages and the associated
tool_calls (from
anthropic_span.attributes[GenAIAttributes.GEN_AI_INPUT_MESSAGES] or nearby
attribute used in the test) and add an assertion that no assistant message's
content string(s) (e.g., msg1_content or input_messages[n]["content"]) appear
again in the corresponding tool_calls entries—i.e., explicitly assert that for
assistant-role messages (input_messages[i]["role"] == "assistant") the textual
content is not duplicated in the parsed tool_calls list.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1)

1567-1579: ⚠️ Potential issue | 🟡 Minor

Add regression assertion for tool_calls on assistant message with history.

The test validates the assistant message's content is correctly extracted as text, but per the PR's bug fix (tool-use blocks no longer duplicated), the test should also verify that tool_calls exists on input_messages[1] with the extracted tool_use data.

Suggested assertion add-on
     assert input_messages[1]["content"] == "I'll help you get the weather and current time in San Francisco."
     assert input_messages[1]["role"] == "assistant"
+    # Verify tool_use block is captured in tool_calls (not duplicated in content)
+    assert "tool_calls" in input_messages[1]
+    assert len(input_messages[1]["tool_calls"]) == 1
+    assert input_messages[1]["tool_calls"][0]["id"] == "call_1"
+    assert input_messages[1]["tool_calls"][0]["name"] == "get_weather"
     assert json.loads(input_messages[2]["content"]) == [
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`
around lines 1567 - 1579, Add an assertion that the assistant message in
input_messages (the second element, input_messages[1]) contains a tool_calls
field with the extracted tool use data from the tool-use block; locate where
input_messages is derived from
anthropic_span.attributes[GenAIAttributes.GEN_AI_INPUT_MESSAGES] and add a check
that input_messages[1]["tool_calls"] (or its JSON-decoded equivalent if stored
as a string) matches the expected tool_use object (e.g.,
type/tool_use_id/content matching the previous tool result).
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py (1)

439-495: Consider extracting repeated span capture pattern (optional refactor).

The span identity tests (test_gen_ai_system_value_is_lowercase_anthropic, test_gen_ai_provider_name_is_set, test_gen_ai_operation_name_*, test_awrap_*) share identical mocking boilerplate. A helper or fixture could reduce duplication.

Example helper extraction
def capture_span_attributes(to_wrap, kwargs, use_awrap=False):
    """Helper to capture span attributes from _wrap/_awrap calls."""
    from unittest.mock import patch, MagicMock, AsyncMock
    from opentelemetry.instrumentation.anthropic import _wrap, _awrap
    
    tracer = MagicMock()
    captured = {}
    
    def fake_start_span(name, kind, attributes):
        captured["attributes"] = attributes
        span = MagicMock()
        span.is_recording.return_value = False
        return span
    
    tracer.start_span.side_effect = fake_start_span
    wrapped_fn = AsyncMock(return_value=None) if use_awrap else MagicMock(return_value=None)
    
    with patch("opentelemetry.context.get_value", return_value=False):
        fn = (_awrap if use_awrap else _wrap)(tracer, None, None, None, None, None, to_wrap)
        if use_awrap:
            import asyncio
            asyncio.run(fn(wrapped_fn, MagicMock(), [], kwargs))
        else:
            fn(wrapped_fn, MagicMock(), [], kwargs)
    
    return captured["attributes"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`
around lines 439 - 495, Several tests repeat the same tracer/mock/start_span
boilerplate; extract that into a single helper (e.g. capture_span_attributes)
and call it from test_gen_ai_system_value_is_lowercase_anthropic,
test_gen_ai_provider_name_is_set and the related test_gen_ai_operation_name_*
and test_awrap_* tests to remove duplication; the helper should create the
MagicMock tracer, install the fake_start_span side_effect to capture attributes,
patch opentelemetry.context.get_value, invoke _wrap or _awrap with the provided
to_wrap and kwargs, and return the captured attributes so each test just asserts
on the returned dict.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`:
- Around line 1567-1579: Add an assertion that the assistant message in
input_messages (the second element, input_messages[1]) contains a tool_calls
field with the extracted tool use data from the tool-use block; locate where
input_messages is derived from
anthropic_span.attributes[GenAIAttributes.GEN_AI_INPUT_MESSAGES] and add a check
that input_messages[1]["tool_calls"] (or its JSON-decoded equivalent if stored
as a string) matches the expected tool_use object (e.g.,
type/tool_use_id/content matching the previous tool result).

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`:
- Around line 439-495: Several tests repeat the same tracer/mock/start_span
boilerplate; extract that into a single helper (e.g. capture_span_attributes)
and call it from test_gen_ai_system_value_is_lowercase_anthropic,
test_gen_ai_provider_name_is_set and the related test_gen_ai_operation_name_*
and test_awrap_* tests to remove duplication; the helper should create the
MagicMock tracer, install the fake_start_span side_effect to capture attributes,
patch opentelemetry.context.get_value, invoke _wrap or _awrap with the provided
to_wrap and kwargs, and return the captured attributes so each test just asserts
on the returned dict.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b50caf51-6c87-4c24-be9d-cac9bcb0ec37

📥 Commits

Reviewing files that changed from the base of the PR and between fb5e0e9 and e820899.

📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
  • packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py

@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1926-anthropic-instrumentation branch from 47d522d to 547b275 Compare March 16, 2026 14:26
@max-deygin-traceloop max-deygin-traceloop changed the base branch from main to max/tlp-1925-python-sdk-otel-semantic-convention March 16, 2026 14:27
@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1926-anthropic-instrumentation branch from 547b275 to 324da4c Compare March 17, 2026 13:44
@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1925-python-sdk-otel-semantic-convention branch from e430fd6 to 0141f4c Compare March 17, 2026 13:51
@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1926-anthropic-instrumentation branch 2 times, most recently from 8fe73f1 to 0e5f2d7 Compare March 18, 2026 10:40
@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1925-python-sdk-otel-semantic-convention branch from 740ee81 to 6401889 Compare March 18, 2026 11:06
@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1926-anthropic-instrumentation branch 4 times, most recently from 7d1116a to cb0feb9 Compare March 18, 2026 14:54
GenAIAttributes.GEN_AI_SYSTEM: "Anthropic",
SpanAttributes.LLM_REQUEST_TYPE: LLMRequestTypeValues.COMPLETION.value,
GenAIAttributes.GEN_AI_SYSTEM: GenAiSystemValues.ANTHROPIC.value,
GenAIAttributes.GEN_AI_PROVIDER_NAME: GenAiSystemValues.ANTHROPIC.value,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@max-deygin-traceloop and @OzBenSimhonTraceloop , here a reference for both sets (system and provider name)

max-deygin-servicenow and others added 9 commits March 19, 2026 16:31
…→ GEN_AI_

- Import GEN_AI_REQUEST_FREQUENCY/PRESENCE_PENALTY from upstream gen_ai_attributes
- Use SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS/IS_STREAMING/RESPONSE_FINISH_REASON/
  RESPONSE_STOP_REASON (renamed from LLM_*)
- Remove duplicate LLM_REQUEST_TYPE dict entry (operation_name var already handles it)
- Update test_messages.py and test_bedrock_with_raw_response.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pan_attrs.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- llm.usage.total_tokens → gen_ai.usage.total_tokens
- gen_ai.usage.cache_creation_input_tokens → gen_ai.usage.cache_creation.input_tokens
- gen_ai.usage.cache_read_input_tokens → gen_ai.usage.cache_read.input_tokens

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1926-anthropic-instrumentation branch from cb0feb9 to 53ba7f1 Compare March 19, 2026 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants