feat(anthropic): conform instrumentation to OTel GenAI semantic conventions by max-deygin-traceloop · Pull Request #3808 · traceloop/openllmetry

max-deygin-traceloop · 2026-03-16T09:10:33Z

What

Refactors the Anthropic instrumentation package to emit span attributes that comply with the
OpenTelemetry GenAI semantic conventions spec.

Changes

New attributes (replaces legacy flat gen_ai.prompt.* / gen_ai.completion.* keys):

gen_ai.input.messages — full message array as JSON
gen_ai.output.messages — assistant response as JSON
gen_ai.system_instructions — system prompt as standalone attribute
gen_ai.tool.definitions — tool definitions as JSON
gen_ai.provider.name — new required attribute ("anthropic")
gen_ai.operation.name — new required attribute ("chat" / "text_completion")
gen_ai.response.finish_reasons — now emitted as an array regardless of TRACELOOP_TRACE_CONTENT
setting

Bug fixes:

gen_ai.system value corrected from "Anthropic" to "anthropic" (spec enum value)
gen_ai.request.max_tokens now correctly reads max_tokens (Messages API) with fallback to
max_tokens_to_sample (legacy Completions API)
Tool-use blocks in input messages no longer duplicated across both content and tool_calls
Streaming tool_calls.arguments correctly handles both string (streaming delta accumulation) and dict
inputs without double-encoding

Checklist

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some
observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.
Validated locally against a running agent using ConsoleSpanExporter.
Confirmed all appear correctly on anthropic.chat spans:

gen_ai.system: "anthropic", gen_ai.provider.name: "anthropic", gen_ai.operation.name: "chat",
gen_ai.input.messages, gen_ai.output.messages,  gen_ai.response.finish_reasons

Summary by CodeRabbit

New Features
- Emit structured JSON input/output messages, system instructions, tool definitions, and response finish reasons as span attributes.
- Add provider and operation-name attributes; normalize streaming output and serialize tool_call arguments.
- Better token fallback and consolidated multi-turn/tool-use handling.
Tests
- Updated and expanded tests to validate structured messaging, tool extraction/handling, streaming behavior, and GenAI semantic conventions.
Chores
- Package version bumped to 0.54.0

coderabbitai · 2026-03-16T09:10:51Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1d8a49ac-0697-4095-af27-4896505e88f6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Switches Anthropic instrumentation from legacy flat gen_ai.prompt/completion attributes to structured GenAI semantic attributes (gen_ai.input.messages, gen_ai.output.messages, gen_ai.system_instructions, gen_ai.tool.definitions, gen_ai.response.finish_reasons), uses GenAi enum values for provider/system/operation attributes, and updates tests and version.

Changes

Cohort / File(s)	Summary
Instrumentation entrypoint `packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py`	Import GenAi enums; set GEN_AI_SYSTEM/provider to `GenAiSystemValues.ANTHROPIC.value`; add `Gen_AI_PROVIDER_NAME` and `Gen_AI_OPERATION_NAME` attributes (CHAT or TEXT_COMPLETION) for both sync and async wrappers.
Span attribute handling `packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py`	Large refactor to emit `GEN_AI_INPUT_MESSAGES`/`GEN_AI_OUTPUT_MESSAGES` JSON arrays; add `GEN_AI_SYSTEM_INSTRUCTIONS`, `GEN_AI_TOOL_DEFINITIONS`, `GEN_AI_RESPONSE_FINISH_REASONS`; normalize roles/content, consolidate streaming and non-streaming paths, extract/serialize tool_calls (JSON-stringify arguments), handle base64 image content, and choose max_tokens precedence.
Tests — updated assertions to new model `packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py`, `.../test_completion.py`, `.../test_messages.py`, `.../test_prompt_caching.py`, `.../test_structured_outputs.py`, `.../test_thinking.py`	Replace legacy per-field prompt/completion checks with JSON parsing of `GEN_AI_INPUT_MESSAGES` / `GEN_AI_OUTPUT_MESSAGES`; adapt assertions for roles, content, tools, finish reasons; add `json` imports where needed.
New tests — semantic conventions `packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`	Add comprehensive GenAI semantic-conventions tests validating gen_ai.* attribute names/values, GenAISystem enum mappings, tool definitions/calls extraction, finish reasons, streaming behavior, max_tokens precedence, and span identity attributes for sync/async paths.
Version bump / packaging `packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/version.py`, `packages/opentelemetry-instrumentation-anthropic/pyproject.toml`	Package version bumped from 0.53.0 to 0.54.0.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

feat(langchain): add OpenTelemetry GenAI semantic conventions #3673 — Related GenAI semantic conventions changes and Anthropic instrumentation adjustments (overlapping enum/attribute updates).
feat(instrumentation): updated GenAI attributes to use OTel's #3138 — Prior work on migrating Anthropic instrumentation to GenAI attributes; closely connected to the input/output message model changes.

Poem

🐇✨ I stitched messages into tidy JSON threads,
System hints and tool calls tucked into beds.
From prompts to outputs the rabbit hops with glee,
Spans now speak GenAI—structured, clean, and free.
sniff-hop

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 27.48% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: conforming Anthropic instrumentation to OpenTelemetry GenAI semantic conventions, which aligns with the substantial refactoring across multiple files to emit compliant span attributes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch max/tlp-1926-anthropic-instrumentation

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

CLAassistant · 2026-03-16T09:13:12Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ max-deygin-traceloop
❌ max-deygin-servicenow
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py (1)

52-56: Consider handling missing role gracefully.

Using next(m for m in output_messages if m.get("role") == "thinking") will raise StopIteration if no thinking message exists. While this is likely intentional to fail the test, adding a default or explicit assertion message would improve test diagnostics:

thinking_msg = next((m for m in output_messages if m.get("role") == "thinking"), None)
assert thinking_msg is not None, "Expected thinking message not found in output"

This is a minor improvement for test clarity.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py`
around lines 52 - 56, Replace the use of next(...) without a default when
locating the thinking message to avoid StopIteration; in the test reading
output_messages from
anthropic_span.attributes[GenAIAttributes.GEN_AI_OUTPUT_MESSAGES], use next((m
for m in output_messages if m.get("role") == "thinking"), None) to assign
thinking_msg and then assert thinking_msg is not None with a clear message
(e.g., "Expected thinking message not found in output") before checking
thinking_msg["content"]; do the same defensive pattern or clear assertions for
assistant_msg if desired.

packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1)

92-97: Consider extracting repeated span-message parsing/assertion helpers.

The same parse/assert pattern appears in many tests, which increases maintenance cost for future semconv shape updates.

Refactor sketch

+def assert_io_messages(span, expected_input_role, expected_input_content, expected_output_role, expected_output_content):
+    input_messages = json.loads(span.attributes[GenAIAttributes.GEN_AI_INPUT_MESSAGES])
+    output_messages = json.loads(span.attributes[GenAIAttributes.GEN_AI_OUTPUT_MESSAGES])
+    assert input_messages[0]["role"] == expected_input_role
+    assert input_messages[0]["content"] == expected_input_content
+    assert output_messages[-1]["role"] == expected_output_role
+    assert output_messages[-1]["content"] == expected_output_content

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`
around lines 92 - 97, Extract a small test helper to centralize parsing and
assertions for span message attributes used across tests: create a utility
(e.g., parse_span_messages or assert_span_message) that accepts a span and an
attribute key (GenAIAttributes.GEN_AI_INPUT_MESSAGES or
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES) and returns the parsed JSON list or
asserts the expected content/role against a provided expected message; then
replace the repeated json.loads and assertions in tests referencing
anthropic_span and response with calls to this helper to reduce duplication and
make future semconv shape updates easier to change in one place.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py`:
- Around line 78-80: The GEN_AI_REQUEST_MAX_TOKENS attribute currently prefers
the legacy parameter by calling kwargs.get("max_tokens_to_sample") or
kwargs.get("max_tokens"); change this to prefer the Messages API by using
kwargs.get("max_tokens") or kwargs.get("max_tokens_to_sample") when calling
set_span_attribute (referencing set_span_attribute and
GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS and the kwargs keys) so max_tokens
takes precedence with max_tokens_to_sample as the fallback.

In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`:
- Around line 1567-1573: The test validates assistant history but misses an
explicit regression guard to ensure tool-use isn't duplicated between message
content and tool_calls; update the test around anthropic_span and
GenAIAttributes.GEN_AI_INPUT_MESSAGES to parse input_messages and the associated
tool_calls (from
anthropic_span.attributes[GenAIAttributes.GEN_AI_INPUT_MESSAGES] or nearby
attribute used in the test) and add an assertion that no assistant message's
content string(s) (e.g., msg1_content or input_messages[n]["content"]) appear
again in the corresponding tool_calls entries—i.e., explicitly assert that for
assistant-role messages (input_messages[i]["role"] == "assistant") the textual
content is not duplicated in the parsed tool_calls list.

---

Nitpick comments:
In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`:
- Around line 92-97: Extract a small test helper to centralize parsing and
assertions for span message attributes used across tests: create a utility
(e.g., parse_span_messages or assert_span_message) that accepts a span and an
attribute key (GenAIAttributes.GEN_AI_INPUT_MESSAGES or
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES) and returns the parsed JSON list or
asserts the expected content/role against a provided expected message; then
replace the repeated json.loads and assertions in tests referencing
anthropic_span and response with calls to this helper to reduce duplication and
make future semconv shape updates easier to change in one place.

In `@packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py`:
- Around line 52-56: Replace the use of next(...) without a default when
locating the thinking message to avoid StopIteration; in the test reading
output_messages from
anthropic_span.attributes[GenAIAttributes.GEN_AI_OUTPUT_MESSAGES], use next((m
for m in output_messages if m.get("role") == "thinking"), None) to assign
thinking_msg and then assert thinking_msg is not None with a clear message
(e.g., "Expected thinking message not found in output") before checking
thinking_msg["content"]; do the same defensive pattern or clear assertions for
assistant_msg if desired.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 55b633d3-e07f-4128-9a6c-7ee9773fb151

📥 Commits

Reviewing files that changed from the base of the PR and between 93786d9 and fb5e0e9.

⛔ Files ignored due to path filters (1)

packages/opentelemetry-instrumentation-anthropic/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (9)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py
packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py
packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py
packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py

coderabbitai · 2026-03-16T09:17:23Z

...pentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py

    set_span_attribute(
-        span, GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, kwargs.get("max_tokens_to_sample")
+        span, GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, kwargs.get("max_tokens_to_sample") or kwargs.get("max_tokens")
    )


⚠️ Potential issue | 🟡 Minor

max_tokens precedence is inverted from PR description.

The PR description states: "gen_ai.request.max_tokens now reads max_tokens (Messages API) with fallback to max_tokens_to_sample (legacy Completions API)". However, the code uses max_tokens_to_sample or max_tokens, which prioritizes the legacy parameter.

For the Messages API (which is the primary/modern API), max_tokens should take precedence:

🔧 Proposed fix

set_span_attribute( - span, GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, kwargs.get("max_tokens_to_sample") or kwargs.get("max_tokens") + span, GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, kwargs.get("max_tokens") or kwargs.get("max_tokens_to_sample") )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

set_span_attribute(

span, GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, kwargs.get("max_tokens_to_sample")

span, GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, kwargs.get("max_tokens_to_sample") or kwargs.get("max_tokens")

)

set_span_attribute(

span, GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, kwargs.get("max_tokens") or kwargs.get("max_tokens_to_sample")

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py` around lines 78 - 80, The GEN_AI_REQUEST_MAX_TOKENS attribute currently prefers the legacy parameter by calling kwargs.get("max_tokens_to_sample") or kwargs.get("max_tokens"); change this to prefer the Messages API by using kwargs.get("max_tokens") or kwargs.get("max_tokens_to_sample") when calling set_span_attribute (referencing set_span_attribute and GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS and the kwargs keys) so max_tokens takes precedence with max_tokens_to_sample as the fallback.

coderabbitai · 2026-03-16T09:17:23Z

packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py

+    input_messages = json.loads(anthropic_span.attributes[GenAIAttributes.GEN_AI_INPUT_MESSAGES])
+    assert input_messages[0]["content"] == "What is the weather and current time in San Francisco?"
+    assert input_messages[0]["role"] == "user"
+    msg1_content = json.loads(input_messages[1]["content"])
+    assert msg1_content[0]["text"] == "I'll help you get the weather and current time in San Francisco."
+    assert input_messages[1]["role"] == "assistant"
+    assert json.loads(input_messages[2]["content"]) == [


⚠️ Potential issue | 🟡 Minor

Add an explicit regression assertion for non-duplicated tool-use in input history.

This block validates assistant history content but does not explicitly assert that tool-use is not duplicated between content and tool_calls in input messages (one of the PR bugfix goals). Add a direct guard here.

Suggested assertion add-on

msg1_content = json.loads(input_messages[1]["content"]) assert msg1_content[0]["text"] == "I'll help you get the weather and current time in San Francisco." + tool_use_blocks = [b for b in msg1_content if b.get("type") == "tool_use"] + assert len(tool_use_blocks) == 1 + assert "tool_calls" not in input_messages[1] assert input_messages[1]["role"] == "assistant"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py` around lines 1567 - 1573, The test validates assistant history but misses an explicit regression guard to ensure tool-use isn't duplicated between message content and tool_calls; update the test around anthropic_span and GenAIAttributes.GEN_AI_INPUT_MESSAGES to parse input_messages and the associated tool_calls (from anthropic_span.attributes[GenAIAttributes.GEN_AI_INPUT_MESSAGES] or nearby attribute used in the test) and add an assertion that no assistant message's content string(s) (e.g., msg1_content or input_messages[n]["content"]) appear again in the corresponding tool_calls entries—i.e., explicitly assert that for assistant-role messages (input_messages[i]["role"] == "assistant") the textual content is not duplicated in the parsed tool_calls list.

coderabbitai

♻️ Duplicate comments (1)

packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1)

1567-1579: ⚠️ Potential issue | 🟡 Minor

Add regression assertion for tool_calls on assistant message with history.

The test validates the assistant message's content is correctly extracted as text, but per the PR's bug fix (tool-use blocks no longer duplicated), the test should also verify that tool_calls exists on input_messages[1] with the extracted tool_use data.

Suggested assertion add-on

     assert input_messages[1]["content"] == "I'll help you get the weather and current time in San Francisco."
     assert input_messages[1]["role"] == "assistant"
+    # Verify tool_use block is captured in tool_calls (not duplicated in content)
+    assert "tool_calls" in input_messages[1]
+    assert len(input_messages[1]["tool_calls"]) == 1
+    assert input_messages[1]["tool_calls"][0]["id"] == "call_1"
+    assert input_messages[1]["tool_calls"][0]["name"] == "get_weather"
     assert json.loads(input_messages[2]["content"]) == [

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`
around lines 1567 - 1579, Add an assertion that the assistant message in
input_messages (the second element, input_messages[1]) contains a tool_calls
field with the extracted tool use data from the tool-use block; locate where
input_messages is derived from
anthropic_span.attributes[GenAIAttributes.GEN_AI_INPUT_MESSAGES] and add a check
that input_messages[1]["tool_calls"] (or its JSON-decoded equivalent if stored
as a string) matches the expected tool_use object (e.g.,
type/tool_use_id/content matching the previous tool result).

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py (1)

439-495: Consider extracting repeated span capture pattern (optional refactor).

The span identity tests (test_gen_ai_system_value_is_lowercase_anthropic, test_gen_ai_provider_name_is_set, test_gen_ai_operation_name_*, test_awrap_*) share identical mocking boilerplate. A helper or fixture could reduce duplication.

Example helper extraction

def capture_span_attributes(to_wrap, kwargs, use_awrap=False):
    """Helper to capture span attributes from _wrap/_awrap calls."""
    from unittest.mock import patch, MagicMock, AsyncMock
    from opentelemetry.instrumentation.anthropic import _wrap, _awrap
    
    tracer = MagicMock()
    captured = {}
    
    def fake_start_span(name, kind, attributes):
        captured["attributes"] = attributes
        span = MagicMock()
        span.is_recording.return_value = False
        return span
    
    tracer.start_span.side_effect = fake_start_span
    wrapped_fn = AsyncMock(return_value=None) if use_awrap else MagicMock(return_value=None)
    
    with patch("opentelemetry.context.get_value", return_value=False):
        fn = (_awrap if use_awrap else _wrap)(tracer, None, None, None, None, None, to_wrap)
        if use_awrap:
            import asyncio
            asyncio.run(fn(wrapped_fn, MagicMock(), [], kwargs))
        else:
            fn(wrapped_fn, MagicMock(), [], kwargs)
    
    return captured["attributes"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`
around lines 439 - 495, Several tests repeat the same tracer/mock/start_span
boilerplate; extract that into a single helper (e.g. capture_span_attributes)
and call it from test_gen_ai_system_value_is_lowercase_anthropic,
test_gen_ai_provider_name_is_set and the related test_gen_ai_operation_name_*
and test_awrap_* tests to remove duplication; the helper should create the
MagicMock tracer, install the fake_start_span side_effect to capture attributes,
patch opentelemetry.context.get_value, invoke _wrap or _awrap with the provided
to_wrap and kwargs, and return the captured attributes so each test just asserts
on the returned dict.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`:
- Around line 1567-1579: Add an assertion that the assistant message in
input_messages (the second element, input_messages[1]) contains a tool_calls
field with the extracted tool use data from the tool-use block; locate where
input_messages is derived from
anthropic_span.attributes[GenAIAttributes.GEN_AI_INPUT_MESSAGES] and add a check
that input_messages[1]["tool_calls"] (or its JSON-decoded equivalent if stored
as a string) matches the expected tool_use object (e.g.,
type/tool_use_id/content matching the previous tool result).

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`:
- Around line 439-495: Several tests repeat the same tracer/mock/start_span
boilerplate; extract that into a single helper (e.g. capture_span_attributes)
and call it from test_gen_ai_system_value_is_lowercase_anthropic,
test_gen_ai_provider_name_is_set and the related test_gen_ai_operation_name_*
and test_awrap_* tests to remove duplication; the helper should create the
MagicMock tracer, install the fake_start_span side_effect to capture attributes,
patch opentelemetry.context.get_value, invoke _wrap or _awrap with the provided
to_wrap and kwargs, and return the captured attributes so each test just asserts
on the returned dict.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b50caf51-6c87-4c24-be9d-cac9bcb0ec37

📥 Commits

Reviewing files that changed from the base of the PR and between fb5e0e9 and e820899.

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py

avivhalfon · 2026-03-19T13:17:26Z

.../opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py

-            GenAIAttributes.GEN_AI_SYSTEM: "Anthropic",
-            SpanAttributes.LLM_REQUEST_TYPE: LLMRequestTypeValues.COMPLETION.value,
+            GenAIAttributes.GEN_AI_SYSTEM: GenAiSystemValues.ANTHROPIC.value,
+            GenAIAttributes.GEN_AI_PROVIDER_NAME: GenAiSystemValues.ANTHROPIC.value,


@max-deygin-traceloop and @OzBenSimhonTraceloop , here a reference for both sets (system and provider name)

…→ GEN_AI_ - Import GEN_AI_REQUEST_FREQUENCY/PRESENCE_PENALTY from upstream gen_ai_attributes - Use SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS/IS_STREAMING/RESPONSE_FINISH_REASON/ RESPONSE_STOP_REASON (renamed from LLM_*) - Remove duplicate LLM_REQUEST_TYPE dict entry (operation_name var already handles it) - Update test_messages.py and test_bedrock_with_raw_response.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…pan_attrs.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- llm.usage.total_tokens → gen_ai.usage.total_tokens - gen_ai.usage.cache_creation_input_tokens → gen_ai.usage.cache_creation.input_tokens - gen_ai.usage.cache_read_input_tokens → gen_ai.usage.cache_read.input_tokens Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

max-deygin-traceloop requested review from OzBenSimhonTraceloop and nina-kollman March 16, 2026 09:10

max-deygin-traceloop added the Python SDK label Mar 16, 2026

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

max-deygin-traceloop mentioned this pull request Mar 16, 2026

feat(semconv): migrating span attributes to OTel gen_ai convention #3809

Open

1 task

max-deygin-traceloop force-pushed the max/tlp-1926-anthropic-instrumentation branch from 47d522d to 547b275 Compare March 16, 2026 14:26

max-deygin-traceloop changed the base branch from main to max/tlp-1925-python-sdk-otel-semantic-convention March 16, 2026 14:27

max-deygin-traceloop force-pushed the max/tlp-1926-anthropic-instrumentation branch from 547b275 to 324da4c Compare March 17, 2026 13:44

max-deygin-traceloop force-pushed the max/tlp-1925-python-sdk-otel-semantic-convention branch from e430fd6 to 0141f4c Compare March 17, 2026 13:51

max-deygin-traceloop force-pushed the max/tlp-1926-anthropic-instrumentation branch 2 times, most recently from 8fe73f1 to 0e5f2d7 Compare March 18, 2026 10:40

max-deygin-traceloop force-pushed the max/tlp-1925-python-sdk-otel-semantic-convention branch from 740ee81 to 6401889 Compare March 18, 2026 11:06

max-deygin-traceloop force-pushed the max/tlp-1926-anthropic-instrumentation branch 4 times, most recently from 7d1116a to cb0feb9 Compare March 18, 2026 14:54

avivhalfon reviewed Mar 19, 2026

View reviewed changes

max-deygin-servicenow and others added 9 commits March 19, 2026 16:31

feat: Anthropic to gen-ai OTEL atributes

dbdbcb3

Fixed a double encoded json

7a561bf

fix: lint failures

f650dc0

Aded otel conv tests for Anthropic package

646418a

fix(anthropic): use SpanAttributes constant for structured_output_schema

70c412e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(anthropic): remove unused SpanAttributes import in test_semconv_s…

0a58e9d

…pan_attrs.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore(anthropic): update semconv dep to >=0.5.0,<0.6.0

dede13c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

max-deygin-traceloop force-pushed the max/tlp-1926-anthropic-instrumentation branch from cb0feb9 to 53ba7f1 Compare March 19, 2026 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(anthropic): conform instrumentation to OTel GenAI semantic conventions#3808

feat(anthropic): conform instrumentation to OTel GenAI semantic conventions#3808
max-deygin-traceloop wants to merge 9 commits intomax/tlp-1925-python-sdk-otel-semantic-conventionfrom
max/tlp-1926-anthropic-instrumentation

max-deygin-traceloop commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 16, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

CLAassistant commented Mar 16, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 16, 2026

Uh oh!

coderabbitai bot Mar 16, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

avivhalfon Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

max-deygin-traceloop commented Mar 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Changes

Bug fixes:

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

CLAassistant commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

avivhalfon Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

max-deygin-traceloop commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 16, 2026 •

edited

Loading

CLAassistant commented Mar 16, 2026 •

edited

Loading