Skip to content

feat(integrations): add support for the litellm responses/aresponses APIs#6205

Open
constantinius wants to merge 2 commits intomasterfrom
constantinius/feat/integrations/litellm-responses-conversation-id
Open

feat(integrations): add support for the litellm responses/aresponses APIs#6205
constantinius wants to merge 2 commits intomasterfrom
constantinius/feat/integrations/litellm-responses-conversation-id

Conversation

@constantinius
Copy link
Copy Markdown
Contributor

Description

Adds support for responses and aresponses, and their differences in output tracking. Also checking the conversation ID if it is passed in the extra_args.

Contributes to https://linear.app/getsentry/issue/TET-2287/see-if-we-can-auto-extract-conversationid-from-openai-python

@constantinius constantinius requested a review from a team as a code owner May 5, 2026 12:20
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 5, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Codecov Results 📊

2187 passed | ⏭️ 154 skipped | Total: 2341 | Pass Rate: 93.42% | Execution Time: 4m 55s

All tests are passing successfully.

❌ Patch coverage is 0.00%. Project has 12726 uncovered lines.

Files with missing lines (2)
File Patch % Lines
openai.py 4.13% ⚠️ 673 Missing
litellm.py 0.00% ⚠️ 199 Missing

Generated by Codecov Action

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 9f4b78a. Configure here.

Comment thread sentry_sdk/integrations/litellm.py Outdated
Comment thread sentry_sdk/integrations/litellm.py Outdated
@constantinius constantinius force-pushed the constantinius/feat/integrations/litellm-responses-conversation-id branch from 9f4b78a to bb31cad Compare May 5, 2026 12:30
Comment thread sentry_sdk/integrations/litellm.py
Comment on lines +2219 to +2222
_input_callback(kwargs)
_success_callback(
kwargs, MockResponsesResponse(), datetime.now(), datetime.now()
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With SDK tests we aim to verify that we generate some telemetry based on the user's interaction with the library. We want to assert the presence of telemetry if the patched library is used as the user would use the library.

Currently, we assert that telemetry is generated if _input_callback and _success_callback are each invoked exactly once.

This is not always the case, and the assumption has resulted in unhandled SDK exceptions that were fixed in the commit below:

96ebbf6

Comment on lines +2140 to +2170
class MockResponsesUsage:
def __init__(self, input_tokens=12, output_tokens=24, total_tokens=36):
self.input_tokens = input_tokens
self.output_tokens = output_tokens
self.total_tokens = total_tokens


class MockResponsesContentItem:
def __init__(self, text):
self.type = "output_text"
self.text = text


class MockResponsesOutputMessage:
def __init__(self, text):
self.type = "message"
self.role = "assistant"
self.content = [MockResponsesContentItem(text)]


class MockResponsesResponse:
def __init__(
self,
model="gpt-4.1-nano",
output=None,
usage=None,
):
self.id = "resp-test"
self.model = model
self.output = output or [MockResponsesOutputMessage("the model response")]
self.usage = usage or MockResponsesUsage()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to https://github.com/getsentry/sentry-python/pull/6205/changes#r3201008608, we should aim to avoid custom types in our test suites.

As soon as we introduce custom types our tests are not coupled to the concrete types used in a library, and the tests no longer verify the SDK contract (namely, that telemetry is generated when a library is used like a user would interact with the library).

We can't hit real LLM APIs in the tests but we can do the next best thing: couple the sample response to the types in the library and patch at the lowest possible level.

This is done most of the tests in this test file, and there are helpers in the repo to accomplish writing effective tests (such as get_model_response()).

Comment on lines +336 to 345
if hasattr(response, "usage"):
usage = response.usage
record_token_usage(
span,
input_tokens=getattr(usage, "prompt_tokens", None),
output_tokens=getattr(usage, "completion_tokens", None),
total_tokens=getattr(usage, "total_tokens", None),
input_tokens=_read_usage_field(usage, "prompt_tokens", "input_tokens"),
output_tokens=_read_usage_field(
usage, "completion_tokens", "output_tokens"
),
total_tokens=_read_usage_field(usage, "total_tokens"),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already probe above to determine which API is used.

As a result, reading prompt_tokens or input_tokens is dead code conditioned on knowing which API you are handling (adding cognitive overhead when reading).

set_data_normalized(
span, SPANDATA.GEN_AI_RESPONSE_TEXT, response_messages
)
elif hasattr(response, "output"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are adding code here which runs for all possible types of object that have an output field.

As a result the branch can easily be accidentally triggered as litellm evolves. There are multiple approaches to narrow down if you have a response in the Chat Completion API schema or a response in the Responses API schema. For example, you can check

isinstance(response, (ResponsesAPIResponse, BaseResponsesAPIStreamingIterator))

based on the signature of the library function

https://github.com/BerriAI/litellm/blob/a67b7a7e87f11bed01f9e073125a7f8f180105a2/litellm/responses/main.py#L449.

normalized = normalize_message_roles(input_messages) # type: ignore[arg-type]
messages_data = truncate_and_annotate_messages(normalized, span, scope)
if messages_data is not None:
set_data_normalized(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the marshaling above you know that messages_data is a list. You should just use span.set_data() when you know the type of an attribute (again, removing cognitive overhead by avoiding dead code).

Comment on lines +46 to +48
The usage object can be either a typed Pydantic model (attribute access) or
a plain dict (litellm hands us a dict for the assembled async-streaming
response), so we try both shapes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just read from the dictionary int he asynchronous streaming scenario and otherwise access the attribute on the Pydantic model 😄 ?

These responses have types, so an isinstance check can tell you which branch you are in.

In the end we're developing against a library with a finite number of return types, and we should just check which case we are handling instead of probing around. Probing around is less robust, since new return types accidentally trigger hasattr() checks.

for content_item in getattr(output, "content", []) or []:
text = getattr(content_item, "text", None)
if text is not None:
output_text.append(text)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has reached a lot of indentation for Python code. Usually you can keep code readable by adding early returns or breaking up into functions where appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants