Skip to content

Investigate increased event frequency after upgrade#8

Closed
mfittko wants to merge 2 commits intosofatutor-tweaksfrom
cursor/investigate-increased-event-frequency-after-upgrade-gpt-5-codex-e384
Closed

Investigate increased event frequency after upgrade#8
mfittko wants to merge 2 commits intosofatutor-tweaksfrom
cursor/investigate-increased-event-frequency-after-upgrade-gpt-5-codex-e384

Conversation

@mfittko
Copy link
Copy Markdown

@mfittko mfittko commented Nov 28, 2025

Title

Fix: Prevent TypeError in daily spend sorting with None values

Relevant issues

N/A

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🐛 Bug Fix

Changes

The daily spend batch processor was crashing with a TypeError when sorting transactions. This occurred because optional fields like model or custom_llm_provider could be None (a scenario that became more frequent after a recent upgrade), and Python's sort cannot compare None with strings.

This change normalizes the sort key for daily spend transactions by coercing potentially None string fields to empty strings ("") before sorting. This ensures robust sorting even when optional fields are absent. Additionally, mcp_namespaced_tool_name has been included in the sort key for more deterministic ordering.


Slack Thread

Open in Cursor Open in Web

Co-authored-by: github <github@mfittko.com>
@cursor
Copy link
Copy Markdown

cursor Bot commented Nov 28, 2025

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@mfittko mfittko marked this pull request as ready for review November 28, 2025 14:42
@mfittko mfittko self-assigned this Nov 28, 2025
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions Bot added the Stale label Mar 19, 2026
@github-actions github-actions Bot closed this Mar 26, 2026
mfittko pushed a commit that referenced this pull request Mar 26, 2026
…onse IDs

Addresses 4 critical OpenTelemetry span issues in LiteLLM:

Issue #3: Remove redundant attributes from raw_gen_ai_request spans
- Removed self.set_attributes() call that was duplicating all parent span
  attributes (gen_ai.*, metadata.*) onto the raw span
- Raw span now only contains provider-specific llm.{provider}.* attributes
- Reduces storage and eliminates search confusion from duplicate data

Issue #4: Prevent attribute duplication on litellm_proxy_request parent span
- When litellm_request child span exists, removed redundant
  set_attributes() call on the parent proxy span
- Child span already carries all attributes; parent duplication doubles
  storage and complicates search

Issue #5: Fix orphaned guardrail traces
- Guardrail spans were created with context=None when no parent proxy span
  existed, resulting in orphaned root spans (separate trace_id)
- Added _resolve_guardrail_context() helper to ensure guardrails always
  have a valid parent (litellm_request or proxy span)
- Applied fix to both _handle_success and _handle_failure paths

Issue #8: Add gen_ai.response.id for embeddings and image generation
- EmbeddingResponse and ImageResponse types don't have provider response IDs
- Added fallback to standard_logging_payload["id"] (litellm call ID) for
  correlation across LiteLLM UI, Phoenix traces, and provider logs
- Completions still use provider ID (e.g. "chatcmpl-xxx") when available

Tests added:
- TestRawSpanAttributeIsolation: Verify raw span has no gen_ai/metadata attrs
- TestNoParentSpanDuplication: Verify parent span doesn't get duplicated attrs
- TestGuardrailSpanParenting: Verify guardrails are children (not orphaned)
- TestResponseIdFallback: Verify response ID set for all call types

All existing OTEL tests pass (73 passed, 14 pre-existing protocol failures).

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants