Skip to content

fix(telemetry): surface FallbackAdapter active model/provider on parent spans#1341

Open
mrniket wants to merge 2 commits into
livekit:mainfrom
lottiehq-oss:claude/competent-beaver-bc8c43
Open

fix(telemetry): surface FallbackAdapter active model/provider on parent spans#1341
mrniket wants to merge 2 commits into
livekit:mainfrom
lottiehq-oss:claude/competent-beaver-bc8c43

Conversation

@mrniket
Copy link
Copy Markdown
Contributor

@mrniket mrniket commented Apr 29, 2026

Summary

llm.FallbackAdapter / tts.FallbackAdapter / stt.FallbackAdapter currently leave their wrapper labels (FallbackAdapter, inference.STT, livekit) on the parent spans (llm_node, tts_node, user_turn), so when a fallback fires the trace can't tell you which provider actually handled the turn — gen_ai.request.model / gen_ai.provider.name are stuck on the wrapper.

Separately, llm_node and the inner llm_request spans both carried gen_ai.usage.* and were both shaped like generation spans, so any tracing backend that infers cost from observation type ended up counting the same call 2-3 times across wrapper + provider layers.

This PR fixes both.

Trace propagation

  • llm/fallback_adapter: capture the caller span on construction. On first successful chunk, write gen_ai.request.model / gen_ai.provider.name back onto the caller span, the inner llm_request span, and the run span. Wrap each attempt in an llm_fallback_attempt span with attempt index + model/provider.
  • tts/fallback_adapter: same caller-span propagation pattern.
  • stt/fallback_adapter:
    • Track _activeStt (set when a child stream produces events or recognize() succeeds) and expose it via label / model / provider getters so external callers reading the wrapper see the active child.
    • Wrap each attempt in stt_fallback_recognize_attempt / stt_fallback_stream_attempt spans.
  • voice/agent_activity + voice/audio_recognition: thread the STT instance into AudioRecognition so user_turn re-reads model/provider on each STT event (FallbackAdapter only knows its active child after the first event lands). Idempotent — skips setAttribute if the value hasn't changed.

Cost attribution

  • voice/generation: capture final ChatChunk.usage and stamp exact prompt/completion tokens on the llm_node span, classified as generation. This becomes the single billable layer per LLM turn — backends that estimate tokens from prompt text no longer diverge from the provider's own billing.
  • llm/llm + tts/tts: classify llm_request / tts_request spans as span (not generation) so wrapper + inner-provider layers aren't counted as separate cost centres. LiveKit Cloud is unaffected — gen_ai.usage.* is still emitted on the inner spans for backends that read it directly. Made _llmRequestSpan / _ttsRequestSpan protected so FallbackAdapter subclasses can write through.
  • telemetry/trace_types: add a new observation-type attribute (matches the existing naming convention in this file) plus ATTR_FALLBACK_ATTEMPT_INDEX.

Verified

End-to-end against a real call:

Metric Before After
llm_node model FallbackAdapter active provider model
user_turn model inference.STT active STT (e.g. assemblyai/u3-rt-pro)
Billable LLM layers 3 (llm_node + 2x llm_request) 1 (llm_node only)
Per-turn cost vs provider math ~3x over exact

Test plan

  • pnpm build:agents clean
  • pnpm test — all 29 fallback-adapter tests pass (LLM 9 + STT + TTS)
  • pnpm format:check clean
  • Verified live in production for several days as a vendored patch on @livekit/agents@1.2.7 before this PR
  • Reviewer eyes on the protected _llmRequestSpan / _ttsRequestSpan exposure — open to a different shape if you'd rather keep them private

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 29, 2026

⚠️ No Changeset found

Latest commit: 94e9807

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@mrniket mrniket force-pushed the claude/competent-beaver-bc8c43 branch from 9abe920 to 3d3cbce Compare April 30, 2026 13:14
mrniket added 2 commits April 30, 2026 14:15
…nt spans

When `llm.FallbackAdapter`, `tts.FallbackAdapter`, or `stt.FallbackAdapter`
wraps multiple providers, parent spans (`llm_node`, `tts_node`, `user_turn`)
were stamped with the wrapper labels (`FallbackAdapter`, `inference.STT`,
`livekit`) instead of the provider that actually handled the request. This
made `gen_ai.request.model` / `gen_ai.provider.name` useless for telemetry
consumers when fallbacks were in play.

Changes:

- llm/fallback_adapter: capture caller span on construction; on first
  successful chunk, write `gen_ai.request.model` / `gen_ai.provider.name`
  back onto the caller span, the inner `llm_request` span, and the run
  span. Wrap each attempt in an `llm_fallback_attempt` span carrying the
  attempt index plus model/provider.
- tts/fallback_adapter: same propagation pattern via captured caller span.
- stt/fallback_adapter:
  - track `_activeStt` set when a child stream produces events or
    `recognize()` succeeds; expose it via `label` / `model` / `provider`
    getters so callers reading the wrapper see the active child.
  - wrap each `recognize()` and stream attempt in
    `stt_fallback_recognize_attempt` / `stt_fallback_stream_attempt`
    spans with attempt index + model/provider.
- voice/agent_activity + audio_recognition: thread the `STT` instance
  into AudioRecognition so `user_turn` re-reads the active model/provider
  on each STT event. Skip `setAttribute` when nothing changed.

Cost attribution:

- voice/generation: capture final `ChatChunk.usage` and stamp exact
  prompt/completion tokens on the `llm_node` span, classified as
  `generation`. This becomes the single billable layer for an LLM turn,
  so tracing backends that infer cost from observation type don't fall
  back to a local-tokenizer estimate of the prompt text.
- llm/llm + tts/tts: classify `llm_request` / `tts_request` spans as
  `span` (not `generation`) so wrapper + provider layers aren't double-
  counted as separate cost centres. Made `_llmRequestSpan` /
  `_ttsRequestSpan` `protected` so subclass implementations can write
  through to them.
- LiveKit Cloud is unaffected: `gen_ai.usage.*` is still emitted on the
  inner `llm_request` / `tts_request` spans for backends that read it
  directly.
- telemetry/trace_types: add a new observation-type attribute (matches
  the existing naming convention in this file) plus
  `ATTR_FALLBACK_ATTEMPT_INDEX`.

Verified end-to-end against a real call — `llm_node` model now reads the
active provider model (was `FallbackAdapter`), `user_turn` model reads
the active STT (was `inference.STT`), per-turn cost matches exact
provider math (was ~3x over).
Brings in 31 upstream commits since the branch diverged. Two real conflicts:

- agents/src/llm/llm.ts: upstream added `#providerRequestIds`; this branch
  made `_llmRequestSpan` protected (so FallbackLLMStream can write through).
  Kept both — protected `_llmRequestSpan` plus private `#providerRequestIds`.
- agents/src/voice/audio_recognition.ts: upstream added requestId collection
  in `onSTTEvent`; this branch added `refreshUserTurnSttAttributes()` at the
  same spot for FallbackAdapter live-update. Kept both, refresh first.

Other files (tts.ts, generation.ts, agent_activity.ts, trace_types.ts) auto-
merged cleanly — upstream's `#providerRequestIds` field on tts.ts coexists
with this branch's protected `_ttsRequestSpan` the same way as llm.ts.

# Conflicts:
#	agents/src/llm/llm.ts
#	agents/src/voice/audio_recognition.ts
@mrniket mrniket force-pushed the claude/competent-beaver-bc8c43 branch from 3d3cbce to 94e9807 Compare April 30, 2026 13:16
@mrniket mrniket marked this pull request as ready for review April 30, 2026 13:20
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant