Skip to content

fix(backend): tag BYOK LLM errors by source#7321

Open
apoorvdarshan wants to merge 1 commit into
BasedHardware:mainfrom
apoorvdarshan:fix/byok-llm-error-logging
Open

fix(backend): tag BYOK LLM errors by source#7321
apoorvdarshan wants to merge 1 commit into
BasedHardware:mainfrom
apoorvdarshan:fix/byok-llm-error-logging

Conversation

@apoorvdarshan
Copy link
Copy Markdown

Summary

  • add BYOK uid context propagation for HTTP, WebSocket, and background processing
  • log LLM failures with platform vs BYOK source, provider, feature, model, operation, uid, status, and reason
  • attach LangChain error callbacks to OpenAI, OpenRouter, Gemini, and embedding clients
  • offload async Anthropic-agent error handling to the shared executor so logging work does not block the event loop

Closes #7290

Verification

  • PATH=/tmp/omi-backend-py311/bin:$PATH ./test-preflight.sh
  • PATH=/tmp/omi-backend-py311/bin:$PATH pytest tests/unit/test_byok_llm_logging.py -q

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 15, 2026

Greptile Summary

This PR adds structured BYOK vs. platform source tagging to LLM error logs across HTTP, WebSocket, and background processing paths. A new byok_errors module provides classify_byok_llm_error and handle_llm_error, a _LLMErrorCallback LangChain handler wires them into OpenAI/OpenRouter/Gemini/embedding clients, and a _byok_uid_ctx ContextVar propagates the authenticated uid through async tasks and thread-pool executors.

  • byok.py: adds _byok_uid_ctx ContextVar with get/set helpers; BYOKMiddleware, validate_byok_request, and validate_byok_websocket all reset/set it correctly.
  • byok_errors.py: new module classifies errors (401 → invalid, 403 → permission, 429+quota-keyword → quota) and logs a structured 10-field message; handle_llm_error_async offloads work to storage_executor via submit_with_context to preserve request context.
  • clients.py: introduces _with_llm_callbacks to uniformly attach the usage callback and the new error callback; wraps _OpenAIEmbeddingsProxy embedding methods directly since OpenAIEmbeddings has no LangChain callback mechanism.

Confidence Score: 4/5

Safe to merge; changes are observability-only and do not alter any request path or data handling logic.

The ContextVar plumbing is correct across all three entry points. The main gaps are that Anthropic credit-exhaustion 429s don't contain the word 'quota' so they log reason='unknown', and platform client factories register callbacks without feature context so platform failures always log feature='unknown'. Neither affects correctness.

backend/utils/llm/byok_errors.py (quota keyword coverage) and backend/utils/llm/clients.py (feature context for platform callbacks)

Important Files Changed

Filename Overview
backend/utils/llm/byok_errors.py New module: structured logging for LLM failures with BYOK/platform source tagging; classification logic may miss non-OpenAI quota error messages
backend/utils/llm/clients.py Adds _LLMErrorCallback LangChain handler and _with_llm_callbacks helper; platform-level clients use feature='' losing feature context in error logs
backend/utils/byok.py Adds _byok_uid_ctx ContextVar and get/set helpers; BYOKMiddleware correctly resets uid token on every request
backend/routers/pusher.py Propagates uid to the byok context in _process_conversation_task when BYOK keys are present
backend/routers/sync.py Background pipeline sets byok_uid when BYOK keys are present and always resets to None in the finally block
backend/utils/retrieval/agentic.py Replaces bare logger.error with handle_llm_error_async for Anthropic agent errors
backend/tests/unit/test_byok_llm_logging.py New unit tests cover classification, source tagging, uid recording, and callback wiring; only OpenAI-style error shapes are exercised

Comments Outside Diff (1)

  1. backend/utils/llm/clients.py, line 480-493 (link)

    P2 Platform LLM errors always log feature='unknown'

    _get_or_create_openai_llm, _get_or_create_openrouter_llm, _get_or_create_gemini_llm, and llm_mini all call _with_llm_callbacks with the default feature='', which handle_llm_error maps to 'unknown'. Because client instances are cached before any request arrives, there is no opportunity to inject a per-feature callback at creation time. Every platform LLM failure in production logs feature=unknown, significantly reducing the usefulness of the structured log field for platform-side triage.

Reviews (1): Last reviewed commit: "fix(backend): tag BYOK LLM errors by sou..." | Re-trigger Greptile

Comment on lines +11 to +14
_QUOTA_ERROR_NAMES = frozenset({'RateLimitError'})


def get_llm_error_source(provider: Optional[str]) -> str:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The _QUOTA_ERROR_NAMES frozenset is named after quota errors but contains RateLimitError, which is a superset of quota failures. More critically, Anthropic's credit-exhaustion 429 response body typically reads "Your credit balance is too low…" — the word "quota" never appears. As a result, an Anthropic BYOK user who runs out of credits will log reason='unknown' instead of reason='quota', defeating the classification for the one provider wired to this path via handle_llm_error_async. Adding a 'credit' / 'balance' keyword check (and optionally renaming the frozenset) would fix the gap.

Suggested change
_QUOTA_ERROR_NAMES = frozenset({'RateLimitError'})
def get_llm_error_source(provider: Optional[str]) -> str:
_RATE_LIMIT_ERROR_NAMES = frozenset({'RateLimitError'})
_QUOTA_KEYWORDS = frozenset({'insufficient_quota', 'quota', 'credit', 'balance'})
def get_llm_error_source(provider: Optional[str]) -> str:

Comment on lines +31 to +33
if status_code == 429 or error_name in _QUOTA_ERROR_NAMES:
if 'insufficient_quota' in error_text or 'quota' in error_text:
return 'quota'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 If the rename above is applied, the 429-branch should use the updated name and keyword set to catch Anthropic credit errors.

Suggested change
if status_code == 429 or error_name in _QUOTA_ERROR_NAMES:
if 'insufficient_quota' in error_text or 'quota' in error_text:
return 'quota'
if status_code == 429 or error_name in _RATE_LIMIT_ERROR_NAMES:
if any(kw in error_text for kw in _QUOTA_KEYWORDS):
return 'quota'

Comment on lines +39 to +46
_llm_error_callbacks: Dict[Tuple[str, str, str], _LLMErrorCallback] = {}


def _get_llm_error_callback(provider: str, model: str = '', feature: str = '') -> _LLMErrorCallback:
key = (provider, model, feature)
if key not in _llm_error_callbacks:
_llm_error_callbacks[key] = _LLMErrorCallback(provider, model=model, feature=feature)
return _llm_error_callbacks[key]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unbounded module-level callback cache

_llm_error_callbacks has no maximum size. While the set of (provider, model, feature) tuples is small in practice today, it will silently grow indefinitely as new models or features are introduced, and there is no eviction path. Consider using a bounded structure (e.g. TTLCache) consistent with the caches already used in this file, or at minimum document the expected upper bound.

Comment on lines +66 to +78
async def handle_llm_error_async(
error: Exception,
provider: Optional[str],
feature: Optional[str] = None,
model: Optional[str] = None,
operation: str = 'chat',
) -> None:
"""Run LLM error handling off the event loop while preserving BYOK context."""
future = submit_with_context(storage_executor, handle_llm_error, error, provider, feature, model, operation)
try:
await asyncio.wrap_future(future)
except Exception as e:
logger.error('Async LLM error handler failed provider=%s feature=%s: %s', provider, feature, e)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 storage_executor is the wrong executor for error logging

storage_executor is documented and dimensioned (16 threads) for audio file pre-caching and GCS operations. Submitting logging work there mixes latency classes and could steal threads from ongoing audio uploads during busy periods. postprocess_executor or the default loop.run_in_executor(None, ...) would be more appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

backend: distinguish platform vs BYOK errors in LLM logging

1 participant