fix(backend): tag BYOK LLM errors by source by apoorvdarshan · Pull Request #7321 · BasedHardware/omi

apoorvdarshan · 2026-05-15T18:27:24Z

Summary

add BYOK uid context propagation for HTTP, WebSocket, and background processing
log LLM failures with platform vs BYOK source, provider, feature, model, operation, uid, status, and reason
attach LangChain error callbacks to OpenAI, OpenRouter, Gemini, and embedding clients
offload async Anthropic-agent error handling to the shared executor so logging work does not block the event loop

Verification

PATH=/tmp/omi-backend-py311/bin:$PATH ./test-preflight.sh
PATH=/tmp/omi-backend-py311/bin:$PATH pytest tests/unit/test_byok_llm_logging.py -q

greptile-apps · 2026-05-15T18:35:33Z

Greptile Summary

This PR adds structured BYOK vs. platform source tagging to LLM error logs across HTTP, WebSocket, and background processing paths. A new byok_errors module provides classify_byok_llm_error and handle_llm_error, a _LLMErrorCallback LangChain handler wires them into OpenAI/OpenRouter/Gemini/embedding clients, and a _byok_uid_ctx ContextVar propagates the authenticated uid through async tasks and thread-pool executors.

byok.py: adds _byok_uid_ctx ContextVar with get/set helpers; BYOKMiddleware, validate_byok_request, and validate_byok_websocket all reset/set it correctly.
byok_errors.py: new module classifies errors (401 → invalid, 403 → permission, 429+quota-keyword → quota) and logs a structured 10-field message; handle_llm_error_async offloads work to storage_executor via submit_with_context to preserve request context.
clients.py: introduces _with_llm_callbacks to uniformly attach the usage callback and the new error callback; wraps _OpenAIEmbeddingsProxy embedding methods directly since OpenAIEmbeddings has no LangChain callback mechanism.

Confidence Score: 4/5

Safe to merge; changes are observability-only and do not alter any request path or data handling logic.

The ContextVar plumbing is correct across all three entry points. The main gaps are that Anthropic credit-exhaustion 429s don't contain the word 'quota' so they log reason='unknown', and platform client factories register callbacks without feature context so platform failures always log feature='unknown'. Neither affects correctness.

backend/utils/llm/byok_errors.py (quota keyword coverage) and backend/utils/llm/clients.py (feature context for platform callbacks)

Important Files Changed

Filename	Overview
backend/utils/llm/byok_errors.py	New module: structured logging for LLM failures with BYOK/platform source tagging; classification logic may miss non-OpenAI quota error messages
backend/utils/llm/clients.py	Adds _LLMErrorCallback LangChain handler and _with_llm_callbacks helper; platform-level clients use feature='' losing feature context in error logs
backend/utils/byok.py	Adds _byok_uid_ctx ContextVar and get/set helpers; BYOKMiddleware correctly resets uid token on every request
backend/routers/pusher.py	Propagates uid to the byok context in _process_conversation_task when BYOK keys are present
backend/routers/sync.py	Background pipeline sets byok_uid when BYOK keys are present and always resets to None in the finally block
backend/utils/retrieval/agentic.py	Replaces bare logger.error with handle_llm_error_async for Anthropic agent errors
backend/tests/unit/test_byok_llm_logging.py	New unit tests cover classification, source tagging, uid recording, and callback wiring; only OpenAI-style error shapes are exercised

Comments Outside Diff (1)

backend/utils/llm/clients.py, line 480-493 (link)

Platform LLM errors always log feature='unknown'

_get_or_create_openai_llm, _get_or_create_openrouter_llm, _get_or_create_gemini_llm, and llm_mini all call _with_llm_callbacks with the default feature='', which handle_llm_error maps to 'unknown'. Because client instances are cached before any request arrives, there is no opportunity to inject a per-feature callback at creation time. Every platform LLM failure in production logs feature=unknown, significantly reducing the usefulness of the structured log field for platform-side triage.

_{Reviews (1): Last reviewed commit: "fix(backend): tag BYOK LLM errors by sou..." | Re-trigger Greptile}

greptile-apps · 2026-05-15T18:35:37Z

+_QUOTA_ERROR_NAMES = frozenset({'RateLimitError'})
+
+
+def get_llm_error_source(provider: Optional[str]) -> str:


The _QUOTA_ERROR_NAMES frozenset is named after quota errors but contains RateLimitError, which is a superset of quota failures. More critically, Anthropic's credit-exhaustion 429 response body typically reads "Your credit balance is too low…" — the word "quota" never appears. As a result, an Anthropic BYOK user who runs out of credits will log reason='unknown' instead of reason='quota', defeating the classification for the one provider wired to this path via handle_llm_error_async. Adding a 'credit' / 'balance' keyword check (and optionally renaming the frozenset) would fix the gap.

Suggested change

_QUOTA_ERROR_NAMES = frozenset({'RateLimitError'})

def get_llm_error_source(provider: Optional[str]) -> str:

_RATE_LIMIT_ERROR_NAMES = frozenset({'RateLimitError'})

_QUOTA_KEYWORDS = frozenset({'insufficient_quota', 'quota', 'credit', 'balance'})

def get_llm_error_source(provider: Optional[str]) -> str:

greptile-apps · 2026-05-15T18:35:38Z

+    if status_code == 429 or error_name in _QUOTA_ERROR_NAMES:
+        if 'insufficient_quota' in error_text or 'quota' in error_text:
+            return 'quota'


If the rename above is applied, the 429-branch should use the updated name and keyword set to catch Anthropic credit errors.

Suggested change

if status_code == 429 or error_name in _QUOTA_ERROR_NAMES:

if 'insufficient_quota' in error_text or 'quota' in error_text:

return 'quota'

if status_code == 429 or error_name in _RATE_LIMIT_ERROR_NAMES:

if any(kw in error_text for kw in _QUOTA_KEYWORDS):

return 'quota'

greptile-apps · 2026-05-15T18:35:39Z

+_llm_error_callbacks: Dict[Tuple[str, str, str], _LLMErrorCallback] = {}
+
+
+def _get_llm_error_callback(provider: str, model: str = '', feature: str = '') -> _LLMErrorCallback:
+    key = (provider, model, feature)
+    if key not in _llm_error_callbacks:
+        _llm_error_callbacks[key] = _LLMErrorCallback(provider, model=model, feature=feature)
+    return _llm_error_callbacks[key]


Unbounded module-level callback cache

_llm_error_callbacks has no maximum size. While the set of (provider, model, feature) tuples is small in practice today, it will silently grow indefinitely as new models or features are introduced, and there is no eviction path. Consider using a bounded structure (e.g. TTLCache) consistent with the caches already used in this file, or at minimum document the expected upper bound.

greptile-apps · 2026-05-15T18:35:43Z

+async def handle_llm_error_async(
+    error: Exception,
+    provider: Optional[str],
+    feature: Optional[str] = None,
+    model: Optional[str] = None,
+    operation: str = 'chat',
+) -> None:
+    """Run LLM error handling off the event loop while preserving BYOK context."""
+    future = submit_with_context(storage_executor, handle_llm_error, error, provider, feature, model, operation)
+    try:
+        await asyncio.wrap_future(future)
+    except Exception as e:
+        logger.error('Async LLM error handler failed provider=%s feature=%s: %s', provider, feature, e)


storage_executor is the wrong executor for error logging

storage_executor is documented and dimensioned (16 threads) for audio file pre-caching and GCS operations. Submitting logging work there mixes latency classes and could steal threads from ongoing audio uploads during busy periods. postprocess_executor or the default loop.run_in_executor(None, ...) would be more appropriate.

fix(backend): tag BYOK LLM errors by source

2726349

apoorvdarshan mentioned this pull request May 15, 2026

fix(backend): surface BYOK LLM failures and chat fallback #7320

Closed

greptile-apps Bot reviewed May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backend): tag BYOK LLM errors by source#7321

fix(backend): tag BYOK LLM errors by source#7321
apoorvdarshan wants to merge 1 commit into
BasedHardware:mainfrom
apoorvdarshan:fix/byok-llm-error-logging

apoorvdarshan commented May 15, 2026

Uh oh!

greptile-apps Bot commented May 15, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 15, 2026

Uh oh!

greptile-apps Bot May 15, 2026

Uh oh!

greptile-apps Bot May 15, 2026

Uh oh!

greptile-apps Bot May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		_QUOTA_ERROR_NAMES = frozenset({'RateLimitError'})


		def get_llm_error_source(provider: Optional[str]) -> str:

Conversation

apoorvdarshan commented May 15, 2026

Summary

Verification

Uh oh!

greptile-apps Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented May 15, 2026 •

edited

Loading