Skip to content

fix(backend): surface BYOK LLM failures and chat fallback#7320

Closed
apoorvdarshan wants to merge 1 commit into
BasedHardware:mainfrom
apoorvdarshan:fix/llm-byok-chat-provider
Closed

fix(backend): surface BYOK LLM failures and chat fallback#7320
apoorvdarshan wants to merge 1 commit into
BasedHardware:mainfrom
apoorvdarshan:fix/llm-byok-chat-provider

Conversation

@apoorvdarshan
Copy link
Copy Markdown

Summary

  • add source-aware LLM error logging and rate-limited BYOK push notifications for invalid, denied, or exhausted user keys
  • propagate BYOK uid context into request and background processing so provider failures can be tied to the right user
  • add CHAT_PROVIDER=openai LangGraph fallback for agentic chat and lazy Anthropic client construction when Anthropic env is unavailable
  • clean up VAD hosted calls to use httpx and move account deletion background wipe onto the shared storage executor

Closes #7290
Closes #7291
Closes #7238

Verification

  • PATH=/tmp/omi-backend-py311/bin:$PATH ./test-preflight.sh
  • PATH=/tmp/omi-backend-py311/bin:$PATH ./test.sh

Note: test.sh completed with the existing fair-use integration block skipped because Redis is not available locally.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 15, 2026

Greptile Summary

This PR introduces source-aware BYOK LLM error logging with rate-limited push notifications, propagates BYOK UID context into background and async processing, and adds a CHAT_PROVIDER=openai LangGraph/ReAct fallback path alongside the existing Anthropic agent.

  • byok_errors.py + clients.py: New handle_llm_error + _LLMErrorCallback surface 401/403/429-quota errors to BYOK users via FCM; _AnthropicClientProxy defers client construction until first call so missing ANTHROPIC_API_KEY doesn't crash at import time.
  • agentic.py: Adds a full _execute_agentic_chat_stream_openai path using LangGraph create_react_agent; the existing Anthropic error handler is upgraded to call handle_llm_error; CHAT_PROVIDER env var selects the active backend at startup.
  • byok.py / sync.py / pusher.py: _byok_uid_ctx ContextVar is threaded through HTTP, WebSocket, and background-thread request contexts so provider errors can be tied back to the originating user.

Confidence Score: 3/5

Safe to merge for most users, but the Anthropic chat error handler now makes synchronous Firestore + FCM network calls inside an async function, which stalls the event loop on the first BYOK failure for each user.

The UID context propagation and Redis lock are solid, and the notification deduplication logic is correct. The concern is handle_llm_error being called at line 438 of agentic.py inside an async def — the downstream path does a Firestore read, a Redis write, and an FCM batch send all synchronously, with no executor offload. This blocks the event loop for every BYOK user who hits a broken key, which is the exact scenario this PR aims to handle more gracefully. The new OpenAI stream path also has a gap: its top-level exception handler doesn't call handle_llm_error, and the outer generator's non-cancel exception path doesn't cancel the background LangGraph task.

backend/utils/retrieval/agentic.py warrants the closest look — the Anthropic error handler's sync I/O and the OpenAI stream's missing error surfacing and task-cancel gap are all in this file.

Important Files Changed

Filename Overview
backend/utils/retrieval/agentic.py Adds new OpenAI/LangGraph ReAct agent path and CHAT_PROVIDER switch. The Anthropic error handler now calls handle_llm_error but does so synchronously in an async context; the new OpenAI stream error handler doesn't call handle_llm_error; and the outer generator's non-cancel exception path doesn't cancel its background task.
backend/utils/llm/byok_errors.py New module that classifies BYOK LLM errors and sends rate-limited FCM push notifications. Logic is clean; all I/O is synchronous (Firestore + Redis + FCM), which becomes a problem when called from async callers.
backend/utils/llm/clients.py Adds _LLMErrorCallback and _AnthropicClientProxy with lazy construction. Error callback registration and BYOK proxy pattern look correct; _AnthropicClientProxy._get_default() avoids creating the client at import time when ANTHROPIC_API_KEY is absent.
backend/utils/byok.py Adds _byok_uid_ctx ContextVar and get/set_byok_uid helpers. Both validate_byok_request and validate_byok_websocket now call set_byok_uid; the middleware correctly resets and restores both tokens in its finally block.
backend/database/redis_db.py Adds try_acquire_byok_llm_error_notification_lock with a 24-hour TTL using SET NX. Implementation is correct; no issues found.
backend/routers/sync.py Propagates BYOK UID into background pipeline via set_byok_uid; correctly clears it in the finally block alongside set_byok_keys.
backend/routers/pusher.py Sets BYOK UID in the async background task context when BYOK keys are present; correct since asyncio task context is independent of the request context.
backend/routers/users.py Replaces bare threading.Thread for account deletion wipe with submit_with_context(storage_executor, ...), moving the heavy background job onto the shared storage thread pool.
backend/utils/stt/vad.py Switches the hosted VAD call from requests to httpx; timeout is passed as a float (300.0) which is correct for httpx's API.
backend/tests/unit/test_byok_llm_errors.py New test file covering classification, notification gating, UID propagation, and the lazy Anthropic proxy. Coverage of the happy-path notification flow and platform-error exclusion looks solid.

Reviews (1): Last reviewed commit: "fix(backend): surface BYOK LLM failures ..." | Re-trigger Greptile

Comment on lines 437 to 441
except Exception as e:
logger.error(f"Anthropic API error: {e}")
handle_llm_error(e, 'anthropic', feature='chat_agent', model=ANTHROPIC_AGENT_MODEL)
await callback.put_data(f"\n\nSorry, I encountered an error. Please try again.")
await callback.end()
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Blocking sync I/O called directly in async error handler

handle_llm_error is a synchronous function that unconditionally calls _send_byok_llm_error_notification, which makes a synchronous Firestore read (notification_db.get_all_tokens), a Redis write (try_acquire_byok_llm_error_notification_lock), and a synchronous FCM send (messaging.send_each). Calling this directly in an async def — without asyncio.get_event_loop().run_in_executor or similar — will stall the entire event loop on the first BYOK error until all three network calls resolve, which could be several hundred milliseconds and will delay every other in-flight coroutine on the server.

Comment on lines +622 to +625
except Exception as e:
logger.error(f"Error in OpenAI agent stream: {e}")
traceback.print_exc()
await callback.end()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 OpenAI stream-level errors bypass handle_llm_error

The LangChain _LLMErrorCallback fires for errors routed through LangChain's callback system (e.g., model invocation failures), but stream-level or chain-level failures that propagate directly to astream_events as an exception can reach this except block without triggering the callback. Unlike the Anthropic path, this handler does not call handle_llm_error, so those errors are silently swallowed without BYOK push notifications.

Suggested change
except Exception as e:
logger.error(f"Error in OpenAI agent stream: {e}")
traceback.print_exc()
await callback.end()
except Exception as e:
handle_llm_error(e, 'openai', feature='chat_agent')
logger.error(f"Error in OpenAI agent stream: {e}")
traceback.print_exc()
await callback.end()

Comment on lines +740 to +744
except Exception as e:
logger.error(f"Error in execute_agentic_chat_stream openai: {e}")
traceback.print_exc()
if callback_data is not None:
callback_data['error'] = str(e)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Background task not cancelled on outer except Exception

When the outer generator's except Exception as e: block is hit (a non-CancelledError failure in the queue-draining loop), task is never cancelled or awaited. The _run_openai_agent_stream coroutine will keep running — consuming OpenAI tokens and filling callback.queue — until it finishes on its own, which could be many seconds for a long tool chain.

Suggested change
except Exception as e:
logger.error(f"Error in execute_agentic_chat_stream openai: {e}")
traceback.print_exc()
if callback_data is not None:
callback_data['error'] = str(e)
except Exception as e:
task.cancel()
logger.error(f"Error in execute_agentic_chat_stream openai: {e}")
traceback.print_exc()
if callback_data is not None:
callback_data['error'] = str(e)

@apoorvdarshan
Copy link
Copy Markdown
Author

Closing in favor of split issue-specific PRs: #7321 (#7290), #7322 (#7238), and #7323 (#7291).

@github-actions
Copy link
Copy Markdown
Contributor

Hey @apoorvdarshan 👋

Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request.

After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:

  • Project standards — Ensuring consistency across the codebase
  • User needs — Making sure changes align with what our users need
  • Code best practices — Maintaining code quality and maintainability
  • Project direction — Keeping aligned with our roadmap and vision

Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out.

Thank you for being part of the Omi community! 💜

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant