fix: prevent auth token death spiral on transient failures#5930
fix: prevent auth token death spiral on transient failures#5930
Conversation
After token refresh in getAuthHeader(), re-read hasAuthToken to fix stale variable bug. Only overwrite cached token when getIdToken() returns non-null, preserving near-expiry-but-still-valid tokens on transient refresh failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Token Previously _clearCachedAuth() was called on ANY exception including network timeouts, wiping valid tokens and causing a permanent death spiral. Now only clears on FirebaseAuthException codes user-not-found, user-disabled, user-token-expired. Transient errors preserve the token. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update test helpers to match new production behavior and add 10 new tests covering FirebaseAuthException code-specific clearing and getAuthHeader hasAuthToken recomputation with token preservation. Total: 32 tests (was 22). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All checkpoints passed — ready for merge
PR is ready for human merge approval. by AI for @beastoin |
Greptile SummaryThis PR fixes two root causes of a permanent auth death spiral where transient Firebase token refresh failures could permanently break all authenticated API calls. The fixes are targeted and correct: Key changes:
Remaining concern: Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[API call needed] --> B[getAuthHeader]
B --> C{hasAuthToken && isExpirationDateValid?}
C -- No --> D[getIdToken]
C -- Yes --> K[Return Bearer token]
D --> E{Firebase currentUser?}
E -- No --> F[_clearCachedAuth\nreturn null]
E -- Yes --> G[getIdTokenResult refresh]
G --> H{token != null?}
H -- Yes --> I[Update SharedPrefs\nreturn token]
H -- No --> J[return null]
G -- FirebaseAuthException --> FA{terminal code?\nuser-not-found\nuser-disabled\nuser-token-expired}
FA -- Yes --> FC[_clearCachedAuth\nreturn null]
FA -- No --> FP[Preserve cache\nreturn null]
G -- Generic exception --> GP[Preserve cache\nreturn null - NEW]
D --> DResult{refreshedToken != null?}
I --> DResult
J --> DResult
FC --> DResult
FP --> DResult
GP --> DResult
DResult -- Yes --> Write[Write to SharedPrefs - NEW]
DResult -- No --> Skip[Skip write - NEW]
Write --> Reread[Re-read hasAuthToken - NEW]
Skip --> Reread
Reread --> Check{hasAuthToken?}
Check -- No & isSignedIn --> Throw[throw Exception]
Check -- Yes / not signed in --> K
|
| if (e.code == 'user-not-found' || e.code == 'user-disabled' || e.code == 'user-token-expired') { | ||
| _clearCachedAuth(); | ||
| } |
There was a problem hiding this comment.
invalid-user-token not included in cache-clearing codes
The new FirebaseAuthException handler correctly identifies three terminal auth states that should clear the cache. However, the Firebase Auth SDK also issues invalid-user-token when a user's credential has been tampered with or is otherwise permanently invalid — a condition that definitely warrants clearing cached auth just like user-token-expired.
if (e.code == 'user-not-found' ||
e.code == 'user-disabled' ||
e.code == 'user-token-expired' ||
e.code == 'invalid-user-token') { // <-- add this
_clearCachedAuth();
}Without this, a tampered/permanently invalid token would be preserved in the cache, and subsequent getAuthHeader() calls would keep returning it until the next successful refresh cycle.
Fresh review complete — all checkpoints passedClean review from scratch on PR #5930 (no prior review history carried over):
PR is ready for human merge approval. by AI for @beastoin |
L2 Live Test Evidence — Auth Flow (CP9B)Environment
Test Flow: Auth → API Call → Token RefreshStep 1: Google Sign-In — SUCCESS Step 2: 401 retry path exercised (shared.dart fix proven) The Step 3: Environment blocker This is a systemic environment mismatch — not a bug in this PR. It prevents completing the full chat/logout/login cycle in L2. Unit Tests — All 32 PassTests cover all changed code paths:
What Was Proven
What Remains BlockedThe full L2 flow (auth → chat → logout → login → chat) requires either:
by AI for @beastoin |
L2 Live Test — Auth → Chat → Logout → Login → Chat (Integrated)Setup: Local dev backend (port 10240, Flow Evidence1. Sign In (Google native)App launched → "Sign in with Google" → account picker → Firebase auth succeeded → home screen loaded with conversations from local backend (all API calls 200 OK). 2. Chat (first message — authenticated)Opened "Ask Omi" → typed "Hello from auth test" → sent → backend received and processed (200 OK on Backend log excerpt: 3. Sign OutSettings → scroll to "Sign Out" → confirmation dialog → tapped OK → Firebase auth state listeners confirmed sign-out. Flutter logs: 4. Signed Out StateApp restarted → onboarding screen shown with "Get Started" → no cached auth. 5. Re-Login"Get Started" → "Sign in with Google" → "Continue with Google" → account picker → Firebase re-authenticated → home screen loaded with all conversations from backend (all API calls 200 OK with fresh token). Backend log excerpt (post-relogin): 6. Chat (second message — post-relogin)Opened "Ask Omi" → typed "Auth works after relogin" → sent → backend received and processed (200 OK). Token validated correctly after full sign-out/sign-in cycle. Backend log excerpt: Summary
by AI for @beastoin |
Clarification on "Sorry, I encountered an error" in chat screenshotsThe error shown in the chat UI is not an auth failure. It's caused by the local dev backend missing Proof from backend log: The auth token was valid in both pre- and post-relogin chat attempts. Zero 401s in the entire log. The "error" is a local env limitation, not related to the auth fix being tested. by AI for @beastoin |
L2 Update: Chat Working End-to-End with Anthropic APIAfter adding ANTHROPIC_API_KEY to the local dev backend, chat now works fully end-to-end. Backend EvidenceApp EvidenceScreenshotBot responded intelligently via Anthropic Claude Opus. Zero 401s, no auth errors. Complete L2 flow (auth → chat → logout → login → chat) verified with working Anthropic backend. Note: Previous L2 chat showed "Sorry, I encountered an error" because ANTHROPIC_API_KEY was not set in dev env. PR #5674 migrated chat from OpenAI to Anthropic native tool use, making this key required. by AI for @beastoin |
|
lgtm |








Summary
Fixes #5927
Two bugs caused chat (and all authenticated API calls) to permanently break after any Firebase token refresh failure:
shared.dart):getAuthHeader()capturedhasAuthTokenonce, refreshed the token, but checked the stale variable (stillfalse) → threw "No auth token found" even after successful refresh. Also usedgetIdToken() ?? ''which wiped near-expiry-but-still-valid tokens when refresh returnednull.auth_service.dart):getIdToken()catch block called_clearCachedAuth()on any exception including network timeouts → wiped valid token → every subsequent call failed via Bug 1 → permanent death spiralFix 1: Re-read
hasAuthTokenafter the token refresh. Only overwrite cached token whengetIdToken()returns non-null.Fix 2: Only
_clearCachedAuth()on auth-specificFirebaseAuthExceptioncodes (user-not-found,user-disabled,user-token-expired). Transient errors leave the old token in place.Changed files
app/lib/backend/http/shared.darthasAuthTokenafter refresh; preserve existing token when refresh returns nullapp/lib/services/auth_service.dartapp/test/unit/token_refresh_loop_test.dartTest evidence
Unit tests (32/32 pass)
user-not-found,user-disabled,user-token-expired→ cleartoo-many-requests) → preserve existing tokenL2 live test — local backend + emulator app
Auth flow: sign in (native Google) → chat → sign out → sign in → chat ✅
[AgentChat] sending via /v2/messages→stream complete — 7 chunks totalsend_message Hello world→POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"Deployment steps
Risks
🤖 Generated with Claude Code