-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Investigate: Rework new_token behavior to allow multiple tabs sharing a single client_token #6252
Description
Background
When a websocket reconnects (e.g. after a brief network blip) before the server has noticed the disconnect (~120s timeout), the server still has the old sid saved in Redis. At that point, the server determines the requested client_token is already associated with an active connection and triggers the new_token flow — generating a fresh token for the reconnecting client. This causes all existing state to be lost and any held thread locks to be released, because the client is now operating under a different token.
This new_token behavior was originally introduced to handle the "Duplicate Tab" scenario: a user right-clicks a tab and selects "Duplicate", which forks SessionStorage from the original tab. The duplicated tab connects with the same client_token, and the server resolves the conflict by assigning a new token so the duplicate gets a fresh session.
Observed Problem
In practice, the new_token behavior is causing UX issues beyond the duplicate-tab case. When network instability causes a websocket to drop and reconnect quickly, the user's session is silently replaced — state disappears, thread locks are lost, and in-progress operations (like an agent loop) are disrupted. This was recently observed in production where an app appeared to "refresh" mid-operation and lost its thread lock.
Options Considered
Option 1: Ping the existing connection before issuing new_token
When a client_token appears to belong to an existing active connection, ping that connection to check if it's still alive before triggering new_token. If the existing connection doesn't respond, treat it as disconnected and allow the reconnecting client to reclaim the token.
Drawbacks: Introduces latency — how long do we wait for the existing connection to respond before considering it disconnected? Could slow down legitimate reconnections.
Option 2: Copy state to the new token on new_token
When new_token occurs, copy the existing state to the new token so the duplicated/reconnected tab picks up where the original was.
Drawbacks: Complexity around background tasks, saved token/sid references in other variables that could be out of sync with the real token. Doesn't solve the fundamental problem of split identity.
Option 3 (Preferred): Allow multiple tabs to share the same client_token
Remove the new_token behavior entirely and allow two or more tabs to point at the same underlying client_token. Each tab targets events at the same state, and deltas are broadcast to all connected tabs.
Why this is preferred:
- Better aligns with expected access patterns after the event queue rewrite (moving event queue to the backend).
- Each tab only needs to manage frontend-specific concerns (console logs, toasts, etc.) while the backend owns the event queue and state.
- Reconnecting after a network blip "just works" — the client reclaims its token without disruption.
- Duplicate tabs genuinely share state, which is arguably more intuitive than silently creating a fresh session.
Open questions for this approach:
- How do chained events behave when multiple tabs are connected? Would a chained event get requeued from both tabs or just one? If just one, which one?
- With the backend event queue, this becomes more viable since tabs only submit events and receive deltas — but we need to verify there are no frontend event loop assumptions that break.
- How do frontend-only events (toasts, console logs, downloads, etc.) get routed — to all tabs or just the originating tab?
- What is the migration/compatibility story for apps that may rely on the current duplicate-tab behavior?
Action Plan
- Audit current
new_tokencode path: Map all the places wherenew_tokenis triggered, and the downstream effects (state reset, sid management, Redis cleanup). - Prototype shared
client_token: Allow multiple websocket connections to be associated with the sameclient_token. Route state deltas to all connections for that token. - Resolve frontend event loop questions: Determine how chained events, frontend-only events, and the submission queue behave with multiple tabs connected to one token — particularly in the context of the backend event queue work.
- Handle tab-specific routing: Design a mechanism (e.g. per-connection
tab_id) to route frontend-only events (toasts, redirects, downloads) to the correct tab. - Test reconnection scenarios: Validate that network blip reconnections, deliberate duplicate tabs, and multi-device scenarios all behave correctly.
- Remove or gate
new_token: Once shared tokens are working, remove thenew_tokencode path or gate it behind a config flag for backward compatibility.
Related Context
- Slack discussion: internal thread on
#topicchannel, 2026-03-26 - Connected to the backend event queue rewrite effort