Skip to content

Investigate: Rework new_token behavior to allow multiple tabs sharing a single client_token #6252

@masenf

Description

@masenf

Background

When a websocket reconnects (e.g. after a brief network blip) before the server has noticed the disconnect (~120s timeout), the server still has the old sid saved in Redis. At that point, the server determines the requested client_token is already associated with an active connection and triggers the new_token flow — generating a fresh token for the reconnecting client. This causes all existing state to be lost and any held thread locks to be released, because the client is now operating under a different token.

This new_token behavior was originally introduced to handle the "Duplicate Tab" scenario: a user right-clicks a tab and selects "Duplicate", which forks SessionStorage from the original tab. The duplicated tab connects with the same client_token, and the server resolves the conflict by assigning a new token so the duplicate gets a fresh session.

Observed Problem

In practice, the new_token behavior is causing UX issues beyond the duplicate-tab case. When network instability causes a websocket to drop and reconnect quickly, the user's session is silently replaced — state disappears, thread locks are lost, and in-progress operations (like an agent loop) are disrupted. This was recently observed in production where an app appeared to "refresh" mid-operation and lost its thread lock.

Options Considered

Option 1: Ping the existing connection before issuing new_token

When a client_token appears to belong to an existing active connection, ping that connection to check if it's still alive before triggering new_token. If the existing connection doesn't respond, treat it as disconnected and allow the reconnecting client to reclaim the token.

Drawbacks: Introduces latency — how long do we wait for the existing connection to respond before considering it disconnected? Could slow down legitimate reconnections.

Option 2: Copy state to the new token on new_token

When new_token occurs, copy the existing state to the new token so the duplicated/reconnected tab picks up where the original was.

Drawbacks: Complexity around background tasks, saved token/sid references in other variables that could be out of sync with the real token. Doesn't solve the fundamental problem of split identity.

Option 3 (Preferred): Allow multiple tabs to share the same client_token

Remove the new_token behavior entirely and allow two or more tabs to point at the same underlying client_token. Each tab targets events at the same state, and deltas are broadcast to all connected tabs.

Why this is preferred:

  • Better aligns with expected access patterns after the event queue rewrite (moving event queue to the backend).
  • Each tab only needs to manage frontend-specific concerns (console logs, toasts, etc.) while the backend owns the event queue and state.
  • Reconnecting after a network blip "just works" — the client reclaims its token without disruption.
  • Duplicate tabs genuinely share state, which is arguably more intuitive than silently creating a fresh session.

Open questions for this approach:

  • How do chained events behave when multiple tabs are connected? Would a chained event get requeued from both tabs or just one? If just one, which one?
  • With the backend event queue, this becomes more viable since tabs only submit events and receive deltas — but we need to verify there are no frontend event loop assumptions that break.
  • How do frontend-only events (toasts, console logs, downloads, etc.) get routed — to all tabs or just the originating tab?
  • What is the migration/compatibility story for apps that may rely on the current duplicate-tab behavior?

Action Plan

  1. Audit current new_token code path: Map all the places where new_token is triggered, and the downstream effects (state reset, sid management, Redis cleanup).
  2. Prototype shared client_token: Allow multiple websocket connections to be associated with the same client_token. Route state deltas to all connections for that token.
  3. Resolve frontend event loop questions: Determine how chained events, frontend-only events, and the submission queue behave with multiple tabs connected to one token — particularly in the context of the backend event queue work.
  4. Handle tab-specific routing: Design a mechanism (e.g. per-connection tab_id) to route frontend-only events (toasts, redirects, downloads) to the correct tab.
  5. Test reconnection scenarios: Validate that network blip reconnections, deliberate duplicate tabs, and multi-device scenarios all behave correctly.
  6. Remove or gate new_token: Once shared tokens are working, remove the new_token code path or gate it behind a config flag for backward compatibility.

Related Context

  • Slack discussion: internal thread on #topic channel, 2026-03-26
  • Connected to the backend event queue rewrite effort

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions