fix(socket): recover from stale disconnected socket to prevent permanent "Connecting..."#2487
Conversation
…ure fresh connections
…kets to ensure fresh connections
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Comment |
M3gA-Mind
left a comment
There was a problem hiding this comment.
Walkthrough
This PR fixes a real, reproducible bug: calling connect(sameToken) after a socket disconnects was silently no-oping because this.socket was still non-null (stale/disconnected), causing the async stale-invocation guard || this.socket to bail early and leave the UI permanently stuck at "Connecting...". The root cause analysis is accurate, the fix is in exactly the right place, and the regression test correctly demonstrates the broken behavior before the fix.
The core logic change is correct. There's one issue worth addressing before merge.
Change summary
| File | Type | Notes |
|---|---|---|
app/src/services/socketService.ts |
Bug fix | Adds else branch in connectAsync to clear stale disconnected socket before async path continues |
app/src/services/__tests__/socketService.test.ts |
Test | Regression test verifying io() called twice on same-token reconnect after disconnect |
Findings
One issue worth addressing before this lands.
| // Drop it so this connect attempt can create a fresh socket; | ||
| // otherwise the async stale-invocation guard below (`|| this.socket`) | ||
| // returns early and leaves connectivity stuck at "connecting". | ||
| this.socket = null; |
There was a problem hiding this comment.
[major] The stale socket is abandoned via null-assignment rather than shut down cleanly. Two problems:
-
Stale reconnect timers: socket.io-client has internal reconnect logic (
reconnectionAttempts: 5is configured). Settingthis.socket = nulldrops your reference but doesn't cancel socket.io's internal reconnect timers. If a queued attempt fires after the new socket is created, the abandoned socket'sconnecthandler (still wired up fromconnectAsync) will dispatchsetStatusForUser({ status: 'connected' })andsetSocketIdForUser({ socketId: oldId })into Redux — overwriting state that belongs to the new socket. -
Orphaned
listenerMapentries:listenerMapis not cleared here. AnysocketService.on(event, cb)listeners registered during the previous connection are still in the map but were never attached to the new socket. Redux-connected components that registered handlers before the disconnect will silently stop receiving events. Meanwhileoff(event, cb)will look up those entries, try to remove them from the new socket (where they don't exist), and remove them fromlistenerMap— so the caller gets no error but the handler is gone with no way to recover.
Suggested fix:
} else {
// Stale disconnected socket, same token. Shut it down before clearing
// to cancel any queued socket.io reconnect timers and avoid stale Redux
// dispatches from the abandoned socket's event handlers.
this.socket.disconnect();
this.socket = null;
this.mcpTransport = null;
this.listenerMap.clear();
this.pendingListeners = [];
}This matches what disconnect() does for the token-changed path and puts listenerMap in a consistent empty state so callers re-register cleanly after reconnection.
| // Same token, previous socket is disconnected=true in our mock. | ||
| // We should still create a fresh socket instead of returning early. | ||
| socketService.connect('mock-jwt-stale-socket'); | ||
| await pollUntil(() => expect(ioMock).toHaveBeenCalledTimes(2)); |
There was a problem hiding this comment.
[minor] The test validates that io() is called twice (new socket created) which is the core regression being guarded. A small addition would increase confidence that the reconnect path actually executes past the stale-socket guard rather than taking some other code path:
// After the second connect(), verify the reconnect path dispatched 'connecting'
await pollUntil(() =>
expect(storeMock.dispatch).toHaveBeenCalledWith(
expect.objectContaining({ payload: expect.objectContaining({ status: 'connecting' }) })
)
);Not a blocker, but would catch a regression where io() gets called for an unrelated reason.
Summary
Problem
Solution
Submission Checklist
.github/workflows/coverage.yml. Run pnpm test:coverage and pnpm test:rust locally; PRs below 80% on changed lines will not merge.docs/TEST-COVERAGE-MATRIX.mdreflect this change (or N/A: behaviour-only change)docs/RELEASE-MANUAL-SMOKE.md)Impact