fix(server): sync disconnect callbacks #176
Open
+177
−9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a bug where the TCP transport removed clients without notifying the higher-level server module, causing
claudecode.server.init'sstate.clientsmirror to accumulate stale entries over time.Key changes:
tcp._disconnect_clientsoon_disconnectfires for EOF, read errors, protocol errors, WebSocket CLOSE frames, and keepalive timeouts.read_startcallbacks.Fixes #170.
Testing
nix develop -c make checknix develop -c make test📋 Implementation Plan
Plan: Investigate + fix Issue #170 (client tables out of sync)
Context / Why
GitHub Issue #170 reports that the WebSocket server keeps two Lua tables of connected clients and they drift out of sync, leaving “phantom” clients in
server/init.luaand causing unbounded growth over time (memory leak / misleading debug state).This should be treated as a bug: the two tables are intended to represent the same set of live connections, but the TCP layer doesn’t reliably notify the higher-level server module on abrupt disconnect paths (EOF/errors/timeouts).
Evidence (what I checked)
lua/claudecode/server/init.luaM.state.clients.on_connect:M.state.clients[client.id] = client(line ~56)on_disconnect:M.state.clients[client.id] = nil(line ~74)lua/claudecode/server/tcp.luaserver.clientsmap.client_tcp:read_start(...), EOF (not data) and read errors call onlyM._remove_client(server, client)and do not callserver.on_disconnect(...)(lines ~125–136).server.on_disconnect(...)and then_remove_client(...)(lines ~139–147)._remove_client(...)withouton_disconnect(lines ~283–297).Why this causes “phantom” clients
server/init.luais only updated throughon_connect/on_disconnect. When the CLI is killed (Ctrl-C), the server usually sees a TCP EOF, not a WebSocket CLOSE frame; the TCP module removes the client fromserver.clientsbut never fireson_disconnect, soM.state.clientsretains a stale entry forever.Plan (fix)
1) Make disconnect notification consistent (core fix)
Goal: whenever a client is removed from
server.clients,server.on_disconnect(client, code, reason)should be invoked exactly once.lua/claudecode/server/tcp.lua, introduce a single helper responsible for cleanup + notification, e.g.:M._disconnect_client(server, client, code, reason)M._remove_client(server, client, code, reason)to optionally notify.server.clients[client.id](only notify/remove if still present). This prevents double-calling if a CLOSE frame is processed and later a TCP EOF arrives.read_starterror branchon_errorbranch) since it currently removes without disconnect notificationstart_ping_timer.Close code/reason conventions (keep simple):
1006with a short reason string (e.g.,"EOF","Client read error: ...","Connection timeout").codeandreason.Defensive checks (per repo preference):
assert(type(server) == "table"),assert(type(server.on_disconnect) == "function"),assert(type(client) == "table" and type(client.id) == "string")in the helper.2) Add regression tests (so it can’t come back)
Because the unit test suite runs with a mocked
vim.loop, we need to simulate the EOF/error callback.tests/mocks/vim.luaTCP stub soread_start(cb)storescbon the handle (e.g.,self._read_cb = cb).tests/unit/tcp_server_spec.lua) that:require("claudecode.server.tcp").create_server(...)with spy callbacks.tcp._handle_new_connection(server).on_connect.client.tcp_handle._read_cb(nil, nil).on_disconnectwas called once andserver.clients[client.id]is nil.Optional (nice-to-have) extra tests:
read_start("some error", nil)to ensure on_disconnect fires on read errors.claudecode.server.client.process_datato force the tcp.lua “protocol error” callback and ensure disconnect notification occurs.3) Clarify design intent (small docs/comments)
The existence of two tables isn’t inherently wrong:
tcp.luaneeds an internal map for transport operations, whileserver/init.luakeeps “server module state”. The bug is that they weren’t kept in sync.tcp.lua: server.clientsis the canonical transport client registry.server/init.lua: M.state.clientsis a mirrored view updated via connect/disconnect callbacks (kept for API/debugging/back-compat).4) Verify
make testandmake check.Generated with
mux• Model:openai:gpt-5.2• Thinking:xhigh