Skip to content

Fix server goroutine leak: idle timeout on bridged streams#372

Merged
0pcom merged 1 commit intoskycoin:developfrom
0pcom:fix/server-session-goroutine-leak
Apr 7, 2026
Merged

Fix server goroutine leak: idle timeout on bridged streams#372
0pcom merged 1 commit intoskycoin:developfrom
0pcom:fix/server-session-goroutine-leak

Conversation

@0pcom
Copy link
Copy Markdown
Collaborator

@0pcom 0pcom commented Apr 7, 2026

Summary

  • Add 5-minute idle timeout to bridged streams in ServerSession.bridgeStream
  • Prevents goroutine accumulation from half-dead client connections

Root Cause

CopyReadWriteCloser blocks on io.Copy(Read) when one client disconnects without cleanly closing (network failure without TCP RST). The other side's read blocks forever, leaking the goroutine, yamux stream, and semaphore slot.

Observed in production: 55,697 goroutines stuck in ServerSession.forwardRequestreadObjectyamux.Stream.Read on a single dmsg-server.

Fix

idleTimeoutConn wrapper resets a deadline on each Read/Write. If no data flows for 5 minutes (both directions idle), the deadline fires, io.Copy returns, CopyReadWriteCloser closes both streams, and the goroutine exits.

Active streams are not affected — the timeout resets on every data transfer.

Test plan

  • Verify goroutine count stabilizes on production dmsg-server after deploy
  • Verify long-running DMSG streams (VPN, proxy) are not interrupted (5-min idle is generous)
  • CI tests pass

Bridged streams (bidirectional copy between two clients through the
server) blocked forever on io.Copy Read when one side disconnected
without cleanly closing the connection. This caused goroutines to
accumulate — observed as 55K+ stuck goroutines in production.

Added idleTimeoutConn wrapper that resets a per-operation deadline
on each Read/Write. If no data flows for 5 minutes, the deadline
fires, io.Copy returns an error, CopyReadWriteCloser closes both
streams, and the goroutine exits.

The timeout resets on each successful read/write, so active streams
are not affected. Only truly idle/dead streams are cleaned up.
@0pcom 0pcom merged commit d583078 into skycoin:develop Apr 7, 2026
3 checks passed
@0pcom 0pcom deleted the fix/server-session-goroutine-leak branch April 7, 2026 22:53
0pcom added a commit to 0pcom/dmsg that referenced this pull request Apr 8, 2026
forwardRequest opens a stream to the destination and reads the
response. If the destination accepts but never responds, readObject
blocks forever — observed as 2.4K+ stuck goroutines per server,
growing rapidly after restart.

The idle timeout fix (PR skycoin#372) only covered the bridge phase
(CopyReadWriteCloser). This adds HandshakeTimeout to the
forwardRequest handshake read, matching the client-side behavior.
0pcom added a commit that referenced this pull request Apr 8, 2026
…374)

forwardRequest opens a stream to the destination and reads the
response. If the destination accepts but never responds, readObject
blocks forever — observed as 2.4K+ stuck goroutines per server,
growing rapidly after restart.

The idle timeout fix (PR #372) only covered the bridge phase
(CopyReadWriteCloser). This adds HandshakeTimeout to the
forwardRequest handshake read, matching the client-side behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant