Skip to content

Don't leak connections in load-balanced topology#3041

Open
jamis wants to merge 2 commits intomongodb:masterfrom
jamis:88924-connection-pool-exhaustion
Open

Don't leak connections in load-balanced topology#3041
jamis wants to merge 2 commits intomongodb:masterfrom
jamis:88924-connection-pool-exhaustion

Conversation

@jamis
Copy link
Copy Markdown
Contributor

@jamis jamis commented May 5, 2026

In load-balanced topology, the MongoDB Ruby driver permanently leaks connection pool slots when a pinned connection's socket dies before the cursor reaper can clean it up.

The specific code path:

  1. A find with cursor_id != 0 pins its underlying connection to the cursor
  2. When the cursor is GC'd, it queues a KillSpec (containing the pinned connection) for the CursorReaper
  3. The reaper tries to send killCursors over that pinned connection
  4. If the socket died in the meantime, op.execute_with_connection raises SocketError
  5. Because there's no ensure block, the connection.connection_pool.check_in(connection) call on the next line is skipped
  6. The connection stays in @checked_out_connections forever — that pool slot is gone

With max_pool_size: 3 and three pinned connections when the network fails, all three slots leak and every subsequent operation gets ConnectionCheckOutTimeout.

The fix: Wrap the execute_with_connection + check_in pair in ``begin/rescue/ensurein cursor_reaper.rb so check_in always runs. Also callconnection.unpin(:cursor)` (guarded by `respond_to?`) so the pool's pruning logic doesn't skip the dead connection.

Three companion changes are also implemented:

  1. change_stream.rb — same anti-pattern (manual checkout without ensure)
  2. cursor.rb — add Error::ConnectionPerished to the get_more rescue list so @get_more_network_error is set on perished connections
  3. A regression test for load-balanced pool recovery after socket death

Copilot AI review requested due to automatic review settings May 5, 2026 23:09
@jamis jamis requested a review from a team as a code owner May 5, 2026 23:09
@jamis jamis requested a review from comandeo-mongo May 5, 2026 23:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix connection-pool slot leaks in load-balanced mode when pinned cursor/change-stream connections die before cleanup runs. It updates the cursor reaper and related cursor/change-stream handling so dead pinned connections are returned or marked appropriately instead of exhausting the pool.

Changes:

  • Add error handling around pinned-connection cleanup in the cursor reaper and change stream initialization.
  • Treat ConnectionPerished like other getMore network failures so cursors skip killCursors on close.
  • Add regression/spec coverage for cursor reaping, change stream checkout cleanup, and ConnectionPerished behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
spec/mongo/cursor_spec.rb Adds unit coverage for ConnectionPerished during getMore.
spec/integration/cursor_reaping_spec.rb Adds load-balanced integration coverage for cursor reaper cleanup after socket death.
spec/integration/cursor_pinning_spec.rb Adds regression coverage for change stream checkout cleanup on initial-query failure.
lib/mongo/server/connection_pool.rb Adds a pool state helper for tests/debugging.
lib/mongo/cursor.rb Marks ConnectionPerished as a getMore network error.
lib/mongo/collection/view/change_stream.rb Checks out/checks in load-balanced change stream connections more defensively.
lib/mongo/cluster/reapers/cursor_reaper.rb Ensures pinned connections are checked in after killCursors attempts on LB cursors.
lib/mongo/cluster.rb Adds a helper to trigger the periodic executor from tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/mongo/cluster/reapers/cursor_reaper.rb Outdated
Also, #unpin will always be defined on connections that reach this point; the guard is unnecessary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants