Skip to content

[grid] Close pre-handshake race in WebSocket proxy#17435

Open
shs96c wants to merge 3 commits into
SeleniumHQ:trunkfrom
shs96c:grid-websocket-prehandshake-buffer
Open

[grid] Close pre-handshake race in WebSocket proxy#17435
shs96c wants to merge 3 commits into
SeleniumHQ:trunkfrom
shs96c:grid-websocket-prehandshake-buffer

Conversation

@shs96c
Copy link
Copy Markdown
Member

@shs96c shs96c commented May 11, 2026

Summary

Frames received from the upstream WebSocket before the client-side handshake completes used to travel through MessageOutboundConverter via a fallback Consumer<Message>. After onUpgradeComplete rewires the pipeline (removing the Message-layer handlers), a frame in flight on the upstream listener thread could land in a pipeline that no longer had its converter and silently drop.

Move the client Channel reference and a pre-handshake buffer into DirectForwardingListener itself:

  • Pre-handshake onText / onBinary calls queue a WebSocketFrame onto a per-listener deque under a lock.
  • FrameProxyConsumer.onUpgradeComplete invokes listener.onUpgrade(channel), which drains the buffer in arrival order and then publishes the channel as volatile.
  • Post-handshake calls do a lock-free volatile read of the channel and write directly — the fast path is unchanged.
  • onClose / onError pre-handshake release buffered frames and latch a closed flag so any in-flight call doesn't leak ref-counted frames.

A focused EmbeddedChannel-based test pins the buffer-then-drain order; the existing medium ProxyWebsocketTest continues to cover the end-to-end Router→Node path.

Test plan

  • `bazel test --config=rbe //java/test/org/openqa/selenium/grid/router:small-tests`
  • `bazel test --config=rbe //java/test/org/openqa/selenium/grid/router:ProxyWebsocketTest`
  • `bazel test --config=rbe //java/test/org/openqa/selenium/grid/router:TunnelWebsocketTest`

🤖 Generated with Claude Code

Frames received from the upstream WebSocket before the client-side
handshake completes used to traverse MessageOutboundConverter via the
fallback consumer. After onUpgradeComplete strips the converter, an
in-flight frame on the upstream listener thread could land in a
pipeline that no longer had its Message-layer handlers and silently
drop. Move the channel reference and a pre-handshake buffer into the
listener: pre-handshake frames are queued, the buffer drains under a
lock when the channel is handed over, and post-handshake calls take
the lock-free fast path.
@selenium-ci selenium-ci added B-grid Everything grid and server related C-java Java Bindings B-build Includes scripting, bazel and CI integrations labels May 11, 2026
@qodo-code-review
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Buffer pre-handshake WebSocket frames to prevent race condition

🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Buffer pre-handshake WebSocket frames in DirectForwardingListener to prevent silent drops
• Drain buffered frames in order when client channel becomes available post-handshake
• Replace fallback Message consumer with lock-protected frame queue for deterministic ordering
• Add unit test validating pre-handshake frame buffering and post-handshake fast path
Diagram
flowchart LR
  A["Pre-handshake frames arrive"] -->|enqueue| B["DirectForwardingListener buffer"]
  C["onUpgradeComplete fires"] -->|drain in order| B
  B -->|write to channel| D["Client receives frames"]
  E["Post-handshake frames"] -->|volatile read| F["Fast path direct write"]
  F --> D
Loading

Grey Divider

File Changes

1. java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java 🐞 Bug fix +107/-53

Refactor WebSocket frame handling with pre-handshake buffering

• Moved channel reference and pre-handshake buffer from AtomicReference into
 DirectForwardingListener
• Replaced fallback Message consumer with lock-protected ArrayDeque for frame buffering
• Implemented onUpgrade method to drain buffered frames before publishing volatile channel reference
• Added frame release logic in onClose and onError to prevent ref-counted frame leaks
• Removed TextMessage and BinaryMessage fallback handling in favor of direct WebSocketFrame writes

java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java


2. java/test/org/openqa/selenium/grid/router/DirectForwardingListenerTest.java 🧪 Tests +106/-0

Add unit tests for pre-handshake frame buffering

• New unit test class with EmbeddedChannel-based tests for DirectForwardingListener
• Test validates frames received before upgrade are drained in arrival order
• Test confirms post-upgrade frames bypass buffer and go directly to channel
• Includes NoopHttpClient stub to avoid network dependencies in unit tests

java/test/org/openqa/selenium/grid/router/DirectForwardingListenerTest.java


3. java/src/org/openqa/selenium/grid/router/BUILD.bazel Dependencies +1/-0

Add netty-buffer dependency

• Added netty-buffer artifact dependency to support Unpooled frame creation

java/src/org/openqa/selenium/grid/router/BUILD.bazel


View more (1)
4. java/test/org/openqa/selenium/grid/router/BUILD.bazel ⚙️ Configuration changes +20/-1

Add small-tests suite for DirectForwardingListenerTest

• Created new SMALL_TESTS list containing DirectForwardingListenerTest
• Added small-tests java_test_suite target for focused unit tests
• Updated medium-tests exclusion to skip SMALL_TESTS

java/test/org/openqa/selenium/grid/router/BUILD.bazel


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented May 11, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0)

Grey Divider


Action required

1. Pre-upgrade close not forwarded ✓ Resolved 🐞 Bug ☼ Reliability
Description
If upstream onClose/onError happens before the client-side upgrade completes, the listener sets a
terminal "closed" state without retaining close details, and onUpgrade later publishes the channel
without closing it—so the client WebSocket can remain open even though upstream is already gone.
This can lead to downstream clients hanging until external timeouts/keepalives rather than receiving
a close handshake promptly.
Code

java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[R287-306]

    @Override
    public void onClose(int code, String reason) {
      upstreamClosing.set(true);
-      // After onUpgradeComplete the pipeline no longer contains MessageOutboundConverter, so
-      // writing a CloseMessage object via fallbackDownstream would fail to encode. Write the
-      // Netty frame directly once the client channel reference is available.
-      Channel ch = clientChannelRef.get();
+      Channel ch = clientChannel;
+      if (ch == null) {
+        synchronized (lock) {
+          ch = clientChannel;
+          if (ch == null) {
+            // Pre-handshake close: there is no live channel to write a close frame to,
+            // so just release the buffer and stop accepting more frames.
+            discardPendingLocked();
+            closed = true;
+          }
+        }
+      }
      if (ch != null && ch.isActive()) {
        ch.writeAndFlush(new CloseWebSocketFrame(code, reason));
-      } else {
-        fallbackDownstream.accept(new CloseMessage(code, reason));
-      }
-      try {
-        client.close();
-      } catch (Exception e) {
-        LOG.log(Level.FINE, "Failed to close client on upstream WebSocket close", e);
      }
+      closeClient();
    }
Evidence
The upstream socket is opened before the downstream WebSocket handshake completes, so upstream can
close pre-upgrade. In that pre-upgrade case, DirectForwardingListener discards any pending frames
and latches closed=true, but does not buffer a CloseWebSocketFrame nor close any downstream channel
(it doesn’t have one yet). When the downstream handshake later succeeds, onUpgrade drains pending
and publishes clientChannel but does not check the latched closed flag, so there is no subsequent
callback to trigger sending a close frame (the upstream onClose already happened).

java/src/org/openqa/selenium/netty/server/WebSocketUpgradeHandler.java[148-205]
java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[241-249]
java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[287-306]
java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[308-329]
java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[338-344]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
When the upstream WebSocket closes/errors before the downstream WebSocket upgrade completes, `DirectForwardingListener` latches `closed=true` but drops the close details and `onUpgrade(Channel)` does not react to the terminal state. If the downstream upgrade completes later, the client channel may be published and left open without a close handshake.

### Issue Context
- Upstream connection is initiated before the Netty handshake completes, so pre-upgrade upstream terminal events are possible.
- `closeClient()` only closes the upstream `HttpClient`, not the downstream Netty channel.

### Fix approach
- Record a terminal state for pre-upgrade close/error (e.g., store `closeCode`/`closeReason`, or store a `CloseWebSocketFrame` to be written post-upgrade).
- In `onUpgrade(Channel ch)`, if a terminal state was observed (`closed==true`), immediately write a close frame (if applicable) and close the channel (or close without sending if protocol requires), instead of publishing the channel for normal forwarding.
- Add a unit test that calls `listener.onClose(...)` (or `onError(...)`) before `listener.onUpgrade(channel)` and asserts the channel is closed and/or a `CloseWebSocketFrame` is emitted.

### Fix Focus Areas
- java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[241-306]
- java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[308-345]
- java/test/org/openqa/selenium/grid/router/DirectForwardingListenerTest.java[1-106]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Pending frames leak onClose ✓ Resolved 📘 Rule violation ☼ Reliability
Description
On a pre-handshake upstream onClose, the listener records the close and closes the HttpClient
but intentionally keeps buffered WebSocketFrames for a future onUpgrade drain. If the
client-side upgrade never completes (e.g., client disconnects mid-handshake), those buffered
ref-counted frames are never released, risking a Netty buffer leak.
Code

java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[R340-345]

+            // Pre-handshake close: record the close so onUpgrade can surface it to the client.
+            // Pending frames are kept so the client receives any data the upstream queued before
+            // closing.
+            closed = true;
+            closeCode = code;
+            closeReason = reason;
Evidence
The onClose pre-handshake path explicitly keeps pending frames for a later onUpgrade drain, but
also closes the client immediately; if onUpgrade never occurs, no code path releases those
buffered frames. Other terminal paths demonstrate these frames require explicit release
(frame.release()), indicating a potential leak when onUpgrade is not reached.

java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[340-345]
java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[301-313]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`DirectForwardingListener` keeps pre-handshake buffered `WebSocketFrame`s on upstream `onClose` so they can be drained during `onUpgrade`. If the client-side upgrade never happens, those buffered (ref-counted) frames never get released, creating a potential memory/leak-detector issue.

## Issue Context
The code already treats buffered frames as ref-counted (it calls `frame.release()` on other terminal paths like overflow), but the pre-handshake `onClose` path retains the buffer without any guaranteed later release.

## Fix Focus Areas
- java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[333-353]
- java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[295-313]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Unbounded frame buffer ✓ Resolved 🐞 Bug ☼ Reliability
Description
DirectForwardingListener buffers pre-upgrade frames in an unbounded deque with no cap, so if the
client upgrade is delayed/stalled while upstream is producing frames, memory can grow without bound
(including retaining ref-counted frame buffers). This risks router memory pressure or OOM under
pathological conditions.
Code

java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[R220-226]

+    private final Object lock = new Object();
+    private final Deque<WebSocketFrame> pending = new ArrayDeque<>();
+    // Volatile so the post-handover fast path needs no synchronization.
+    private volatile Channel clientChannel;
+    // Guarded by lock; once true, further frames received while clientChannel is still null
+    // are released rather than enqueued (we know the upstream has already gone away).
+    private boolean closed;
Evidence
The listener maintains a per-connection ArrayDeque<WebSocketFrame> and enqueues frames whenever
clientChannel is null and closed is false; there is no size/bytes limit. Frames are only removed
by draining on upgrade or discarding on close/error, so a delayed upgrade path can accumulate frames
indefinitely.

java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[220-226]
java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[271-285]
java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[331-336]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`DirectForwardingListener` uses an unbounded `Deque<WebSocketFrame>` to buffer frames arriving before `onUpgrade(Channel)` is called. If the downstream upgrade stalls while upstream continues sending, the deque can grow without limit, retaining ref-counted frames and risking memory exhaustion.

### Issue Context
- Buffering is correct for ordering, but needs a safety valve.

### Fix approach
- Add a hard limit (by frame count and/or total bytes) for `pending`.
- When the limit is exceeded, choose a deterministic policy: drop newest/oldest with `release()`, and/or close the downstream channel once available; also log at WARNING with session context.
- Consider a time-based cutoff (e.g., if upgrade hasn’t completed within N seconds, discard pending and mark closed).

### Fix Focus Areas
- java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[220-285]
- java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[331-336]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Previous review results

Review updated until commit 0cd57dd

Results up to commit 6616a7e


🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)


Action required
1. Pre-upgrade close not forwarded ✓ Resolved 🐞 Bug ☼ Reliability
Description
If upstream onClose/onError happens before the client-side upgrade completes, the listener sets a
terminal "closed" state without retaining close details, and onUpgrade later publishes the channel
without closing it—so the client WebSocket can remain open even though upstream is already gone.
This can lead to downstream clients hanging until external timeouts/keepalives rather than receiving
a close handshake promptly.
Code

java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[R287-306]

    @Override
    public void onClose(int code, String reason) {
      upstreamClosing.set(true);
-      // After onUpgradeComplete the pipeline no longer contains MessageOutboundConverter, so
-      // writing a CloseMessage object via fallbackDownstream would fail to encode. Write the
-      // Netty frame directly once the client channel reference is available.
-      Channel ch = clientChannelRef.get();
+      Channel ch = clientChannel;
+      if (ch == null) {
+        synchronized (lock) {
+          ch = clientChannel;
+          if (ch == null) {
+            // Pre-handshake close: there is no live channel to write a close frame to,
+            // so just release the buffer and stop accepting more frames.
+            discardPendingLocked();
+            closed = true;
+          }
+        }
+      }
      if (ch != null && ch.isActive()) {
        ch.writeAndFlush(new CloseWebSocketFrame(code, reason));
-      } else {
-        fallbackDownstream.accept(new CloseMessage(code, reason));
-      }
-      try {
-        client.close();
-      } catch (Exception e) {
-        LOG.log(Level.FINE, "Failed to close client on upstream WebSocket close", e);
      }
+      closeClient();
    }
Evidence
The upstream socket is opened before the downstream WebSocket handshake completes, so upstream can
close pre-upgrade. In that pre-upgrade case, DirectForwardingListener discards any pending frames
and latches closed=true, but does not buffer a CloseWebSocketFrame nor close any downstream channel
(it doesn’t have one yet). When the downstream handshake later succeeds, onUpgrade drains pending
and publishes clientChannel but does not check the latched closed flag, so there is no subsequent
callback to trigger sending a close frame (the upstream onClose already happened).

java/src/org/openqa/selenium/netty/server/WebSocketUpgradeHandler.java[148-205]
java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[241-249]
java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[287-306]
java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[308-329]
java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[338-344]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
When the upstream WebSocket closes/errors before the downstream WebSocket upgrade completes, `DirectForwardingListener` latches `closed=true` but drops the close details and `onUpgrade(Channel)` does not react to the terminal state. If the downstream upgrade completes later, the client channel may be published and left open without a close handshake.

### Issue Context
- Upstream connection is initiated before the Netty handshake completes, so pre-upgrade upstream terminal events are possible.
- `closeClient()` only closes the upstream `HttpClient`, not the downstream Netty channel.

### Fix approach
- Record a terminal state for pre-upgrade close/error (e.g., store `closeCode`/`closeReason`, or store a `CloseWebSocketFrame` to be written post-upgrade).
- In `onUpgrade(Channel ch)`, if a terminal state was observed (`closed==true`), immediately write a close frame (if applicable) and close the channel (or close without sending if protocol requires), instead of publishing the channel for normal forwarding.
- Add a unit test that calls `listener.onClose(...)` (or `onError(...)`) before `listener.onUpgrade(channel)` and asserts the channel is closed and/or a `CloseWebSocketFrame` is emitted.

### Fix Focus Areas
- java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[241-306]
- java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[308-345]
- java/test/org/openqa/selenium/grid/router/DirectForwardingListenerTest.java[1-106]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended
2. Unbounded frame buffer ✓ Resolved 🐞 Bug ☼ Reliability
Description
DirectForwardingListener buffers pre-upgrade frames in an unbounded deque with no cap, so if the
client upgrade is delayed/stalled while upstream is producing frames, memory can grow without bound
(including retaining ref-counted frame buffers). This risks router memory pressure or OOM under
pathological conditions.
Code

java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[R220-226]

+    private final Object lock = new Object();
+    private final Deque<WebSocketFrame> pending = new ArrayDeque<>();
+    // Volatile so the post-handover fast path needs no synchronization.
+    private volatile Channel clientChannel;
+    // Guarded by lock; once true, further frames received while clientChannel is still null
+    // are released rather than enqueued (we know the upstream has already gone away).
+    private boolean closed;
Evidence
The listener maintains a per-connection ArrayDeque<WebSocketFrame> and enqueues frames whenever
clientChannel is null and closed is false; there is no size/bytes limit. Frames are only removed
by draining on upgrade or discarding on close/error, so a delayed upgrade path can accumulate frames
indefinitely.

java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[220-226]
java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[271-285]
java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[331-336]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`DirectForwardingListener` uses an unbounded `Deque<WebSocketFrame>` to buffer frames arriving before `onUpgrade(Channel)` is called. If the downstream upgrade stalls while upstream continues sending, the deque can grow without limit, retaining ref-counted frames and risking memory exhaustion.

### Issue Context
- Buffering is correct for ordering, but needs a safety valve.

### Fix approach
- Add a hard limit (by frame count and/or total bytes) for `pending`.
- When the limit is exceeded, choose a deterministic policy: drop newest/oldest with `release()`, and/or close the downstream channel once available; also log at WARNING with session context.
- Consider a time-based cutoff (e.g., if upgrade hasn’t completed within N seconds, discard pending and mark closed).

### Fix Focus Areas
- java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[220-285]
- java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[331-336]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Results up to commit bf52afc


🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)


Remediation recommended
1. Pending frames leak onClose ✓ Resolved 📘 Rule violation ☼ Reliability
Description
On a pre-handshake upstream onClose, the listener records the close and closes the HttpClient
but intentionally keeps buffered WebSocketFrames for a future onUpgrade drain. If the
client-side upgrade never completes (e.g., client disconnects mid-handshake), those buffered
ref-counted frames are never released, risking a Netty buffer leak.
Code

java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[R340-345]

+            // Pre-handshake close: record the close so onUpgrade can surface it to the client.
+            // Pending frames are kept so the client receives any data the upstream queued before
+            // closing.
+            closed = true;
+            closeCode = code;
+            closeReason = reason;
Evidence
The onClose pre-handshake path explicitly keeps pending frames for a later onUpgrade drain, but
also closes the client immediately; if onUpgrade never occurs, no code path releases those
buffered frames. Other terminal paths demonstrate these frames require explicit release
(frame.release()), indicating a potential leak when onUpgrade is not reached.

java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[340-345]
java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[301-313]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`DirectForwardingListener` keeps pre-handshake buffered `WebSocketFrame`s on upstream `onClose` so they can be drained during `onUpgrade`. If the client-side upgrade never happens, those buffered (ref-counted) frames never get released, creating a potential memory/leak-detector issue.

## Issue Context
The code already treats buffered frames as ref-counted (it calls `frame.release()` on other terminal paths like overflow), but the pre-handshake `onClose` path retains the buffer without any guaranteed later release.

## Fix Focus Areas
- java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[333-353]
- java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[295-313]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Qodo Logo

Address two issues raised on SeleniumHQ#17435:

- Pre-handshake onClose/onError used to discard the close details and
  publish the channel for normal forwarding when the handshake later
  landed, leaving the client open indefinitely. Record the code/reason
  and, on onUpgrade, drain any frames the upstream had queued and then
  write the close frame followed by closing the channel.
- The pending deque had no cap, so a stalled handshake while the
  upstream keeps producing could grow it without bound. Cap at 128
  frames; on overflow drop the buffer, latch a 1009 terminal state,
  and close the upstream so the channel sees a clean close once the
  upgrade fires.
@shs96c
Copy link
Copy Markdown
Member Author

shs96c commented May 11, 2026

Pushed bf52afc addressing both qodo-bot findings:

  1. Pre-upgrade close not forwardedDirectForwardingListener now records the close code + reason when onClose/onError fires pre-handshake. On onUpgrade it drains any frames the upstream had queued, then writes a CloseWebSocketFrame with the recorded details and closes the channel, instead of publishing it for normal forwarding. New test preHandshakeCloseIsSurfacedAndClosesChannelOnUpgrade pins the behaviour.

  2. Unbounded frame buffer — capped pending at 128 frames. On overflow the buffer is released, the listener latches a 1009 ("buffer overflow") terminal state, and the upstream client is closed; the eventual onUpgrade sees the terminal state and surfaces a clean 1009 close to the client. New test overflowOfPreHandshakeBufferArmsCloseAndDrains.

The Ruby Windows test failure is unrelated — gpg: error retrieving '<key>' via WKD: Try again later during MSYS2 toolchain setup. No Java tests ran on that job before it failed.

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented May 11, 2026

Persistent review updated to latest commit bf52afc

If pre-handshake onClose kept the buffered ref-counted WebSocketFrames
"for a future onUpgrade drain" and the client-side handshake never
landed (client disconnected mid-handshake, or the upgrade itself
failed and triggered the upstream close), those frames were never
released — a Netty buffer leak.

Make onClose symmetric with onError: drop the buffer on pre-handshake
close, while still recording the code/reason so onUpgrade can surface
a clean close to the client if the handshake does fire later. The
trade-off is small: a client that just received the 101 response
cannot usefully consume frames if we are about to close them
immediately afterward.
@shs96c
Copy link
Copy Markdown
Member Author

shs96c commented May 11, 2026

Good catch — pushed 0cd57dd addressing it.

onClose pre-handshake used to keep the buffered ref-counted frames "for a future onUpgrade drain". If the client-side handshake never lands (client disconnected mid-handshake, or the upgrade itself failed and triggered the upstream close), those frames are never released.

Made onClose symmetric with onError — drop the buffer immediately. The code/reason are still recorded so a late onUpgrade can write a proper close to the client, just without the trailing data. The trade-off is small: a client that has just received the 101 cannot usefully consume frames if we are about to close them immediately afterward.

Existing test updated to reflect the new behaviour (only the close frame on the wire after pre-handshake close + upgrade).

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented May 11, 2026

Persistent review updated to latest commit 0cd57dd

@asolntsev asolntsev added this to the 4.44.0 milestone May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B-build Includes scripting, bazel and CI integrations B-grid Everything grid and server related C-java Java Bindings

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants