Skip to content

Fix connection lifecycle: disconnect propagation, connect dedup, client cleanup#74

Merged
nathanfallet merged 1 commit into
mainfrom
fix/connection-lifecycle
May 23, 2026
Merged

Fix connection lifecycle: disconnect propagation, connect dedup, client cleanup#74
nathanfallet merged 1 commit into
mainfrom
fix/connection-lifecycle

Conversation

@nathanfallet
Copy link
Copy Markdown
Member

Summary

Three connection-ownership issues, sharing the lifecycle root cause. Builds on the WebSocketTransport seam + pendingRequests registry from #73.

ISSUE-3 — disconnect/close hung in-flight commands

When the WebSocket dropped or close() was called, the receive loop just ended; any callCommand parked on its response deferred waited forever (the loop only printStackTrace'd). Now failPendingRequests(...) drains the registry and completes every waiter exceptionally with a new ConnectionClosedException:

  • on normal completion of incoming() (socket closed),
  • on error in the receive loop (now logger.error, not printStackTrace),
  • and in close().

Callers observe a failure instead of hanging.

ISSUE-4 — unsynchronized connect() opened duplicate sessions

Concurrent first commands could each pass the isActive check and open a second session, leaking the extra socket + listener. connect() is now guarded by a double-checked connectMutex; the prepareExpert/prepareHeadless block by a separate prepareMutex (distinct mutex avoids re-entrancy — prepare* issue ONE_SHOT commands that skip that block).

ISSUE-9 — HttpClients never closed

KtorWebSocketTransport.close() now closes its engine-backed HttpClient; DefaultBrowser.stop() closes the HTTP-API client. createBrowser/stop cycles no longer leak threads/connection pools.

Verification (red → green)

New ConnectionLifecycleTest (fake transport, UnconfinedTestDispatcher) — all three fail against the unfixed code and pass after:

  • callCommand fails with ConnectionClosedException on disconnect (RED: hang → timeout).
  • callCommand fails with ConnectionClosedException on close (RED: hang → timeout).
  • two concurrent first commands open the transport exactly once (RED: count == 2).

ISSUE-9 is resource cleanup — not deterministically unit-testable without injecting clients, so verified by inspection plus the real-browser createBrowser/stop suite.

Full :core:jvmTest (real Chrome) + :opentelemetry:jvmTest pass; macOS native tests pass; Linux/MinGW compile.

Test plan

  • ./gradlew :core:jvmTest --tests "*.ConnectionLifecycleTest" — red on unfixed code, green after
  • ./gradlew :core:jvmTest (real Chrome) — all pass
  • ./gradlew :opentelemetry:jvmTest — all pass
  • ./gradlew :core:macosArm64Test + Linux/MinGW compile — pass

…pe connect, close clients

Three connection-ownership issues:

- Disconnect/close hung in-flight commands (ISSUE-3). When the WebSocket dropped
  or close() was called, the receive loop just ended; any callCommand parked on
  its response deferred waited forever. The loop now fails all pending requests
  (new ConnectionClosedException) on both normal completion of incoming() and on
  error, and close() does the same, so callers observe a failure instead of
  hanging. The receive-loop error is now logged (logger.error) rather than
  printStackTrace'd.

- connect() was unsynchronized (ISSUE-4). Concurrent first commands could each
  pass the isActive check and open a duplicate session, leaking the extra socket
  and listener. connect() is now guarded by a double-checked connectMutex, and
  the prepareExpert/prepareHeadless block by a separate prepareMutex (distinct
  mutex avoids re-entrancy: prepare* issue ONE_SHOT commands that skip that block).

- HttpClients were never closed (ISSUE-9). KtorWebSocketTransport.close() now
  closes its engine-backed client, and DefaultBrowser.stop() closes the HTTP-API
  client, so create/stop cycles no longer leak threads/connection pools.

Verified red->green with ConnectionLifecycleTest (fake transport): callCommand
fails with ConnectionClosedException on disconnect and on close, and concurrent
first commands open the transport exactly once. All three fail against the
unfixed code and pass after. ISSUE-9 is resource cleanup, verified by inspection
plus the real-browser create/stop suite. Full :core:jvmTest (real Chrome) +
:opentelemetry:jvmTest pass; macOS native tests pass; Linux/MinGW compile.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 23, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...n/dev/kdriver/core/connection/DefaultConnection.kt 77.77% 2 Missing and 2 partials ⚠️
.../kotlin/dev/kdriver/core/browser/DefaultBrowser.kt 66.66% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@nathanfallet nathanfallet merged commit d1ee729 into main May 23, 2026
5 checks passed
@nathanfallet nathanfallet deleted the fix/connection-lifecycle branch May 23, 2026 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant