Fix target registry: thread-safety, single creation policy, wait-for-target in get() by nathanfallet · Pull Request #75 · cdpdriver/kdriver

nathanfallet · 2026-05-23T11:57:16Z

Summary

The browser's target registry (the targets list) had three issues sharing one root cause: one mutable list populated by two code paths with different rules and no shared lock.

ISSUE-5 — thread-safety

targets was mutated under updateTargetInfoMutex on the event path, but read/written without it elsewhere (updateTargets(), and the non-suspend getters tabs/mainTab/targets) → ConcurrentModificationException/torn reads on a multi-threaded dispatcher. Now a private val mutableTargets is mutated only under the mutex, and an immutable copy is published to @Volatile targetsSnapshot after each mutation; the non-suspend getters read the snapshot. Browser.targets tightened MutableList<Connection> → List<Connection> (no caller mutated it; OpenTelemetryBrowser delegates).

Why a snapshot rather than just marking the list @Volatile: volatile governs reassignment of the reference, not in-place add/remove, and MutableList isn't thread-safe — a reader iterating the live list would still hit ConcurrentModificationException. Readers iterate the immutable snapshot instead.

ISSUE-6 — inconsistent creation / duplicates

updateTargets() created plain Connections (invisible to tabs/mainTab, which filter Tab), while the event path created Tabs and appended without dedup — so a target could appear twice, or the initial page was a non-Tab. A single upsertTarget() now serves both paths: dedupe by targetId, page targets become Tabs.

ISSUE-8 — `get()` race / crash

get() did targets.filterIsInstance<Tab>().first { … }, throwing NoSuchElementException when the Tab wasn't registered yet (the targetCreated event races createTarget() returning; at start-up the initial page registers asynchronously). Replaced with awaitPageTab(), which polls the snapshot with a timeout. This is the mechanism behind the intermittent ElementTest.testGetChildren CI failures.

Verification (red → green)

New TargetRegistryTest (fake connection/tab via stubbed callCommand, no real browser):

updateTargets_registersPageTargetAsTab — RED on unfixed (createConnection → not a Tab → tabs empty), GREEN after.
get_waitsForPageTabToAppear — RED on unfixed (.first {} → NoSuchElementException), GREEN after.
updateTargets_dedupesByTargetId — guard.

Independently reviewed (safe to merge). Full :core:jvmTest (real Chrome) + :opentelemetry:jvmTest pass; macOS native tests pass; Linux/MinGW compile.

Test plan

./gradlew :core:jvmTest --tests "*.TargetRegistryTest" — red on unfixed, green after
./gradlew :core:jvmTest (real Chrome) — all pass
./gradlew :opentelemetry:jvmTest — all pass
./gradlew :core:macosArm64Test + Linux/MinGW compile — pass

…target in get() The browser's target registry (the targets list) had three issues sharing one root cause: one mutable list populated by two code paths with different rules and no shared lock. - Thread-safety (ISSUE-5). targets was mutated under updateTargetInfoMutex on the event path but read/written without it elsewhere (updateTargets, and the non-suspend getters tabs/mainTab/targets), risking ConcurrentModificationException or torn reads on a multi-threaded dispatcher. Now a private mutableTargets is mutated only under the mutex, and an immutable copy is published to a @volatile targetsSnapshot after every mutation; the non-suspend getters read the snapshot. Browser.targets is tightened MutableList<Connection> -> List<Connection> (no caller mutated it; OpenTelemetryBrowser delegates). - Inconsistent creation (ISSUE-6). updateTargets() created plain Connections (invisible to tabs/mainTab, which filter Tab), while the event path created Tabs and appended without dedup, so a target could appear twice or the initial page was a non-Tab. A single upsertTarget() helper now serves both paths: dedupe by targetId, page targets become Tabs. - get() race (ISSUE-8). get() did targets.filterIsInstance<Tab>().first { ... }, which threw NoSuchElementException when the Tab had not been registered yet (the targetCreated event races createTarget() returning, and at start-up the initial page registers asynchronously). Replaced with awaitPageTab(), which polls the snapshot with a timeout. Verified red->green with TargetRegistryTest (fake connection/tab via stubbed callCommand): a page target is registered as a Tab, and get() waits for the page tab to appear instead of throwing. Independently reviewed. Full :core:jvmTest (real Chrome) + :opentelemetry:jvmTest pass; macOS native tests pass; Linux/MinGW compile.

PR-C made the initial page register as a Tab, but only if getTargets() (in start()'s updateTargets()) already returns it. getTargets() can return before the browser has created the page, and no targetCreated event may have fired yet, so browser.mainTab could still be null right after createBrowser — the residual startup race behind the flaky `browser.mainTab ?: error(...)` tests (e.g. testEvaluateWaitPromiseSuccess). start() now ends with a best-effort awaitInitialPageTab(): it polls (re-running updateTargets each tick so it works with or without auto-discovery events) until a page Tab is registered, bounded by the same 10s timeout. It does NOT throw on timeout — a browser may legitimately have no page, in which case mainTab stays null as before. This closes the mainTab startup race the same way the get() fix closed the navigation race. Verified red->green with TargetRegistryTest.awaitInitialPageTab_ makesMainTabReadyOncePageAppears (mainTab null before the await; non-null once the page appears during it). Full :core:jvmTest (real Chrome) + :opentelemetry:jvmTest pass; macOS native tests pass; Linux/MinGW compile.

… timeouts Two unrelated-to-the-fix CI-reliability changes so the build is a trustworthy signal: - Set strategy.fail-fast: false so an intermittent failure on one OS leg no longer cancels the other two — the real per-OS result is visible instead of masked. - Widen the real-browser network-event test timeouts from 3s (and a 1s handler wait) to 10s. These assert on requestWillBeSent/responseReceived/requestPaused arriving after a real navigation; 3s is too tight on loaded CI runners (notably Windows) and caused TimeoutCancellationException flakes in FetchInterceptionTest /RequestExpectationTest. The deterministic fake-transport unit tests keep their short timeouts.

codecov · 2026-05-24T09:10:40Z

Codecov Report

❌ Patch coverage is 92.50000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
.../kotlin/dev/kdriver/core/browser/DefaultBrowser.kt	92.50%	0 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

The on-runner HTML report isn't accessible from the Actions log, so a failing assertion only shows its top frame. --info logs the test failure exception (expected/actual) so CI-only failures can be diagnosed without re-uploading reports. Diagnostic; can be trimmed once the flaky Windows test is understood.

nathanfallet added 3 commits May 23, 2026 13:56

nathanfallet merged commit 5131422 into main May 24, 2026
4 of 5 checks passed

nathanfallet deleted the fix/target-registry branch May 24, 2026 10:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix target registry: thread-safety, single creation policy, wait-for-target in get()#75

Fix target registry: thread-safety, single creation policy, wait-for-target in get()#75
nathanfallet merged 4 commits into
mainfrom
fix/target-registry

nathanfallet commented May 23, 2026

Uh oh!

codecov Bot commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

nathanfallet commented May 23, 2026

Summary

ISSUE-5 — thread-safety

ISSUE-6 — inconsistent creation / duplicates

ISSUE-8 — get() race / crash

Verification (red → green)

Test plan

Uh oh!

codecov Bot commented May 24, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ISSUE-8 — `get()` race / crash