Fix target registry: thread-safety, single creation policy, wait-for-target in get()#75
Merged
Conversation
…target in get() The browser's target registry (the targets list) had three issues sharing one root cause: one mutable list populated by two code paths with different rules and no shared lock. - Thread-safety (ISSUE-5). targets was mutated under updateTargetInfoMutex on the event path but read/written without it elsewhere (updateTargets, and the non-suspend getters tabs/mainTab/targets), risking ConcurrentModificationException or torn reads on a multi-threaded dispatcher. Now a private mutableTargets is mutated only under the mutex, and an immutable copy is published to a @volatile targetsSnapshot after every mutation; the non-suspend getters read the snapshot. Browser.targets is tightened MutableList<Connection> -> List<Connection> (no caller mutated it; OpenTelemetryBrowser delegates). - Inconsistent creation (ISSUE-6). updateTargets() created plain Connections (invisible to tabs/mainTab, which filter Tab), while the event path created Tabs and appended without dedup, so a target could appear twice or the initial page was a non-Tab. A single upsertTarget() helper now serves both paths: dedupe by targetId, page targets become Tabs. - get() race (ISSUE-8). get() did targets.filterIsInstance<Tab>().first { ... }, which threw NoSuchElementException when the Tab had not been registered yet (the targetCreated event races createTarget() returning, and at start-up the initial page registers asynchronously). Replaced with awaitPageTab(), which polls the snapshot with a timeout. Verified red->green with TargetRegistryTest (fake connection/tab via stubbed callCommand): a page target is registered as a Tab, and get() waits for the page tab to appear instead of throwing. Independently reviewed. Full :core:jvmTest (real Chrome) + :opentelemetry:jvmTest pass; macOS native tests pass; Linux/MinGW compile.
PR-C made the initial page register as a Tab, but only if getTargets() (in start()'s updateTargets()) already returns it. getTargets() can return before the browser has created the page, and no targetCreated event may have fired yet, so browser.mainTab could still be null right after createBrowser — the residual startup race behind the flaky `browser.mainTab ?: error(...)` tests (e.g. testEvaluateWaitPromiseSuccess). start() now ends with a best-effort awaitInitialPageTab(): it polls (re-running updateTargets each tick so it works with or without auto-discovery events) until a page Tab is registered, bounded by the same 10s timeout. It does NOT throw on timeout — a browser may legitimately have no page, in which case mainTab stays null as before. This closes the mainTab startup race the same way the get() fix closed the navigation race. Verified red->green with TargetRegistryTest.awaitInitialPageTab_ makesMainTabReadyOncePageAppears (mainTab null before the await; non-null once the page appears during it). Full :core:jvmTest (real Chrome) + :opentelemetry:jvmTest pass; macOS native tests pass; Linux/MinGW compile.
… timeouts Two unrelated-to-the-fix CI-reliability changes so the build is a trustworthy signal: - Set strategy.fail-fast: false so an intermittent failure on one OS leg no longer cancels the other two — the real per-OS result is visible instead of masked. - Widen the real-browser network-event test timeouts from 3s (and a 1s handler wait) to 10s. These assert on requestWillBeSent/responseReceived/requestPaused arriving after a real navigation; 3s is too tight on loaded CI runners (notably Windows) and caused TimeoutCancellationException flakes in FetchInterceptionTest /RequestExpectationTest. The deterministic fake-transport unit tests keep their short timeouts.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
The on-runner HTML report isn't accessible from the Actions log, so a failing assertion only shows its top frame. --info logs the test failure exception (expected/actual) so CI-only failures can be diagnosed without re-uploading reports. Diagnostic; can be trimmed once the flaky Windows test is understood.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The browser's target registry (the
targetslist) had three issues sharing one root cause: one mutable list populated by two code paths with different rules and no shared lock.ISSUE-5 — thread-safety
targetswas mutated underupdateTargetInfoMutexon the event path, but read/written without it elsewhere (updateTargets(), and the non-suspend getterstabs/mainTab/targets) →ConcurrentModificationException/torn reads on a multi-threaded dispatcher. Now aprivate val mutableTargetsis mutated only under the mutex, and an immutable copy is published to@Volatile targetsSnapshotafter each mutation; the non-suspend getters read the snapshot.Browser.targetstightenedMutableList<Connection>→List<Connection>(no caller mutated it;OpenTelemetryBrowserdelegates).ISSUE-6 — inconsistent creation / duplicates
updateTargets()created plainConnections (invisible totabs/mainTab, which filterTab), while the event path createdTabs and appended without dedup — so a target could appear twice, or the initial page was a non-Tab. A singleupsertTarget()now serves both paths: dedupe bytargetId, page targets becomeTabs.ISSUE-8 —
get()race / crashget()didtargets.filterIsInstance<Tab>().first { … }, throwingNoSuchElementExceptionwhen theTabwasn't registered yet (thetargetCreatedevent racescreateTarget()returning; at start-up the initial page registers asynchronously). Replaced withawaitPageTab(), which polls the snapshot with a timeout. This is the mechanism behind the intermittentElementTest.testGetChildrenCI failures.Verification (red → green)
New
TargetRegistryTest(fake connection/tab via stubbedcallCommand, no real browser):updateTargets_registersPageTargetAsTab— RED on unfixed (createConnection→ not aTab→tabsempty), GREEN after.get_waitsForPageTabToAppear— RED on unfixed (.first {}→NoSuchElementException), GREEN after.updateTargets_dedupesByTargetId— guard.Independently reviewed (safe to merge). Full
:core:jvmTest(real Chrome) +:opentelemetry:jvmTestpass; macOS native tests pass; Linux/MinGW compile.Test plan
./gradlew :core:jvmTest --tests "*.TargetRegistryTest"— red on unfixed, green after./gradlew :core:jvmTest(real Chrome) — all pass./gradlew :opentelemetry:jvmTest— all pass./gradlew :core:macosArm64Test+ Linux/MinGW compile — pass