Skip to content

Test concurrent component lifecycles deterministically under synctest#65

Open
danielorbach wants to merge 20 commits into
mainfrom
worktree-upgrade-to-synctest
Open

Test concurrent component lifecycles deterministically under synctest#65
danielorbach wants to merge 20 commits into
mainfrom
worktree-upgrade-to-synctest

Conversation

@danielorbach
Copy link
Copy Markdown
Owner

@danielorbach danielorbach commented May 31, 2026

The lifecycle suite synchronised its concurrent scenarios with sleep-based timeouts and a SyncTimeout constant — brittle, timing-dependent, and slow, since every "this must not happen yet" assertion paid real wall-clock time to prove a negative. This migrates the suite to testing/synctest, whose per-bubble fake clock advances only once every goroutine is durably blocked, giving deterministic control over the startup, shutdown, and cleanup sequencing the framework's concurrency depends on.

The work proceeds conservatively, one top-level test at a time: each test is first wrapped in a synctest.Test bubble with its assertions left exactly as they were, and only then is the now-redundant machinery peeled away. Inside a bubble a stalled goroutine surfaces as a deadlock report instead of a silent hang, so the time.After(SyncTimeout) and time.NewTimer guard rails that once protected those waits become dead branches and are dropped. The t.Parallel() calls existed only to keep the real-clock timeouts off the critical path; under the fake clock those waits are instant, so the parallelism goes too. Channels, timers, and contexts that drive a test's timing are constructed inside the bubble so they belong to its clock. The commit history is grouped per top-level test — bubble commit first, then that test's modernization commits — so each test's evolution reads top to bottom.

Finally the suite moves to package component_test. It had already stopped referencing any unexported symbol, so binding it to the exported API (RunProc, L, the With* options, ErrTerminated) keeps the black-box contract honest and stops future tests from quietly reaching into internals.

Scope and exclusions: the change is confined to lifecycle_test.go, the only suite that relied on sleep/timeout-based concurrency synchronisation; the other root tests and the subpackage tests do not exercise lifecycle concurrency and are left untouched. SyncTimeout is gone entirely — with every wall-clock guard rail removed, its last reference (the timeout argument to l.Stop() in TestL_Stop/Respected) became a plain time.Second literal whose only job is to be comfortably longer than the now-instant stop. Documenting reusable synctest patterns for component authors is deliberately deferred to a follow-up; this PR establishes the pattern in the suite itself. No production code changes.

Verification: go test -race ./... passes and is deterministic across 20 repeated runs, golangci-lint run ./... reports no issues, and every commit compiles its tests independently.

Addresses #10.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request updates the lifecycle concurrency test suite to run deterministically under Go’s testing/synctest, removing brittle real-time sleeps/timeouts and migrating the suite to a black-box component_test package that uses only exported APIs.

Changes:

  • Wrap lifecycle concurrency scenarios in synctest.Test bubbles and drop SyncTimeout/time.After guard rails.
  • Remove t.Parallel() usage that previously existed primarily to hide wall-clock waits.
  • Move lifecycle_test.go to package component_test and switch to component.* exported symbols.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lifecycle_test.go Outdated
Comment on lines +166 to +167
// yield to increase the change of the cleanup being called in case the test will
// have failed
Wrap the table-driven body in synctest.Test so the lifecycle and its
reaper goroutines execute inside an isolated bubble. This is the first
step of migrating the lifecycle tests to testing/synctest; assertions
are untouched.
The Concurrent subtest fans out 16 child lifecycles and waits on an
unbuffered channel; inside a bubble the wait resolves deterministically
without leaning on the SyncTimeout wall-clock fallback.
Inside a synctest bubble a stalled child can no longer hang the suite:
once every goroutine is durably blocked with no timer to advance, the
bubble reports a deadlock. The SyncTimeout fallback was a wall-clock
safety net for that hang, so the wait collapses to a bare receive.
The SynchronisesBeforeSubComponents subtest deliberately waits for a
timeout to prove the cleanup does not run early. Its context is now
created inside the bubble so the deadline rides the fake clock, which
advances the moment every goroutine is durably blocked instead of
burning 180ms of wall time.
The parallelism existed only to hide the wall-clock cost of the
SynchronisesBeforeSubComponents subtest, which deliberately waited out a
real 180ms deadline. Under the fake clock that wait is instant, so the
suite no longer needs to overlap it with other work.
SynchronisesBeforeSubComponents proved that a cleanup does not run before
its subcomponent completes by racing a canary channel against a 180ms
context deadline, nudged by runtime.Gosched. Inside a synctest bubble both
are vestigial: the deadline is synthetic fake time and the yield is inert
under deterministic scheduling. The idiomatic primitive for "nothing else
has run yet" is synctest.Wait, which returns only once every other bubble
goroutine is durably blocked.

Hold the subcomponent on a release channel, Wait for that quiescent point,
and assert the cleanup has not run; then release it and confirm it does.
The negative is now a fact observed at a known-still moment rather than an
inference drawn from a timeout that never fires, and the canary, the
deadline, and the runtime import all retire.
Several subtests spawn goroutines that call L.Fatal and report back
over a channel, with a SyncTimeout guard against a missed signal. The
bubble makes that rendezvous deterministic and surfaces any stranded
goroutine as a bubble deadlock rather than a silent timeout.
The deferred sends from the Fatal-calling goroutines now rendezvous
deterministically inside the bubble, so the SyncTimeout fallback that
once protected against a missed signal is no longer load-bearing.
The Canceled and DeadlineExceeded subtests build their parent contexts
inside the bubble so the cancellation and the one-millisecond deadline
are bubble-owned; otherwise the fake clock could never trip a deadline
armed against the real clock outside.
The bubble advances its fake clock to the one-millisecond deadline as
soon as the lifecycle blocks on Context().Done(), so waiting for the
cancellation no longer needs a SyncTimeout escape hatch.
The subtest was parallel only to absorb its real millisecond deadline.
The bubble's fake clock reaches that deadline instantly, removing the
reason to overlap it with sibling subtests.
These naming checks carry no timing of their own, but bubbling them
keeps the suite uniform and lets synctest confirm each lifecycle's
reaper goroutines have wound down before the test returns.
These subtests turn on Stop()'s timeout behaviour: some expect it to
expire (lifecycle ignores the signal), others expect it to succeed. The
stopped/ready channels and the post-RunProc receive move inside the
bubble so the Stop() timers and that final rendezvous share the fake
clock, collapsing the staggered 18/36/180ms timeouts to an instant.
Ignored and Concurrent both block on Stop() results that the bubble now
delivers deterministically, draining each buffered send in turn. The
SyncTimeout fallback and the NewTimer watchdog guarded against a Stop()
that never returned, which a bubble deadlock would now surface instead.
Every Stop() subtest leaned on real timers (the 180ms ignored timeout,
the staggered 18/36/180ms concurrent timeouts) and was parallelised to
keep those waits off the critical path. The fake clock collapses them
to an instant, so the parent and its subtests run sequentially now.
The external stopper channel moves inside the bubble so closing it and
the lifecycle's propagation of that signal are observed deterministically
by synctest.Wait-style scheduling rather than a wall-clock race.
Closing the stopper and the lifecycle's propagation of that signal are
ordered deterministically within the bubble, so a bare receive on
Stopping() suffices where a SyncTimeout fallback once stood.
Move the lifecycle suite into package component_test so it consumes
only the exported API (RunProc, L, the With* options, ErrTerminated)
and qualifies them through the imported package. The tests had stopped
touching any unexported symbol, so binding them to the public surface
guards against future reliance on internals and keeps the black-box
contract honest. The change spans the whole file and belongs to no
single test, so it stands apart from the per-test modernization.
Once the bubbles replaced every wall-clock guard rail, the only
surviving reference was the timeout handed to l.Stop() in
TestL_Stop/Respected, where the value is merely 'long enough to
succeed'. Inlining a plain time.Second there lets the constant and its
stale doc comment retire.
@danielorbach danielorbach force-pushed the worktree-upgrade-to-synctest branch from d5c84cf to 0b47063 Compare May 31, 2026 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants