fix(pool): UAF in SubmitterRegistry TLS cache (§10 ci-linux SegFault) + deterministic CPU_ANY tests#15
Merged
Merged
Conversation
A solo, push-to-main workflow was not covered: on a push the previous step diffed against origin/main, which equals HEAD after the push -> empty range -> nothing detected. Pick the base by event type instead: - pull_request: diff against the target branch (origin/<base_ref>). - push (e.g. to main): diff against github.event.before -- exactly the commits just pushed (with an all-zero/new-branch fallback to HEAD~1). The doc-sync job is git-only (no build/tests), so running it on every push is fast and does not add to the slow build/test matrix. Docs updated to match (CONTRIBUTING + DOC_AUDIT). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The §10 CPU_ANY stress reliably SegFaults on the 2-vCPU ci-linux runner under gcc-13/libstdc++ -O3 (3/3 retries) but never reproduces locally: 228 Release runs (taskset -c 0,1 + 6-/10-way oversubscription) and 40 ASan/UBSan runs on a 16C/32T host produced 0 crashes and 0 sanitizer errors, and the clang-18/libc++ CI job passes. The trigger is the runner's CPU/scheduler, so the faulting frame can only be captured there. Add a glibc-only execinfo SIGSEGV/SIGABRT/SIGBUS backtrace handler to pool_integration_test (async-signal-safe write + backtrace_symbols_fd, then re-raise to preserve the exit code) and unbuffered stdout so the last "§ Test N" header survives the crash. Build the test with -g1 -rdynamic on Linux so the handler resolves named frames in the CI log. Pure diagnostic: no test logic or intensity change; compiles to a no-op off glibc (e.g. Windows MinGW). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…kip-doc-sync] Root cause of the §10 pool_integration_test SegFault on the ci-linux runner (proven via the backtrace from PR #15 + disassembly): the faulting instruction is `mov 0x12080(%rax),%rsi` inside try_submit — a read of ring->next_producer_seq_ through `slot->ring`, where `slot` is a DANGLING pointer returned by SubmitterRegistry::acquire_current()'s TLS-cache fast path. The fast path validated the cache with `cache.registry == this && cache.slot->owning_thread_id == tid`. Both checks pass spuriously across the ~10 PoolRuntimes the test creates and destroys on the main thread: - the allocator places a new PoolRuntime at the SAME address as a destroyed one, so `registry == this` passes for a *different* registry instance; - the destroyed registry's freed slots_ array still holds the main thread's (stable) tid, so `owning_thread_id == tid` passes too. The stale slot — dangling into freed memory whose `ring` field has since been overwritten — is then returned, and try_send dereferences the garbage ring pointer → SIGSEGV. It only hits the main thread (§10's submitter threads have fresh, empty TLS caches → scan path → valid slots), reproduces under both gcc/libstdc++ and clang/libc++, and is ASan-clean (ASan's quarantine prevents the address reuse that is the precondition). Fix: key the cache on a per-registry monotonic instance id assigned at construction instead of the `this` address. A recreated registry (even at the same address) gets a fresh id, so a cache populated by a destroyed registry can never match a live one — acquire_current never dereferences a stale slot. The owning_thread_id field remains the claim token for the scan path. No public API change (detail:: header), hence [skip-doc-sync]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… fix §7 flake The §10 CPU_ANY test and §7 metrics test were timing-fragile under CI -j oversubscription (reproduced locally: 13/120 runs failed §7's 2 s metrics deadline; §10's sampled fill%-distribution assertion is inherently stochastic). Replace stochastic assertions with deterministic ones; keep the heavy stress as a gated soak. §10 (now DETERMINISTIC): prove CPU_ANY's candidate set spans BOTH CPU groups by steering tasks with affinity_hint (relative index into the kind's candidate group) and reading back TaskResult.worker_id — no saturation, no sampling. With make_pool(2,2,1): CPU_ANY hint 2/3 routes to the long group (ids 2,3); CPU_SHORT hint 2 wraps into its 2-element set and can never reach long. That contrast is the proof, and it is exact on an idle pool. §7 (deterministic primary check): assert every task completed via wait_result SUCCESS (independent of aggregator scheduling). The metrics_snapshot read stays, but as an eventually-consistent secondary check with a generous deadline (2 s → 15 s) — the aggregator thread can be descheduled under oversubscription after the work is already done; this waits for telemetry to settle, it does not mask a race. Saturation soak moved to pool_cpu_any_stress_test.cpp, built only with -DPHYRIAD_BUILD_STRESS_TESTS=ON (default OFF — not a CI gate, labelled "stress"). It keeps the intensive 8-thread × 3000-task open-loop burst but asserts only the DETERMINISTIC lossless-under-backpressure property (accepted == completed, the D-2 task-loss guard); the fill%-distribution is printed, not asserted. Also: extract the execinfo crash-backtrace handler to test_crash_handler.hpp (shared by both tests); pass io=1 explicitly to make_pool in §10 (io=0 means "derive from topology", not zero IO workers — the old comment's worker model was inaccurate). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Swately
added a commit
that referenced
this pull request
May 21, 2026
The submitter-slot TLS cache is now keyed to the pool instance (not the registry address), so a slot claimed in one PoolRuntime is never reused by a later one even under allocator address reuse — see the §10 UAF fix in #15. Documents this in the threading section and satisfies doc-sync for that header change (the squash-merge of #15 dropped the [skip-doc-sync] token). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Root-causes and fixes the
pool_integration_test§10 SegFault that failedci-linux/linux-gcc-13(3/3 retries on the 2-vCPU runner), then hardens the test.The bug (use-after-free)
Captured the faulting frame on the runner via an
execinfobacktrace handler(it never reproduced on a 16C/32T host; 228 Release + 40 ASan runs clean). The
backtrace + disassembly pinned it to
PoolRuntime::try_submit→SubmitterRegistry::acquire_current()'s TLS-cache fast path: across the ~10PoolRuntimes the test creates/destroys on the main thread, the allocator reuses
the
thisaddress (soregistry == thispasses) and the freedslots_arraystill holds the stable main-thread tid (so
owning_thread_id == tidpasses), soa slot dangling into freed memory is returned and
try_senddereferences itsoverwritten
ringpointer. Main-thread only; both gcc/libstdc++ and clang/libc++;ASan-clean (quarantine prevents the address reuse).
Fix: key the TLS cache on a per-registry monotonic instance id, not the
thisaddress — a recreated registry can never match a stale cache.Test hardening (P0b)
affinity_hint+TaskResult.worker_id(no saturation/sampling).wait_resultSUCCESS) and treats themetrics read as eventually-consistent (2 s → 15 s deadline).
pool_cpu_any_stress_test.cpp, gated behind-DPHYRIAD_BUILD_STRESS_TESTS=ON(asserts deterministic lossless completion).execinfocrash-backtrace handler (shared header) as a permanent aidfor these concurrency-heavy tests.
Test plan
linux-gcc-13/linux-clang-18pass (§10 no longer SegFaults)taskset -c 0,1🤖 Generated with Claude Code