Skip to content

fix(pool): UAF in SubmitterRegistry TLS cache (§10 ci-linux SegFault) + deterministic CPU_ANY tests#15

Merged
Swately merged 4 commits into
mainfrom
diag/pool-int-segfault-backtrace
May 21, 2026
Merged

fix(pool): UAF in SubmitterRegistry TLS cache (§10 ci-linux SegFault) + deterministic CPU_ANY tests#15
Swately merged 4 commits into
mainfrom
diag/pool-int-segfault-backtrace

Conversation

@Swately
Copy link
Copy Markdown
Owner

@Swately Swately commented May 21, 2026

Summary

Root-causes and fixes the pool_integration_test §10 SegFault that failed
ci-linux/linux-gcc-13 (3/3 retries on the 2-vCPU runner), then hardens the test.

The bug (use-after-free)

Captured the faulting frame on the runner via an execinfo backtrace handler
(it never reproduced on a 16C/32T host; 228 Release + 40 ASan runs clean). The
backtrace + disassembly pinned it to PoolRuntime::try_submit
SubmitterRegistry::acquire_current()'s TLS-cache fast path: across the ~10
PoolRuntimes the test creates/destroys on the main thread, the allocator reuses
the this address (so registry == this passes) and the freed slots_ array
still holds the stable main-thread tid (so owning_thread_id == tid passes), so
a slot dangling into freed memory is returned and try_send dereferences its
overwritten ring pointer. Main-thread only; both gcc/libstdc++ and clang/libc++;
ASan-clean (quarantine prevents the address reuse).

Fix: key the TLS cache on a per-registry monotonic instance id, not the
this address — a recreated registry can never match a stale cache.

Test hardening (P0b)

  • §10 is now deterministic: proves CPU_ANY spans both CPU groups via
    affinity_hint + TaskResult.worker_id (no saturation/sampling).
  • §7 asserts deterministic completion (wait_result SUCCESS) and treats the
    metrics read as eventually-consistent (2 s → 15 s deadline).
  • The heavy saturation soak moves to pool_cpu_any_stress_test.cpp, gated behind
    -DPHYRIAD_BUILD_STRESS_TESTS=ON (asserts deterministic lossless completion).
  • Kept the execinfo crash-backtrace handler (shared header) as a permanent aid
    for these concurrency-heavy tests.

Test plan

  • linux-gcc-13 / linux-clang-18 pass (§10 no longer SegFaults)
  • deterministic §10/§7: 24/24, 5/5 stable on WSL gcc-13 + Windows MinGW
  • gated stress: 24000/24000 lossless under taskset -c 0,1
  • lint-hal / lint-docs green; doc-sync bypassed via [skip-doc-sync] (detail:: header)

🤖 Generated with Claude Code

Swately and others added 4 commits May 21, 2026 09:12
A solo, push-to-main workflow was not covered: on a push the previous step
diffed against origin/main, which equals HEAD after the push -> empty range ->
nothing detected. Pick the base by event type instead:

- pull_request: diff against the target branch (origin/<base_ref>).
- push (e.g. to main): diff against github.event.before -- exactly the commits
  just pushed (with an all-zero/new-branch fallback to HEAD~1).

The doc-sync job is git-only (no build/tests), so running it on every push is
fast and does not add to the slow build/test matrix. Docs updated to match
(CONTRIBUTING + DOC_AUDIT).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The §10 CPU_ANY stress reliably SegFaults on the 2-vCPU ci-linux runner under
gcc-13/libstdc++ -O3 (3/3 retries) but never reproduces locally: 228 Release
runs (taskset -c 0,1 + 6-/10-way oversubscription) and 40 ASan/UBSan runs on a
16C/32T host produced 0 crashes and 0 sanitizer errors, and the clang-18/libc++
CI job passes. The trigger is the runner's CPU/scheduler, so the faulting frame
can only be captured there.

Add a glibc-only execinfo SIGSEGV/SIGABRT/SIGBUS backtrace handler to
pool_integration_test (async-signal-safe write + backtrace_symbols_fd, then
re-raise to preserve the exit code) and unbuffered stdout so the last "§ Test N"
header survives the crash. Build the test with -g1 -rdynamic on Linux so the
handler resolves named frames in the CI log. Pure diagnostic: no test logic or
intensity change; compiles to a no-op off glibc (e.g. Windows MinGW).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…kip-doc-sync]

Root cause of the §10 pool_integration_test SegFault on the ci-linux runner
(proven via the backtrace from PR #15 + disassembly): the faulting instruction
is `mov 0x12080(%rax),%rsi` inside try_submit — a read of ring->next_producer_seq_
through `slot->ring`, where `slot` is a DANGLING pointer returned by
SubmitterRegistry::acquire_current()'s TLS-cache fast path.

The fast path validated the cache with `cache.registry == this &&
cache.slot->owning_thread_id == tid`. Both checks pass spuriously across the
~10 PoolRuntimes the test creates and destroys on the main thread:
  - the allocator places a new PoolRuntime at the SAME address as a destroyed
    one, so `registry == this` passes for a *different* registry instance;
  - the destroyed registry's freed slots_ array still holds the main thread's
    (stable) tid, so `owning_thread_id == tid` passes too.
The stale slot — dangling into freed memory whose `ring` field has since been
overwritten — is then returned, and try_send dereferences the garbage ring
pointer → SIGSEGV. It only hits the main thread (§10's submitter threads have
fresh, empty TLS caches → scan path → valid slots), reproduces under both
gcc/libstdc++ and clang/libc++, and is ASan-clean (ASan's quarantine prevents
the address reuse that is the precondition).

Fix: key the cache on a per-registry monotonic instance id assigned at
construction instead of the `this` address. A recreated registry (even at the
same address) gets a fresh id, so a cache populated by a destroyed registry can
never match a live one — acquire_current never dereferences a stale slot. The
owning_thread_id field remains the claim token for the scan path. No public API
change (detail:: header), hence [skip-doc-sync].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… fix §7 flake

The §10 CPU_ANY test and §7 metrics test were timing-fragile under CI -j
oversubscription (reproduced locally: 13/120 runs failed §7's 2 s metrics
deadline; §10's sampled fill%-distribution assertion is inherently stochastic).
Replace stochastic assertions with deterministic ones; keep the heavy stress as
a gated soak.

§10 (now DETERMINISTIC): prove CPU_ANY's candidate set spans BOTH CPU groups by
steering tasks with affinity_hint (relative index into the kind's candidate
group) and reading back TaskResult.worker_id — no saturation, no sampling. With
make_pool(2,2,1): CPU_ANY hint 2/3 routes to the long group (ids 2,3); CPU_SHORT
hint 2 wraps into its 2-element set and can never reach long. That contrast is
the proof, and it is exact on an idle pool.

§7 (deterministic primary check): assert every task completed via wait_result
SUCCESS (independent of aggregator scheduling). The metrics_snapshot read stays,
but as an eventually-consistent secondary check with a generous deadline (2 s →
15 s) — the aggregator thread can be descheduled under oversubscription after the
work is already done; this waits for telemetry to settle, it does not mask a race.

Saturation soak moved to pool_cpu_any_stress_test.cpp, built only with
-DPHYRIAD_BUILD_STRESS_TESTS=ON (default OFF — not a CI gate, labelled "stress").
It keeps the intensive 8-thread × 3000-task open-loop burst but asserts only the
DETERMINISTIC lossless-under-backpressure property (accepted == completed, the
D-2 task-loss guard); the fill%-distribution is printed, not asserted.

Also: extract the execinfo crash-backtrace handler to test_crash_handler.hpp
(shared by both tests); pass io=1 explicitly to make_pool in §10 (io=0 means
"derive from topology", not zero IO workers — the old comment's worker model was
inaccurate).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Swately Swately changed the title diag(pool): capture §10 segfault backtrace on ci-linux runner fix(pool): UAF in SubmitterRegistry TLS cache (§10 ci-linux SegFault) + deterministic CPU_ANY tests May 21, 2026
@Swately Swately merged commit 115d479 into main May 21, 2026
10 checks passed
@Swately Swately deleted the diag/pool-int-segfault-backtrace branch May 21, 2026 17:18
Swately added a commit that referenced this pull request May 21, 2026
The submitter-slot TLS cache is now keyed to the pool instance (not the registry
address), so a slot claimed in one PoolRuntime is never reused by a later one
even under allocator address reuse — see the §10 UAF fix in #15. Documents this
in the threading section and satisfies doc-sync for that header change (the
squash-merge of #15 dropped the [skip-doc-sync] token).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant