test: de-flake watchdog + pheromone-scaling assertions; reconcile TEST_INVENTORY by Swately · Pull Request #16 · Swately/phyriad

Swately · 2026-05-21T18:58:33Z

Summary

Found via an empirical flake loop (full suite ×10 under ctest -j4 on WSL2/Linux,
matching the CI runner), then fixed deterministically:

orchestration_test — watchdog §4 false-negative (100 ms detection budget too
tight under -j → monitor thread starved); §3 false-positive (500 ms node timeout
vs stretched inter-heartbeat sleeps). Both now use generous deadlines; both still
assert the watchdog's real behaviour.
stigmergy_pheromone_test §10 — a hard scaling-shape floor (r4 >= 2.0× r1)
is a perf claim, not correctness; it flaked under ctest -j even off-CI. Now
asserts functional throughput only; scaling shape stays in bench/.

Verified 0/18 flaky under ctest -j4 after the fixes (was 1/10 and 1/12 before);
pass on WSL gcc-13 + Windows MinGW.

Also reconciles TEST_INVENTORY.md with the actual code: 4 microbenches were
already migrated to the v2 harness (warmup + escape(); the 0xDEADBEEF hacks are
gone) but still tagged Tier C — bumped C→B; the comparison verdicts now defer to
BENCHMARK_FAIRNESS.md (the SoT) instead of describing the old broken state.

Test plan

WSL gcc-13 + Windows MinGW: both tests pass
full suite 0/18 flaky under ctest -j4
CI (gcc-13 + clang-18, 2-vCPU) green
lint-docs green; no include/ change so doc-sync is unaffected

🤖 Generated with Claude Code

… under ctest -j Found by an empirical flake loop (full suite x10 under `ctest -j4` on WSL2/Linux, matching CI): two tests fail rarely under -j oversubscription. orchestration_test §4 (watchdog timeout): a missed heartbeat must be detected by the monitor thread, which the test waited only 100 ms for — under -j contention on a 2-vCPU box that thread can be descheduled longer, a false-negative flake. Use a generous 2 s deadline (loop still exits the instant the fault fires, ~12 ms normally). §3 (heartbeat keeps node alive): bump the node timeout 500 ms -> 2 s so a stretched inter-heartbeat sleep can't trip a false-positive fault. Neither masks anything — both still assert the watchdog's actual behaviour. stigmergy_pheromone_test §10: asserted a hard scaling-SHAPE floor (r4 >= 2.0x r1) on non-CI machines. That is a PERFORMANCE claim, inherently contention-sensitive, and it flaked under `ctest -j` even on a dev box (it was only skipped when CI=true). The unit test now asserts FUNCTIONAL throughput only (each thread count produces work); the scaling shape is measured properly, with affinity pinning, in bench/ — its correct home. Drops the CI-vs-dev branch entirely → deterministic everywhere. Verified: full suite 0/18 flaky under `ctest -j4` after the fixes (orchestration was 1/10, stigmergy_pheromone 1/12 before); both pass on WSL gcc-13 + Windows MinGW. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…proved) code The §3.1 matrix still tagged bench_ring_channel / bench_circuit_breaker / bench_frame_arena / bench_hal_primitives as Tier C with "add warmup / replace 0xDEADBEEF DCE hack" — but the V1 harness migration already did that (verified in source: all use bench::measure_repeated + bench::escape(); the hacks are gone). Bump them C->B to match the code and BENCHMARK_FAIRNESS.md. Likewise the §3.2 / §4 comparison rows still described the OLD broken state (14x MPMC import, 178x submit-only, gRPC inproc-transport lie) — contradicting BENCHMARK_FAIRNESS.md, which records these as resolved (D-1/D-2/D-3). Defer the ratios to that SoT and mark the remaining open item honestly: independent re-run of the comparison numbers on this machine (Boost/Taskflow/gRPC available on WSL; concurrencpp pending). Also: bench::escape() promotion to BenchHarness.hpp is done (D2 resolved); refresh the §7 summary and the cross-cutting issues list. No code change — documentation reconciliation only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…on benches Rebuilt and re-ran ring_vs_boost_lockfree, pool_vs_taskflow, pool_vs_concurrencpp, and pn1_vs_grpc on the WSL env (g++-13, Release, pinned to CCD0). All four third-party libs are present (concurrencpp included — the earlier "pending" note was wrong). Each bench completes losslessly — confirming the D-1 livelock, D-2 pool task-loss, and D-3 loopback fixes — and wins in the direction BENCHMARK_FAIRNESS.md records: SPSC both lossless, MPMC ~4×, pool submit→completion ~2.1×/~3.8× (both 200000/200000), concurrencpp large (genuine coroutine overhead), pn1_vs_grpc 3.6× p50 with the not-like-for-like caveat printed first. Exact magnitudes vary with the machine/V-Cache pinning, so the SoT medians stay authoritative for the ratios; this run confirms direction + lossless completion. Raw capture added under docs/perf-history/. Updates footnote 8, the §4 map rows, and the §7 summary accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

COMPARISONS.md was reconciled to BENCHMARK_FAIRNESS.md on 2026-05-21 (honest summary deferred to the SoT; stale section tables carry ⛔ Superseded banners with the old 5.5×/14×/18×/178×/1.41× numbers retired in-place). Update the inventory's "documentation inconsistencies" list to reflect that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Swately and others added 4 commits May 21, 2026 12:58

Swately merged commit 3dcf10d into main May 21, 2026
10 checks passed

Swately deleted the test/deflake-and-inventory-reconcile branch May 21, 2026 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: de-flake watchdog + pheromone-scaling assertions; reconcile TEST_INVENTORY#16

test: de-flake watchdog + pheromone-scaling assertions; reconcile TEST_INVENTORY#16
Swately merged 4 commits into
mainfrom
test/deflake-and-inventory-reconcile

Swately commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Swately commented May 21, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant