Follow-up from #939 (ActiveHead cache-line refactor).
Pending: regression test for aicore_rotate failure-path accounting
PR #939 fixed a pre-existing over-counting bug: the pre-emptive dropped_record_count += BUFFER_SIZE in aicore_rotate's two failure branches (empty free queue, full ready queue) double-counted records that the flush retry path would still deliver, breaking the collected + dropped == total reconcile invariant when the run ended before the slot guard actually overflowed the projected BUFFER_SIZE more records.
We need a regression test that exercises both failure paths and asserts the reconcile invariant. Triggers are hard to set up in the existing 5-task vector example:
- Empty free queue at rotation: requires driving enough rotations to exhaust the free pool (
PLATFORM_AICORE_BUFFERS_PER_CORE per core). A long stress run with many tasks per core.
- Ready queue full at rotation: requires the host drain thread to be slow / paused.
Approach options:
- Add a stress test that runs N×PLATFORM_AICORE_BUFFERS_PER_CORE tasks per core and asserts the reconcile invariant in the captured JSON.
- Add a sim-only knob to artificially block the host drain for a window.
Pending: perf measurement on paged_attention_unroll (RESOLVED)
Measured on a2a3 onboard, paged_attention_unroll Case1 with --enable-l2-swimlane 4, 3 iters each via task-submit:
|
pytest body |
wall (incl. import) |
| Baseline (upstream/main pre-#939) |
15.46–15.67s |
22.85–23.01s |
| B alone (#939) |
15.19–15.34s |
22.99–23.28s |
Within noise (<2%); no measurable regression from packing head + counters into the same cache line. Design choice validated — counters can stay co-located with head.
Priority
Regression test is non-blocking; the fix in #939 is correct by code review and validated by reconcile math. Add when test-infra can model the trigger.
Follow-up from #939 (ActiveHead cache-line refactor).
Pending: regression test for
aicore_rotatefailure-path accountingPR #939 fixed a pre-existing over-counting bug: the pre-emptive
dropped_record_count += BUFFER_SIZEinaicore_rotate's two failure branches (empty free queue, full ready queue) double-counted records that the flush retry path would still deliver, breaking thecollected + dropped == totalreconcile invariant when the run ended before the slot guard actually overflowed the projected BUFFER_SIZE more records.We need a regression test that exercises both failure paths and asserts the reconcile invariant. Triggers are hard to set up in the existing 5-task vector example:
PLATFORM_AICORE_BUFFERS_PER_COREper core). A long stress run with many tasks per core.Approach options:
Pending: perf measurement on(RESOLVED)paged_attention_unrollMeasured on a2a3 onboard, paged_attention_unroll Case1 with --enable-l2-swimlane 4, 3 iters each via task-submit:
Within noise (<2%); no measurable regression from packing head + counters into the same cache line. Design choice validated — counters can stay co-located with head.
Priority
Regression test is non-blocking; the fix in #939 is correct by code review and validated by reconcile math. Add when test-infra can model the trigger.