Perf(runtime): defer fanout wiring to scheduler via wiring queue by poursoul · Pull Request #496 · hw-native-sys/simpler

poursoul · 2026-04-09T11:17:30Z

Move fanout edge construction (fanout_lock acquisition, dep_pool allocation, early_finished check, and ready-queue push) from the orchestrator's submit hot path to a dedicated wiring queue drained by scheduler thread 0. This reduces cross-core L2 cache and memory bus contention between orchestrator and scheduler threads.

Key changes:

Orchestrator submit (STEP 6) now only stores fanin metadata in payload and increments producers' fanout_count (no lock needed)
New PTO2SchedulerState::drain_wiring_queue() method handles all fanout wiring asynchronously
dep_pool ownership moved from PTO2RingSet to RingSchedState, exclusively managed by scheduler thread 0
Slot state initialization consolidated into pto2_prepare_task()
Scheduler profiling extended with wiring phase statistics
Fix pre-existing MD040/MD060 markdown lint errors in touched docs

Measured on paged_attention_unroll (Case1, 100 rounds):
entry_cost: 914 -> 739 us (-19%)
sched_cost: 1143 -> 1148 us (no regression)

Example	Base (us)	HEAD (us)	Delta (us)	Change (%)
alternating_matmul_add	916.0	785.9	-130.1	-14.20%
(orch)	915.8	785.5	-130.3	-14.23%
benchmark_bgemm	728.8	733.0	+4.2	+0.58%
(orch)	697.2	689.7	-7.5	-1.08%
paged_attention_unroll (Case1)	1146.3	1154.4	+8.1	+0.71%
(orch)	934.4	733.1	-201.3	-21.54%
paged_attention_unroll (Case2)	554.5	532.5	-22.0	-3.97%
(orch)	412.3	306.9	-105.4	-25.56%
batch_paged_attention	3165.6	2834.9	-330.7	-10.45%
(orch)	2529.9	1877.1	-652.8	-25.80%

gemini-code-assist

Code Review

This pull request moves fanout wiring from the orchestrator's submission hot path to a deferred wiring queue managed by the scheduler to reduce memory bus pressure. Feedback identifies a critical race condition where dep_pool_mark is assigned after tasks are pushed to the ready queue, which could cause incorrect memory reclamation. Other suggestions include restoring error observability during dependency pool initialization and optimizing the frequency of reclamation checks by grouping tasks by ring.

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_scheduler.h

src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_scheduler.h

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_scheduler.cpp

src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_scheduler.cpp

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_scheduler.h

src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_scheduler.h

Move fanout edge construction (fanout_lock acquisition, dep_pool allocation, early_finished check, and ready-queue push) from the orchestrator's submit hot path to a dedicated wiring queue drained by scheduler thread 0. This reduces cross-core L2 cache and memory bus contention between orchestrator and scheduler threads. Key changes: - Orchestrator submit (STEP 6) now only stores fanin metadata in payload and increments producers' fanout_count (no lock needed) - New PTO2SchedulerState::drain_wiring_queue() method handles all fanout wiring asynchronously - dep_pool ownership moved from PTO2RingSet to RingSchedState, exclusively managed by scheduler thread 0 - Slot state initialization consolidated into pto2_prepare_task() - Scheduler profiling extended with wiring phase statistics - Fix pre-existing MD040/MD060 markdown lint errors in touched docs Measured on paged_attention_unroll (Case1, 100 rounds): entry_cost: 914 -> 739 us (-19%) sched_cost: 1143 -> 1148 us (no regression)

gemini-code-assist bot reviewed Apr 9, 2026

View reviewed changes

poursoul force-pushed the refactor-fanin branch from b938314 to 45baf73 Compare April 10, 2026 08:56

poursoul force-pushed the refactor-fanin branch from 45baf73 to c0a95cf Compare April 10, 2026 08:59

ChaoWao approved these changes Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf(runtime): defer fanout wiring to scheduler via wiring queue#496

Perf(runtime): defer fanout wiring to scheduler via wiring queue#496
poursoul wants to merge 1 commit intohw-native-sys:mainfrom
poursoul:refactor-fanin

poursoul commented Apr 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

poursoul commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

poursoul commented Apr 9, 2026 •

edited

Loading