Add: dual-slot AICPU dispatch payload and two-phase pipelining scheduler by zhusy54 · Pull Request #477 · hw-native-sys/simpler

zhusy54 · 2026-04-08T07:13:44Z

Summary

Introduce dual-buffer payload storage so AICPU can pre-load the next task's
payload while AICore is still executing the current one (true pipelining)
Add two-phase dispatch loop: issue to a pending slot while the running slot
is occupied, eliminating idle cycles between consecutive kernel launches

Key Changes

s_pto2_payload_per_core: extended from single-buffer to [RUNTIME_MAX_WORKER][2];
slot selected by reg_task_id & 1u, consistent between AICPU (write) and AICore (read)
CoreExecState: add parallel running/pending field pairs (slot_state, reg_task_id,
subslot, dispatch_timestamp); rename executing_* → running_*
CoreTracker: add pending_occupied_ BitStates with
get_idle_cluster_offset_states (both slots free) and
get_pending_only_cluster_offset_states (core running, pending slot free)
for two-phase dispatch
Extract decide_slot_transition() pure function to decode register events
into SlotTransition flags; extract complete_slot_task() helper for the
completion hot path
AICore: add pipe_barrier(PIPE_ALL) before kernel execution and select
exec_payload via payload + (task_id & 1u); simulation no-op fallback added
Static assert: parity skip range keeps even parity over TASK_ID_MASK

Testing

Existing simulation tests pass with new dual-slot scheduler
Two-phase dispatch validated against both a2a3sim and hardware paths

gemini-code-assist

Code Review

This pull request introduces a dual-slot dispatch mechanism to the AICPU executor, allowing for task pipelining by tracking both running and pending tasks per core. It updates the CoreTracker and CoreExecState to manage these slots and implements a two-phase dispatch logic that prioritizes idle cores before filling pending slots. Additionally, the AICore performance collector was optimized to use a caller-maintained write index, reducing cache invalidation overhead. A fix for simulation environments was also included to prevent payload corruption. Review feedback identifies opportunities to optimize the scheduler's hot path by removing redundant bitmask refreshes.

src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp

… with dual-watermark deferred task release - Introduce two-slot dispatch payload (slot 0 / slot 1) for AICPU - Implement two-phase pipelining: dispatch phase and execute phase - Add dual-watermark mechanism for deferred task release

zhusy54 marked this pull request as draft April 8, 2026 07:13

gemini-code-assist bot reviewed Apr 8, 2026

View reviewed changes

src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp Outdated Show resolved Hide resolved

src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp Outdated Show resolved Hide resolved

zhusy54 force-pushed the dual-sched branch from f73ecf4 to baff3b4 Compare April 10, 2026 07:42

zhusy54 marked this pull request as ready for review April 10, 2026 07:55

zhusy54 force-pushed the dual-sched branch from baff3b4 to a55c495 Compare April 10, 2026 07:59

zhusy54 changed the title ~~feat(aicpu): implement dual-slot scheduling with idle-core-first dispatch and deferred release~~ Add: dual-slot AICPU dispatch payload and two-phase pipelining scheduler Apr 10, 2026

zhusy54 force-pushed the dual-sched branch 2 times, most recently from 4abf76b to aa95038 Compare April 10, 2026 11:36

zhusy54 force-pushed the dual-sched branch from aa95038 to fa10be3 Compare April 10, 2026 11:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add: dual-slot AICPU dispatch payload and two-phase pipelining scheduler#477

Add: dual-slot AICPU dispatch payload and two-phase pipelining scheduler#477
zhusy54 wants to merge 1 commit intohw-native-sys:mainfrom
zhusy54:dual-sched

zhusy54 commented Apr 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhusy54 commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Testing

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhusy54 commented Apr 8, 2026 •

edited

Loading