Skip to content

Add: dual-slot AICPU dispatch payload and two-phase pipelining scheduler#477

Open
zhusy54 wants to merge 1 commit intohw-native-sys:mainfrom
zhusy54:dual-sched
Open

Add: dual-slot AICPU dispatch payload and two-phase pipelining scheduler#477
zhusy54 wants to merge 1 commit intohw-native-sys:mainfrom
zhusy54:dual-sched

Conversation

@zhusy54
Copy link
Copy Markdown
Contributor

@zhusy54 zhusy54 commented Apr 8, 2026

Summary

  • Introduce dual-buffer payload storage so AICPU can pre-load the next task's
    payload while AICore is still executing the current one (true pipelining)
  • Add two-phase dispatch loop: issue to a pending slot while the running slot
    is occupied, eliminating idle cycles between consecutive kernel launches

Key Changes

  • s_pto2_payload_per_core: extended from single-buffer to [RUNTIME_MAX_WORKER][2];
    slot selected by reg_task_id & 1u, consistent between AICPU (write) and AICore (read)
  • CoreExecState: add parallel running/pending field pairs (slot_state, reg_task_id,
    subslot, dispatch_timestamp); rename executing_*running_*
  • CoreTracker: add pending_occupied_ BitStates with
    get_idle_cluster_offset_states (both slots free) and
    get_pending_only_cluster_offset_states (core running, pending slot free)
    for two-phase dispatch
  • Extract decide_slot_transition() pure function to decode register events
    into SlotTransition flags; extract complete_slot_task() helper for the
    completion hot path
  • AICore: add pipe_barrier(PIPE_ALL) before kernel execution and select
    exec_payload via payload + (task_id & 1u); simulation no-op fallback added
  • Static assert: parity skip range keeps even parity over TASK_ID_MASK

Testing

  • Existing simulation tests pass with new dual-slot scheduler
  • Two-phase dispatch validated against both a2a3sim and hardware paths

@zhusy54 zhusy54 marked this pull request as draft April 8, 2026 07:13
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a dual-slot dispatch mechanism to the AICPU executor, allowing for task pipelining by tracking both running and pending tasks per core. It updates the CoreTracker and CoreExecState to manage these slots and implements a two-phase dispatch logic that prioritizes idle cores before filling pending slots. Additionally, the AICore performance collector was optimized to use a caller-maintained write index, reducing cache invalidation overhead. A fix for simulation environments was also included to prevent payload corruption. Review feedback identifies opportunities to optimize the scheduler's hot path by removing redundant bitmask refreshes.

@zhusy54 zhusy54 marked this pull request as ready for review April 10, 2026 07:55
@zhusy54 zhusy54 changed the title feat(aicpu): implement dual-slot scheduling with idle-core-first dispatch and deferred release Add: dual-slot AICPU dispatch payload and two-phase pipelining scheduler Apr 10, 2026
@zhusy54 zhusy54 force-pushed the dual-sched branch 2 times, most recently from 4abf76b to aa95038 Compare April 10, 2026 11:36
… with dual-watermark deferred task release

- Introduce two-slot dispatch payload (slot 0 / slot 1) for AICPU
- Implement two-phase pipelining: dispatch phase and execute phase
- Add dual-watermark mechanism for deferred task release
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant