Doc: sync L2 swimlane refs to post-split layout#946
Conversation
After hw-native-sys#939 (pool unification), hw-native-sys#941 (PhaseHeader merge), and hw-native-sys#942 (split sched/orch phase records), several comments and doc sections still referenced the pre-split a2a3 layout. Audit and update: a2a3 code/comments: - platform_config.h: PROF_BUFFERS_PER_THREAD doc references both SchedPhaseBuffer and OrchPhaseBuffer (was: single PhaseBuffer); PROF_READYQUEUE_SIZE comment now says "four kinds"; formula bumped by 2x on the per-thread term to cover both sched and orch pool enqueues (matches host alloc which iterates both pool arrays). - l2_swimlane_profiling.h header layout diagram: name the two split phase-thread counts. - l2_swimlane_collector_aicpu.cpp: cross-launch reset comment now references s_sched_phase_pools / s_orch_phase_pools (was: single s_aicpu_phase_pools) and record_sched_phase / record_orch_phase. - scheduler_dispatch.cpp / aicpu_executor.cpp: comments reference the split record types. src/common/ shared comments (now mixed-arch): - profiler_base.h / buffer_pool_manager.h: qualify L2SwimlaneAicpuPhaseHeader::magic example as "on a5" since the struct no longer exists on a2a3. docs/dfx/l2-swimlane-profiling.md: - §5.1: layout block + record list now distinguish a2a3 split shape (SchedPhaseRecord 40B + OrchPhaseRecord 32B, two pool arrays) from a5's still-unified shape (pending port). - §5.2: a2a3 buffer-kind list updated to all four kinds (was: two); ASCII data-flow diagram redrawn to show split phase records; kBufferKinds = 4 in the L2SwimlaneModule trait description. - §5.3 (a5): num_phase_threads / core_to_thread[] reference corrected to live in L2SwimlaneAicpuPhaseHeader on a5 (was wrongly attributed to L2SwimlaneDataHeader). - §5.4: comparison table separates task record (identical) from phase record (diverged); ready-queue and kBufferKinds rows call out the a2a3=4 vs a5=2 split. - §6: overhead description differentiates a2a3's per-emit SchedPhase + per-submit OrchPhase from a5's unified PhaseRecord (was: "4 phases × 40B per iteration", which described a removed shape). - §8 FAQ: "phase records empty" entry gates a2a3 on num_{sched,orch}_phase_threads, a5 on PhaseHeader::magic. No semantic code changes except the READYQUEUE_SIZE formula bump (adds ~8KB to the header; necessary correctness fix given the second phase pool). Test plan: - pre-commit clean - onboard l2_swimlane STs (--enable-l2-swimlane --enable-dep-gen): 2 passed - onboard paged_attention_unroll level 4: 1 passed
|
Caution Review failedPull request was closed or merged during review No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
📝 WalkthroughWalkthroughThis PR updates documentation, configuration constants, and code comments to reflect the post-#942 L2 swimlane profiling architecture. The scheduler and orchestrator phases are now split into separate per-thread buffer pools with 4 multiplexed buffer kinds, requiring updated ready-queue capacity calculations and documentation describing the new model. ChangesL2 Swimlane Phase Split Documentation and Config
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the documentation and code comments across several files to reflect the architectural split of L2 swimlane phase records on the a2a3 platform into separate scheduler (L2SwimlaneAicpuSchedPhaseRecord) and orchestrator (L2SwimlaneAicpuOrchPhaseRecord) streams, while the a5 platform retains its legacy unified shape. Additionally, the ready queue capacity (PLATFORM_PROF_READYQUEUE_SIZE) in platform_config.h is updated to correctly account for the four buffer kinds instead of three by doubling the thread buffer allocation term. There are no review comments provided, so no further feedback is available.
Summary
After #939/#941/#942 merged, several comments and doc sections still referenced the pre-split a2a3 layout. Audit and update so code, comments, and
docs/dfx/l2-swimlane-profiling.mdare consistent with the current shape.What changed
a2a3 code/comments
platform_config.h:PROF_BUFFERS_PER_THREADnow references bothSchedPhaseBuffer+OrchPhaseBuffer;PROF_READYQUEUE_SIZEcomment says "four kinds"; formula bumped 2× on the per-thread term to cover both pool arrays.l2_swimlane_profiling.hheader layout diagram: names the two split phase-thread counts.l2_swimlane_collector_aicpu.cpp: cross-launch reset comment now referencess_sched_phase_pools/s_orch_phase_poolsandrecord_sched_phase/record_orch_phase.scheduler_dispatch.cpp/aicpu_executor.cpp: comments cite split record types.src/common/ shared comments
profiler_base.h/buffer_pool_manager.h: qualifyL2SwimlaneAicpuPhaseHeader::magicexample as "on a5" since the struct no longer exists on a2a3.docs/dfx/l2-swimlane-profiling.md
kBufferKinds = 4inL2SwimlaneModuledescription.num_phase_threads/core_to_thread[]reference toL2SwimlaneAicpuPhaseHeader(was wrongly attributed toL2SwimlaneDataHeader— that was always wrong for a5).kBufferKindsrows call out a2a3=4 vs a5=2.SchedPhase+ per-submitOrchPhasefrom a5 unifiedPhaseRecord(was: "4 phases × 40B per iter", a removed shape).num_{sched,orch}_phase_threads, a5 onPhaseHeader::magic.Notes
PROF_READYQUEUE_SIZEformula bumped (+~8KB header) — required correctness fix given the second phase pool array enqueues into the same per-thread ready queues.Test plan
l2_swimlane STs(--enable-l2-swimlane --enable-dep-gen): 2 passedpaged_attention_unrolllevel 4: 1 passed