Skip to content

Doc: sync L2 swimlane refs to post-split layout#946

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
hw-native-sys-bot:chore/swimlane-doc-consistency
May 31, 2026
Merged

Doc: sync L2 swimlane refs to post-split layout#946
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
hw-native-sys-bot:chore/swimlane-doc-consistency

Conversation

@hw-native-sys-bot
Copy link
Copy Markdown
Collaborator

Summary

After #939/#941/#942 merged, several comments and doc sections still referenced the pre-split a2a3 layout. Audit and update so code, comments, and docs/dfx/l2-swimlane-profiling.md are consistent with the current shape.

What changed

a2a3 code/comments

  • platform_config.h: PROF_BUFFERS_PER_THREAD now references both SchedPhaseBuffer + OrchPhaseBuffer; PROF_READYQUEUE_SIZE comment says "four kinds"; formula bumped 2× on the per-thread term to cover both pool arrays.
  • l2_swimlane_profiling.h header layout diagram: names the two split phase-thread counts.
  • l2_swimlane_collector_aicpu.cpp: cross-launch reset comment now references s_sched_phase_pools/s_orch_phase_pools and record_sched_phase/record_orch_phase.
  • scheduler_dispatch.cpp / aicpu_executor.cpp: comments cite split record types.

src/common/ shared comments

  • profiler_base.h / buffer_pool_manager.h: qualify L2SwimlaneAicpuPhaseHeader::magic example as "on a5" since the struct no longer exists on a2a3.

docs/dfx/l2-swimlane-profiling.md

  • §5.1 layout + record list now distinguish a2a3 split (40B Sched + 32B Orch, two pool arrays) from a5's still-unified shape (pending port).
  • §5.2 a2a3 buffer kinds = 4 (was 2); ASCII data-flow diagram redrawn; kBufferKinds = 4 in L2SwimlaneModule description.
  • §5.3 (a5): corrected num_phase_threads / core_to_thread[] reference to L2SwimlaneAicpuPhaseHeader (was wrongly attributed to L2SwimlaneDataHeader — that was always wrong for a5).
  • §5.4 comparison table: separates task record (identical) from phase record (diverged); ready-queue / kBufferKinds rows call out a2a3=4 vs a5=2.
  • §6 overhead: differentiates a2a3 per-emit SchedPhase + per-submit OrchPhase from a5 unified PhaseRecord (was: "4 phases × 40B per iter", a removed shape).
  • §8 FAQ: "phase records empty" entry gates a2a3 on num_{sched,orch}_phase_threads, a5 on PhaseHeader::magic.

Notes

  • Only one semantic code change: PROF_READYQUEUE_SIZE formula bumped (+~8KB header) — required correctness fix given the second phase pool array enqueues into the same per-thread ready queues.
  • Everything else is comments + docs.

Test plan

  • pre-commit clean
  • onboard l2_swimlane STs (--enable-l2-swimlane --enable-dep-gen): 2 passed
  • onboard paged_attention_unroll level 4: 1 passed

After hw-native-sys#939 (pool unification), hw-native-sys#941 (PhaseHeader merge), and hw-native-sys#942
(split sched/orch phase records), several comments and doc sections
still referenced the pre-split a2a3 layout. Audit and update:

a2a3 code/comments:
- platform_config.h: PROF_BUFFERS_PER_THREAD doc references both
  SchedPhaseBuffer and OrchPhaseBuffer (was: single PhaseBuffer);
  PROF_READYQUEUE_SIZE comment now says "four kinds"; formula bumped
  by 2x on the per-thread term to cover both sched and orch pool
  enqueues (matches host alloc which iterates both pool arrays).
- l2_swimlane_profiling.h header layout diagram: name the two split
  phase-thread counts.
- l2_swimlane_collector_aicpu.cpp: cross-launch reset comment now
  references s_sched_phase_pools / s_orch_phase_pools (was: single
  s_aicpu_phase_pools) and record_sched_phase / record_orch_phase.
- scheduler_dispatch.cpp / aicpu_executor.cpp: comments reference
  the split record types.

src/common/ shared comments (now mixed-arch):
- profiler_base.h / buffer_pool_manager.h: qualify
  L2SwimlaneAicpuPhaseHeader::magic example as "on a5" since the
  struct no longer exists on a2a3.

docs/dfx/l2-swimlane-profiling.md:
- §5.1: layout block + record list now distinguish a2a3 split shape
  (SchedPhaseRecord 40B + OrchPhaseRecord 32B, two pool arrays) from
  a5's still-unified shape (pending port).
- §5.2: a2a3 buffer-kind list updated to all four kinds (was: two);
  ASCII data-flow diagram redrawn to show split phase records;
  kBufferKinds = 4 in the L2SwimlaneModule trait description.
- §5.3 (a5): num_phase_threads / core_to_thread[] reference corrected
  to live in L2SwimlaneAicpuPhaseHeader on a5 (was wrongly attributed
  to L2SwimlaneDataHeader).
- §5.4: comparison table separates task record (identical) from
  phase record (diverged); ready-queue and kBufferKinds rows
  call out the a2a3=4 vs a5=2 split.
- §6: overhead description differentiates a2a3's per-emit
  SchedPhase + per-submit OrchPhase from a5's unified PhaseRecord
  (was: "4 phases × 40B per iteration", which described a removed
  shape).
- §8 FAQ: "phase records empty" entry gates a2a3 on
  num_{sched,orch}_phase_threads, a5 on PhaseHeader::magic.

No semantic code changes except the READYQUEUE_SIZE formula bump
(adds ~8KB to the header; necessary correctness fix given the second
phase pool).

Test plan:
- pre-commit clean
- onboard l2_swimlane STs (--enable-l2-swimlane --enable-dep-gen): 2 passed
- onboard paged_attention_unroll level 4: 1 passed
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 31, 2026

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bc4065d9-470b-4637-ac3a-18e01d7e6006

📥 Commits

Reviewing files that changed from the base of the PR and between d6ee27b and df02f47.

📒 Files selected for processing (8)
  • docs/dfx/l2-swimlane-profiling.md
  • src/a2a3/platform/include/common/l2_swimlane_profiling.h
  • src/a2a3/platform/include/common/platform_config.h
  • src/a2a3/platform/src/aicpu/l2_swimlane_collector_aicpu.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
  • src/common/platform/include/host/profiling_common/buffer_pool_manager.h
  • src/common/platform/include/host/profiling_common/profiler_base.h

📝 Walkthrough

Walkthrough

This PR updates documentation, configuration constants, and code comments to reflect the post-#942 L2 swimlane profiling architecture. The scheduler and orchestrator phases are now split into separate per-thread buffer pools with 4 multiplexed buffer kinds, requiring updated ready-queue capacity calculations and documentation describing the new model.

Changes

L2 Swimlane Phase Split Documentation and Config

Layer / File(s) Summary
L2 swimlane profiling documentation
docs/dfx/l2-swimlane-profiling.md
Comprehensive updates to describe scheduler/orchestrator phase pool split, 4 multiplexed buffer kinds flowing through single ready queue per AICPU thread, updated a5 host-shadow transport field races, a2a3-vs-a5 comparison table reflecting split-vs-unified phase models, conditional overhead section for --enable-l2-swimlane >= 3, and FAQ guidance using new phase gating variables (num_sched_phase_threads / num_orch_phase_threads).
Ready-queue capacity sizing and header memory layout
src/a2a3/platform/include/common/platform_config.h, src/a2a3/platform/include/common/l2_swimlane_profiling.h
PLATFORM_PROF_READYQUEUE_SIZE constant updated to account for doubled AICPU phase buffer component (separate scheduler and orchestrator buffers), and memory layout comment clarifies per-phase-type thread metadata naming.
Comment updates in platform and runtime collectors
src/a2a3/platform/src/aicpu/l2_swimlane_collector_aicpu.cpp, src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp, src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp, src/common/platform/include/host/profiling_common/buffer_pool_manager.h, src/common/platform/include/host/profiling_common/profiler_base.h
Scattered comment-only updates clarifying cached phase metadata, per-launch pool pointer lifecycle, phase record stream types (L2SwimlaneAicpuSchedPhaseRecord and L2SwimlaneAicpuOrchPhaseRecord), and shared-memory race-condition field lists in mirroring operations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • hw-native-sys/simpler#942: The main architectural change introducing the scheduler/orchestrator phase split, which this PR documents.
  • hw-native-sys/simpler#941: Related phase metadata refactoring that moved metadata into L2SwimlaneDataHeader, which the documentation reflects.

Poem

🐰 Phases split in two, scheduler and orch so clear,
Four kinds now dance together, multiplexed without fear,
Ready queue resizes, constants ring true,
Comments align with the architecture new.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: synchronizing L2 swimlane documentation and code references to reflect the post-split layout introduced in recent PRs.
Description check ✅ Passed The description is directly related to the changeset, providing detailed context about the updates to comments, code, and documentation to reflect the post-split a2a3 layout.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the documentation and code comments across several files to reflect the architectural split of L2 swimlane phase records on the a2a3 platform into separate scheduler (L2SwimlaneAicpuSchedPhaseRecord) and orchestrator (L2SwimlaneAicpuOrchPhaseRecord) streams, while the a5 platform retains its legacy unified shape. Additionally, the ready queue capacity (PLATFORM_PROF_READYQUEUE_SIZE) in platform_config.h is updated to correctly account for the four buffer kinds instead of three by doubling the thread buffer allocation term. There are no review comments provided, so no further feedback is available.

@ChaoWao ChaoWao merged commit ef33626 into hw-native-sys:main May 31, 2026
15 of 16 checks passed
@ChaoWao ChaoWao deleted the chore/swimlane-doc-consistency branch May 31, 2026 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants