Doc: sync L2 swimlane refs to post-split layout by hw-native-sys-bot · Pull Request #946 · hw-native-sys/simpler

hw-native-sys-bot · 2026-05-31T12:09:29Z

Summary

After #939/#941/#942 merged, several comments and doc sections still referenced the pre-split a2a3 layout. Audit and update so code, comments, and docs/dfx/l2-swimlane-profiling.md are consistent with the current shape.

What changed

a2a3 code/comments

platform_config.h: PROF_BUFFERS_PER_THREAD now references both SchedPhaseBuffer + OrchPhaseBuffer; PROF_READYQUEUE_SIZE comment says "four kinds"; formula bumped 2× on the per-thread term to cover both pool arrays.
l2_swimlane_profiling.h header layout diagram: names the two split phase-thread counts.
l2_swimlane_collector_aicpu.cpp: cross-launch reset comment now references s_sched_phase_pools/s_orch_phase_pools and record_sched_phase/record_orch_phase.
scheduler_dispatch.cpp / aicpu_executor.cpp: comments cite split record types.

src/common/ shared comments

profiler_base.h / buffer_pool_manager.h: qualify L2SwimlaneAicpuPhaseHeader::magic example as "on a5" since the struct no longer exists on a2a3.

docs/dfx/l2-swimlane-profiling.md

§5.1 layout + record list now distinguish a2a3 split (40B Sched + 32B Orch, two pool arrays) from a5's still-unified shape (pending port).
§5.2 a2a3 buffer kinds = 4 (was 2); ASCII data-flow diagram redrawn; kBufferKinds = 4 in L2SwimlaneModule description.
§5.3 (a5): corrected num_phase_threads / core_to_thread[] reference to L2SwimlaneAicpuPhaseHeader (was wrongly attributed to L2SwimlaneDataHeader — that was always wrong for a5).
§5.4 comparison table: separates task record (identical) from phase record (diverged); ready-queue / kBufferKinds rows call out a2a3=4 vs a5=2.
§6 overhead: differentiates a2a3 per-emit SchedPhase + per-submit OrchPhase from a5 unified PhaseRecord (was: "4 phases × 40B per iter", a removed shape).
§8 FAQ: "phase records empty" entry gates a2a3 on num_{sched,orch}_phase_threads, a5 on PhaseHeader::magic.

Notes

Only one semantic code change: PROF_READYQUEUE_SIZE formula bumped (+~8KB header) — required correctness fix given the second phase pool array enqueues into the same per-thread ready queues.
Everything else is comments + docs.

Test plan

pre-commit clean
onboard l2_swimlane STs (--enable-l2-swimlane --enable-dep-gen): 2 passed
onboard paged_attention_unroll level 4: 1 passed

After hw-native-sys#939 (pool unification), hw-native-sys#941 (PhaseHeader merge), and hw-native-sys#942 (split sched/orch phase records), several comments and doc sections still referenced the pre-split a2a3 layout. Audit and update: a2a3 code/comments: - platform_config.h: PROF_BUFFERS_PER_THREAD doc references both SchedPhaseBuffer and OrchPhaseBuffer (was: single PhaseBuffer); PROF_READYQUEUE_SIZE comment now says "four kinds"; formula bumped by 2x on the per-thread term to cover both sched and orch pool enqueues (matches host alloc which iterates both pool arrays). - l2_swimlane_profiling.h header layout diagram: name the two split phase-thread counts. - l2_swimlane_collector_aicpu.cpp: cross-launch reset comment now references s_sched_phase_pools / s_orch_phase_pools (was: single s_aicpu_phase_pools) and record_sched_phase / record_orch_phase. - scheduler_dispatch.cpp / aicpu_executor.cpp: comments reference the split record types. src/common/ shared comments (now mixed-arch): - profiler_base.h / buffer_pool_manager.h: qualify L2SwimlaneAicpuPhaseHeader::magic example as "on a5" since the struct no longer exists on a2a3. docs/dfx/l2-swimlane-profiling.md: - §5.1: layout block + record list now distinguish a2a3 split shape (SchedPhaseRecord 40B + OrchPhaseRecord 32B, two pool arrays) from a5's still-unified shape (pending port). - §5.2: a2a3 buffer-kind list updated to all four kinds (was: two); ASCII data-flow diagram redrawn to show split phase records; kBufferKinds = 4 in the L2SwimlaneModule trait description. - §5.3 (a5): num_phase_threads / core_to_thread[] reference corrected to live in L2SwimlaneAicpuPhaseHeader on a5 (was wrongly attributed to L2SwimlaneDataHeader). - §5.4: comparison table separates task record (identical) from phase record (diverged); ready-queue and kBufferKinds rows call out the a2a3=4 vs a5=2 split. - §6: overhead description differentiates a2a3's per-emit SchedPhase + per-submit OrchPhase from a5's unified PhaseRecord (was: "4 phases × 40B per iteration", which described a removed shape). - §8 FAQ: "phase records empty" entry gates a2a3 on num_{sched,orch}_phase_threads, a5 on PhaseHeader::magic. No semantic code changes except the READYQUEUE_SIZE formula bump (adds ~8KB to the header; necessary correctness fix given the second phase pool). Test plan: - pre-commit clean - onboard l2_swimlane STs (--enable-l2-swimlane --enable-dep-gen): 2 passed - onboard paged_attention_unroll level 4: 1 passed

coderabbitai · 2026-05-31T12:10:08Z

Caution

Review failed

Pull request was closed or merged during review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bc4065d9-470b-4637-ac3a-18e01d7e6006

📥 Commits

Reviewing files that changed from the base of the PR and between d6ee27b and df02f47.

📒 Files selected for processing (8)

docs/dfx/l2-swimlane-profiling.md
src/a2a3/platform/include/common/l2_swimlane_profiling.h
src/a2a3/platform/include/common/platform_config.h
src/a2a3/platform/src/aicpu/l2_swimlane_collector_aicpu.cpp
src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
src/common/platform/include/host/profiling_common/buffer_pool_manager.h
src/common/platform/include/host/profiling_common/profiler_base.h

📝 Walkthrough

Walkthrough

This PR updates documentation, configuration constants, and code comments to reflect the post-#942 L2 swimlane profiling architecture. The scheduler and orchestrator phases are now split into separate per-thread buffer pools with 4 multiplexed buffer kinds, requiring updated ready-queue capacity calculations and documentation describing the new model.

Changes

L2 Swimlane Phase Split Documentation and Config

Layer / File(s)	Summary
L2 swimlane profiling documentation `docs/dfx/l2-swimlane-profiling.md`	Comprehensive updates to describe scheduler/orchestrator phase pool split, 4 multiplexed buffer kinds flowing through single ready queue per AICPU thread, updated a5 host-shadow transport field races, a2a3-vs-a5 comparison table reflecting split-vs-unified phase models, conditional overhead section for `--enable-l2-swimlane >= 3`, and FAQ guidance using new phase gating variables (`num_sched_phase_threads` / `num_orch_phase_threads`).
Ready-queue capacity sizing and header memory layout `src/a2a3/platform/include/common/platform_config.h`, `src/a2a3/platform/include/common/l2_swimlane_profiling.h`	`PLATFORM_PROF_READYQUEUE_SIZE` constant updated to account for doubled AICPU phase buffer component (separate scheduler and orchestrator buffers), and memory layout comment clarifies per-phase-type thread metadata naming.
Comment updates in platform and runtime collectors `src/a2a3/platform/src/aicpu/l2_swimlane_collector_aicpu.cpp`, `src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp`, `src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp`, `src/common/platform/include/host/profiling_common/buffer_pool_manager.h`, `src/common/platform/include/host/profiling_common/profiler_base.h`	Scattered comment-only updates clarifying cached phase metadata, per-launch pool pointer lifecycle, phase record stream types (`L2SwimlaneAicpuSchedPhaseRecord` and `L2SwimlaneAicpuOrchPhaseRecord`), and shared-memory race-condition field lists in mirroring operations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

hw-native-sys/simpler#942: The main architectural change introducing the scheduler/orchestrator phase split, which this PR documents.
hw-native-sys/simpler#941: Related phase metadata refactoring that moved metadata into L2SwimlaneDataHeader, which the documentation reflects.

Poem

🐰 Phases split in two, scheduler and orch so clear,
Four kinds now dance together, multiplexed without fear,
Ready queue resizes, constants ring true,
Comments align with the architecture new. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: synchronizing L2 swimlane documentation and code references to reflect the post-split layout introduced in recent PRs.
Description check	✅ Passed	The description is directly related to the changeset, providing detailed context about the updates to comments, code, and documentation to reflect the post-split a2a3 layout.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request updates the documentation and code comments across several files to reflect the architectural split of L2 swimlane phase records on the a2a3 platform into separate scheduler (L2SwimlaneAicpuSchedPhaseRecord) and orchestrator (L2SwimlaneAicpuOrchPhaseRecord) streams, while the a5 platform retains its legacy unified shape. Additionally, the ready queue capacity (PLATFORM_PROF_READYQUEUE_SIZE) in platform_config.h is updated to correctly account for the four buffer kinds instead of three by doubling the thread buffer allocation term. There are no review comments provided, so no further feedback is available.

gemini-code-assist Bot reviewed May 31, 2026

View reviewed changes

ChaoWao approved these changes May 31, 2026

View reviewed changes

ChaoWao merged commit ef33626 into hw-native-sys:main May 31, 2026
15 of 16 checks passed

ChaoWao deleted the chore/swimlane-doc-consistency branch May 31, 2026 12:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc: sync L2 swimlane refs to post-split layout#946

Doc: sync L2 swimlane refs to post-split layout#946
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
hw-native-sys-bot:chore/swimlane-doc-consistency

hw-native-sys-bot commented May 31, 2026

Uh oh!

coderabbitai Bot commented May 31, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hw-native-sys-bot commented May 31, 2026

Summary

What changed

Notes

Test plan

Uh oh!

coderabbitai Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 31, 2026 •

edited

Loading