-
Notifications
You must be signed in to change notification settings - Fork 52
Refactor: src/ layout cleanup (extract aicpu + rename platform/src→shared + consolidate common/) #950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Refactor: src/ layout cleanup (extract aicpu + rename platform/src→shared + consolidate common/) #950
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -359,7 +359,7 @@ sched overhead per session as price for unbounded session length). | |
| `halHostRegister` maps device memory into host virtual address | ||
| space so the host can read device buffers directly. | ||
| `L2SwimlaneCollector` runs two background threads on top of a | ||
| [`BufferPoolManager<L2SwimlaneModule>`](../src/a2a3/platform/include/host/profiling_common/buffer_pool_manager.h): | ||
| [`BufferPoolManager<L2SwimlaneModule>`](../src/a2a3/platform/include/host/buffer_pool_manager.h): | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Update these links to the consolidated common header path. Line 362, Line 435, Line 563, and Line 565 still reference arch-specific header locations, but this PR consolidates profiling headers under Also applies to: 435-435, 563-563, 565-565 🤖 Prompt for AI Agents |
||
| a mgmt thread that polls SPSC ready queues and recycles full | ||
| buffers **while kernels are still executing**, plus a poll | ||
| thread that drains the L2 hand-off queue into | ||
|
|
@@ -432,7 +432,7 @@ finalize(unregister, free) | |
|
|
||
| [`L2SwimlaneCollector`](../src/a2a3/platform/include/host/l2_swimlane_collector.h) | ||
| on a2a3 inherits from | ||
| [`profiling_common::ProfilerBase<L2SwimlaneCollector, L2SwimlaneModule>`](../src/a2a3/platform/include/host/profiling_common/profiler_base.h): | ||
| [`profiling_common::ProfilerBase<L2SwimlaneCollector, L2SwimlaneModule>`](../src/a2a3/platform/include/host/profiler_base.h): | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| the base class owns the mgmt thread, the poll thread, and the | ||
| `BufferPoolManager<L2SwimlaneModule>` they share. `L2SwimlaneCollector` | ||
| supplies the L2-specific pieces — the `L2SwimlaneModule` trait | ||
|
|
@@ -560,9 +560,9 @@ l2_swimlane_collector_.finalize() | |
|
|
||
| [`L2SwimlaneCollector`](../src/a5/platform/include/host/l2_swimlane_collector.h) | ||
| on a5 inherits the same CRTP base | ||
| ([`profiling_common::ProfilerBase`](../src/a5/platform/include/host/profiling_common/profiler_base.h)) | ||
| ([`profiling_common::ProfilerBase`](../src/a5/platform/include/host/profiler_base.h)) | ||
| as a2a3 and parameterizes | ||
| [`BufferPoolManager`](../src/a5/platform/include/host/profiling_common/buffer_pool_manager.h) | ||
| [`BufferPoolManager`](../src/a5/platform/include/host/buffer_pool_manager.h) | ||
|
Comment on lines
+563
to
+565
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| with `L2SwimlaneModule` (`kBufferKinds = 2`). The only a5-specific | ||
| glue is the 5-callback `MemoryOps` and the per-tick shm mirror. | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -293,7 +293,7 @@ finalize(unregister, free) | |
|
|
||
| [`PmuCollector`](../src/a2a3/platform/include/host/pmu_collector.h) | ||
| inherits from | ||
| [`profiling_common::ProfilerBase<PmuCollector, PmuModule>`](../src/a2a3/platform/include/host/profiling_common/profiler_base.h): | ||
| [`profiling_common::ProfilerBase<PmuCollector, PmuModule>`](../src/a2a3/platform/include/host/profiler_base.h): | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix stale PMU framework links after header consolidation. Line 296, Line 464, and Line 466 still point to arch-specific host header paths. These should link to Also applies to: 464-464, 466-466 🤖 Prompt for AI Agents |
||
| the base class owns the mgmt thread, the poll thread, and the | ||
| `BufferPoolManager<PmuModule>` they share. `PmuCollector` only supplies | ||
| the PMU-specific pieces — the `PmuModule` trait that describes the | ||
|
|
@@ -461,9 +461,9 @@ guarantees neighboring register tokens differ by 1 → different slots). | |
|
|
||
| [`PmuCollector`](../src/a5/platform/include/host/pmu_collector.h) on | ||
| a5 inherits the same CRTP base | ||
| ([`profiling_common::ProfilerBase`](../src/a5/platform/include/host/profiling_common/profiler_base.h)) | ||
| ([`profiling_common::ProfilerBase`](../src/a5/platform/include/host/profiler_base.h)) | ||
| as a2a3 and parameterizes | ||
| [`BufferPoolManager`](../src/a5/platform/include/host/profiling_common/buffer_pool_manager.h) | ||
| [`BufferPoolManager`](../src/a5/platform/include/host/buffer_pool_manager.h) | ||
|
Comment on lines
+464
to
+466
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| with `PmuModule`. The only a5-specific glue is the 5-callback | ||
| `MemoryOps` and the per-tick shm mirror. | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -43,7 +43,7 @@ Scope stats uses a clean platform-provides / runtime-calls pattern: | |
| platform/include/aicpu/scope_stats_collector.h | ||
| Pure-value API declarations. No runtime types cross this boundary. | ||
|
|
||
| platform/src/aicpu/scope_stats_collector.cpp | ||
| platform/shared/aicpu/scope_stats_collector.cpp | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| Owns all collector state (depth stack, peak arrays, shared buffer). | ||
| Implements scope lifecycle (begin/end), peak comparison logic, | ||
| capacity registration, and shared buffer record writes. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -361,7 +361,7 @@ normal execution continues. | |
| `halHostRegister` maps device memory into host virtual address | ||
| space so the host can read device buffers directly. | ||
| `TensorDumpCollector` runs two background threads on top of a | ||
| [`BufferPoolManager<DumpModule>`](../src/a2a3/platform/include/host/profiling_common/buffer_pool_manager.h): | ||
| [`BufferPoolManager<DumpModule>`](../src/a2a3/platform/include/host/buffer_pool_manager.h): | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Retarget TensorDump framework links to common include headers. Line 364, Line 426, Line 531, and Line 533 still use arch-specific host-header paths. Update these links to Also applies to: 426-426, 531-531, 533-533 🤖 Prompt for AI Agents |
||
| a mgmt thread that polls SPSC ready queues and recycles full | ||
| metadata buffers **while kernels are still executing**, plus a | ||
| poll thread that drains the L2 hand-off queue into | ||
|
|
@@ -423,7 +423,7 @@ export_dump_files() | |
|
|
||
| [`TensorDumpCollector`](../src/a2a3/platform/include/host/tensor_dump_collector.h) | ||
| on a2a3 inherits from | ||
| [`profiling_common::ProfilerBase<TensorDumpCollector, DumpModule>`](../src/a2a3/platform/include/host/profiling_common/profiler_base.h): | ||
| [`profiling_common::ProfilerBase<TensorDumpCollector, DumpModule>`](../src/a2a3/platform/include/host/profiler_base.h): | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| the base class owns the mgmt thread, the poll thread, and the | ||
| `BufferPoolManager<DumpModule>` they share. `TensorDumpCollector` | ||
| only supplies the dump-specific pieces — the `DumpModule` trait | ||
|
|
@@ -528,9 +528,9 @@ dump_collector_.finalize() | |
|
|
||
| [`TensorDumpCollector`](../src/a5/platform/include/host/tensor_dump_collector.h) | ||
| on a5 inherits the same CRTP base | ||
| ([`profiling_common::ProfilerBase`](../src/a5/platform/include/host/profiling_common/profiler_base.h)) | ||
| ([`profiling_common::ProfilerBase`](../src/a5/platform/include/host/profiler_base.h)) | ||
| as a2a3 and parameterizes | ||
| [`BufferPoolManager`](../src/a5/platform/include/host/profiling_common/buffer_pool_manager.h) | ||
| [`BufferPoolManager`](../src/a5/platform/include/host/buffer_pool_manager.h) | ||
|
Comment on lines
+531
to
+533
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| with `DumpModule`. The only a5-specific glue is the 5-callback | ||
| `MemoryOps`, the per-tick shm mirror, and the on-demand arena copy | ||
| inside `on_buffer_collected`. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -99,7 +99,7 @@ Two implementations link the same ABI symbols: | |
| | Symbol owner | Implementation file | Backend | | ||
| | ------------ | ------------------- | ------- | | ||
| | `libsimpler_log.so` (host) | `src/common/log/unified_log_host.cpp` | `HostLogger` → stderr | | ||
| | AICPU binary (device) | `src/{arch}/platform/src/aicpu/unified_log_device.cpp` | `dev_vlog_*` → backend | | ||
| | AICPU binary (device) | `src/{arch}/platform/shared/aicpu/unified_log_device.cpp` | `dev_vlog_*` → backend | | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| The host `.so` is loaded with `RTLD_GLOBAL` so all consumer `.so`s | ||
| (`host_runtime`, `cpu_sim_context`, sim `aicore_kernel`, the binding) resolve | ||
|
|
@@ -184,7 +184,7 @@ are allowed by default. CMake blocks live in: | |
|
|
||
| - `src/{a5,a2a3}/platform/sim/host/CMakeLists.txt` | ||
| - `src/{a5,a2a3}/platform/sim/aicore/CMakeLists.txt` | ||
| - `src/common/sim_context/CMakeLists.txt` | ||
| - `src/common/platform/sim/sim_context/CMakeLists.txt` | ||
|
|
||
| (Onboard host `.so` builds Linux-only and needs no flag.) | ||
|
|
||
|
|
@@ -274,6 +274,6 @@ RTLD_GLOBAL)`s them before handing off to the C++ `_ChipWorker.init`. | |
| | Change the host output format / pattern | `src/common/log/host_log.cpp::HostLogger::emit` | | ||
| | Change the sim AICPU output format | `src/{arch}/platform/sim/aicpu/device_log.cpp::dev_vlog_*` | | ||
| | Change the onboard AICPU CANN dlog tagging | `src/{arch}/platform/onboard/aicpu/device_log.cpp::dev_vlog_*` | | ||
| | Add a new C ABI entry point (e.g. dynamic config push) | `src/common/log/include/common/unified_log.h` + `unified_log_host.cpp` + `src/{arch}/platform/src/aicpu/unified_log_device.cpp` | | ||
| | Add a new C ABI entry point (e.g. dynamic config push) | `src/common/log/include/common/unified_log.h` + `unified_log_host.cpp` + `src/{arch}/platform/shared/aicpu/unified_log_device.cpp` | | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| | Hook a new consumer `.so` | declare `target_include_directories(target PRIVATE src/common/log/include)`; for host code also link `simpler_log` (or use undefined symbol resolution at runtime via `RTLD_GLOBAL` load) | | ||
| | Add a new severity / verbosity tier | `python/simpler/_log.py` (Python integer + `addLevelName`) + `host_log.h::LogLevel` (if a new severity) + `_split_threshold` (band mapping) + AICPU `set_log_*` setters | | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,9 +2,9 @@ | |
|
|
||
| Shared host-side infrastructure that the PMU, L2Swimlane, and TensorDump | ||
| collectors are built on. Each architecture maintains its own copy of the | ||
| framework headers under `src/<arch>/platform/include/host/profiling_common/` | ||
| ([a2a3](../src/a2a3/platform/include/host/profiling_common/), | ||
| [a5](../src/a5/platform/include/host/profiling_common/)) — the two copies | ||
| framework headers under `src/common/platform/include/host/` | ||
| ([a2a3](../src/common/platform/include/host/), | ||
| [a5](../src/common/platform/include/host/)) — the two copies | ||
|
Comment on lines
+5
to
+7
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since the profiling framework headers have been consolidated into a single shared directory (src/common/platform/include/host/), they are no longer maintained as separate per-architecture copies. The duplicate links for a2a3 and a5 should be replaced with a single link to common, and the surrounding text (lines 4 and 8-9) should be updated to remove references to 'two copies' and 'each arch is free to evolve independently'. |
||
| are kept byte-for-byte structurally aligned so reviewers can diff them, but | ||
|
Comment on lines
+5
to
8
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Resolve contradiction and stale links in framework header location docs. Line 5–Line 8 says there are per-arch header copies, but both links point to the same common directory. Also, Line 68, Line 79, Line 100, Line 115, and Line 135 still reference Also applies to: 68-68, 79-79, 100-100, 115-115, 135-135 🤖 Prompt for AI Agents |
||
| each arch is free to evolve independently. This page describes the shape; | ||
| §8 covers the a5-specific deviations driven by transport differences. | ||
|
|
@@ -65,7 +65,7 @@ a small per-subsystem trait. | |
| ``` | ||
|
|
||
| `ProfilerBase` is the owner: it holds `BufferPoolManager manager_` as a | ||
| member ([profiler_base.h:414](../src/a2a3/platform/include/host/profiling_common/profiler_base.h#L414)), | ||
| member ([profiler_base.h:414](../src/a2a3/platform/include/host/profiler_base.h#L414)), | ||
| spawns and joins both threads, and dispatches collected buffers to | ||
| `Derived::on_buffer_collected` via CRTP. `BufferPoolManager` owns no | ||
| threads — it is just the shared data structure both threads access. | ||
|
|
@@ -76,7 +76,7 @@ subsystem's shared-memory layout is shaped. | |
|
|
||
| ### 3.1 `BufferPoolManager<Module>` — data layer | ||
|
|
||
| Defined in [`buffer_pool_manager.h`](../src/a2a3/platform/include/host/profiling_common/buffer_pool_manager.h). | ||
| Defined in [`buffer_pool_manager.h`](../src/a2a3/platform/include/host/buffer_pool_manager.h). | ||
| Owns: | ||
|
|
||
| - `ready_queue_` — mgmt → collector hand-off, guarded by mutex+cv. | ||
|
|
@@ -97,7 +97,7 @@ Owns no threads. Every entry point is documented as one of: | |
|
|
||
| ### 3.2 `ProfilerBase<Derived, Module>` — control layer | ||
|
|
||
| Defined in [`profiler_base.h`](../src/a2a3/platform/include/host/profiling_common/profiler_base.h). | ||
| Defined in [`profiler_base.h`](../src/a2a3/platform/include/host/profiler_base.h). | ||
| Provides: | ||
|
|
||
| - The two threads and their lifecycle (`start` / `stop`). | ||
|
|
@@ -112,7 +112,7 @@ Provides: | |
| stash the alloc/reg/free callbacks before threads start; if init aborts | ||
| before stashing, `start(tf)` becomes a no-op. | ||
|
|
||
| `ProfilerAlgorithms<Module>` (in the same header, [profiler_base.h:170](../src/a2a3/platform/include/host/profiling_common/profiler_base.h#L170)) | ||
| `ProfilerAlgorithms<Module>` (in the same header, [profiler_base.h:170](../src/a2a3/platform/include/host/profiler_base.h#L170)) | ||
| is where the unified algorithms live: | ||
|
|
||
| - `try_pop_aicpu_entry` — barrier-correct head/tail advance over the | ||
|
|
@@ -132,7 +132,7 @@ is where the unified algorithms live: | |
| A stateless `struct` per subsystem (`PmuModule`, `L2SwimlaneModule`, | ||
| `DumpModule`) that tells the generic algorithms what the shared-memory | ||
| layout looks like. The contract lives in the docblock at the top of | ||
| [`profiler_base.h`](../src/a2a3/platform/include/host/profiling_common/profiler_base.h); | ||
| [`profiler_base.h`](../src/a2a3/platform/include/host/profiler_base.h); | ||
| the required members are: | ||
|
|
||
| | Member | Purpose | | ||
|
|
@@ -305,7 +305,7 @@ Existing collectors are the canonical examples: | |
| ## 8. a5 specifics — host-shadow transport | ||
|
|
||
| a5's framework headers (under | ||
| [`src/a5/platform/include/host/profiling_common/`](../src/a5/platform/include/host/profiling_common/)) | ||
| [`src/common/platform/include/host/`](../src/common/platform/include/host/)) | ||
| mirror a2a3's class shapes — same `ProfilerBase<Derived, Module>` / | ||
| `BufferPoolManager<Module>` / `ProfilerAlgorithms<Module>` decomposition, | ||
| same Module concept contract, same start/stop lifecycle. The only | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,7 +20,7 @@ | |
| * carries fanout to keep AICPU off the per-task GM-store critical path. | ||
| * | ||
| * Streaming buffer design mirrors PMU / L2Swimlane / TensorDump (single source of | ||
| * algorithmic truth in src/a2a3/platform/include/host/profiling_common/profiler_base.h): | ||
| * algorithmic truth in src/a2a3/platform/include/host/profiler_base.h): | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| * | ||
| * DepGenFreeQueue — SPSC: Host pushes free DepGenBuffers, AICPU pops them. | ||
| * DepGenBufferState — Per-instance state: free_queue + current buffer ptr. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The buffer_pool_manager.h header is located in the shared src/common/platform/include/host/ directory, not under src/a2a3/. The link should be updated to point to the correct path.