Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/developer-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ pto-runtime/
│ └── {arch}/ # Architecture-specific code (a2a3, a5)
│ ├── platform/ # Platform-specific implementations
│ │ ├── include/ # Shared headers (host/, aicpu/, aicore/, common/)
│ │ ├── src/ # Shared source (compiled into both backends)
│ │ ├── shared/ # Sources shared between onboard and sim backends (compiled into both)
│ │ ├── onboard/ # Real hardware backend
│ │ │ ├── host/ # Host runtime (.so)
│ │ │ ├── aicpu/ # AICPU kernel (.so)
Expand Down
8 changes: 4 additions & 4 deletions docs/dfx/l2-swimlane-profiling.md
Original file line number Diff line number Diff line change
Expand Up @@ -359,7 +359,7 @@ sched overhead per session as price for unbounded session length).
`halHostRegister` maps device memory into host virtual address
space so the host can read device buffers directly.
`L2SwimlaneCollector` runs two background threads on top of a
[`BufferPoolManager<L2SwimlaneModule>`](../src/a2a3/platform/include/host/profiling_common/buffer_pool_manager.h):
[`BufferPoolManager<L2SwimlaneModule>`](../src/a2a3/platform/include/host/buffer_pool_manager.h):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The buffer_pool_manager.h header is located in the shared src/common/platform/include/host/ directory, not under src/a2a3/. The link should be updated to point to the correct path.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update these links to the consolidated common header path.

Line 362, Line 435, Line 563, and Line 565 still reference arch-specific header locations, but this PR consolidates profiling headers under src/common/platform/include/host/. Please retarget these links to avoid stale/broken documentation paths.

Also applies to: 435-435, 563-563, 565-565

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/dfx/l2-swimlane-profiling.md` at line 362, Several documentation links
still point to arch-specific profiling headers (e.g., the link referencing
BufferPoolManager<L2SwimlaneModule>) and must be retargeted to the consolidated
common header location; find the markdown occurrences that link to
architecture-specific include headers (the BufferPoolManager<L2SwimlaneModule>
link and the other similar header links) and update their URLs to point to the
consolidated common platform include directory (the centralized host profiling
headers) so the docs reference the shared header location instead of the old
arch-specific paths.

a mgmt thread that polls SPSC ready queues and recycles full
buffers **while kernels are still executing**, plus a poll
thread that drains the L2 hand-off queue into
Expand Down Expand Up @@ -432,7 +432,7 @@ finalize(unregister, free)

[`L2SwimlaneCollector`](../src/a2a3/platform/include/host/l2_swimlane_collector.h)
on a2a3 inherits from
[`profiling_common::ProfilerBase<L2SwimlaneCollector, L2SwimlaneModule>`](../src/a2a3/platform/include/host/profiling_common/profiler_base.h):
[`profiling_common::ProfilerBase<L2SwimlaneCollector, L2SwimlaneModule>`](../src/a2a3/platform/include/host/profiler_base.h):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The profiler_base.h header is located in the shared src/common/platform/include/host/ directory, not under src/a2a3/. The link should be updated to point to the correct path.

the base class owns the mgmt thread, the poll thread, and the
`BufferPoolManager<L2SwimlaneModule>` they share. `L2SwimlaneCollector`
supplies the L2-specific pieces — the `L2SwimlaneModule` trait
Expand Down Expand Up @@ -560,9 +560,9 @@ l2_swimlane_collector_.finalize()

[`L2SwimlaneCollector`](../src/a5/platform/include/host/l2_swimlane_collector.h)
on a5 inherits the same CRTP base
([`profiling_common::ProfilerBase`](../src/a5/platform/include/host/profiling_common/profiler_base.h))
([`profiling_common::ProfilerBase`](../src/a5/platform/include/host/profiler_base.h))
as a2a3 and parameterizes
[`BufferPoolManager`](../src/a5/platform/include/host/profiling_common/buffer_pool_manager.h)
[`BufferPoolManager`](../src/a5/platform/include/host/buffer_pool_manager.h)
Comment on lines +563 to +565
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The profiler_base.h and buffer_pool_manager.h headers are located in the shared src/common/platform/include/host/ directory, not under src/a5/. The links should be updated to point to the correct path.

with `L2SwimlaneModule` (`kBufferKinds = 2`). The only a5-specific
glue is the 5-callback `MemoryOps` and the per-tick shm mirror.

Expand Down
6 changes: 3 additions & 3 deletions docs/dfx/pmu-profiling.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,7 @@ finalize(unregister, free)

[`PmuCollector`](../src/a2a3/platform/include/host/pmu_collector.h)
inherits from
[`profiling_common::ProfilerBase<PmuCollector, PmuModule>`](../src/a2a3/platform/include/host/profiling_common/profiler_base.h):
[`profiling_common::ProfilerBase<PmuCollector, PmuModule>`](../src/a2a3/platform/include/host/profiler_base.h):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The profiler_base.h header is located in the shared src/common/platform/include/host/ directory, not under src/a2a3/. The link should be updated to point to the correct path.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix stale PMU framework links after header consolidation.

Line 296, Line 464, and Line 466 still point to arch-specific host header paths. These should link to src/common/platform/include/host/... to match the refactor and keep docs navigable.

Also applies to: 464-464, 466-466

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/dfx/pmu-profiling.md` at line 296, Update the stale documentation links
that reference arch-specific host headers so they point to the consolidated
common headers; specifically change the link target for
profiling_common::ProfilerBase<PmuCollector, PmuModule> and the other two links
mentioned (the occurrences around lines referencing the PMU framework) to use
the new path prefix src/common/platform/include/host/... instead of the old
arch-specific host header paths, ensuring each markdown link target is replaced
with the consolidated header location so the docs resolve correctly.

the base class owns the mgmt thread, the poll thread, and the
`BufferPoolManager<PmuModule>` they share. `PmuCollector` only supplies
the PMU-specific pieces — the `PmuModule` trait that describes the
Expand Down Expand Up @@ -461,9 +461,9 @@ guarantees neighboring register tokens differ by 1 → different slots).

[`PmuCollector`](../src/a5/platform/include/host/pmu_collector.h) on
a5 inherits the same CRTP base
([`profiling_common::ProfilerBase`](../src/a5/platform/include/host/profiling_common/profiler_base.h))
([`profiling_common::ProfilerBase`](../src/a5/platform/include/host/profiler_base.h))
as a2a3 and parameterizes
[`BufferPoolManager`](../src/a5/platform/include/host/profiling_common/buffer_pool_manager.h)
[`BufferPoolManager`](../src/a5/platform/include/host/buffer_pool_manager.h)
Comment on lines +464 to +466
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The profiler_base.h and buffer_pool_manager.h headers are located in the shared src/common/platform/include/host/ directory, not under src/a5/. The links should be updated to point to the correct path.

with `PmuModule`. The only a5-specific glue is the 5-callback
`MemoryOps` and the per-tick shm mirror.

Expand Down
2 changes: 1 addition & 1 deletion docs/dfx/scope-stats.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Scope stats uses a clean platform-provides / runtime-calls pattern:
platform/include/aicpu/scope_stats_collector.h
Pure-value API declarations. No runtime types cross this boundary.

platform/src/aicpu/scope_stats_collector.cpp
platform/shared/aicpu/scope_stats_collector.cpp
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The device-side scope stats collector file is named scope_stats_collector_aicpu.cpp, not scope_stats_collector.cpp. The path should be updated to reflect the correct file name.

Owns all collector state (depth stack, peak arrays, shared buffer).
Implements scope lifecycle (begin/end), peak comparison logic,
capacity registration, and shared buffer record writes.
Expand Down
8 changes: 4 additions & 4 deletions docs/dfx/tensor-dump.md
Original file line number Diff line number Diff line change
Expand Up @@ -361,7 +361,7 @@ normal execution continues.
`halHostRegister` maps device memory into host virtual address
space so the host can read device buffers directly.
`TensorDumpCollector` runs two background threads on top of a
[`BufferPoolManager<DumpModule>`](../src/a2a3/platform/include/host/profiling_common/buffer_pool_manager.h):
[`BufferPoolManager<DumpModule>`](../src/a2a3/platform/include/host/buffer_pool_manager.h):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The buffer_pool_manager.h header is located in the shared src/common/platform/include/host/ directory, not under src/a2a3/. The link should be updated to point to the correct path.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Retarget TensorDump framework links to common include headers.

Line 364, Line 426, Line 531, and Line 533 still use arch-specific host-header paths. Update these links to src/common/platform/include/host/... so they reflect the consolidated framework location.

Also applies to: 426-426, 531-531, 533-533

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/dfx/tensor-dump.md` at line 364, Update the TensorDump framework include
links that currently point to arch-specific host headers to instead use the
consolidated common headers; specifically replace occurrences like
../src/a2a3/platform/include/host/... with
../src/common/platform/include/host/... for the referenced symbols/links such as
BufferPoolManager<DumpModule> and the other TensorDump host-header links in this
document so all four occurrences use src/common/platform/include/host/...

a mgmt thread that polls SPSC ready queues and recycles full
metadata buffers **while kernels are still executing**, plus a
poll thread that drains the L2 hand-off queue into
Expand Down Expand Up @@ -423,7 +423,7 @@ export_dump_files()

[`TensorDumpCollector`](../src/a2a3/platform/include/host/tensor_dump_collector.h)
on a2a3 inherits from
[`profiling_common::ProfilerBase<TensorDumpCollector, DumpModule>`](../src/a2a3/platform/include/host/profiling_common/profiler_base.h):
[`profiling_common::ProfilerBase<TensorDumpCollector, DumpModule>`](../src/a2a3/platform/include/host/profiler_base.h):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The profiler_base.h header is located in the shared src/common/platform/include/host/ directory, not under src/a2a3/. The link should be updated to point to the correct path.

the base class owns the mgmt thread, the poll thread, and the
`BufferPoolManager<DumpModule>` they share. `TensorDumpCollector`
only supplies the dump-specific pieces — the `DumpModule` trait
Expand Down Expand Up @@ -528,9 +528,9 @@ dump_collector_.finalize()

[`TensorDumpCollector`](../src/a5/platform/include/host/tensor_dump_collector.h)
on a5 inherits the same CRTP base
([`profiling_common::ProfilerBase`](../src/a5/platform/include/host/profiling_common/profiler_base.h))
([`profiling_common::ProfilerBase`](../src/a5/platform/include/host/profiler_base.h))
as a2a3 and parameterizes
[`BufferPoolManager`](../src/a5/platform/include/host/profiling_common/buffer_pool_manager.h)
[`BufferPoolManager`](../src/a5/platform/include/host/buffer_pool_manager.h)
Comment on lines +531 to +533
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The profiler_base.h and buffer_pool_manager.h headers are located in the shared src/common/platform/include/host/ directory, not under src/a5/. The links should be updated to point to the correct path.

with `DumpModule`. The only a5-specific glue is the 5-callback
`MemoryOps`, the per-tick shm mirror, and the on-demand arena copy
inside `on_buffer_collected`.
Expand Down
4 changes: 2 additions & 2 deletions docs/hardware/cache-coherency.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,13 +81,13 @@ Two separate concerns, often conflated:
`rmb()` between the COND check and the slot reads.

Concretely, the L2 swimlane staging-slot read in
`src/{a2a3,a5}/platform/src/aicpu/l2_swimlane_collector_aicpu.cpp` does
`src/{a2a3,a5}/platform/shared/aicpu/l2_swimlane_collector_aicpu.cpp` does
**not** call `cache_invalidate_range` on the slot, but it **does** call
`rmb()` before reading `slot->task_id` and the timing fields. All of
those fields are AICore writes covered by the AICore-side `dcci` in
`l2_swimlane_aicore_record_task`. The same pattern applies to the PMU
staging slot
(`src/{a2a3,a5}/platform/src/aicpu/pmu_collector_aicpu.cpp`).
(`src/{a2a3,a5}/platform/shared/aicpu/pmu_collector_aicpu.cpp`).

### Historical pitfall

Expand Down
6 changes: 3 additions & 3 deletions docs/logging.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ Two implementations link the same ABI symbols:
| Symbol owner | Implementation file | Backend |
| ------------ | ------------------- | ------- |
| `libsimpler_log.so` (host) | `src/common/log/unified_log_host.cpp` | `HostLogger` → stderr |
| AICPU binary (device) | `src/{arch}/platform/src/aicpu/unified_log_device.cpp` | `dev_vlog_*` → backend |
| AICPU binary (device) | `src/{arch}/platform/shared/aicpu/unified_log_device.cpp` | `dev_vlog_*` → backend |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The unified_log_device.cpp file is located in the shared src/common/platform/shared/aicpu/ directory, not under the per-architecture src/{arch}/ directory. The path should be updated to point to the correct location.


The host `.so` is loaded with `RTLD_GLOBAL` so all consumer `.so`s
(`host_runtime`, `cpu_sim_context`, sim `aicore_kernel`, the binding) resolve
Expand Down Expand Up @@ -184,7 +184,7 @@ are allowed by default. CMake blocks live in:

- `src/{a5,a2a3}/platform/sim/host/CMakeLists.txt`
- `src/{a5,a2a3}/platform/sim/aicore/CMakeLists.txt`
- `src/common/sim_context/CMakeLists.txt`
- `src/common/platform/sim/sim_context/CMakeLists.txt`

(Onboard host `.so` builds Linux-only and needs no flag.)

Expand Down Expand Up @@ -274,6 +274,6 @@ RTLD_GLOBAL)`s them before handing off to the C++ `_ChipWorker.init`.
| Change the host output format / pattern | `src/common/log/host_log.cpp::HostLogger::emit` |
| Change the sim AICPU output format | `src/{arch}/platform/sim/aicpu/device_log.cpp::dev_vlog_*` |
| Change the onboard AICPU CANN dlog tagging | `src/{arch}/platform/onboard/aicpu/device_log.cpp::dev_vlog_*` |
| Add a new C ABI entry point (e.g. dynamic config push) | `src/common/log/include/common/unified_log.h` + `unified_log_host.cpp` + `src/{arch}/platform/src/aicpu/unified_log_device.cpp` |
| Add a new C ABI entry point (e.g. dynamic config push) | `src/common/log/include/common/unified_log.h` + `unified_log_host.cpp` + `src/{arch}/platform/shared/aicpu/unified_log_device.cpp` |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The unified_log_device.cpp file is located in the shared src/common/platform/shared/aicpu/ directory, not under the per-architecture src/{arch}/ directory. The path should be updated to point to the correct location.

| Hook a new consumer `.so` | declare `target_include_directories(target PRIVATE src/common/log/include)`; for host code also link `simpler_log` (or use undefined symbol resolution at runtime via `RTLD_GLOBAL` load) |
| Add a new severity / verbosity tier | `python/simpler/_log.py` (Python integer + `addLevelName`) + `host_log.h::LogLevel` (if a new severity) + `_split_threshold` (band mapping) + AICPU `set_log_*` setters |
18 changes: 9 additions & 9 deletions docs/profiling-framework.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

Shared host-side infrastructure that the PMU, L2Swimlane, and TensorDump
collectors are built on. Each architecture maintains its own copy of the
framework headers under `src/<arch>/platform/include/host/profiling_common/`
([a2a3](../src/a2a3/platform/include/host/profiling_common/),
[a5](../src/a5/platform/include/host/profiling_common/)) — the two copies
framework headers under `src/common/platform/include/host/`
([a2a3](../src/common/platform/include/host/),
[a5](../src/common/platform/include/host/)) — the two copies
Comment on lines +5 to +7
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since the profiling framework headers have been consolidated into a single shared directory (src/common/platform/include/host/), they are no longer maintained as separate per-architecture copies. The duplicate links for a2a3 and a5 should be replaced with a single link to common, and the surrounding text (lines 4 and 8-9) should be updated to remove references to 'two copies' and 'each arch is free to evolve independently'.

are kept byte-for-byte structurally aligned so reviewers can diff them, but
Comment on lines +5 to 8
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Resolve contradiction and stale links in framework header location docs.

Line 5–Line 8 says there are per-arch header copies, but both links point to the same common directory. Also, Line 68, Line 79, Line 100, Line 115, and Line 135 still reference src/a2a3/... paths. Please make this section consistently describe and link to the consolidated src/common/platform/include/host/ location.

Also applies to: 68-68, 79-79, 100-100, 115-115, 135-135

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/profiling-framework.md` around lines 5 - 8, Update the paragraph that
currently claims per-arch header copies and uses two identical links so it
consistently states that headers are consolidated under the common host include
directory and not duplicated per-arch; replace the two identical links that
point to the common location with a single clear reference to the consolidated
host headers and change every stale occurrence of the legacy per-arch token
(e.g. any 'src/a2a3/' or similar per-arch paths) at the noted locations to the
consolidated 'src/common/platform/include/host/' token, and reword the sentence
that mentions "the two copies are kept byte-for-byte" to reflect a single
authoritative copy so all references and links are consistent.

each arch is free to evolve independently. This page describes the shape;
§8 covers the a5-specific deviations driven by transport differences.
Expand Down Expand Up @@ -65,7 +65,7 @@ a small per-subsystem trait.
```

`ProfilerBase` is the owner: it holds `BufferPoolManager manager_` as a
member ([profiler_base.h:414](../src/a2a3/platform/include/host/profiling_common/profiler_base.h#L414)),
member ([profiler_base.h:414](../src/a2a3/platform/include/host/profiler_base.h#L414)),
spawns and joins both threads, and dispatches collected buffers to
`Derived::on_buffer_collected` via CRTP. `BufferPoolManager` owns no
threads — it is just the shared data structure both threads access.
Expand All @@ -76,7 +76,7 @@ subsystem's shared-memory layout is shaped.

### 3.1 `BufferPoolManager<Module>` — data layer

Defined in [`buffer_pool_manager.h`](../src/a2a3/platform/include/host/profiling_common/buffer_pool_manager.h).
Defined in [`buffer_pool_manager.h`](../src/a2a3/platform/include/host/buffer_pool_manager.h).
Owns:

- `ready_queue_` — mgmt → collector hand-off, guarded by mutex+cv.
Expand All @@ -97,7 +97,7 @@ Owns no threads. Every entry point is documented as one of:

### 3.2 `ProfilerBase<Derived, Module>` — control layer

Defined in [`profiler_base.h`](../src/a2a3/platform/include/host/profiling_common/profiler_base.h).
Defined in [`profiler_base.h`](../src/a2a3/platform/include/host/profiler_base.h).
Provides:

- The two threads and their lifecycle (`start` / `stop`).
Expand All @@ -112,7 +112,7 @@ Provides:
stash the alloc/reg/free callbacks before threads start; if init aborts
before stashing, `start(tf)` becomes a no-op.

`ProfilerAlgorithms<Module>` (in the same header, [profiler_base.h:170](../src/a2a3/platform/include/host/profiling_common/profiler_base.h#L170))
`ProfilerAlgorithms<Module>` (in the same header, [profiler_base.h:170](../src/a2a3/platform/include/host/profiler_base.h#L170))
is where the unified algorithms live:

- `try_pop_aicpu_entry` — barrier-correct head/tail advance over the
Expand All @@ -132,7 +132,7 @@ is where the unified algorithms live:
A stateless `struct` per subsystem (`PmuModule`, `L2SwimlaneModule`,
`DumpModule`) that tells the generic algorithms what the shared-memory
layout looks like. The contract lives in the docblock at the top of
[`profiler_base.h`](../src/a2a3/platform/include/host/profiling_common/profiler_base.h);
[`profiler_base.h`](../src/a2a3/platform/include/host/profiler_base.h);
the required members are:

| Member | Purpose |
Expand Down Expand Up @@ -305,7 +305,7 @@ Existing collectors are the canonical examples:
## 8. a5 specifics — host-shadow transport

a5's framework headers (under
[`src/a5/platform/include/host/profiling_common/`](../src/a5/platform/include/host/profiling_common/))
[`src/common/platform/include/host/`](../src/common/platform/include/host/))
mirror a2a3's class shapes — same `ProfilerBase<Derived, Module>` /
`BufferPoolManager<Module>` / `ProfilerAlgorithms<Module>` decomposition,
same Module concept contract, same start/stop lifecycle. The only
Expand Down
2 changes: 1 addition & 1 deletion simpler_setup/runtime_compiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -453,7 +453,7 @@ def compile_sim_context(
if not self.platform.endswith("sim"):
raise ValueError(f"compile_sim_context is only for sim platforms, got {self.platform}")

cmake_source_dir = str(self.project_root / "src" / "common" / "sim_context")
cmake_source_dir = str(self.project_root / "src" / "common" / "platform" / "sim" / "sim_context")
binary_name = "libcpu_sim_context.so"
cmake_args = self.host_target.toolchain.get_cmake_args() + self._sanitizer_cmake_args()

Expand Down
2 changes: 1 addition & 1 deletion simpler_setup/tools/sched_overhead_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,7 @@ def validate_perf_tasks_for_overhead_analysis(tasks):
if missing:
detail = "; ".join(missing)
# These fields are produced by runtime-side JSON export in:
# src/platform/src/host/performance_collector.cpp (dispatch_time_us, finish_time_us)
# src/platform/shared/host/performance_collector.cpp (dispatch_time_us, finish_time_us)
msg = "\n".join(
[
"Perf JSON is incompatible with scheduler overhead deep-dive analysis.",
Expand Down
2 changes: 1 addition & 1 deletion src/a2a3/docs/platform.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Platform-agnostic headers live in `src/a2a3/platform/include/`, split by target:
- `aicore/` — AICore platform API
- `common/` — Shared types and utilities (unified_log, tensor, common.h)

Shared source implementations in `src/a2a3/platform/src/`.
Shared source implementations in `src/a2a3/platform/shared/`.

## Cache Coherency on GM

Expand Down
2 changes: 1 addition & 1 deletion src/a2a3/platform/docs/tpush-tpop-sim.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ Validated with `examples/a5/tensormap_and_ringbuffer/bgemm/`:

| Responsibility | File |
| -------------- | ---- |
| Per-device pipe shared state + TLS | `src/common/sim_context/cpu_sim_context.cpp` |
| Per-device pipe shared state + TLS | `src/common/platform/sim/sim_context/cpu_sim_context.cpp` |
| Per-thread core identity setup | `src/{arch}/platform/sim/aicore/kernel.cpp` |
| Hook injection into kernel SOs | `src/{arch}/platform/sim/host/device_runner.cpp` |
| pto-isa hook registration API | `pto-isa/include/pto/common/cpu_stub.hpp` |
Expand Down
2 changes: 1 addition & 1 deletion src/a2a3/platform/include/aicpu/platform_regs.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
* and runtime code calls get_platform_regs() and read_reg/write_reg()
* for register communication with AICore.
*
* Implementation: src/platform/src/aicpu/platform_regs.cpp (shared across all platforms)
* Implementation: src/platform/shared/aicpu/platform_regs.cpp (shared across all platforms)
*/

#ifndef PLATFORM_AICPU_PLATFORM_REGS_H_
Expand Down
2 changes: 1 addition & 1 deletion src/a2a3/platform/include/common/dep_gen.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
* carries fanout to keep AICPU off the per-task GM-store critical path.
*
* Streaming buffer design mirrors PMU / L2Swimlane / TensorDump (single source of
* algorithmic truth in src/a2a3/platform/include/host/profiling_common/profiler_base.h):
* algorithmic truth in src/a2a3/platform/include/host/profiler_base.h):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The profiler_base.h header is located in the shared src/common/platform/include/host/ directory, not under src/a2a3/. The comment path should be updated to reflect the correct location.

 * algorithmic truth in src/common/platform/include/host/profiler_base.h):

*
* DepGenFreeQueue — SPSC: Host pushes free DepGenBuffers, AICPU pops them.
* DepGenBufferState — Per-instance state: free_queue + current buffer ptr.
Expand Down
2 changes: 1 addition & 1 deletion src/a2a3/platform/include/host/dep_gen_collector.h
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@
#include "common/dep_gen.h"
#include "common/platform_config.h"
#include "common/unified_log.h"
#include "host/profiling_common/profiler_base.h"
#include "host/profiler_base.h"

// ---------------------------------------------------------------------------
// dep_gen Module (drives BufferPoolManager<DepGenModule>)
Expand Down
2 changes: 1 addition & 1 deletion src/a2a3/platform/include/host/l2_swimlane_collector.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
#include "common/memory_barrier.h"
#include "common/platform_config.h"
#include "common/unified_log.h"
#include "host/profiling_common/profiler_base.h"
#include "host/profiler_base.h"

// ---------------------------------------------------------------------------
// L2 Perf profiling Module (drives BufferPoolManager<L2SwimlaneModule>)
Expand Down
2 changes: 1 addition & 1 deletion src/a2a3/platform/include/host/pmu_collector.h
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@
#include "common/platform_config.h"
#include "common/pmu_profiling.h"
#include "common/unified_log.h"
#include "host/profiling_common/profiler_base.h"
#include "host/profiler_base.h"

// ---------------------------------------------------------------------------
// PMU profiling Module (drives BufferPoolManager<PmuModule>)
Expand Down
21 changes: 17 additions & 4 deletions src/a2a3/platform/onboard/aicpu/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,11 @@ project(aicpu_kernel LANGUAGES C CXX)
set(CMAKE_CUSTOM_INCLUDE_DIRS "")
list(APPEND CMAKE_CUSTOM_INCLUDE_DIRS "${CMAKE_CURRENT_SOURCE_DIR}/../../include")
list(APPEND CMAKE_CUSTOM_INCLUDE_DIRS "${CMAKE_CURRENT_SOURCE_DIR}/../../../../common/platform/include")
# Pick up spin_hint.h next to the cross-arch onboard/aicpu sources.
list(APPEND CMAKE_CUSTOM_INCLUDE_DIRS "${CMAKE_CURRENT_SOURCE_DIR}/../../../../common/platform/onboard/aicpu")
list(APPEND CMAKE_CUSTOM_INCLUDE_DIRS "${CMAKE_CURRENT_SOURCE_DIR}/../../../../common/task_interface")
list(APPEND CMAKE_CUSTOM_INCLUDE_DIRS "${CMAKE_CURRENT_SOURCE_DIR}/../../../../common/log/include")
list(APPEND CMAKE_CUSTOM_INCLUDE_DIRS "${CMAKE_CURRENT_SOURCE_DIR}/../../../../common/device_comm")
list(APPEND CMAKE_CUSTOM_INCLUDE_DIRS "${CMAKE_CURRENT_SOURCE_DIR}/../../../../common")
list(APPEND CMAKE_CUSTOM_INCLUDE_DIRS "${CMAKE_CURRENT_SOURCE_DIR}/../../../../common/aicpu_dispatcher")
if(DEFINED CUSTOM_INCLUDE_DIRS)
foreach(INC_DIR ${CUSTOM_INCLUDE_DIRS})
Expand All @@ -30,12 +32,23 @@ else()
message(FATAL_ERROR "MUST set CUSTOM_INCLUDE_DIRS to build AICPU kernel")
endif()

# Build complete source list from CUSTOM_SOURCE_DIRS and core sources
# Build complete source list from CUSTOM_SOURCE_DIRS and core sources.
# Sources come from three buckets:
# 1. `${CMAKE_CURRENT_SOURCE_DIR}/*.cpp` — onboard-aicpu sources that are
# still arch-specific (a2a3-only files like kernel.cpp).
# 2. `common/platform/onboard/aicpu/*.cpp` — onboard-aicpu sources shared
# by both arches (cache_ops, device_log, device_time, device_malloc,
# orch_so_file, platform_aicpu_affinity).
# 3. arch-shared (between onboard and sim) sources under `../../shared/aicpu/`
# and `common/platform/shared/aicpu/` (unified log, scope_stats, tensor_dump).
set(AICPU_KERNEL_SOURCES "")
file(GLOB AICPU_SOURCES "${CMAKE_CURRENT_SOURCE_DIR}/*.cpp")
list(APPEND AICPU_KERNEL_SOURCES ${AICPU_SOURCES})
# Add common source list like 1) unified log device implementation; 2) platform register implementation
file(GLOB COMMON_SOURCES CONFIGURE_DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/../../src/aicpu/*.cpp" "${CMAKE_CURRENT_SOURCE_DIR}/../../../../common/platform/src/aicpu/*.cpp")
file(GLOB COMMON_SOURCES CONFIGURE_DEPENDS
"${CMAKE_CURRENT_SOURCE_DIR}/../../shared/aicpu/*.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/../../../../common/platform/shared/aicpu/*.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/../../../../common/platform/onboard/aicpu/*.cpp"
)
list(APPEND AICPU_KERNEL_SOURCES ${COMMON_SOURCES})
if(DEFINED CUSTOM_SOURCE_DIRS)
foreach(SRC_DIR ${CUSTOM_SOURCE_DIRS})
Expand Down
Loading
Loading