[WIP] Add: dep_gen capture+replay support on a5 by indigo1973 · Pull Request #886 · hw-native-sys/simpler

indigo1973 · 2026-05-28T10:02:37Z

Port the dep_gen (SubmitTrace) feature from a2a3 to a5 so the tensormap_and_ringbuffer runtime on a5 can produce deps.json and feed flow events into swimlane_converter.py. Without this, --enable-dep-gen was a no-op on a5 and merged_swimlane_*.json had no dependency arrows.

Reused from a2a3 verbatim (byte-identical):

Shared-memory ABI: common/dep_gen.h (DepGenRecord 2624 B, overflow chain, SPSC free_queue, per-thread ready_queue)
AICPU writer: aicpu/dep_gen_collector_aicpu.{h,cpp}
Runtime replay: runtime/tensormap_and_ringbuffer/host/dep_gen_replay
Orchestrator capture point + aicpu_executor lifecycle hooks
5 platform_config constants + PROFILING_FLAG_DEP_GEN bit

Specialized for a5 (no SVM, see profiling_common diff vs a2a3):

dep_gen_collector.cpp uses alloc_single_buffer (malloc shadow + profiling_copy_to_device) instead of identity-mapping when register_cb is null — matches a5's PMU/L2Perf/Dump collectors.
Two-phase set_memory_context: callbacks first, then shm pointers once the region is committed, so start(tf) gates correctly.
reconcile_counters explicitly copy_from_device's the BufferState + current_buf before reading (mgmt thread is stopped by then).
finalize lets BufferPoolManager::clear_mappings() be the single source of truth for host-shadow lifetime — no per-collector dedup.

Sim path: dlsym set_platform_dep_gen_base / set_dep_gen_enabled out of the AICPU .so and forward kernel_args.dep_gen_data_base + enable flag at boot, mirroring the existing pmu / dump / l2_perf setters.

Onboard kernel.cpp adds two lines to forward dep_gen_data_base + PROFILING_FLAG_DEP_GEN into the AICPU writer's globals, mirroring the existing PMU / L2 / Dump setters.

c_api: run_prepared's enable_dep_gen parameter is no longer ignored — wired to runner->set_dep_gen_enabled() on both onboard and sim.

Tests:

tests/st/a5/.../dfx/dep_gen/test_dep_gen.py: 6-edge validation against vector_example orchestration (byte-identical to a2a3 — same expected edge set).
tests/st/a5/.../dfx/dep_gen/test_dep_gen_chain.py: overflow chain regression for >64 explicit deps.

Docs:

docs/dfx/dep_gen.md: §8 Architecture Touchpoints now lists both platforms; "Currently a2a3 only" line removed.
src/a5/runtime/.../docs/profiling_levels.md: Code Locations point at src/a5/ (was stale src/a2a3/ refs from PR Refactor(a5): align profiling stack with a2a3 (host CRTP + stable ring) #777 cleanup) and add a dep_gen entry.

Port the dep_gen (SubmitTrace) feature from a2a3 to a5 so the tensormap_and_ringbuffer runtime on a5 can produce deps.json and feed flow events into swimlane_converter.py. Without this, --enable-dep-gen was a no-op on a5 and merged_swimlane_*.json had no dependency arrows. Reused from a2a3 verbatim (byte-identical): - Shared-memory ABI: common/dep_gen.h (DepGenRecord 2624 B, overflow chain, SPSC free_queue, per-thread ready_queue) - AICPU writer: aicpu/dep_gen_collector_aicpu.{h,cpp} - Runtime replay: runtime/tensormap_and_ringbuffer/host/dep_gen_replay - Orchestrator capture point + aicpu_executor lifecycle hooks - 5 platform_config constants + PROFILING_FLAG_DEP_GEN bit Specialized for a5 (no SVM, see profiling_common diff vs a2a3): - dep_gen_collector.cpp uses alloc_single_buffer (malloc shadow + profiling_copy_to_device) instead of identity-mapping when register_cb is null — matches a5's PMU/L2Perf/Dump collectors. - Two-phase set_memory_context: callbacks first, then shm pointers once the region is committed, so start(tf) gates correctly. - reconcile_counters explicitly copy_from_device's the BufferState + current_buf before reading (mgmt thread is stopped by then). - finalize lets BufferPoolManager::clear_mappings() be the single source of truth for host-shadow lifetime — no per-collector dedup. Sim path: dlsym set_platform_dep_gen_base / set_dep_gen_enabled out of the AICPU .so and forward kernel_args.dep_gen_data_base + enable flag at boot, mirroring the existing pmu / dump / l2_perf setters. Onboard kernel.cpp adds two lines to forward dep_gen_data_base + PROFILING_FLAG_DEP_GEN into the AICPU writer's globals, mirroring the existing PMU / L2 / Dump setters. c_api: run_prepared's enable_dep_gen parameter is no longer ignored — wired to runner->set_dep_gen_enabled() on both onboard and sim. Tests: - tests/st/a5/.../dfx/dep_gen/test_dep_gen.py: 6-edge validation against vector_example orchestration (byte-identical to a2a3 — same expected edge set). - tests/st/a5/.../dfx/dep_gen/test_dep_gen_chain.py: overflow chain regression for >64 explicit deps. Docs: - docs/dfx/dep_gen.md: §8 Architecture Touchpoints now lists both platforms; "Currently a2a3 only" line removed. - src/a5/runtime/.../docs/profiling_levels.md: Code Locations point at src/a5/ (was stale src/a2a3/ refs from PR hw-native-sys#777 cleanup) and add a dep_gen entry.

coderabbitai · 2026-05-28T10:02:52Z

📝 Walkthrough

Walkthrough

This PR introduces a complete dependency-generation (DepGen) capture and replay system for the a5 platform. It enables offline analysis of orchestrator task submission graphs by capturing per-submit metadata (task IDs, tensor references, explicit dependencies) into device-resident buffers, transferring completed buffers to the host, and replaying them to generate a deps.json artifact validated against the tensormap engine.

Changes

DepGen Dependency Capture and Replay

Layer / File(s)	Summary
Data contracts and platform configuration `src/a5/platform/include/common/dep_gen.h`, `src/a5/platform/include/common/kernel_args.h`, `src/a5/platform/include/common/platform_config.h`	Defines shared-memory record layout (`DepGenRecord`, overflow chains, free/ready queues), buffer state, and profiling flags; adds `dep_gen_data_base` to `KernelArgs` and platform-sizing constants.
AICPU capture interface and implementation `src/a5/platform/include/aicpu/dep_gen_collector_aicpu.h`, `src/a5/platform/src/aicpu/dep_gen_collector_aicpu.cpp`, `src/a5/platform/onboard/aicpu/kernel.cpp`	C-ABI capture interface; device-side lifecycle (init, record_submit with overflow handling, flush, finalize); wires base pointer and enable flag in kernel execution.
Orchestrator submission capture `src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp`, `src/a5/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp`	Snapshot task identity/tensors/deps before tensormap lookup; orchestrator init/flush/finalize lifecycle; fallback stubs for host builds.
Host-side buffer collection and state reconciliation `src/a5/platform/include/host/dep_gen_collector.h`, `src/a5/platform/src/host/dep_gen_collector.cpp`, `src/a5/platform/onboard/host/CMakeLists.txt`, `src/a5/platform/sim/host/CMakeLists.txt`	`ProfilerBase<DepGenCollector>`-derived class allocating device/host buffer pairs; accumulates in-memory records; reconciles device counters for consistency checking.
Replay and deps.json generation `src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.h`, `src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp`	Consumes captured records; builds oracle and annotated tensormap instances in parallel; validates edge producer-id consistency; serializes JSON with tensor metadata and overlap flags.
Onboard device runner integration `src/a5/platform/onboard/host/device_runner.h`, `src/a5/platform/onboard/host/device_runner.cpp`, `src/a5/platform/onboard/host/pto_runtime_c_api.cpp`	Enables dep-gen via C API flag; initializes collector and wires base pointer into `kernel_args`; starts/stops collection; triggers replay after reconciliation.
Simulation device runner integration `src/a5/platform/sim/host/device_runner.h`, `src/a5/platform/sim/host/device_runner.cpp`, `src/a5/platform/sim/host/pto_runtime_c_api.cpp`	Dynamically resolves AICPU dep-gen control functions; conditionally enables profiling flag and collector; reconciles counters and replays with same lifecycle.
Test cases and orchestration kernels `tests/st/a5/tensormap_and_ringbuffer/dfx/dep_gen/kernels/orchestration/chain_barrier_orch.cpp`, `tests/st/a5/tensormap_and_ringbuffer/dfx/dep_gen/test_dep_gen.py`, `tests/st/a5/tensormap_and_ringbuffer/dfx/dep_gen/test_dep_gen_chain.py`	Vector example validating 6-edge baseline; chain_barrier kernel and test validating overflow chains for 64–391 producers; schema sanity checks and `deps_to_graph` smoke tests.
Documentation updates `docs/dfx/dep_gen.md`, `src/a5/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md`	Enablement examples for a2a3\|a5; architecture touchpoints mapping capture/replay/collection components; validation-gate expectations updated.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A rabbit captured the task flows so true,
Device to host, dependencies brew,
With overflow chains and tensormap sight,
Dependencies written in JSON light,
SubmitTrace magic—the graph takes flight! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 41.49% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description is directly related to the changeset, explaining the purpose (porting dep_gen from a2a3 to a5), what was reused, a5-specific adaptations, integration changes, tests, and documentation updates.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title '[WIP] Add: dep_gen capture+replay support on a5' accurately summarizes the main objective of the changeset—implementing dep_gen capture and replay functionality on the a5 platform.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request ports the dep_gen (SubmitTrace) capture and offline replay infrastructure to the a5 platform, mirroring the a2a3 implementation. It introduces the shared-memory structures, the AICPU writer, the host collector, and the host-side replay mechanism that performs a differential check to emit deps.json, along with corresponding integration tests. The review feedback highlights critical concurrency and memory management issues that must be addressed: several incorrect or missing memory barriers (wmb() and rmb()) in the AICPU writer could lead to stale reads on weakly-ordered architectures, a memory leak exists where std::malloc'd host shadows are not freed during finalization, and a data race on total_collected_ requires it to be declared as atomic.

gemini-code-assist · 2026-05-28T10:05:41Z

+    } else {
+        host_ptr = std::malloc(size);
+        if (host_ptr == nullptr) {
+            LOG_ERROR("DepGenCollector: host shadow alloc failed for %zu bytes", size);
+            free_cb_(dev_ptr);
+            if (host_ptr_out) *host_ptr_out = nullptr;
+            return nullptr;
+        }
+        std::memset(host_ptr, 0, size);
+        profiling_copy_to_device(dev_ptr, host_ptr, size);
+    }


When register_cb_ is null (which is always the case on the a5 platform), host shadows are allocated via std::malloc in alloc_single_buffer. However, in finalize(), manager_.clear_mappings() is called to clear the mappings, but it does not free the std::malloc'd host pointers because it cannot unconditionally call std::free without crashing registered mappings. This causes a silent and major memory leak of all host shadows and the SHM host region on every run. We should track the std::malloc'd host pointers in a private vector malloced_host_ptrs_ and free them in finalize().

Suggested change

} else {

host_ptr = std::malloc(size);

if (host_ptr == nullptr) {

LOG_ERROR("DepGenCollector: host shadow alloc failed for %zu bytes", size);

free_cb_(dev_ptr);

if (host_ptr_out) *host_ptr_out = nullptr;

return nullptr;

}

std::memset(host_ptr, 0, size);

profiling_copy_to_device(dev_ptr, host_ptr, size);

}

} else {

host_ptr = std::malloc(size);

if (host_ptr == nullptr) {

LOG_ERROR("DepGenCollector: host shadow alloc failed for %zu bytes", size);

free_cb_(dev_ptr);

if (host_ptr_out) *host_ptr_out = nullptr;

return nullptr;

}

std::memset(host_ptr, 0, size);

profiling_copy_to_device(dev_ptr, host_ptr, size);

malloced_host_ptrs_.push_back(host_ptr);

}

gemini-code-assist · 2026-05-28T10:05:41Z

+    // Free remaining host shadows (per-state buffers + shm region).
+    manager_.clear_mappings();


To prevent memory leaks of the host shadows allocated via std::malloc when register_cb_ == nullptr, we must explicitly free them in finalize() after clearing the mappings.

Suggested change

// Free remaining host shadows (per-state buffers + shm region).

manager_.clear_mappings();

// Free remaining host shadows (per-state buffers + shm region).

manager_.clear_mappings();

for (void* ptr : malloced_host_ptrs_) {

std::free(ptr);

}

malloced_host_ptrs_.clear();

gemini-code-assist · 2026-05-28T10:05:41Z

+    rmb();
+    uint32_t head = s_dep_gen_state->free_queue.head;
+    uint32_t tail = s_dep_gen_state->free_queue.tail;
+
+    if (head != tail) {
+        uint64_t buf_ptr = s_dep_gen_state->free_queue.buffer_ptrs[head % PLATFORM_DEP_GEN_SLOT_COUNT];


The read memory barrier rmb() must be placed after reading tail and before reading buffer_ptrs to prevent speculative loads of stale data on weakly-ordered architectures.

Suggested change

rmb();

uint32_t head = s_dep_gen_state->free_queue.head;

uint32_t tail = s_dep_gen_state->free_queue.tail;

if (head != tail) {

uint64_t buf_ptr = s_dep_gen_state->free_queue.buffer_ptrs[head % PLATFORM_DEP_GEN_SLOT_COUNT];

uint32_t head = s_dep_gen_state->free_queue.head;

uint32_t tail = s_dep_gen_state->free_queue.tail;

if (head != tail) {

rmb();

uint64_t buf_ptr = s_dep_gen_state->free_queue.buffer_ptrs[head % PLATFORM_DEP_GEN_SLOT_COUNT];

References

On weakly-ordered architectures, ensure that a read memory barrier is explicitly placed between an MMIO read and a subsequent Normal memory read when there is no data or address dependency.

gemini-code-assist · 2026-05-28T10:05:41Z

+    // Running total of records appended. Equal to ``records_.size()`` after
+    // every append; kept separately for the reconcile_counters cross-check
+    // even when records_ may be inspected concurrently.
+    uint64_t total_collected_ = 0;


The member variable total_collected_ is updated on the background management thread under records_mutex_ but read concurrently on other threads via the public getter total_collected() without any synchronization. This constitutes a data race under the C++ memory model and can lead to torn reads on 32-bit platforms or undefined behavior. Declaring total_collected_ as std::atomic<uint64_t> resolves this safely.

Suggested change

uint64_t total_collected_ = 0;

std::atomic<uint64_t> total_collected_ = 0;

std::vector<void*> malloced_host_ptrs_;

References

When a component is accessed by multiple threads, use std::atomic with release-store and acquire-load semantics to establish a happens-before relationship.

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (2)

tests/st/a5/tensormap_and_ringbuffer/dfx/dep_gen/test_dep_gen_chain.py (1)

200-206: ⚡ Quick win

Tighten explicit-edge validation to reduce false positives.

Add a strict explicit-edge count (n + 1) and validate the barrier’s single outgoing explicit edge is not self-looping or targeting a producer.

Proposed fix

         # All N producer→barrier edges must be present. This is the chain
         # round-trip assertion: pre-chain code drops anything past index 63.
         assert len(barrier_preds) == n, f"barrier has {len(barrier_preds)} preds, expected {n}"
+        assert len(explicit_edges) == n + 1, (
+            f"expected exactly {n + 1} explicit edges (N producer->barrier + 1 barrier->consumer), "
+            f"got {len(explicit_edges)}"
+        )
 
         # Consumer must explicit-depend on the barrier — exactly one outgoing
         # explicit edge from the barrier.
         outgoing_explicit_from_barrier = {succ for pred, succ in explicit_edges if pred == barrier_id}
         assert len(outgoing_explicit_from_barrier) == 1, (
             f"barrier {barrier_id} has {len(outgoing_explicit_from_barrier)} outgoing explicit edges, "
             f"expected 1 (the consumer)"
         )
+        consumer_id = next(iter(outgoing_explicit_from_barrier))
+        assert consumer_id != barrier_id and consumer_id not in barrier_preds, (
+            f"barrier {barrier_id} outgoing explicit edge points to invalid consumer candidate {consumer_id}"
+        )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/st/a5/tensormap_and_ringbuffer/dfx/dep_gen/test_dep_gen_chain.py`
around lines 200 - 206, Tighten the barrier explicit-edge validation: compute
outgoing_explicit_from_barrier from explicit_edges (as shown), assert its size
equals (len(producers) + 1) instead of 1, then extract the single non-barrier
consumer target and ensure it is not a self-loop (target != barrier_id) and not
in the producers set; update the assertion message to include the expected count
and offending targets using outgoing_explicit_from_barrier, barrier_id, and
producers for diagnostics.

src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp (1)

42-50: ⚡ Quick win

Add a compile-time guard for arg-slot-count parity.

This block asserts sizeof(Tensor), but the capture path also assumes MAX_TENSOR_ARGS == CORE_MAX_TENSOR_ARGS. If those ever drift, dep_gen will silently truncate/mis-shape records. A sibling static_assert would catch that at build time.

Suggested guard

 static_assert(sizeof(Tensor) == DEP_GEN_TENSOR_SIZE, "DepGenRecord::tensors slot size out of sync with sizeof(Tensor)");
+static_assert(MAX_TENSOR_ARGS == CORE_MAX_TENSOR_ARGS, "DepGen arg slot count out of sync with shared-memory ABI");

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp` around
lines 42 - 50, Add a compile-time check that ensures the capture path's assumed
arg-slot parity by static_assert-ing MAX_TENSOR_ARGS == CORE_MAX_TENSOR_ARGS so
drift is caught at build time; locate the existing Tensor size static_assert
near DepGenRecord/tensors in pto_orchestrator.cpp and add a sibling
static_assert referencing the macros MAX_TENSOR_ARGS and CORE_MAX_TENSOR_ARGS
with a clear error message like "tensor arg slot count mismatch: MAX_TENSOR_ARGS
!= CORE_MAX_TENSOR_ARGS".

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/a5/platform/onboard/host/device_runner.cpp`:
- Around line 753-761: The code currently only logs when
dep_gen_replay_emit_deps_json fails; instead treat that as a fatal error: after
calling dep_gen_replay_emit_deps_json (inside the enable_dep_gen_ block where
dep_gen_collector_.reconcile_counters() is true), check replay_rc and on
non-zero either return a non-zero error code from this function (or set the
overall run/exit status variable and bail out) so the caller sees failure;
update the branch that calls dep_gen_replay_emit_deps_json (symbols:
enable_dep_gen_, dep_gen_collector_.reconcile_counters(),
dep_gen_collector_.records(), make_deps_json_path(),
dep_gen_replay_emit_deps_json) to propagate the error instead of only LOG_ERROR.

In `@src/a5/platform/sim/host/device_runner.cpp`:
- Around line 709-717: The dep-gen path currently logs an error but does not
stop the simulation when dep_gen_replay_emit_deps_json fails; update the block
in device_runner.cpp (around enable_dep_gen_, dep_gen_collector_.stop(),
dep_gen_collector_.reconcile_counters(), make_deps_json_path,
dep_gen_replay_emit_deps_json) so that if replay_rc != 0 you propagate failure
instead of only logging: e.g., set the function/result state to indicate failure
(return an error code or throw an exception / call the routine that aborts the
run used by this module) so the caller sees the failure and the sim run
terminates when deps.json emission fails.

In `@src/a5/platform/src/aicpu/dep_gen_collector_aicpu.cpp`:
- Around line 72-75: The host-visible fields are being published before the
payload/reset is visible; move the publish points so that the payload and reset
are fully committed before updating host-visible pointers: ensure the per-queue
entry fields (s_dep_gen_header->queues[q][current_tail].instance_index,
.buffer_ptr, .buffer_seq) and buffer reset (current_buf_ptr, buf->count) are
written, then execute the memory barrier (wmb()) and only after that assign the
host-visible tail pointer (s_dep_gen_header->queue_tails[q]) and any other
host-visible pointer updates; apply the same reorder to the other occurrences
involving s_dep_gen_header, queue_tails, current_buf_ptr and buf->count (also at
the other noted spots).

In `@src/a5/platform/src/host/dep_gen_collector.cpp`:
- Line 38: DepGenCollector::init() must roll back any successfully created
buffers/mappings if a later alloc_single_buffer() fails: on error, iterate the
list of already-allocated entries (the same containers used in init()),
unmap/free each buffer, close any fds, and remove/clear those entries so no
partial state remains; ensure initialized_ stays false and any temporary
resources are released (or alternatively set initialized_ = true and call
finalize() only after the partial state has been made consistent) so the
destructor/stop() path won't leak. Reference symbols: DepGenCollector::init(),
alloc_single_buffer(), finalize(), DepGenCollector::~DepGenCollector(),
initialized_, stop().

In `@src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp`:
- Around line 568-607: The code currently logs orphan/unterminated overflow
chains inside the loop (checking DEP_GEN_FLAG_OVERFLOW and matching task_id) and
then continues, allowing a truncated deps list to be used; instead, when
encountering these malformed chains (or when chain_complete is false after
scanning overflow records) abort the replay immediately rather than proceeding:
in the block handling orphan overflow and the block handling unterminated
chains, replace the LOG_ERROR-only behavior with a hard failure (e.g., return an
error status or throw an exception) from the enclosing function so that
DepGenRecord/DepGenOverflowRecord chains that are malformed do not lead to using
full_deps_buf/deps_data; ensure the failure path prevents setting deps_data and
propagates a clear error for the caller to detect.
- Around line 540-543: The loop currently reinterprets raw bytes in
DepGenRecord::tensors as const Tensor* which risks object-lifetime UB; instead
allocate a temporary array/vector of real Tensor objects (e.g.
std::vector<Tensor> replay_tensors(tc) or an aligned buffer of Tensor) and for
each i do a memcpy(&replay_tensors[i], &rec.tensors[i][0], sizeof(Tensor)) to
materialize a real aligned Tensor object, then set tref_buf[i].ptr =
&replay_tensors[i] and atype_buf[i] as before (ensure replay_tensors lives long
enough for the replay usage).

In `@src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp`:
- Around line 488-512: The code currently records the raw
args.explicit_deps_data()/args.explicit_dep_count() into
dep_gen_aicpu_record_submit, but you must record the runtime-filtered
explicit-deps list (the array and count produced after invalid/already-dead deps
are dropped) so replay matches actual enforced edges; update the
dep_gen_aicpu_record_submit calls (the one shown and the similar call around the
534-551 region) to pass the filtered deps buffer and its filtered count instead
of args.explicit_deps_data() and args.explicit_dep_count(), using the same
filtered-deps variable(s) produced by the runtime’s dep-filtering code path.

In `@tests/st/a5/tensormap_and_ringbuffer/dfx/dep_gen/test_dep_gen.py`:
- Around line 139-142: The code currently returns silently when deps_path (the
expected deps.json) is missing, which hides failures when dep_gen is enabled;
replace the silent return with a fail-fast check (e.g. assert
deps_path.exists(), f"dep_gen enabled but deps.json missing at {deps_path}" or
pytest.fail(...) ) so the test fails with a clear message; ensure pytest is
imported if you use pytest.fail and reference the symbols deps_path, deps.json
and dep_gen in the failure message to aid debugging.

---

Nitpick comments:
In `@src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp`:
- Around line 42-50: Add a compile-time check that ensures the capture path's
assumed arg-slot parity by static_assert-ing MAX_TENSOR_ARGS ==
CORE_MAX_TENSOR_ARGS so drift is caught at build time; locate the existing
Tensor size static_assert near DepGenRecord/tensors in pto_orchestrator.cpp and
add a sibling static_assert referencing the macros MAX_TENSOR_ARGS and
CORE_MAX_TENSOR_ARGS with a clear error message like "tensor arg slot count
mismatch: MAX_TENSOR_ARGS != CORE_MAX_TENSOR_ARGS".

In `@tests/st/a5/tensormap_and_ringbuffer/dfx/dep_gen/test_dep_gen_chain.py`:
- Around line 200-206: Tighten the barrier explicit-edge validation: compute
outgoing_explicit_from_barrier from explicit_edges (as shown), assert its size
equals (len(producers) + 1) instead of 1, then extract the single non-barrier
consumer target and ensure it is not a self-loop (target != barrier_id) and not
in the producers set; update the assertion message to include the expected count
and offending targets using outgoing_explicit_from_barrier, barrier_id, and
producers for diagnostics.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8b369d14-d666-4b3b-98ae-5e9b838f7773

📥 Commits

Reviewing files that changed from the base of the PR and between 61ba501 and 6ba5fa6.

📒 Files selected for processing (25)

docs/dfx/dep_gen.md
src/a5/platform/include/aicpu/dep_gen_collector_aicpu.h
src/a5/platform/include/common/dep_gen.h
src/a5/platform/include/common/kernel_args.h
src/a5/platform/include/common/platform_config.h
src/a5/platform/include/host/dep_gen_collector.h
src/a5/platform/onboard/aicpu/kernel.cpp
src/a5/platform/onboard/host/CMakeLists.txt
src/a5/platform/onboard/host/device_runner.cpp
src/a5/platform/onboard/host/device_runner.h
src/a5/platform/onboard/host/pto_runtime_c_api.cpp
src/a5/platform/sim/host/CMakeLists.txt
src/a5/platform/sim/host/device_runner.cpp
src/a5/platform/sim/host/device_runner.h
src/a5/platform/sim/host/pto_runtime_c_api.cpp
src/a5/platform/src/aicpu/dep_gen_collector_aicpu.cpp
src/a5/platform/src/host/dep_gen_collector.cpp
src/a5/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp
src/a5/runtime/tensormap_and_ringbuffer/docs/profiling_levels.md
src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp
src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.h
src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp
tests/st/a5/tensormap_and_ringbuffer/dfx/dep_gen/kernels/orchestration/chain_barrier_orch.cpp
tests/st/a5/tensormap_and_ringbuffer/dfx/dep_gen/test_dep_gen.py
tests/st/a5/tensormap_and_ringbuffer/dfx/dep_gen/test_dep_gen_chain.py

coderabbitai · 2026-05-28T10:21:09Z

+    if (enable_dep_gen_) {
+        dep_gen_collector_.stop();
+        if (dep_gen_collector_.reconcile_counters()) {
+            const auto &records = dep_gen_collector_.records();
+            const std::string deps = make_deps_json_path(output_prefix_);
+            int replay_rc = dep_gen_replay_emit_deps_json(records.data(), records.size(), deps.c_str());
+            if (replay_rc != 0) {
+                LOG_ERROR("dep_gen replay failed (%d) — deps.json not produced", replay_rc);
+            }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail the run when dep-gen emission fails.

When enable_dep_gen_ is on, missing deps.json is a feature failure, not just a log message. Returning success here hides the regression from callers, and the current dep-gen tests only inspect deps.json when it exists.

Suggested fix

int replay_rc = dep_gen_replay_emit_deps_json(records.data(), records.size(), deps.c_str()); if (replay_rc != 0) { LOG_ERROR("dep_gen replay failed (%d) — deps.json not produced", replay_rc); + return replay_rc; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (enable_dep_gen_) {

dep_gen_collector_.stop();

if (dep_gen_collector_.reconcile_counters()) {

const auto &records = dep_gen_collector_.records();

const std::string deps = make_deps_json_path(output_prefix_);

int replay_rc = dep_gen_replay_emit_deps_json(records.data(), records.size(), deps.c_str());

if (replay_rc != 0) {

LOG_ERROR("dep_gen replay failed (%d) — deps.json not produced", replay_rc);

}

if (enable_dep_gen_) {

dep_gen_collector_.stop();

if (dep_gen_collector_.reconcile_counters()) {

const auto &records = dep_gen_collector_.records();

const std::string deps = make_deps_json_path(output_prefix_);

int replay_rc = dep_gen_replay_emit_deps_json(records.data(), records.size(), deps.c_str());

if (replay_rc != 0) {

LOG_ERROR("dep_gen replay failed (%d) — deps.json not produced", replay_rc);

return replay_rc;

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/a5/platform/onboard/host/device_runner.cpp` around lines 753 - 761, The code currently only logs when dep_gen_replay_emit_deps_json fails; instead treat that as a fatal error: after calling dep_gen_replay_emit_deps_json (inside the enable_dep_gen_ block where dep_gen_collector_.reconcile_counters() is true), check replay_rc and on non-zero either return a non-zero error code from this function (or set the overall run/exit status variable and bail out) so the caller sees failure; update the branch that calls dep_gen_replay_emit_deps_json (symbols: enable_dep_gen_, dep_gen_collector_.reconcile_counters(), dep_gen_collector_.records(), make_deps_json_path(), dep_gen_replay_emit_deps_json) to propagate the error instead of only LOG_ERROR.

coderabbitai · 2026-05-28T10:21:09Z

+    if (enable_dep_gen_) {
+        dep_gen_collector_.stop();
+        if (dep_gen_collector_.reconcile_counters()) {
+            const auto &records = dep_gen_collector_.records();
+            const std::string deps = make_deps_json_path(output_prefix_);
+            int replay_rc = dep_gen_replay_emit_deps_json(records.data(), records.size(), deps.c_str());
+            if (replay_rc != 0) {
+                LOG_ERROR("dep_gen replay failed (%d) — deps.json not produced", replay_rc);
+            }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail the sim run when dep-gen emission fails.

This has the same silent-failure problem as the onboard path: enable_dep_gen_ can succeed from the caller’s point of view even though deps.json was never written. That makes dep-gen regressions easy to miss.

Suggested fix

int replay_rc = dep_gen_replay_emit_deps_json(records.data(), records.size(), deps.c_str()); if (replay_rc != 0) { LOG_ERROR("dep_gen replay failed (%d) — deps.json not produced", replay_rc); + return replay_rc; }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/a5/platform/sim/host/device_runner.cpp` around lines 709 - 717, The dep-gen path currently logs an error but does not stop the simulation when dep_gen_replay_emit_deps_json fails; update the block in device_runner.cpp (around enable_dep_gen_, dep_gen_collector_.stop(), dep_gen_collector_.reconcile_counters(), make_deps_json_path, dep_gen_replay_emit_deps_json) so that if replay_rc != 0 you propagate failure instead of only logging: e.g., set the function/result state to indicate failure (return an error code or throw an exception / call the routine that aborts the run used by this module) so the caller sees the failure and the sim run terminates when deps.json emission fails.

coderabbitai · 2026-05-28T10:21:09Z

+    s_dep_gen_header->queues[q][current_tail].instance_index = 0;
+    s_dep_gen_header->queues[q][current_tail].buffer_ptr = buffer_ptr;
+    s_dep_gen_header->queues[q][current_tail].buffer_seq = buffer_seq;
+    s_dep_gen_header->queue_tails[q] = next_tail;


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Publish consumer-visible state only after the payload/reset is visible.

queue_tails[q], current_buf_ptr, and buf->count are the host-visible publish points here, but each is stored before the wmb(). That lets the host observe a ready entry before its fields land, or see a recycled/current buffer before count has been reset, or copy records after count increases but before the payload is fully committed.

Suggested ordering fix

s_dep_gen_header->queues[q][current_tail].instance_index = 0; s_dep_gen_header->queues[q][current_tail].buffer_ptr = buffer_ptr; s_dep_gen_header->queues[q][current_tail].buffer_seq = buffer_seq; + wmb(); s_dep_gen_header->queue_tails[q] = next_tail;

- s_dep_gen_state->current_buf_ptr = new_buf_ptr; - s_dep_gen_state->current_buf_seq = seq + 1; - wmb(); - DepGenBuffer *new_buf = reinterpret_cast<DepGenBuffer *>(new_buf_ptr); new_buf->count = 0; + wmb(); + s_dep_gen_state->current_buf_ptr = new_buf_ptr; + s_dep_gen_state->current_buf_seq = seq + 1;

- s_dep_gen_state->current_buf_ptr = buf_ptr; - s_dep_gen_state->current_buf_seq = 0; - wmb(); DepGenBuffer *buf = reinterpret_cast<DepGenBuffer *>(buf_ptr); buf->count = 0; + wmb(); + s_dep_gen_state->current_buf_ptr = buf_ptr; + s_dep_gen_state->current_buf_seq = 0;

- buf->count = idx + static_cast<uint32_t>(needed); - wmb(); + wmb(); + buf->count = idx + static_cast<uint32_t>(needed);

Also applies to: 121-126, 149-154, 321-322

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/a5/platform/src/aicpu/dep_gen_collector_aicpu.cpp` around lines 72 - 75, The host-visible fields are being published before the payload/reset is visible; move the publish points so that the payload and reset are fully committed before updating host-visible pointers: ensure the per-queue entry fields (s_dep_gen_header->queues[q][current_tail].instance_index, .buffer_ptr, .buffer_seq) and buffer reset (current_buf_ptr, buf->count) are written, then execute the memory barrier (wmb()) and only after that assign the host-visible tail pointer (s_dep_gen_header->queue_tails[q]) and any other host-visible pointer updates; apply the same reorder to the other occurrences involving s_dep_gen_header, queue_tails, current_buf_ptr and buf->count (also at the other noted spots).

coderabbitai · 2026-05-28T10:21:09Z

+#include "common/unified_log.h"
+#include "host/profiling_copy.h"
+
+DepGenCollector::~DepGenCollector() { stop(); }


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Rollback partial init() allocations before returning.

If any alloc_single_buffer() call fails after earlier allocations succeeded, init() returns immediately and leaves those buffers/mappings behind. Because initialized_ is still false, finalize() exits at Line 265 and the destructor only calls stop(), so the partial state is never unwound.

Also applies to: 111-130, 264-266

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/a5/platform/src/host/dep_gen_collector.cpp` at line 38, DepGenCollector::init() must roll back any successfully created buffers/mappings if a later alloc_single_buffer() fails: on error, iterate the list of already-allocated entries (the same containers used in init()), unmap/free each buffer, close any fds, and remove/clear those entries so no partial state remains; ensure initialized_ stays false and any temporary resources are released (or alternatively set initialized_ = true and call finalize() only after the partial state has been made consistent) so the destructor/stop() path won't leak. Reference symbols: DepGenCollector::init(), alloc_single_buffer(), finalize(), DepGenCollector::~DepGenCollector(), initialized_, stop().

coderabbitai · 2026-05-28T10:21:09Z

+        for (int32_t i = 0; i < tc; i++) {
+            tref_buf[i].ptr = reinterpret_cast<const Tensor *>(&rec.tensors[i][0]);
+            atype_buf[i] = static_cast<TensorArgType>(rec.arg_types[i]);
+        }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

❓ Verification inconclusive

🏁 Script executed:

#!/bin/bash # Verify whether DepGenRecord::tensors is typed/aligned storage or raw bytes. fd -i 'dep_gen\.h|tensor\.h' src | xargs -r rg -n -C3 'struct DepGenRecord|tensors\[|alignas$Tensor$|struct Tensor|class Tensor'

Repository: hw-native-sys/simpler

Length of output: 7792

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp" HDR="src/a5/platform/include/common/dep_gen.h" echo "=== dep_gen_replay.cpp: cast site (lines ~520-620) ===" nl -ba "$FILE" | sed -n '500,620p' echo "=== dep_gen_replay.cpp: where tref_buf.tptr is dereferenced/consumed (lines ~620-740) ===" nl -ba "$FILE" | sed -n '620,740p' echo "=== dep_gen_replay.cpp: find usages of tref_buf and '.ptr' ===" rg -n "tref_buf|\.ptr\b" "$FILE" echo "=== dep_gen.h: DepGenRecord::tensors definition ===" nl -ba "$HDR" | sed -n '70,140p'

Repository: hw-native-sys/simpler

Length of output: 181

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp" HDR="src/a5/platform/include/common/dep_gen.h" echo "=== dep_gen_replay.cpp: cast site (lines ~500-620) ===" cat -n "$FILE" | sed -n '500,620p' echo "=== dep_gen_replay.cpp: where tref_buf is consumed (lines ~620-740) ===" cat -n "$FILE" | sed -n '620,740p' echo "=== dep_gen_replay.cpp: find usages of tref_buf and '.ptr' ===" rg -n "tref_buf|\.ptr\b" "$FILE" echo "=== dep_gen.h: DepGenRecord::tensors definition ===" cat -n "$HDR" | sed -n '70,140p'

Repository: hw-native-sys/simpler

Length of output: 18259

🏁 Script executed:

#!/bin/bash set -euo pipefail # Focused file locations TENSOR_HDRS="$(fd -i 'tensor\.h' src | head -n 20)" echo "Tensor headers (sample):" echo "$TENSOR_HDRS" echo "=== Find Tensor / TensorRef definitions ===" fd -i 'tensor\.h' src | xargs -r rg -n --hidden --no-ignore-vcs 'struct Tensor\b|class Tensor\b|struct TensorRef\b|class TensorRef\b|using TensorRef\b|DEP_GEN_TENSOR_SIZE\b' echo "=== dep_gen capture: where DepGenRecord::tensors gets written ===" fd -i 'dep_gen_.*\.(cpp|cc|cxx|h|hpp|hxx)' src | xargs -r rg -n --hidden --no-ignore-vcs '&\s*rec\.tensors|dep_gen.*tensors\[|memcpy\([^,]*rec\.tensors|rec\.tensors\s*\[|DEP_GEN_TENSOR_SIZE' # Also search by the field name more broadly (limited to relevant dirs via ripgrep) rg -n --hidden --no-ignore-vcs '\bDepGenRecord\b|tensors\[\w*\]\[|DEP_GEN_TENSOR_SIZE' src/a5/platform/include/common/dep_gen.h src/a5/runtime/tensormap_and_ringbuffer/host src/a5/runtime/tensormap_and_ringbuffer/runtime || true

Repository: hw-native-sys/simpler

Length of output: 252

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Definitions: Tensor / TensorRef / TensorArgType / DEP_GEN_TENSOR_SIZE echo "=== Tensor / TensorRef definitions (search) ===" rg -n --hidden --no-ignore-vcs --glob 'src/**/tensor.h' \ 'struct TensorRef\b|class TensorRef\b|using TensorRef\b|struct Tensor\b|class Tensor\b|enum class TensorArgType\b|enum TensorArgType\b|DEP_GEN_TENSOR_SIZE\b|TensorArgType\b' \ src/a5/runtime/tensormap_and_ringbuffer/runtime/tensor.h src/a2a3/runtime/tensormap_and_ringbuffer/runtime/tensor.h echo echo "=== dep_gen.h: constants around DEP_GEN_TENSOR_SIZE ===" rg -n --hidden --no-ignore-vcs 'DEP_GEN_TENSOR_SIZE' src/a5/platform/include/common/dep_gen.h cat -n src/a5/platform/include/common/dep_gen.h | sed -n '1,220p' # 2) Where DepGenRecord::tensors gets written in capture path echo echo "=== Find writes/initialization to DepGenRecord::tensors ===" rg -n --hidden --no-ignore-vcs \ 'rec\.tensors\s*\[|\.tensors\s*\[|tensors\s*\[.*DEP_GEN_TENSOR_SIZE|DEP_GEN_TENSOR_SIZE' \ src/a5 | head -n 200 echo echo "=== Search for serialization/copy into tensors blobs (memcpy / placement new / make_tensor_* ) ===" rg -n --hidden --no-ignore-vcs \ 'memcpy$[^;]*tensors|std::memcpy\([^;]*tensors|placement new|::new\s*\([^;]*tensors|reinterpret_cast<\s*Tensor\s*\*>\s*\(&rec\.tensors' \ src/a5 | head -n 200 # 3) Also check if dep_gen capture exists in other modules echo echo "=== Search for DepGenRecord usage outside a5 ===" rg -n --hidden --no-ignore-vcs 'DepGenRecord\b' src | head -n 200

Repository: hw-native-sys/simpler

Length of output: 113

🏁 Script executed:

#!/bin/bash set -euo pipefail A5_TENSOR_HDR="src/a5/runtime/tensormap_and_ringbuffer/runtime/tensor.h" A2A3_TENSOR_HDR="src/a2a3/runtime/tensormap_and_ringbuffer/runtime/tensor.h" DEP_GEN_HDR="src/a5/platform/include/common/dep_gen.h" DEP_GEN_CPP_DIR="src/a5" echo "=== Tensor: find struct/class Tensor + TensorRef in a5 tensor.h ===" rg -n --no-heading --hidden --no-ignore-vcs 'struct Tensor\b|class Tensor\b|TensorRef|struct TensorRef\b|alignas\(Tensor$|make_tensor_' "$A5_TENSOR_HDR" echo echo "=== Tensor: show relevant sections in a5 tensor.h ===" # show around TensorRef first, then Tensor rg -n --no-heading --hidden --no-ignore-vcs 'TensorRef' "$A5_TENSOR_HDR" | head -n 20 | cut -d: -f1 | while read -r ln; do start=$((ln-20)); end=$((ln+80)); echo "--- $A5_TENSOR_HDR:$start-$end ---" sed -n "${start},${end}p" "$A5_TENSOR_HDR" | cat done echo rg -n --no-heading --hidden --no-ignore-vcs 'struct Tensor\b|class Tensor\b' "$A5_TENSOR_HDR" | head -n 20 | cut -d: -f1 | while read -r ln; do start=$((ln-20)); end=$((ln+120)); echo "--- $A5_TENSOR_HDR:$start-$end ---" sed -n "${start},${end}p" "$A5_TENSOR_HDR" | cat done echo echo "=== dep_gen.h: confirm DepGenRecord::tensors type ===" rg -n --no-heading --hidden --no-ignore-vcs 'struct DepGenRecord\b|tensors\[' "$DEP_GEN_HDR" sed -n '80,140p' "$DEP_GEN_HDR" | cat echo echo "=== Capture path: find writes into rec.tensors in src/a5 ===" rg -n --hidden --no-ignore-vcs \ 'rec\.tensors\s*\[|\.tensors\s*\[.*DEP_GEN_TENSOR_SIZE|tensors\s*\[.*\]\s*\=' \ "$DEP_GEN_CPP_DIR" | head -n 200 echo echo "=== Capture path: find memcpy/placement-new into tensors blobs ===" rg -n --hidden --no-ignore-vcs \ 'memcpy\s*\([^;]*tensors|std::memcpy\s*\([^;]*tensors|::new\s*\([^;]*tensors|placement new|start_lifetime' \ "$DEP_GEN_CPP_DIR" | head -n 200 echo echo "=== Capture path: find code that casts blob to Tensor (or TensorRef) ===" rg -n --hidden --no-ignore-vcs \ 'reinterpret_cast\s*<\s*(const\s+)?Tensor\s*\*>\s*\(&.*tensors|reinterpret_cast<\s*(const\s+)?Tensor\s*\*>' \ "$DEP_GEN_CPP_DIR" | head -n 200

Repository: hw-native-sys/simpler

Length of output: 276

🏁 Script executed:

#!/bin/bash set -euo pipefail A5_TENSOR_HDR="src/a5/runtime/tensormap_and_ringbuffer/runtime/tensor.h" echo "=== Occurrences of TensorRef in a5 tensor.h ===" rg -n --hidden --no-ignore-vcs 'TensorRef' "$A5_TENSOR_HDR" || true echo echo "=== Occurrences of 'struct Tensor' / 'class Tensor' in a5 tensor.h ===" rg -n --hidden --no-ignore-vcs 'struct Tensor\b|class Tensor\b' "$A5_TENSOR_HDR" || true echo echo "=== Print tail/middle sections of a5 tensor.h (to locate definitions) ===" echo "--- head ---" sed -n '1,220p' "$A5_TENSOR_HDR" | cat echo echo "--- mid (220-420) ---" sed -n '220,420p' "$A5_TENSOR_HDR" | cat echo echo "--- mid2 (420-620) ---" sed -n '420,620p' "$A5_TENSOR_HDR" | cat echo echo "--- tail (620-900) ---" sed -n '620,900p' "$A5_TENSOR_HDR" | cat

Repository: hw-native-sys/simpler

Length of output: 25484

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Locate TensorRef definition in repo ===" rg -n --hidden --no-ignore-vcs '\bTensorRef\b' src | head -n 200 echo echo "=== Find capture code that writes into DepGenRecord::tensors blobs ===" rg -n --hidden --no-ignore-vcs '\.tensors\s*\[' src/a5 | head -n 200 echo echo "=== Find where DepGenRecord is constructed/filled (a5 capture path) ===" rg -n --hidden --no-ignore-vcs 'DepGenRecord' src/a5 | head -n 200 echo echo "=== Search for memcpy of Tensor-sized blobs into tensors[][] ===" rg -n --hidden --no-ignore-vcs '\bmemcpy\s*\([^;]*tensors' src/a5 | head -n 200

Repository: hw-native-sys/simpler

Length of output: 9724

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Look at the memcpy site that fills DepGenRecord::tensors FILE1="src/a5/platform/src/aicpu/dep_gen_collector_aicpu.cpp" echo "=== dep_gen_collector_aicpu.cpp around memcpy(rec->tensors[i], ...) ===" cat -n "$FILE1" | sed -n '240,330p' # 2) Look at static_assert in orchestrator about tensor blob size FILE2="src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp" echo echo "=== pto_orchestrator.cpp static_assert sizeof(Tensor) == DEP_GEN_TENSOR_SIZE ===" cat -n "$FILE2" | sed -n '1,120p' # 3) Look at comment in dep_gen_collector_aicpu.h about DEP_GEN_TENSOR_SIZE bytes memcpy'd FILE3="src/a5/platform/include/aicpu/dep_gen_collector_aicpu.h" echo echo "=== dep_gen_collector_aicpu.h comment about memcpy into record.tensors[i] ===" cat -n "$FILE3" | sed -n '60,140p' # 4) Look at TensorRef union definition to see how Tensor/metadata are accessed on host FILE4="src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.h" echo echo "=== pto_types.h TensorRef union ===" cat -n "$FILE4" | sed -n '120,220p'

Repository: hw-native-sys/simpler

Length of output: 19882

🏁 Script executed:

#!/bin/bash set -euo pipefail # Search for trivially copyable / implicit lifetime assertions rg -n --hidden --no-ignore-vcs 'is_trivially_copyable<.*Tensor|is_trivially_copyable|implicit.?lifetime|std::launder|launder$' \ src/a5/runtime/tensormap_and_ringbuffer/runtime/tensor.h src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp src/a5 | head -n 200 echo echo "=== Search for any Tensor object reconstruction in dep_gen replay ===" rg -n --hidden --no-ignore-vcs 'dep_gen_replay.*(memcpy|bit_cast|std::launder|placement new|start_lifetime)' \ src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp || true echo echo "=== Inspect DepGenRecord alignment/tensor size constants in dep_gen.h ===" rg -n --hidden --no-ignore-vcs 'DEP_GEN_TENSOR_SIZE|CORE_MAX_TENSOR_ARGS|alignof\(DepGenRecord$|__attribute__$\(aligned|offsetof\(DepGenRecord, tensors$' \ src/a5/platform/include/common/dep_gen.h

Repository: hw-native-sys/simpler

Length of output: 47

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Check whether Tensor is asserted as trivially copyable echo "=== Search for trivially copyable assertions for Tensor ===" rg -n --hidden --no-ignore-vcs \ 'is_trivially_copyable.*Tensor|is_trivially_copyable_v.*Tensor|std::is_trivially_copyable.*Tensor|static_assert$[^)]*trivially_copyable[^)]*Tensor' \ src | head -n 200 || true echo echo "=== Search for std::start_lifetime_as / std::launder usage ===" rg -n --hidden --no-ignore-vcs \ 'start_lifetime_as<|std::start_lifetime_as|std::launder|__builtin_launder' \ src | head -n 200 || true echo echo "=== Find C++ standard flags (-std=c++*) in build files ===" rg -n --hidden --no-ignore-vcs \ '-std=c\+\+[0-9]+\s*|/std:c\+\+[0-9]+' \ . | head -n 200 || true echo echo "=== Show alignas/lifetime relevant traits for Tensor (quick check) ===" # show type declaration line containing alignas(64) rg -n --hidden --no-ignore-vcs 'struct alignas\(64$ Tensor|class alignas$64$ Tensor|struct alignas$64$ Tensor' \ src/a5/runtime/tensormap_and_ringbuffer/runtime/tensor.h

Repository: hw-native-sys/simpler

Length of output: 1964

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Confirm DepGenRecord::tensors is raw byte storage rg -n 'struct DepGenRecord|uint8_t\s+tensors\[' src/a5/platform/include/common/dep_gen.h # 2) Confirm capture uses memcpy into rec->tensors[i] rg -n 'memcpy\s*$rec->tensors\[' src/a5/platform/src/aicpu/dep_gen_collector_aicpu.cpp # 3) Confirm Tensor alignment/size rg -n 'struct\s+alignas\(64$\s+Tensor|static_assert$sizeof\(Tensor$' src/a5/runtime/tensormap_and_ringbuffer/runtime/tensor.h # 4) Check whether Tensor is treated as implicit-lifetime / trivially copyable anywhere rg -n --hidden --no-ignore-vcs 'is_trivially_copyable<.*Tensor|is_trivially_copyable_v<.*Tensor|implicit.?lifetime|start_lifetime_as|std::launder|__builtin_launder' src/a5/runtime/tensormap_and_ringbuffer/runtime/tensor.h src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp src/a5 | head -n 200 # 5) Find C++ standard flags used by this repo (implicit-lifetime rules depend on it) rg -n --hidden --no-ignore-vcs --glob '*.{cmake,sh,txt,mk}' '-std=c\+\+[0-9]+' . | head -n 200

Repository: hw-native-sys/simpler

Length of output: 1964

🏁 Script executed:

#!/bin/bash set -euo pipefail # C++ standard mode affects whether memcpy-written storage is valid for implicit-lifetime types. echo "== C++ standard flags ==" rg -n --hidden --no-ignore-vcs --glob '*.{cmake,sh,txt,mk,cc,cpp,h,hpp}' '-std=c\+\+[0-9]+' . || true echo echo "== Tensor trivial/implicit-lifetime signals (look for traits/asserts) ==" rg -n --hidden --no-ignore-vcs 'is_trivially_copyable.*Tensor|is_trivially_copyable_v.*Tensor|implicit.?lifetime|start_lifetime_as|std::launder|__builtin_launder' \ src/a5/runtime/tensormap_and_ringbuffer/runtime/tensor.h src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.h || true echo echo "== Any existing pattern for materializing tensors from byte blobs ==" rg -n --hidden --no-ignore-vcs 'start_lifetime_as<.*Tensor>|placement new.*Tensor|std::launder.*Tensor|bit_cast.*Tensor' src/a5 || true

Repository: hw-native-sys/simpler

Length of output: 1964

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Locate this repo's C++ standard flags (-std=c++*) rg -n --hidden --no-ignore-vcs --glob '*.{cmake,sh,txt,mk,cc,cpp,h,hpp}' '-std=c\+\+[0-9]+' . || true # 2) Confirm DepGenRecord tensor blob type is bytes rg -n --hidden --no-ignore-vcs 'struct DepGenRecord|uint8_t\s+tensors\[' src/a5/platform/include/common/dep_gen.h # 3) Confirm capture uses memcpy into rec->tensors[i] rg -n --hidden --no-ignore-vcs 'memcpy\s*\(rec->tensors\[.*\],\s*tensor_ptrs\[' \ src/a5/platform/src/aicpu/dep_gen_collector_aicpu.cpp # 4) Check for any traits/asserts about Tensor triviality/implicit-lifetime support rg -n --hidden --no-ignore-vcs 'is_trivially_copyable.*Tensor|is_trivial.*Tensor|implicit.?lifetime|start_lifetime_as<|std::launder|__builtin_launder' \ src/a5/runtime/tensormap_and_ringbuffer/runtime/tensor.h \ src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp \ src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.h || true

Repository: hw-native-sys/simpler

Length of output: 1964

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) C++ standard mode (object-lifetime rules depend on it) rg -n --hidden --no-ignore-vcs --glob '*.{cmake,sh,txt,mk,cc,cpp,h,hpp}' '-std=c\+\+[0-9]+' . || true # 2) Confirm tensors[] is raw bytes and capture uses memcpy into it rg -n --hidden --no-ignore-vcs 'struct DepGenRecord|uint8_t\s+tensors\[' src/a5/platform/include/common/dep_gen.h rg -n --hidden --no-ignore-vcs 'memcpy\s*\(rec->tensors\[|memcpy\s*\(rec\.tensors\[' src/a5/platform/src/aicpu/dep_gen_collector_aicpu.cpp # 3) Look for any Tensor implicit-lifetime / triviality guarantees or existing lifetime-materialization patterns rg -n --hidden --no-ignore-vcs 'is_trivially_copyable.*Tensor|std::is_trivially_copyable.*Tensor|implicit.?lifetime|start_lifetime_as|std::launder|__builtin_launder' \ src/a5/runtime/tensormap_and_ringbuffer/runtime/tensor.h \ src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp \ src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.h || true

Repository: hw-native-sys/simpler

Length of output: 1964

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Locate C++ standard flags (-std=c++*) rg -n --hidden --no-ignore-vcs --glob '*.{cmake,sh,txt,mk,cc,cpp,h,hpp}' '-std=c\+\+[0-9]+' . || true # 2) Reconfirm capture writes tensors as bytes rg -n --hidden --no-ignore-vcs 'memcpy\s*$rec->tensors\[|memset\s*\(rec->tensors\[|DEP_GEN_TENSOR_SIZE' \ src/a5/platform/src/aicpu/dep_gen_collector_aicpu.cpp src/a5/platform/include/common/dep_gen.h # 3) Look for any established implicit-lifetime / object-materialization helpers rg -n --hidden --no-ignore-vcs 'start_lifetime_as<|std::launder|__builtin_launder|placement new|memcpy\s*\(.*Tensor' \ src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp src/a5/runtime/tensormap_and_ringbuffer/runtime/tensor.h src/a5/runtime || true

Repository: hw-native-sys/simpler

Length of output: 1964

🏁 Script executed:

#!/bin/bash set -euo pipefail # Blob type/layout rg -n 'uint8_t\s+tensors\[' src/a5/platform/include/common/dep_gen.h rg -n '__attribute__\(\(aligned\(64$\)\)|offsetof$DepGenRecord,\s*tensors$' src/a5/platform/include/common/dep_gen.h # Capture writes bytes into blob rg -n 'memcpy\s*$\s*rec->tensors\[' src/a5/platform/src/aicpu/dep_gen_collector_aicpu.cpp # Tensor alignment/size and (if present) trivial/implicit-lifetime indicators rg -n 'struct\s+alignas\(64$\s+Tensor|static_assert$sizeof\(Tensor$' src/a5/runtime/tensormap_and_ringbuffer/runtime/tensor.h rg -n 'is_trivially_copyable|is_trivial|implicit.?lifetime|start_lifetime_as|std::launder|__builtin_launder' src/a5/runtime/tensormap_and_ringbuffer/runtime/tensor.h src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp || true # C++ standard mode (affects implicit-lifetime rules) rg -n --hidden --no-ignore-vcs --glob '*.{cmake,sh,txt,mk,cc,cpp,h,hpp}' '-std=c\+\+[0-9]+' . || true

Repository: hw-native-sys/simpler

Length of output: 1964

Avoid object-lifetime UB when replaying Tensor from DepGenRecord::tensors

dep_gen_replay.cpp reinterprets the DepGenRecord::tensors byte blobs (uint8_t tensors[...][DEP_GEN_TENSOR_SIZE]) as const Tensor* (reinterpret_cast<const Tensor *>(&rec.tensors[i][0])) and then dereferences them to read Tensor fields. The capture path fills these blobs via memcpy(rec->tensors[i], ..., DEP_GEN_TENSOR_SIZE) (or zeros for null slots), and the intended layout/alignment is enforced (DepGenRecord aligns tensors[] to 64B; Tensor is alignas(64) with static_assert(sizeof(Tensor) == DEP_GEN_TENSOR_SIZE)). The remaining risk is C++ object-lifetime/implicit-lifetime legality for turning memcpy-written bytes into an active Tensor object; if the build mode doesn’t make this pattern language-safe, replay should materialize into a real aligned Tensor object before dereference.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp` around lines 540 - 543, The loop currently reinterprets raw bytes in DepGenRecord::tensors as const Tensor* which risks object-lifetime UB; instead allocate a temporary array/vector of real Tensor objects (e.g. std::vector<Tensor> replay_tensors(tc) or an aligned buffer of Tensor) and for each i do a memcpy(&replay_tensors[i], &rec.tensors[i][0], sizeof(Tensor)) to materialize a real aligned Tensor object, then set tref_buf[i].ptr = &replay_tensors[i] and atype_buf[i] as before (ensure replay_tensors lives long enough for the replay usage).

coderabbitai · 2026-05-28T10:21:10Z

+            for (size_t j = rec_i + 1; j < num_records; j++) {
+                const DepGenRecord &maybe = records[j];
+                if (!(maybe.flags & DEP_GEN_FLAG_OVERFLOW)) {
+                    LOG_ERROR(
+                        "dep_gen replay: unterminated overflow chain at rec_idx=%zu (task_id=%" PRIu64 ")", rec_i,
+                        rec.task_id
+                    );
+                    break;
+                }
+                if (maybe.task_id != rec.task_id) {
+                    LOG_ERROR(
+                        "dep_gen replay: orphan overflow at rec_idx=%zu (expected task_id=%" PRIu64 ", found %" PRIu64
+                        ")",
+                        j, rec.task_id, maybe.task_id
+                    );
+                    break;
+                }
+                const auto *over = reinterpret_cast<const DepGenOverflowRecord *>(&maybe);
+                uint16_t over_dc = over->dep_count;
+                if (over_dc > DEP_GEN_OVERFLOW_DEPS_PER_RECORD) {
+                    LOG_ERROR(
+                        "dep_gen replay: clamping overflow dep_count %u > %d at rec_idx=%zu (task_id=%" PRIu64 ")",
+                        over_dc, DEP_GEN_OVERFLOW_DEPS_PER_RECORD, j, rec.task_id
+                    );
+                    over_dc = DEP_GEN_OVERFLOW_DEPS_PER_RECORD;
+                }
+                full_deps_buf.insert(full_deps_buf.end(), over->deps, over->deps + over_dc);
+                if (over->flags & DEP_GEN_FLAG_LAST_OVERFLOW) {
+                    chain_complete = true;
+                    break;
+                }
+            }
+            if (!chain_complete) {
+                LOG_ERROR(
+                    "dep_gen replay: chain for task_id=%" PRIu64 " missing LAST_OVERFLOW marker — "
+                    "using partial dep list (%zu deps)",
+                    rec.task_id, full_deps_buf.size()
+                );
+            }
+            deps_data = full_deps_buf.data();


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail replay on malformed overflow chains instead of writing partial deps.

The orphan/unterminated-chain paths only log and keep going with a truncated explicit-dependency list. That still allows a "successful" deps.json with missing arrows, which is worse than failing the replay the way the oracle-divergence path already does.

Suggested fix

bool chain_complete = false; + bool malformed_chain = false; for (size_t j = rec_i + 1; j < num_records; j++) { const DepGenRecord &maybe = records[j]; if (!(maybe.flags & DEP_GEN_FLAG_OVERFLOW)) { LOG_ERROR( "dep_gen replay: unterminated overflow chain at rec_idx=%zu (task_id=%" PRIu64 ")", rec_i, rec.task_id ); + malformed_chain = true; break; } if (maybe.task_id != rec.task_id) { LOG_ERROR( "dep_gen replay: orphan overflow at rec_idx=%zu (expected task_id=%" PRIu64 ", found %" PRIu64 ")", j, rec.task_id, maybe.task_id ); + malformed_chain = true; break; } const auto *over = reinterpret_cast<const DepGenOverflowRecord *>(&maybe); uint16_t over_dc = over->dep_count; if (over_dc > DEP_GEN_OVERFLOW_DEPS_PER_RECORD) { @@ if (over->flags & DEP_GEN_FLAG_LAST_OVERFLOW) { chain_complete = true; break; } } - if (!chain_complete) { + if (malformed_chain || !chain_complete) { LOG_ERROR( - "dep_gen replay: chain for task_id=%" PRIu64 " missing LAST_OVERFLOW marker — " - "using partial dep list (%zu deps)", + "dep_gen replay: invalid overflow chain for task_id=%" PRIu64, rec.task_id, full_deps_buf.size() ); + tm_oracle.destroy(); + tm_annot.destroy(); + return -7; }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp` around lines 568 - 607, The code currently logs orphan/unterminated overflow chains inside the loop (checking DEP_GEN_FLAG_OVERFLOW and matching task_id) and then continues, allowing a truncated deps list to be used; instead, when encountering these malformed chains (or when chain_complete is false after scanning overflow records) abort the replay immediately rather than proceeding: in the block handling orphan overflow and the block handling unterminated chains, replace the LOG_ERROR-only behavior with a hard failure (e.g., return an error status or throw an exception) from the enclosing function so that DepGenRecord/DepGenOverflowRecord chains that are malformed do not lead to using full_deps_buf/deps_data; ensure the failure path prevents setting deps_data and propagates a clear error for the caller to detect.

coderabbitai · 2026-05-28T10:21:10Z

+    if (is_dep_gen_enabled()) {
+        const void *tensor_ptrs[MAX_TENSOR_ARGS];
+        // TensorArgType is `enum class : int32_t` (4 bytes); the on-disk record
+        // packs arg_types as uint8_t[16] (5-value enum fits in a byte). Narrow
+        // each tag here rather than letting the AICPU writer reinterpret a
+        // 4×-wider array as bytes — that path silently lost two of every three
+        // tags on little-endian and synthesized phantom self-edges in replay.
+        uint8_t arg_types_u8[MAX_TENSOR_ARGS];
+        // Clamp to MAX_TENSOR_ARGS even though the Arg builder caps adds at
+        // MAX_TENSOR_ARGS: defensive against any future builder bypass /
+        // shared-memory bit-flip that could otherwise overrun the two
+        // MAX_TENSOR_ARGS-sized stack buffers above.
+        const int tc_raw = args.tensor_count();
+        const int tc = tc_raw > MAX_TENSOR_ARGS ? MAX_TENSOR_ARGS : tc_raw;
+        for (int i = 0; i < tc; i++) {
+            // OUTPUT slots carry create_info (not yet a Tensor); skip them —
+            // they have no producer to look up and replay's per-tensor loop
+            // also skips OUTPUT.
+            tensor_ptrs[i] = (args.tag(i) == TensorArgType::OUTPUT) ? nullptr : args.tensor(i).ptr;
+            arg_types_u8[i] = static_cast<uint8_t>(args.tag(i));
+        }
+        dep_gen_aicpu_record_submit(
+            task_id.raw, orch->in_manual_scope(), tc, tensor_ptrs, arg_types_u8,
+            static_cast<int>(args.explicit_dep_count()), reinterpret_cast<const uint64_t *>(args.explicit_deps_data())
+        );


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Record the runtime-filtered explicit deps, not the raw input list.

This capture writes args.explicit_deps_data() verbatim, but the runtime immediately below drops invalid/already-dead deps before they ever participate in fanin. Replay can therefore emit explicit edges that the runtime never enforced, which makes deps.json drift from the real dependency graph.

Also applies to: 534-551

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp` around lines 488 - 512, The code currently records the raw args.explicit_deps_data()/args.explicit_dep_count() into dep_gen_aicpu_record_submit, but you must record the runtime-filtered explicit-deps list (the array and count produced after invalid/already-dead deps are dropped) so replay matches actual enforced edges; update the dep_gen_aicpu_record_submit calls (the one shown and the similar call around the 534-551 region) to pass the filtered deps buffer and its filtered count instead of args.explicit_deps_data() and args.explicit_dep_count(), using the same filtered-deps variable(s) produced by the runtime’s dep-filtering code path.

coderabbitai · 2026-05-28T10:21:10Z

+        if not deps_path.exists():
+            # Output dir exists but no deps.json — another diagnostic flag was
+            # on (e.g. just --enable-l2-swimlane) but not --enable-dep-gen.
+            return


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when dep_gen is enabled but deps.json is missing.

Line 139 currently returns silently, which can mask the exact regression this test is supposed to catch when dep_gen is effectively enabled.

Proposed fix

deps_path = out_dir / "deps.json" - if not deps_path.exists(): - # Output dir exists but no deps.json — another diagnostic flag was - # on (e.g. just --enable-l2-swimlane) but not --enable-dep-gen. - return + assert deps_path.exists(), ( + f"dep_gen was enabled but {deps_path} is missing. " + "Likely cause: dep_gen capture/replay did not emit the artifact." + ) with deps_path.open() as f: deps = json.load(f)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/st/a5/tensormap_and_ringbuffer/dfx/dep_gen/test_dep_gen.py` around lines 139 - 142, The code currently returns silently when deps_path (the expected deps.json) is missing, which hides failures when dep_gen is enabled; replace the silent return with a fail-fast check (e.g. assert deps_path.exists(), f"dep_gen enabled but deps.json missing at {deps_path}" or pytest.fail(...) ) so the test fails with a clear message; ensure pytest is imported if you use pytest.fail and reference the symbols deps_path, deps.json and dep_gen in the failure message to aid debugging.

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

indigo1973 changed the title ~~Add: dep_gen capture+replay support on a5~~ [WIP] Add: dep_gen capture+replay support on a5 May 28, 2026

		// Free remaining host shadows (per-state buffers + shm region).
		manager_.clear_mappings();

	uint64_t total_collected_ = 0;
	std::atomic<uint64_t> total_collected_ = 0;
	std::vector<void*> malloced_host_ptrs_;

Conversation

indigo1973 commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 28, 2026 •

edited

Loading