Refactor: RAII rollback for profiling collector init paths (+125 lines)#948
Conversation
|
Warning Review limit reached
More reviews will be available in 36 minutes and 10 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces an RAII rollback mechanism (InitRollbackGuard) to handle initialization failures across various collectors (PmuCollector, ScopeStatsCollector, and TensorDumpCollector), alongside a new release_all_owned method in BufferPoolManager for abort-path cleanup. The review feedback highlights two key improvement opportunities: first, using malloc_shadows_.erase instead of count to atomically check and remove host shadows to prevent potential double-free issues; second, calling manager_.release_all_owned unconditionally in the guard's destructor to ensure host shadows are cleaned up even if the device release callback is empty.
Plug the latent leak CodeRabbit flagged on hw-native-sys#944: a5 PmuCollector::init, ScopeStatsCollector::init, TensorDumpCollector::init register paired device+host buffers in BufferPoolManager via alloc_paired_buffer BEFORE flipping the initialized_ flag. If a subsequent allocation fails, init() returns -1; finalize() then early-exits (gated on initialized_ / shm_host_) and every registered device buffer + framework-malloc'd host shadow leaks. The pattern existed pre-hw-native-sys#944 (old alloc_single_buffer + manual register_mapping had the same shape), so this is not a regression of the unification work — just the cleanup it enables. Framework changes ----------------- - BufferPoolManager::release_all_owned(release_fn) [new]: abort-path cleanup that releases EVERY framework-tracked dev_ptr (via release_fn) and every framework-malloc'd host shadow (via std::free), then clears all internal containers. Distinct from release_owned_buffers() because this also catches buffers parked in callers' SPSC free_queues (tracked via register_mapping but not framework-owned via a queue). Drains recycled/done/ready first (just clears — release goes via dev_to_host_ to avoid double-free) then walks the full mapping table. - profiling_common::InitRollbackGuard<Manager> [new, profiler_base.h]: RAII scope guard for collector init() rollback. Holds a manager reference + release_fn + a vector of "extra direct dev_ptrs" the collector owns outside the framework (e.g. PMU per-core PmuAicoreRings on a5 — plain alloc_cb allocations with no host shadow). On destruction without commit(), calls manager.release_all_owned + free_cb on each direct ptr. Move-only. Collector wiring ---------------- - common/scope_stats_collector.cpp init(): construct guard after set_memory_context, commit() right before return 0. Catches the shm region + ScopeStatsBuffer entries (free_queue and recycled pool). - common/tensor_dump_collector.cpp init(): same pattern. Catches the shm region + per-thread arenas + DumpMetaBuffers (free_queue and recycled pool). - a5/pmu_collector.cpp init(): same pattern + guard.add_direct_ptr(ring) for each per-core PmuAicoreRing (those don't go through alloc_paired_buffer so the framework doesn't track them — register them with the guard explicitly). Test plan --------- - a2a3sim ST L1+L2: pass (rollback path inert on success — guard.commit short-circuits the destructor). - a5sim ST L1+L2: pass (same). - Build all four libhost_runtime.so (a2a3 onboard/sim, a5 onboard/sim): clean. Net: +122 lines (5 files touched).
fbca3d8 to
6583f39
Compare
…n, consolidate common/, rename platform/src→shared) Mechanical post-hw-native-sys#944/hw-native-sys#945/hw-native-sys#948 layout cleanup, motivated by an audit of duplicated and oddly-placed files that the recent unification work left behind. Four orthogonal changes bundled here because each touches the same set of CMakeLists; splitting would mean three more rebuild rounds and three more cmake-include-path edits across the same files. 1. Extract identical aicpu sources to common/platform/{onboard,sim}/aicpu/ ------------------------------------------------------------------------ Seven files per backend (14 total) were byte-identical (or differed only in a one-line @brief arch qualifier) between a2a3 and a5: cache_ops.cpp, device_log.cpp, device_time.cpp, device_malloc.cpp, orch_so_file.cpp, platform_aicpu_affinity.cpp, spin_hint.h Moved a2a3's copy to common/, deleted a5's duplicate, and extended each arch's onboard/aicpu and sim/aicpu CMakeLists COMMON_SOURCES glob to pick them up from common/platform/{onboard,sim}/aicpu/. The device_malloc.cpp arch tag in its @brief was the only real content diff; generalized to "Real Hardware" / "Simulation" without the arch qualifier. Backfilled a copyright header that was missing on device_time.cpp (caught by the check-headers hook). The remaining files in per-arch aicpu/ (kernel.cpp, inner_platform_regs.cpp) have real arch-specific divergence (register addresses, kernel protocols) and stay where they are. 2. Flatten profiling_common/ subdir ------------------------------------ src/common/platform/include/host/profiling_common/{buffer_pool_manager, profiler_base}.h → src/common/platform/include/host/{buffer_pool_manager, profiler_base}.h. Updated 10 #include sites and the 2 header guards. The profiling_common:: C++ namespace stays — file path and namespace don't have to match. 3. Consolidate small src/common subdirs ---------------------------------------- - src/common/device_comm/device_arena.h → src/common/utils/device_arena.h. The file is a generic bump-arena utility, not a comm primitive; the enclosing dir name was misleading. Updated 10 #include sites "device_arena.h" → "utils/device_arena.h" and dropped the common/device_comm entry from 8 CMakeLists (replaced with common since utils/ resolves there). - src/common/sim_context/ → src/common/platform/sim/sim_context/. The dir is sim-only infrastructure (CPU sim context for CANN intrinsic emulation), so it belongs next to the other common/platform/sim/ shared sim infrastructure. Updated: * the dir's own CMakeLists relative path to log/include; * simpler_setup/runtime_compiler.py::compile_sim_context source path; * 4 sim-host CMakeLists references; * a small handful of docs that named the old path. 4. Rename platform/src → platform/shared ------------------------------------------ Per-arch src/{arch}/platform/src/ was confusingly nested inside the top-level src/ directory and read as "src/src" in many paths. Renamed to shared/ across all 3 trees (a2a3, a5, common), matching its actual semantic ("shared between onboard and sim within one arch"). Updated 21 files that referenced the old path: CMakeLists, host headers, docs, one test file, and the src/{arch}/docs/platform.md map. Test plan --------- - Build all four libhost_runtime.so (a2a3 onboard/sim, a5 onboard/sim) + libcpu_sim_context.so + aicpu and aicore artifacts: clean. - CI will run the full ST + UT suite. Net: ~30 file renames, ~14 files extracted to common, +0 / -0 behavioral changes (pure layout).
…n, consolidate common/, rename platform/src→shared) Mechanical post-hw-native-sys#944/hw-native-sys#945/hw-native-sys#948 layout cleanup, motivated by an audit of duplicated and oddly-placed files that the recent unification work left behind. Four orthogonal changes bundled here because each touches the same set of CMakeLists; splitting would mean three more rebuild rounds and three more cmake-include-path edits across the same files. 1. Extract identical aicpu sources to common/platform/{onboard,sim}/aicpu/ ------------------------------------------------------------------------ Seven files per backend (14 total) were byte-identical (or differed only in a one-line @brief arch qualifier) between a2a3 and a5: cache_ops.cpp, device_log.cpp, device_time.cpp, device_malloc.cpp, orch_so_file.cpp, platform_aicpu_affinity.cpp, spin_hint.h Moved a2a3's copy to common/, deleted a5's duplicate, and extended each arch's onboard/aicpu and sim/aicpu CMakeLists COMMON_SOURCES glob to pick them up from common/platform/{onboard,sim}/aicpu/. The device_malloc.cpp arch tag in its @brief was the only real content diff; generalized to "Real Hardware" / "Simulation" without the arch qualifier. Backfilled a copyright header that was missing on device_time.cpp (caught by the check-headers hook). The remaining files in per-arch aicpu/ (kernel.cpp, inner_platform_regs.cpp) have real arch-specific divergence (register addresses, kernel protocols) and stay where they are. 2. Flatten profiling_common/ subdir ------------------------------------ src/common/platform/include/host/profiling_common/{buffer_pool_manager, profiler_base}.h → src/common/platform/include/host/{buffer_pool_manager, profiler_base}.h. Updated 10 #include sites and the 2 header guards. The profiling_common:: C++ namespace stays — file path and namespace don't have to match. 3. Consolidate small src/common subdirs ---------------------------------------- - src/common/device_comm/device_arena.h → src/common/utils/device_arena.h. The file is a generic bump-arena utility, not a comm primitive; the enclosing dir name was misleading. Updated 10 #include sites "device_arena.h" → "utils/device_arena.h" and dropped the common/device_comm entry from 8 CMakeLists (replaced with common since utils/ resolves there). - src/common/sim_context/ → src/common/platform/sim/sim_context/. The dir is sim-only infrastructure (CPU sim context for CANN intrinsic emulation), so it belongs next to the other common/platform/sim/ shared sim infrastructure. Updated: * the dir's own CMakeLists relative path to log/include; * simpler_setup/runtime_compiler.py::compile_sim_context source path; * 4 sim-host CMakeLists references; * a small handful of docs that named the old path. 4. Rename platform/src → platform/shared ------------------------------------------ Per-arch src/{arch}/platform/src/ was confusingly nested inside the top-level src/ directory and read as "src/src" in many paths. Renamed to shared/ across all 3 trees (a2a3, a5, common), matching its actual semantic ("shared between onboard and sim within one arch"). Updated 21 files that referenced the old path: CMakeLists, host headers, docs, one test file, and the src/{arch}/docs/platform.md map. Test plan --------- - Build all four libhost_runtime.so (a2a3 onboard/sim, a5 onboard/sim) + libcpu_sim_context.so + aicpu and aicore artifacts: clean. - CI will run the full ST + UT suite. Net: ~30 file renames, ~14 files extracted to common, +0 / -0 behavioral changes (pure layout).
…n, consolidate common/, rename platform/src→shared) Mechanical post-hw-native-sys#944/hw-native-sys#945/hw-native-sys#948 layout cleanup, motivated by an audit of duplicated and oddly-placed files that the recent unification work left behind. Four orthogonal changes bundled here because each touches the same set of CMakeLists; splitting would mean three more rebuild rounds and three more cmake-include-path edits across the same files. 1. Extract identical aicpu sources to common/platform/{onboard,sim}/aicpu/ ------------------------------------------------------------------------ Seven files per backend (14 total) were byte-identical (or differed only in a one-line @brief arch qualifier) between a2a3 and a5: cache_ops.cpp, device_log.cpp, device_time.cpp, device_malloc.cpp, orch_so_file.cpp, platform_aicpu_affinity.cpp, spin_hint.h Moved a2a3's copy to common/, deleted a5's duplicate, and extended each arch's onboard/aicpu and sim/aicpu CMakeLists COMMON_SOURCES glob to pick them up from common/platform/{onboard,sim}/aicpu/. The device_malloc.cpp arch tag in its @brief was the only real content diff; generalized to "Real Hardware" / "Simulation" without the arch qualifier. Backfilled a copyright header that was missing on device_time.cpp (caught by the check-headers hook). The remaining files in per-arch aicpu/ (kernel.cpp, inner_platform_regs.cpp) have real arch-specific divergence (register addresses, kernel protocols) and stay where they are. 2. Flatten profiling_common/ subdir ------------------------------------ src/common/platform/include/host/profiling_common/{buffer_pool_manager, profiler_base}.h → src/common/platform/include/host/{buffer_pool_manager, profiler_base}.h. Updated 10 #include sites and the 2 header guards. The profiling_common:: C++ namespace stays — file path and namespace don't have to match. 3. Consolidate small src/common subdirs ---------------------------------------- - src/common/device_comm/device_arena.h → src/common/utils/device_arena.h. The file is a generic bump-arena utility, not a comm primitive; the enclosing dir name was misleading. Updated 10 #include sites "device_arena.h" → "utils/device_arena.h" and dropped the common/device_comm entry from 8 CMakeLists (replaced with common since utils/ resolves there). - src/common/sim_context/ → src/common/platform/sim/sim_context/. The dir is sim-only infrastructure (CPU sim context for CANN intrinsic emulation), so it belongs next to the other common/platform/sim/ shared sim infrastructure. Updated: * the dir's own CMakeLists relative path to log/include; * simpler_setup/runtime_compiler.py::compile_sim_context source path; * 4 sim-host CMakeLists references; * a small handful of docs that named the old path. 4. Rename platform/src → platform/shared ------------------------------------------ Per-arch src/{arch}/platform/src/ was confusingly nested inside the top-level src/ directory and read as "src/src" in many paths. Renamed to shared/ across all 3 trees (a2a3, a5, common), matching its actual semantic ("shared between onboard and sim within one arch"). Updated 21 files that referenced the old path: CMakeLists, host headers, docs, one test file, and the src/{arch}/docs/platform.md map. Test plan --------- - Build all four libhost_runtime.so (a2a3 onboard/sim, a5 onboard/sim) + libcpu_sim_context.so + aicpu and aicore artifacts: clean. - CI will run the full ST + UT suite. Net: ~30 file renames, ~14 files extracted to common, +0 / -0 behavioral changes (pure layout).
…n, consolidate common/, rename platform/src→shared) Mechanical post-hw-native-sys#944/hw-native-sys#945/hw-native-sys#948 layout cleanup, motivated by an audit of duplicated and oddly-placed files that the recent unification work left behind. Four orthogonal changes bundled here because each touches the same set of CMakeLists; splitting would mean three more rebuild rounds and three more cmake-include-path edits across the same files. 1. Extract identical aicpu sources to common/platform/{onboard,sim}/aicpu/ ------------------------------------------------------------------------ Seven files per backend (14 total) were byte-identical (or differed only in a one-line @brief arch qualifier) between a2a3 and a5: cache_ops.cpp, device_log.cpp, device_time.cpp, device_malloc.cpp, orch_so_file.cpp, platform_aicpu_affinity.cpp, spin_hint.h Moved a2a3's copy to common/, deleted a5's duplicate, and extended each arch's onboard/aicpu and sim/aicpu CMakeLists COMMON_SOURCES glob to pick them up from common/platform/{onboard,sim}/aicpu/. The device_malloc.cpp arch tag in its @brief was the only real content diff; generalized to "Real Hardware" / "Simulation" without the arch qualifier. Backfilled a copyright header that was missing on device_time.cpp (caught by the check-headers hook). The remaining files in per-arch aicpu/ (kernel.cpp, inner_platform_regs.cpp) have real arch-specific divergence (register addresses, kernel protocols) and stay where they are. 2. Flatten profiling_common/ subdir ------------------------------------ src/common/platform/include/host/profiling_common/{buffer_pool_manager, profiler_base}.h → src/common/platform/include/host/{buffer_pool_manager, profiler_base}.h. Updated 10 #include sites and the 2 header guards. The profiling_common:: C++ namespace stays — file path and namespace don't have to match. 3. Consolidate small src/common subdirs ---------------------------------------- - src/common/device_comm/device_arena.h → src/common/utils/device_arena.h. The file is a generic bump-arena utility, not a comm primitive; the enclosing dir name was misleading. Updated 10 #include sites "device_arena.h" → "utils/device_arena.h" and dropped the common/device_comm entry from 8 CMakeLists (replaced with common since utils/ resolves there). - src/common/sim_context/ → src/common/platform/sim/sim_context/. The dir is sim-only infrastructure (CPU sim context for CANN intrinsic emulation), so it belongs next to the other common/platform/sim/ shared sim infrastructure. Updated: * the dir's own CMakeLists relative path to log/include; * simpler_setup/runtime_compiler.py::compile_sim_context source path; * 4 sim-host CMakeLists references; * a small handful of docs that named the old path. 4. Rename platform/src → platform/shared ------------------------------------------ Per-arch src/{arch}/platform/src/ was confusingly nested inside the top-level src/ directory and read as "src/src" in many paths. Renamed to shared/ across all 3 trees (a2a3, a5, common), matching its actual semantic ("shared between onboard and sim within one arch"). Updated 21 files that referenced the old path: CMakeLists, host headers, docs, one test file, and the src/{arch}/docs/platform.md map. Test plan --------- - Build all four libhost_runtime.so (a2a3 onboard/sim, a5 onboard/sim) + libcpu_sim_context.so + aicpu and aicore artifacts: clean. - CI will run the full ST + UT suite. Net: ~30 file renames, ~14 files extracted to common, +0 / -0 behavioral changes (pure layout).
…n, consolidate common/, rename platform/src→shared) Mechanical post-hw-native-sys#944/hw-native-sys#945/hw-native-sys#948 layout cleanup, motivated by an audit of duplicated and oddly-placed files that the recent unification work left behind. Four orthogonal changes bundled here because each touches the same set of CMakeLists; splitting would mean three more rebuild rounds and three more cmake-include-path edits across the same files. 1. Extract identical aicpu sources to common/platform/{onboard,sim}/aicpu/ ------------------------------------------------------------------------ Seven files per backend (14 total) were byte-identical (or differed only in a one-line @brief arch qualifier) between a2a3 and a5: cache_ops.cpp, device_log.cpp, device_time.cpp, device_malloc.cpp, orch_so_file.cpp, platform_aicpu_affinity.cpp, spin_hint.h Moved a2a3's copy to common/, deleted a5's duplicate, and extended each arch's onboard/aicpu and sim/aicpu CMakeLists COMMON_SOURCES glob to pick them up from common/platform/{onboard,sim}/aicpu/. The device_malloc.cpp arch tag in its @brief was the only real content diff; generalized to "Real Hardware" / "Simulation" without the arch qualifier. Backfilled a copyright header that was missing on device_time.cpp (caught by the check-headers hook). The remaining files in per-arch aicpu/ (kernel.cpp, inner_platform_regs.cpp) have real arch-specific divergence (register addresses, kernel protocols) and stay where they are. 2. Flatten profiling_common/ subdir ------------------------------------ src/common/platform/include/host/profiling_common/{buffer_pool_manager, profiler_base}.h → src/common/platform/include/host/{buffer_pool_manager, profiler_base}.h. Updated 10 #include sites and the 2 header guards. The profiling_common:: C++ namespace stays — file path and namespace don't have to match. 3. Consolidate small src/common subdirs ---------------------------------------- - src/common/device_comm/device_arena.h → src/common/utils/device_arena.h. The file is a generic bump-arena utility, not a comm primitive; the enclosing dir name was misleading. Updated 10 #include sites "device_arena.h" → "utils/device_arena.h" and dropped the common/device_comm entry from 8 CMakeLists (replaced with common since utils/ resolves there). - src/common/sim_context/ → src/common/platform/sim/sim_context/. The dir is sim-only infrastructure (CPU sim context for CANN intrinsic emulation), so it belongs next to the other common/platform/sim/ shared sim infrastructure. Updated: * the dir's own CMakeLists relative path to log/include; * simpler_setup/runtime_compiler.py::compile_sim_context source path; * 4 sim-host CMakeLists references; * a small handful of docs that named the old path. 4. Rename platform/src → platform/shared ------------------------------------------ Per-arch src/{arch}/platform/src/ was confusingly nested inside the top-level src/ directory and read as "src/src" in many paths. Renamed to shared/ across all 3 trees (a2a3, a5, common), matching its actual semantic ("shared between onboard and sim within one arch"). Updated 21 files that referenced the old path: CMakeLists, host headers, docs, one test file, and the src/{arch}/docs/platform.md map. Test plan --------- - Build all four libhost_runtime.so (a2a3 onboard/sim, a5 onboard/sim) + libcpu_sim_context.so + aicpu and aicore artifacts: clean. - CI will run the full ST + UT suite. Net: ~30 file renames, ~14 files extracted to common, +0 / -0 behavioral changes (pure layout).
Summary
Plugs the latent leak CodeRabbit flagged on #944 and that I deferred to a follow-up: a5
PmuCollector::init,ScopeStatsCollector::init,TensorDumpCollector::initregister paired device + host buffers inBufferPoolManagerviaalloc_paired_bufferBEFORE flippinginitialized_. A later allocation failure returns -1,finalize()early-exits (gated oninitialized_/shm_host_), and every registered device buffer + framework-malloc'd host shadow leaks.The pattern existed pre-#944 (old
alloc_single_buffer+ manualregister_mappinghad the same shape), so this is not a regression — just the cleanup #944'smalloc_shadows_tracking enables.Framework changes
BufferPoolManager::release_all_owned(release_fn)— new method. Abort-path cleanup: releases EVERY framework-tracked dev_ptr (viarelease_fn) and every framework-malloc'd host shadow (viastd::free), then clears all internal containers. Distinct fromrelease_owned_buffers()because this also catches buffers parked in callers' SPSCfree_queues (tracked viaregister_mappingbut not framework-owned via a queue). Drains recycled/done/ready first (just clears — release goes viadev_to_host_to avoid double-free), then walks the full mapping table.profiling_common::InitRollbackGuard<Manager>— new RAII scope guard inprofiler_base.h. Holds a manager reference + release_fn + a vector of "extra direct dev_ptrs" the collector owns outside the framework (e.g. PMU per-corePmuAicoreRings on a5 — plainalloc_cballocations with no host shadow). On destruction withoutcommit(), callsmanager.release_all_owned(release_fn)and thenrelease_fnon each direct ptr. Move-only.Collector wiring
common/scope_stats_collector.cpp::init()— construct guard right afterset_memory_context,guard.commit()right beforereturn 0. Catches the shm region +ScopeStatsBufferentries (free_queue and recycled pool).common/tensor_dump_collector.cpp::init()— same pattern. Catches the shm region + per-thread arenas +DumpMetaBuffers.a5/pmu_collector.cpp::init()— same pattern +guard.add_direct_ptr(ring)for each per-corePmuAicoreRing(those don't go throughalloc_paired_bufferso the framework doesn't track them — register them with the guard explicitly).Success-path cost
Constructing the guard is
std::functionmove +std::vectordefault-construct + bool init.commit()flips one bool. The destructor short-circuits oncommitted_. Hot-path overhead is a single boolean check perinit()call.What this does NOT touch
a2a3 collectors (
dep_gen,pmu,scope_stats,tensor_dumplegacy,l2_swimlane) — they predatealloc_paired_bufferand manage their own buffer lifecycle outside the framework'sdev_to_host_map. They're left as-is; the existing manualrelease_one_buffer+clear_mappingsloop in theirfinalize()handles cleanup.Test plan
guard.commitshort-circuits the destructor)libhost_runtime.so(a2a3 onboard/sim, a5 onboard/sim): cleanNet: +125 lines (5 files touched).