Fix: assert/LOG paths failing when pto2_current_runtime() is null on AICPU#519
Fix: assert/LOG paths failing when pto2_current_runtime() is null on AICPU#519yanghaoran29 wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
79006a7 to
9dc991d
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces environment variable fixes for CMake in the Python runtime compiler and implements explicit runtime binding in the AICPU executors for A2A3 and A5 platforms. The review identifies potential race conditions in the executors, suggesting that runtime binding should occur earlier in the initialization sequence to ensure thread safety. Additionally, optimizations are recommended for the Python environment setup to minimize redundant file system operations and path resolutions.
I am having trouble creating individual review comments. Click here to see my feedback.
src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp (2119-2122)
There is a potential race condition here. The orchestrator thread signals that the runtime is ready for schedulers at line 2103 (runtime_init_ready_.store(true)). Scheduler threads waiting for this signal (line 2330) will proceed to resolve_and_dispatch_pto2. If any framework or utility code called by the schedulers relies on pto2_framework_current_runtime() (which accesses the global g_pto2_current_runtime being bound here), they might encounter a nullptr if they reach that code before the orchestrator thread executes line 2119.
To ensure consistency across all threads in the DSO, pto2_framework_bind_runtime(rt) should be called immediately after rt is successfully created (around line 2086), and definitely before signaling runtime_init_ready_.
References
- If a component is accessed by multiple threads (e.g., Orchestrator and Scheduler), it requires protection or guaranteed initialization order to prevent data races, even if current usage patterns do not show overlap.
src/a5/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp (2098-2101)
Similar to the A2A3 implementation, there is a race condition where scheduler threads might proceed after runtime_init_ready_ is signaled but before the runtime is bound to the DSO's global pointer. Please move the pto2_framework_bind_runtime(rt) call to occur immediately after rt is created (around line 2065) to ensure it is visible to all threads before they begin processing tasks.
References
- If a component is accessed by multiple threads (e.g., Orchestrator and Scheduler), it requires protection or guaranteed initialization order to prevent data races, even if current usage patterns do not show overlap.
python/runtime_compiler.py (182-206)
The _subprocess_env_for_cmake method is called multiple times during a single compilation process (once in _run_compilation and then again inside every _run_build_step invocation). Since it performs several file system checks (os.path.isfile, os.access) and potentially multiple shutil.which("ninja") calls, it is inefficient to recreate this environment repeatedly.
Consider caching the result of the environment fix or calling it once at the start of the compilation process and passing the fixed environment down to the build steps.
python/runtime_compiler.py (197-201)
The call to shutil.which("ninja") is inside the _fix_make_program nested function. If both CMAKE_MAKE_PROGRAM and NINJA environment variables are invalid, shutil.which will be executed twice. It would be more efficient to resolve the ninja path once outside the nested function.
resolved_ninja = shutil.which("ninja")
def _fix_make_program(key: str) -> None:
path = env.get(key)
if not path:
return
if os.path.isfile(path) and os.access(path, os.X_OK):
return
if resolved_ninja:
env[key] = resolved_ninja
else:
env.pop(key, None)…AICPU Problem: assert_impl and related code in orchestration/common.cpp use LOG_* macros that dereference pto2_current_runtime()->ops. On the device path, pto2_submit_mixed_task and other runtime code run in libaicpu_kernel.so, but only the orchestration plugin (libdevice_orch_*.so) had g_pto2_current_runtime set via dlsym(pto2_framework_bind_runtime). Each .so carries its own copy of common.cpp, so the AICPU image's g_ stayed nullptr and assertion reporting could crash or misbehave. Fix: In AicpuExecutor, call the link-resolved pto2_framework_bind_runtime(rt) for this DSO before orch_bind_runtime_, and pto2_framework_bind_runtime(nullptr) before pto2_runtime_destroy, mirroring the orchestration SO bind. Same change for a5 tensormap_and_ringbuffer.
9dc991d to
df84945
Compare
Problem: assert_impl and related code in orchestration/common.cpp use LOG_*
macros that dereference pto2_current_runtime()->ops. On the device path,
pto2_submit_mixed_task and other runtime code run in libaicpu_kernel.so, but
only the orchestration plugin (libdevice_orch_*.so) had g_pto2_current_runtime
set via dlsym(pto2_framework_bind_runtime). Each .so carries its own copy of
common.cpp, so the AICPU image's g_ stayed nullptr and assertion reporting
could crash or misbehave.
Fix: In AicpuExecutor, call the link-resolved pto2_framework_bind_runtime(rt)
for this DSO before orch_bind_runtime_, and pto2_framework_bind_runtime(nullptr)
before pto2_runtime_destroy, mirroring the orchestration SO bind. Same change
for a5 tensormap_and_ringbuffer.