Add: cann-examples/aicpu-kernel-launch reference tool + mechanisms doc#923
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR introduces a complete, self-contained end-to-end example for launching a custom AICPU kernel using CANN's production dispatcher bootstrap path. It includes a minimal device-side kernel that calls HAL functions, a host-side launcher that manages the full lifecycle, comprehensive documentation covering pipeline flow and adaptation patterns, and CI integration for smoke testing on hardware runners. ChangesAICPU Kernel Launch Example
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new end-to-end demonstration tool, aicpu-kernel-launch, under tools/cann-examples/, which showcases launching a custom AICPU kernel from a host process using the production dispatcher bootstrap path. The tool includes a device-side kernel and a host-side launcher. Feedback on the host-side launcher highlights several robustness and security improvements: adding validation checks to the inlined ElfBuildId function to prevent out-of-bounds reads, checking for negative values from f.tellg() to avoid std::bad_alloc exceptions, validating that read buffers are not empty, ensuring the device is reset on all exit paths using an RAII guard, and using an RAII cleanup struct to prevent leaking temporary JSON files on early failure paths.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/cann-examples/aicpu-kernel-launch/host/launch_hello.cpp`:
- Around line 357-378: The current code creates a predictable temp file using
json_path_buf / json_path and std::ofstream which is racey and may leak on early
returns; change this to create the descriptor atomically with mkstemp or
mkstemps (or equivalent secure temp API), write the JSON from MakeJsonDescriptor
into the returned fd, then close and pass that secure pathname to
rtsBinaryLoadFromFile; ensure the temporary file is unlinked/removed via RAII
(scope guard or wrapper) so it is always cleaned up even on errors and avoid
using the fixed /tmp/simpler_inner_<fp>_<pid>.json pattern.
- Around line 170-179: DevBuf destructors call aclrtFree() but the DevBuf
instances (dev_dispatcher, dev_inner, dev_args, dev_result) are locals in main
and are destroyed after main returns — which currently happens after
aclrtDestroyStream/aclrtResetDevice/aclFinalize(); move the GM frees to occur
before ACL teardown by either (A) placing those DevBuf instances into an inner
scope that ends before the calls to
aclrtDestroyStream/aclrtResetDevice/aclFinalize so their destructors run
earlier, or (B) add an explicit Release/Free method to DevBuf that calls
aclrtFree(ptr) and invoke that on dev_dispatcher/dev_inner/dev_args/dev_result
just before calling aclrtDestroyStream/aclrtResetDevice/aclFinalize (references:
DevBuf, dev_dispatcher, dev_inner, dev_args, dev_result, aclrtFree,
aclrtDestroyStream, aclrtResetDevice, aclFinalize, main).
- Around line 157-168: In ReadFile, guard against tellg() failure and verify the
full read: check that f.seekg and auto pos = f.tellg() don't return -1 before
casting and resizing (handle zero-length files separately), resize out using the
validated size, call f.read(...) and then verify f.gcount() (or check read state
and bytes read == size) to ensure a full read succeeded; on any failure clear
out (or leave empty), log the error and return false. Ensure you reference and
update the ReadFile function, validating seekg/tellg results, checking f.read
bytes vs. expected size, and returning false on partial/failed reads.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: aa1a189a-9083-4659-99de-f7178a3bf96b
📒 Files selected for processing (8)
.github/workflows/ci.ymldocs/ci.mdtools/README.mdtools/cann-examples/aicpu-kernel-launch/README.mdtools/cann-examples/aicpu-kernel-launch/device/CMakeLists.txttools/cann-examples/aicpu-kernel-launch/device/hello_aicpu.cpptools/cann-examples/aicpu-kernel-launch/host/CMakeLists.txttools/cann-examples/aicpu-kernel-launch/host/launch_hello.cpp
b460a02 to
0b28422
Compare
0b28422 to
51cf926
Compare
Smallest possible end-to-end demonstration of the AICPU kernel launch pipeline used by this repo's runtime — no scene-test plumbing, no ringbuffer / tensormap, no ChipWorker fork. Strips PR hw-native-sys#883's aicpu-device-query down to the bootstrap path itself with a trivial inner kernel so a reader who wants to add new AICPU work can use it as a copy-paste template. Pipeline (Method 2 / "Path A" — see docs/aicpu-kernel-launch-mechanisms.md): 1. host : rtAicpuKernelLaunchExWithArgs(KERNEL_TYPE_AICPU_KFC, libaicpu_extend_kernels.so) hands dispatcher SO + inner SO bytes to the AICPU OS process. 2. device: dispatcher (DynTileFwkBackendKernelServerInit) writes inner SO to /usr/lib64/aicpu_kernels/0/aicpu_kernels_device/ simpler_inner_<fp>_<dev>.so. 3. host : fingerprint inner SO by ELF Build-ID, emit a JSON descriptor pointing at the preinstall basename, register via rtsBinaryLoadFromFile. 4. host : rtsFuncGetByName for init + run handles. 5. host : rewrite DeviceArgs (offsets 96 / 104) to point at the HelloResult + input_token, rtsLaunchCpuKernel(run). 6. device: kernel logs via DlogRecord, calls halGetDeviceInfo, writes HelloResult{ magic, echoed_token, hal_rc, hal_value }. 7. host : D2H + verify magic == 0xDEADBEEFC0FFEE01 + echoed token. Files: - docs/aicpu-kernel-launch-mechanisms.md : NEW canonical doc covering all three known methods of getting a custom AICPU SO onto the device (tar.gz pre-deployment, Path A dispatcher bootstrap, broken Path B KERNEL_TYPE_AICPU_CUSTOM). Records issue hw-native-sys#822's full failure forensics: cust-subprocess L1 stale on AICore HBM writes, the four user-space workarounds that all fail (volatile / ldar / dc civac / dc ivac) with the architectural reason each fails, and the CANN-side fix options (A/B/C/D). Sedimentation of the PR hw-native-sys#537 debugging session so future readers don't re-derive any of it. - tools/cann-examples/aicpu-kernel-launch/device/hello_aicpu.cpp : simpler_aicpu_init no-op + simpler_aicpu_run reads DeviceArgs, calls halGetDeviceInfo(AICPU, CORE_NUM), writes HelloResult to GM. - tools/cann-examples/aicpu-kernel-launch/device/CMakeLists.txt : aarch64 cross, links ascend_hal, emits --build-id for fingerprint stability. - tools/cann-examples/aicpu-kernel-launch/host/launch_hello.cpp : full Mode A bootstrap + JSON descriptor + binary load + launch + verify. Inlined ELF Build-ID reader so the example is standalone (no headers from src/). - tools/cann-examples/aicpu-kernel-launch/host/CMakeLists.txt : links ascendcl + runtime. - tools/cann-examples/aicpu-kernel-launch/README.md : pipeline diagram, I/O contract, build/run, scope+limits; references the mechanisms doc for the full comparison. Cross-links updated: - src/common/aicpu_dispatcher/README.md and - tools/cann-examples/aicpu-device-query/README.md both now point at the new mechanisms doc so anyone reading the dispatcher or the device-query tool can find the full Path A vs Path B vs tar.gz comparison. Integration: - ut-a2a3 + ut-a5 CI jobs build the device SO (cross-compile) + host launcher as a link smoke test, mirroring the aicpu-device-query precedent. No device locked, no resource-spec conflict. - tools/README.md adds an "aicpu-kernel-launch" subsection. - docs/ci.md job table updated. Why bundle as one PR: the reference tool and the mechanisms doc exist to answer the same question ("how do I launch a custom AICPU kernel"). Splitting them would mean the doc points at a tool that doesn't exist yet, or the tool points at a doc that doesn't exist yet. Reviewable as one logical unit.
51cf926 to
1db3210
Compare
Summary
Adds two coupled artefacts that together answer "how do I launch a
custom AICPU kernel in this repo":
docs/aicpu-kernel-launch-mechanisms.md— canonical referencefor the three known methods of getting a custom AICPU SO onto the
device, and the cement that holds the lessons from
#822 /
PR Feat: AICPU launch via dispatcher bootstrap and per-task rtsLaunchCpuKernel #537. Records:
fleet provisioning)
production runtime + the new reference tool)
KERNEL_TYPE_AICPU_CUSTOM— broken in [Bug] Mode B (kernelType=AICPU_CUSTOM): cust_aicpu_sd subprocess cache stale on AICore HBM writes → AICPU handshake deadlock (507018) #822;full failure forensics including the cust-subprocess L1 stale
read on AICore HBM writes, the four user-space workarounds
that all fail (
volatile/__atomic_load_n ACQUIRE → ldar/dc civac/dc ivac) with the architectural reason eachfails, and the CANN-side fix options (A/B/C/D).
tools/cann-examples/aicpu-kernel-launch/— minimal end-to-endtool implementing Method 2. Strips PR Add: hardware docs and CANN query tools #883's
aicpu-device-querydown to the bootstrap path itself with a trivial inner kernel
(writes a magic value, echoes a host-supplied token, runs one
halGetDeviceInfocall). Copy as a template for new AICPU work.Pipeline (Method 2, what the tool implements)
rtAicpuKernelLaunchExWithArgs(KERNEL_TYPE_AICPU_KFC, libaicpu_extend_kernels.so)hands dispatcher SO + inner SO bytes to the AICPU OS process/usr/lib64/aicpu_kernels/0/aicpu_kernels_device/simpler_inner_<fp>_<dev>.sortsBinaryLoadFromFilertsFuncGetByNameforsimpler_aicpu_init_<fp>+simpler_aicpu_run_<fp>rtsLaunchCpuKernel(run)halGetDeviceInfo(AICPU, CORE_NUM), writesHelloResult{ magic, echoed_token, hal_rc, hal_value }0xDEADBEEFC0FFEE01+ echoed tokenCross-links updated
src/common/aicpu_dispatcher/README.mdandtools/cann-examples/aicpu-device-query/README.mdboth now point atthe new mechanisms doc so anyone reading the dispatcher or the
device-query tool can find the full Path A vs Path B vs tar.gz
comparison without searching the issue tracker.
Integration
ut-a2a3+ut-a5CI jobs build the device SO (cross-compile) +host launcher as a link smoke test, mirroring the
aicpu-device-queryprecedent. No device locked, no resource-specconflict.
tools/README.mdadds anaicpu-kernel-launchsubsection.docs/ci.mdjob table updated.Why bundle as one PR
The reference tool and the mechanisms doc exist to answer the same
question. Splitting them would mean the doc points at a tool that
doesn't exist yet, or the tool points at a doc that doesn't exist
yet. Reviewable as one logical unit.
Test plan
ut-a2a3+ut-a5link-smoke builds pass (cross-compile device SO + native host)task-submit): build both halves, runlaunch_hello $TASK_DEVICE, verify output linemagic = 0xdeadbeefc0ffee01 OKandechoed_token ... OK