Skip to content

Add: cann-examples/aicpu-kernel-launch reference tool + mechanisms doc#923

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
hw-native-sys-bot:feat/aicpu-kernel-launch-example
May 31, 2026
Merged

Add: cann-examples/aicpu-kernel-launch reference tool + mechanisms doc#923
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
hw-native-sys-bot:feat/aicpu-kernel-launch-example

Conversation

@hw-native-sys-bot
Copy link
Copy Markdown
Collaborator

@hw-native-sys-bot hw-native-sys-bot commented May 30, 2026

Summary

Adds two coupled artefacts that together answer "how do I launch a
custom AICPU kernel in this repo":

  1. docs/aicpu-kernel-launch-mechanisms.md — canonical reference
    for the three known methods of getting a custom AICPU SO onto the
    device, and the cement that holds the lessons from
    #822 /
    PR Feat: AICPU launch via dispatcher bootstrap and per-task rtsLaunchCpuKernel #537. Records:
  2. tools/cann-examples/aicpu-kernel-launch/ — minimal end-to-end
    tool implementing Method 2. Strips PR Add: hardware docs and CANN query tools #883's aicpu-device-query
    down to the bootstrap path itself with a trivial inner kernel
    (writes a magic value, echoes a host-supplied token, runs one
    halGetDeviceInfo call). Copy as a template for new AICPU work.

Pipeline (Method 2, what the tool implements)

Step Side What
1 host rtAicpuKernelLaunchExWithArgs(KERNEL_TYPE_AICPU_KFC, libaicpu_extend_kernels.so) hands dispatcher SO + inner SO bytes to the AICPU OS process
2 device dispatcher writes inner SO to /usr/lib64/aicpu_kernels/0/aicpu_kernels_device/simpler_inner_<fp>_<dev>.so
3 host fingerprint inner SO by ELF Build-ID, emit JSON descriptor, rtsBinaryLoadFromFile
4 host rtsFuncGetByName for simpler_aicpu_init_<fp> + simpler_aicpu_run_<fp>
5 host rewrite DeviceArgs (offsets 96 / 104) to point at HelloResult + input_token, rtsLaunchCpuKernel(run)
6 device kernel logs via DlogRecord, halGetDeviceInfo(AICPU, CORE_NUM), writes HelloResult{ magic, echoed_token, hal_rc, hal_value }
7 host D2H + verify magic == 0xDEADBEEFC0FFEE01 + echoed token

Cross-links updated

src/common/aicpu_dispatcher/README.md and
tools/cann-examples/aicpu-device-query/README.md both now point at
the new mechanisms doc so anyone reading the dispatcher or the
device-query tool can find the full Path A vs Path B vs tar.gz
comparison without searching the issue tracker.

Integration

  • ut-a2a3 + ut-a5 CI jobs build the device SO (cross-compile) +
    host launcher as a link smoke test, mirroring the
    aicpu-device-query precedent. No device locked, no resource-spec
    conflict.
  • tools/README.md adds an aicpu-kernel-launch subsection.
  • docs/ci.md job table updated.

Why bundle as one PR

The reference tool and the mechanisms doc exist to answer the same
question. Splitting them would mean the doc points at a tool that
doesn't exist yet, or the tool points at a doc that doesn't exist
yet. Reviewable as one logical unit.

Test plan

  • CI: ut-a2a3 + ut-a5 link-smoke builds pass (cross-compile device SO + native host)
  • Local (a2a3 or a5 hardware, via task-submit): build both halves, run launch_hello $TASK_DEVICE, verify output line magic = 0xdeadbeefc0ffee01 OK and echoed_token ... OK

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 539c74b0-4838-4382-8826-0239592d253a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces a complete, self-contained end-to-end example for launching a custom AICPU kernel using CANN's production dispatcher bootstrap path. It includes a minimal device-side kernel that calls HAL functions, a host-side launcher that manages the full lifecycle, comprehensive documentation covering pipeline flow and adaptation patterns, and CI integration for smoke testing on hardware runners.

Changes

AICPU Kernel Launch Example

Layer / File(s) Summary
Documentation and CI integration
tools/README.md, tools/cann-examples/aicpu-kernel-launch/README.md, docs/ci.md, .github/workflows/ci.yml
README files document the example's purpose, pipeline, I/O contract (DeviceArgs/HelloResult struct layout), build/run instructions, adaptation guide for new device work, and scope/limits. CI workflow and job documentation are updated to build and test both device and host portions on ut-a2a3 and ut-a5 hardware runners.
Device-side kernel implementation
tools/cann-examples/aicpu-kernel-launch/device/CMakeLists.txt, tools/cann-examples/aicpu-kernel-launch/device/hello_aicpu.cpp
Cross-compilation CMake config builds the hello_aicpu shared library with ascend_hal linking and --build-id flags. Device kernel exports simpler_aicpu_init (no-op) and simpler_aicpu_run, which unpacks kernel arguments, validates buffers, issues halGetDeviceInfo call, and writes packed HelloResult (magic, echoed token, HAL rc/value) to GM memory with device-side logging.
Host-side launcher implementation
tools/cann-examples/aicpu-kernel-launch/host/CMakeLists.txt, tools/cann-examples/aicpu-kernel-launch/host/launch_hello.cpp
Native CMake config builds launch_hello executable linking ascendcl and runtime. Host launcher loads dispatcher and kernel SOs, initializes ACL runtime, allocates device memory, bootstraps AICPU via rtAicpuKernelLaunchExWithArgs, fingerprints the kernel (ELF Build-ID or FNV1a fallback), registers via JSON descriptor and rtsBinaryLoadFromFile, resolves init/run symbols, launches with rtsLaunchCpuKernel, verifies magic/token roundtrip from D2H copied HelloResult, prints HAL diagnostics, and exits with status code indicating verification success.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A kernel wakes on distant sand,
Where host and device join hand in hand—
From dispatcher's nest to HAL's warm call,
This humble example explains it all! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately summarizes the main change: adding a reference tool and documentation for the AICPU kernel launch mechanisms.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the tool, pipeline, integration, and test plan.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new end-to-end demonstration tool, aicpu-kernel-launch, under tools/cann-examples/, which showcases launching a custom AICPU kernel from a host process using the production dispatcher bootstrap path. The tool includes a device-side kernel and a host-side launcher. Feedback on the host-side launcher highlights several robustness and security improvements: adding validation checks to the inlined ElfBuildId function to prevent out-of-bounds reads, checking for negative values from f.tellg() to avoid std::bad_alloc exceptions, validating that read buffers are not empty, ensuring the device is reset on all exit paths using an RAII guard, and using an RAII cleanup struct to prevent leaking temporary JSON files on early failure paths.

Comment thread tools/cann-examples/aicpu-kernel-launch/host/launch_hello.cpp
Comment thread tools/cann-examples/aicpu-kernel-launch/host/launch_hello.cpp Outdated
Comment thread tools/cann-examples/aicpu-kernel-launch/host/launch_hello.cpp
Comment thread tools/cann-examples/aicpu-kernel-launch/host/launch_hello.cpp Outdated
Comment thread tools/cann-examples/aicpu-kernel-launch/host/launch_hello.cpp Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/cann-examples/aicpu-kernel-launch/host/launch_hello.cpp`:
- Around line 357-378: The current code creates a predictable temp file using
json_path_buf / json_path and std::ofstream which is racey and may leak on early
returns; change this to create the descriptor atomically with mkstemp or
mkstemps (or equivalent secure temp API), write the JSON from MakeJsonDescriptor
into the returned fd, then close and pass that secure pathname to
rtsBinaryLoadFromFile; ensure the temporary file is unlinked/removed via RAII
(scope guard or wrapper) so it is always cleaned up even on errors and avoid
using the fixed /tmp/simpler_inner_<fp>_<pid>.json pattern.
- Around line 170-179: DevBuf destructors call aclrtFree() but the DevBuf
instances (dev_dispatcher, dev_inner, dev_args, dev_result) are locals in main
and are destroyed after main returns — which currently happens after
aclrtDestroyStream/aclrtResetDevice/aclFinalize(); move the GM frees to occur
before ACL teardown by either (A) placing those DevBuf instances into an inner
scope that ends before the calls to
aclrtDestroyStream/aclrtResetDevice/aclFinalize so their destructors run
earlier, or (B) add an explicit Release/Free method to DevBuf that calls
aclrtFree(ptr) and invoke that on dev_dispatcher/dev_inner/dev_args/dev_result
just before calling aclrtDestroyStream/aclrtResetDevice/aclFinalize (references:
DevBuf, dev_dispatcher, dev_inner, dev_args, dev_result, aclrtFree,
aclrtDestroyStream, aclrtResetDevice, aclFinalize, main).
- Around line 157-168: In ReadFile, guard against tellg() failure and verify the
full read: check that f.seekg and auto pos = f.tellg() don't return -1 before
casting and resizing (handle zero-length files separately), resize out using the
validated size, call f.read(...) and then verify f.gcount() (or check read state
and bytes read == size) to ensure a full read succeeded; on any failure clear
out (or leave empty), log the error and return false. Ensure you reference and
update the ReadFile function, validating seekg/tellg results, checking f.read
bytes vs. expected size, and returning false on partial/failed reads.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aa1a189a-9083-4659-99de-f7178a3bf96b

📥 Commits

Reviewing files that changed from the base of the PR and between d9cdf01 and b460a02.

📒 Files selected for processing (8)
  • .github/workflows/ci.yml
  • docs/ci.md
  • tools/README.md
  • tools/cann-examples/aicpu-kernel-launch/README.md
  • tools/cann-examples/aicpu-kernel-launch/device/CMakeLists.txt
  • tools/cann-examples/aicpu-kernel-launch/device/hello_aicpu.cpp
  • tools/cann-examples/aicpu-kernel-launch/host/CMakeLists.txt
  • tools/cann-examples/aicpu-kernel-launch/host/launch_hello.cpp

Comment thread tools/cann-examples/aicpu-kernel-launch/host/launch_hello.cpp
Comment thread tools/cann-examples/aicpu-kernel-launch/host/launch_hello.cpp
Comment thread tools/cann-examples/aicpu-kernel-launch/host/launch_hello.cpp Outdated
@ChaoWao ChaoWao force-pushed the feat/aicpu-kernel-launch-example branch from b460a02 to 0b28422 Compare May 30, 2026 12:24
@hw-native-sys-bot hw-native-sys-bot changed the title Add: cann-examples/aicpu-kernel-launch reference tool Add: cann-examples/aicpu-kernel-launch reference tool + mechanisms doc May 30, 2026
@ChaoWao ChaoWao force-pushed the feat/aicpu-kernel-launch-example branch from 0b28422 to 51cf926 Compare May 30, 2026 12:33
Smallest possible end-to-end demonstration of the AICPU kernel launch
pipeline used by this repo's runtime — no scene-test plumbing, no
ringbuffer / tensormap, no ChipWorker fork. Strips PR hw-native-sys#883's
aicpu-device-query down to the bootstrap path itself with a trivial
inner kernel so a reader who wants to add new AICPU work can use it as
a copy-paste template.

Pipeline (Method 2 / "Path A" — see docs/aicpu-kernel-launch-mechanisms.md):
  1. host  : rtAicpuKernelLaunchExWithArgs(KERNEL_TYPE_AICPU_KFC,
             libaicpu_extend_kernels.so) hands dispatcher SO + inner SO
             bytes to the AICPU OS process.
  2. device: dispatcher (DynTileFwkBackendKernelServerInit) writes
             inner SO to /usr/lib64/aicpu_kernels/0/aicpu_kernels_device/
             simpler_inner_<fp>_<dev>.so.
  3. host  : fingerprint inner SO by ELF Build-ID, emit a JSON descriptor
             pointing at the preinstall basename, register via
             rtsBinaryLoadFromFile.
  4. host  : rtsFuncGetByName for init + run handles.
  5. host  : rewrite DeviceArgs (offsets 96 / 104) to point at the
             HelloResult + input_token, rtsLaunchCpuKernel(run).
  6. device: kernel logs via DlogRecord, calls halGetDeviceInfo, writes
             HelloResult{ magic, echoed_token, hal_rc, hal_value }.
  7. host  : D2H + verify magic == 0xDEADBEEFC0FFEE01 + echoed token.

Files:
- docs/aicpu-kernel-launch-mechanisms.md : NEW canonical doc covering
  all three known methods of getting a custom AICPU SO onto the device
  (tar.gz pre-deployment, Path A dispatcher bootstrap, broken Path B
  KERNEL_TYPE_AICPU_CUSTOM). Records issue hw-native-sys#822's full failure
  forensics: cust-subprocess L1 stale on AICore HBM writes, the four
  user-space workarounds that all fail (volatile / ldar / dc civac /
  dc ivac) with the architectural reason each fails, and the CANN-side
  fix options (A/B/C/D). Sedimentation of the PR hw-native-sys#537 debugging
  session so future readers don't re-derive any of it.
- tools/cann-examples/aicpu-kernel-launch/device/hello_aicpu.cpp :
  simpler_aicpu_init no-op + simpler_aicpu_run reads DeviceArgs, calls
  halGetDeviceInfo(AICPU, CORE_NUM), writes HelloResult to GM.
- tools/cann-examples/aicpu-kernel-launch/device/CMakeLists.txt :
  aarch64 cross, links ascend_hal, emits --build-id for fingerprint
  stability.
- tools/cann-examples/aicpu-kernel-launch/host/launch_hello.cpp :
  full Mode A bootstrap + JSON descriptor + binary load + launch +
  verify. Inlined ELF Build-ID reader so the example is standalone
  (no headers from src/).
- tools/cann-examples/aicpu-kernel-launch/host/CMakeLists.txt :
  links ascendcl + runtime.
- tools/cann-examples/aicpu-kernel-launch/README.md : pipeline
  diagram, I/O contract, build/run, scope+limits; references the
  mechanisms doc for the full comparison.

Cross-links updated:
- src/common/aicpu_dispatcher/README.md and
- tools/cann-examples/aicpu-device-query/README.md
  both now point at the new mechanisms doc so anyone reading the
  dispatcher or the device-query tool can find the full Path A vs
  Path B vs tar.gz comparison.

Integration:
- ut-a2a3 + ut-a5 CI jobs build the device SO (cross-compile) + host
  launcher as a link smoke test, mirroring the aicpu-device-query
  precedent. No device locked, no resource-spec conflict.
- tools/README.md adds an "aicpu-kernel-launch" subsection.
- docs/ci.md job table updated.

Why bundle as one PR: the reference tool and the mechanisms doc
exist to answer the same question ("how do I launch a custom AICPU
kernel"). Splitting them would mean the doc points at a tool that
doesn't exist yet, or the tool points at a doc that doesn't exist
yet. Reviewable as one logical unit.
@ChaoWao ChaoWao force-pushed the feat/aicpu-kernel-launch-example branch from 51cf926 to 1db3210 Compare May 31, 2026 01:27
@ChaoWao ChaoWao merged commit 55c3d8b into hw-native-sys:main May 31, 2026
16 checks passed
@ChaoWao ChaoWao deleted the feat/aicpu-kernel-launch-example branch May 31, 2026 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants