Skip to content

Update: close a5 AICPU split question with device-side probe#918

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:probe/a5-aicpu
May 30, 2026
Merged

Update: close a5 AICPU split question with device-side probe#918
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:probe/a5-aicpu

Conversation

@ChaoWao
Copy link
Copy Markdown
Collaborator

@ChaoWao ChaoWao commented May 30, 2026

Summary

  • Ran tools/cann-examples/aicpu-device-query/ (merged in Add: hardware docs and CANN query tools #883) on Ascend950PR_9599 local device 0 and used the results to replace the "calibrated inference" paragraph in src/a5/docs/hardware.md with directly measured device-side ground truth.
  • Key new finding: AICPU + OCCUPY returns a different value host-side (0x1fe) vs. device-side (0x1f8). The 2-bit difference (bits 1, 2) coincides exactly with the SMT pair DSMI CPU_TOPO reports on phy_cpu_id 1. The AICPU OS withholds the SMT pair from the user kernel dispatch pool. So the a5 9 → 6 gap is 1 AICPU OS scheduler (cpu_id 0) + 2 SMT-pair (cpu_id 1, 2) withheld by AICPU OSnot "AICPU-OS + PG fab-disable" by analogy with a3.
  • Docs touched: src/a5/docs/hardware.md (main writeup), tools/cann-examples/aicpu-device-query/README.md (new "On a5" results block + drops the stale "a3 is the only validated arch" note), tools/README.md (one-line update to mention both arches).

Measured queries (a5, device 0)

AICPU + OS_SCHED      rc=0  val=0x1     ← AICPU OS owns cpu_id 0
AICPU + OCCUPY        rc=0  val=0x1f8   ← 6 user cores at cpu_id 3..8
                                          (host-side returns 0x1fe — divergence is a5-specific)
AICPU + PF_OCCUPY     rc=0  val=0x1f8   ← matches OCCUPY → no vNPU slicing
AICPU + PF_CORE_NUM   rc=0  val=0x6     ← PF view = 6, no virtualization
AICPU + CORE_NUM      rc=3              ← restricted device-side on a5 (unlike a3)
CCPU  + OCCUPY        rc=0  val=0x1     ← CCPU owns 1 core in its namespace
DCPU/TSCPU            rc=3              ← module-level restricted (same as a3)

Reconciliation

Slot Owner Evidence
cpu_id 0 AICPU OS scheduler OS_SCHED bit 0 = 1; cleared in host-side OCCUPY by design
cpu_id 1, 2 SMT pair on phy_cpu_id 1, OS-withheld present in host OCCUPY (0x1fe), absent from device OCCUPY (0x1f8) → not PG-disabled. DSMI CPU_TOPO labels exactly this pair as the chip's only SMT pair.
cpu_id 3..8 user-schedulable (6) device-side OCCUPY bits 3..8 set; matches rtGetAiCpuCount=6 and PF_CORE_NUM=6

Test plan

  • task-submit --device auto --device-num 1 --run "query_device_hal $TASK_DEVICE" returned the values above (clean exit, no 0x2a / 507018).
  • Reproduced the same values across two consecutive runs (deterministic).
  • Confirmed host = Ascend950PR_9599 via npu-smi info -t board -i 0 and Short_SoC_version=Ascend950 in Ascend950PR_9599.ini.
  • npx markdownlint-cli2 --config tests/lint/.markdownlint.yaml clean on all touched files.

🤖 Generated with Claude Code

Ran tools/cann-examples/aicpu-device-query/ on Ascend950PR_9599 local
device 0 to close the open "what owns the 3-core gap between 9 logical
CPUs and 6 user-visible cores" question.

Key new finding: AICPU OCCUPY differs between host-side (0x1fe) and
device-side (0x1f8). The 2-bit gap (bits 1, 2) matches DSMI CPU_TOPO's
sole SMT pair on phy_cpu_id 1 — the AICPU OS withholds the hyperthread
pair from the user kernel dispatch pool to avoid intra-pair contention.

So the a5 9 → 6 gap resolves to:
- cpu_id 0 = AICPU OS scheduler (OS_SCHED bit 0)
- cpu_id 1, 2 = SMT pair, AICPU-OS-reserved (not PG-disabled — both
  present in host OCCUPY, unlike the a3 PG slot)
- cpu_id 3..8 = 6 user-schedulable cores

Replaces the "Two-layer AICPU reservation on a5" calibrated-inference
paragraph in src/a5/docs/hardware.md with a "Device-side probe resolves
the AICPU question" section in the same Slot|Owner|Evidence shape used
by the a3 doc. Adds device-side rows to the "Key semantic differences
from a3" table.

Also extends tools/cann-examples/aicpu-device-query/README.md with the
a5 "What it answered" block and drops the now-stale "a3 is the only
arch this has been validated on" note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

Warning

Review limit reached

@ChaoWao, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 18 minutes and 8 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ddc7da73-6b21-4a18-8af1-23732faaf86b

📥 Commits

Reviewing files that changed from the base of the PR and between aa6ce64 and da59f0c.

📒 Files selected for processing (3)
  • src/a5/docs/hardware.md
  • tools/README.md
  • tools/cann-examples/aicpu-device-query/README.md

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the documentation for the a5 hardware architecture and its companion query tool to reflect new findings from a device-side probe. It details how the AICPU 9-to-6 core gap on the Ascend950 is resolved, showing that 1 core is reserved for the AICPU OS scheduler and 2 SMT-paired cores are withheld by the OS, leaving 6 user-schedulable cores. The tool documentation is updated to confirm validation on both a3 and a5 architectures. As there are no review comments, I have no feedback to provide.

@ChaoWao ChaoWao merged commit 69aa249 into hw-native-sys:main May 30, 2026
16 checks passed
@ChaoWao ChaoWao deleted the probe/a5-aicpu branch May 30, 2026 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant