Update: close a5 AICPU split question with device-side probe#918
Conversation
Ran tools/cann-examples/aicpu-device-query/ on Ascend950PR_9599 local device 0 to close the open "what owns the 3-core gap between 9 logical CPUs and 6 user-visible cores" question. Key new finding: AICPU OCCUPY differs between host-side (0x1fe) and device-side (0x1f8). The 2-bit gap (bits 1, 2) matches DSMI CPU_TOPO's sole SMT pair on phy_cpu_id 1 — the AICPU OS withholds the hyperthread pair from the user kernel dispatch pool to avoid intra-pair contention. So the a5 9 → 6 gap resolves to: - cpu_id 0 = AICPU OS scheduler (OS_SCHED bit 0) - cpu_id 1, 2 = SMT pair, AICPU-OS-reserved (not PG-disabled — both present in host OCCUPY, unlike the a3 PG slot) - cpu_id 3..8 = 6 user-schedulable cores Replaces the "Two-layer AICPU reservation on a5" calibrated-inference paragraph in src/a5/docs/hardware.md with a "Device-side probe resolves the AICPU question" section in the same Slot|Owner|Evidence shape used by the a3 doc. Adds device-side rows to the "Key semantic differences from a3" table. Also extends tools/cann-examples/aicpu-device-query/README.md with the a5 "What it answered" block and drops the now-stale "a3 is the only arch this has been validated on" note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 18 minutes and 8 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the documentation for the a5 hardware architecture and its companion query tool to reflect new findings from a device-side probe. It details how the AICPU 9-to-6 core gap on the Ascend950 is resolved, showing that 1 core is reserved for the AICPU OS scheduler and 2 SMT-paired cores are withheld by the OS, leaving 6 user-schedulable cores. The tool documentation is updated to confirm validation on both a3 and a5 architectures. As there are no review comments, I have no feedback to provide.
Summary
tools/cann-examples/aicpu-device-query/(merged in Add: hardware docs and CANN query tools #883) onAscend950PR_9599local device 0 and used the results to replace the "calibrated inference" paragraph insrc/a5/docs/hardware.mdwith directly measured device-side ground truth.AICPU + OCCUPYreturns a different value host-side (0x1fe) vs. device-side (0x1f8). The 2-bit difference (bits 1, 2) coincides exactly with the SMT pair DSMI CPU_TOPO reports on phy_cpu_id 1. The AICPU OS withholds the SMT pair from the user kernel dispatch pool. So the a5 9 → 6 gap is 1 AICPU OS scheduler (cpu_id 0) + 2 SMT-pair (cpu_id 1, 2) withheld by AICPU OS — not "AICPU-OS + PG fab-disable" by analogy with a3.src/a5/docs/hardware.md(main writeup),tools/cann-examples/aicpu-device-query/README.md(new "On a5" results block + drops the stale "a3 is the only validated arch" note),tools/README.md(one-line update to mention both arches).Measured queries (a5, device 0)
Reconciliation
0x1fe), absent from device OCCUPY (0x1f8) → not PG-disabled. DSMI CPU_TOPO labels exactly this pair as the chip's only SMT pair.rtGetAiCpuCount=6andPF_CORE_NUM=6Test plan
task-submit --device auto --device-num 1 --run "query_device_hal $TASK_DEVICE"returned the values above (clean exit, no0x2a/ 507018).Ascend950PR_9599vianpu-smi info -t board -i 0andShort_SoC_version=Ascend950inAscend950PR_9599.ini.npx markdownlint-cli2 --config tests/lint/.markdownlint.yamlclean on all touched files.🤖 Generated with Claude Code