High performance Paged Attention example by MirkoDeVita98 · Pull Request #899 · hw-native-sys/simpler

MirkoDeVita98 · 2026-05-29T14:35:02Z

No description provided.

coderabbitai · 2026-05-29T14:35:21Z

Warning

Review limit reached

@MirkoDeVita98, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 56 minutes and 38 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7b1bd167-c740-4710-8872-7301c29e27a4

📥 Commits

Reviewing files that changed from the base of the PR and between 22538de and 621782f.

📒 Files selected for processing (13)

tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/.gitignore
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/README.md
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/aic/paged_attention_highperf.cpp
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/bench_pa_performance.py
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/compile.sh
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/kernel/pa_entry.cce
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/kernel/pa_kernel.cce
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/orchestration/paged_attention_highperf_orch.cpp
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/pa_accuracy.py
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/pa_tiling.py
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/paged_attention_wrapper.cpp
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/tiling/pa_tiling_struct.h
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/test_spmd_paged_attention_highperf.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a high-performance SPMD paged attention implementation, including C++ kernels, tiling logic, benchmarks, and correctness tests, alongside compiler updates to dynamically include CANN directories. The review feedback recommends adding defensive input validation in the tiling and kernel code to prevent division-by-zero and null pointer dereferences, checking environment variables in the compilation script, and adding future annotations in the test script.

gemini-code-assist · 2026-05-29T14:36:53Z

+    Returns:
+        (tiling_tensor, effective_block_dim)
+    """
+    kv_real = kv_heads if kv_heads > 0 else num_heads


Always perform defensive validation on input parameters before deriving other dependent variables from them to prevent potential division-by-zero or out-of-bounds errors (e.g., if batch, block_dim, num_heads, or block_size are zero or negative).

Suggested change

kv_real = kv_heads if kv_heads > 0 else num_heads

if batch <= 0 or block_dim <= 0 or num_heads <= 0 or block_size <= 0 or head_dim <= 0 or head_dim_v <= 0:

raise ValueError("Input dimensions (batch, block_dim, num_heads, block_size, head_dim, head_dim_v) must be strictly positive.")

kv_real = kv_heads if kv_heads > 0 else num_heads

References

Always perform defensive validation and normalization/fixups on input parameters before deriving other dependent variables from them, even if the invalid input is theoretically unreachable in practice.

gemini-code-assist · 2026-05-29T14:36:53Z

+    const int batch = static_cast<int>(query_t->shapes[0]);
+    const int num_heads = static_cast<int>(query_t->shapes[1]);
+    const int head_dim = static_cast<int>(query_t->shapes[2]);
+    const int block_size = static_cast<int>(key_t->shapes[1]);
+    const int num_kv_heads = static_cast<int>(key_t->shapes[2]);
+    const int blocks_per_batch = static_cast<int>(key_t->shapes[0]) / batch;
+    const int max_blocks_per_query = static_cast<int>(block_table_t->shapes[1]);
+    const int heads_per_kv = num_heads / num_kv_heads;
+    const int seq_len = blocks_per_batch * block_size;
+    const float scale = 1.0f / std::sqrt(static_cast<float>(head_dim));


Defensively validate input dimensions to prevent potential division-by-zero or undefined behavior in CPU simulation if batch, num_kv_heads, head_dim, or block_size are zero or negative.

const int batch = static_cast<int>(query_t->shapes[0]); const int num_heads = static_cast<int>(query_t->shapes[1]); const int head_dim = static_cast<int>(query_t->shapes[2]); const int block_size = static_cast<int>(key_t->shapes[1]); const int num_kv_heads = static_cast<int>(key_t->shapes[2]); if (batch <= 0 || num_kv_heads <= 0 || head_dim <= 0 || block_size <= 0) { return; } const int blocks_per_batch = static_cast<int>(key_t->shapes[0]) / batch; const int max_blocks_per_query = static_cast<int>(block_table_t->shapes[1]); const int heads_per_kv = num_heads / num_kv_heads; const int seq_len = blocks_per_batch * block_size; const float scale = 1.0f / std::sqrt(static_cast<float>(head_dim));

References

Always perform defensive validation and normalization/fixups on input parameters before deriving other dependent variables from them, even if the invalid input is theoretically unreachable in practice.

gemini-code-assist · 2026-05-29T14:36:53Z

+static __aicore__ __attribute__((always_inline)) __gm__ uint8_t *tensor_data(__gm__ int64_t *args, int idx) {
+    __gm__ Tensor *tensor = reinterpret_cast<__gm__ Tensor *>(args[idx]);
+    return reinterpret_cast<__gm__ uint8_t *>(tensor->buffer.addr);
+}


Defensively check if tensor is null before dereferencing tensor->buffer.addr to prevent potential null pointer dereferences.

static __aicore__ __attribute__((always_inline)) __gm__ uint8_t *tensor_data(__gm__ int64_t *args, int idx) { __gm__ Tensor *tensor = reinterpret_cast<__gm__ Tensor *>(args[idx]); if (tensor == nullptr) { return nullptr; } return reinterpret_cast<__gm__ uint8_t *>(tensor->buffer.addr); }

gemini-code-assist · 2026-05-29T14:36:53Z

+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+


Defensively check if ASCEND_TOOLKIT_HOME is set before executing bisheng to provide a clear and actionable error message.

Suggested change

#!/usr/bin/env bash

set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

#!/usr/bin/env bash

set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

if [ -z "${ASCEND_TOOLKIT_HOME:-}" ]; then

echo "Error: ASCEND_TOOLKIT_HOME environment variable is not set." >&2

exit 1

fi

gemini-code-assist · 2026-05-29T14:36:53Z

+#!/usr/bin/env python3
+"""High-performance SPMD paged attention using the scene-test calling interface."""
+


Include from __future__ import annotations at the top of the file to enable PEP 585 generic collections (like tuple[...]) and avoid runtime errors on Python versions earlier than 3.10.

Suggested change

#!/usr/bin/env python3

"""High-performance SPMD paged attention using the scene-test calling interface."""

#!/usr/bin/env python3

"""High-performance SPMD paged attention using the scene-test calling interface."""

from __future__ import annotations

References

In projects targeting Python versions earlier than 3.10 (such as Python 3.9), include 'from future import annotations' to enable the use of PEP 604 union type hints (e.g., 'int | None') and avoid runtime errors.

MirkoDeVita98 force-pushed the pr-655-work branch from a305ea5 to 4f2365b Compare May 29, 2026 14:36

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

working a2a3sim for paged attention with new calling itnerface

621782f

MirkoDeVita98 force-pushed the pr-655-work branch from 4f2365b to 621782f Compare May 29, 2026 14:38

MirkoDeVita98 mentioned this pull request May 29, 2026

[Bug] A2A3 spmd_paged_attention_highperf hardware run times out or produces partial zero output while a2a3sim passes #900

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High performance Paged Attention example#899

High performance Paged Attention example#899
MirkoDeVita98 wants to merge 1 commit into
hw-native-sys:mainfrom
MirkoDeVita98:pr-655-work

MirkoDeVita98 commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Review limit reached

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		#!/usr/bin/env python3
		"""High-performance SPMD paged attention using the scene-test calling interface."""

Conversation

MirkoDeVita98 commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 29, 2026 •

edited

Loading