Fix heap OOB write in Attention PrepareMask via negative mask_index values by vraspar · Pull Request #27789 · microsoft/onnxruntime

vraspar · 2026-03-20T22:39:41Z

Description

Fixes a heap out-of-bounds write (underflow) in the Attention contrib operator's PrepareMask function. Negative values in the 1D mask_index tensor were used directly as loop start indices without bounds checking, allowing writes at negative offsets before the p_mask buffer.

In PrepareMask() (attention_helper.h), end_position is read from mask_index[b_i] and used as the starting index in a write loop with no lower-bound validation. When end_position is negative, the loop writes mask_filter_value at negative offsets — a heap buffer underflow. In contrast, start_position had partial clamping via std::min() but lacked a lower bound as well.

Motivation and Context

Copilot

Pull request overview

This PR addresses a heap buffer underflow risk in the CPU Attention contrib operator’s PrepareMask path when mask_index contains negative 1D end positions, and adds coverage to ensure the invalid input is rejected.

Changes:

Clamp 1D mask_index start/end positions in CPU PrepareMask to prevent negative indexing.
Add CPU-side input validation to reject negative 1D end positions in AttentionBase::CheckMask.
Add a regression test that expects failure for negative 1D end positions; also clamp CUDA softmax’s end_position to be non-negative.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
onnxruntime/test/contrib_ops/attention_op_test.cc	Adds a regression test for negative 1D `mask_index` end positions.
onnxruntime/contrib_ops/cuda/bert/attention_softmax.cu	Ensures CUDA masking end position is clamped to `>= 0`.
onnxruntime/contrib_ops/cpu/bert/attention_helper.h	Clamps 1D mask start/end positions to prevent negative indexing in mask fill loops.
onnxruntime/contrib_ops/cpu/bert/attention_base.h	Adds explicit validation rejecting negative 1D end positions in `mask_index`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tianleiwu · 2026-04-02T07:39:15Z

onnxruntime_provider_test segfaults on CUDA CI (ContribOpAttentionTest.AttentionBatch1) due to a crash in the CUDA Attention kernel for the configuration batch=1, seq=2, hidden=4, heads=2 with a 1D key-sequence-length mask.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

vraspar · 2026-04-03T21:11:19Z

@tianleiwu Thanks! I fixed it.

tianleiwu

1. Input Validation (`onnxruntime/contrib_ops/cpu/bert/attention_base.h`)

Positive:

The validation is placed at the right layer — inside CheckMask, which is on the validation path for both CPU and CUDA Attention operators (the CUDA Attention::ComputeInternal calls CheckInputs → CheckMask).
The error message is clear and actionable, including the offending value and index, which aids debugging.
The validation fires for all three 1D mask types (MASK_1D_KEY_SEQ_LEN, MASK_1D_END_START, MASK_1D_KEY_SEQ_LEN_START), which is correct since the first batch_size elements serve as key-sequence-length / end-position in all three formats.

Concern:

⚠️

Incomplete validation for MASK_1D_END_START start positions: For the 2 * batch_size mask format, the PR validates that end_position values [0, batch_size) are non-negative but does not validate start_position values at indices [batch_size, 2 * batch_size). Although the runtime clamping in PrepareMask and the CUDA kernels prevents OOB access from negative start positions, this creates an asymmetry: negative end_position produces a clear error, while negative start_position is silently clamped to 0. For consistent user-facing behavior, consider extending the loop to also validate start positions when mask_type == MASK_1D_END_START:

// Also validate start_position values for MASK_1D_END_START
if (mask_type == AttentionMaskType::MASK_1D_END_START) {
  for (int64_t i = 0; i < batch_size; i++) {
    if (mask_data[batch_size + i] < 0) {
      return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
                             "mask_index start_position value ", mask_data[batch_size + i],
                             " at index ", batch_size + i,
                             " is negative. mask_index start_position values must be non-negative.");
    }
  }
}

2. CPU Runtime Clamping (`onnxruntime/contrib_ops/cpu/bert/attention_helper.h`)

Positive:

Defense in depth: even though the validation in CheckMask should prevent negative values from reaching PrepareMask, the clamping std::max(0, std::min(..., all_sequence_length)) ensures safety if PrepareMask is ever called from a path that bypasses validation.
Both end_position and start_position are clamped symmetrically with the same [0, all_sequence_length] range, which is correct.
The static_cast<int> on mask_index[b_i] is explicit about the int32_t → int conversion, avoiding potential type-mismatch issues with std::min.

Positive:

The #include <algorithm> addition is correct. The previous code relied on a transitive include for std::min (likely via <limits> or platform headers), which is fragile. This fix addresses that latent issue — good catch.

3. CUDA Kernel Clamping (`onnxruntime/contrib_ops/cuda/bert/attention_softmax.cu`)

Positive:

Both MaskedSoftmaxKernelSmall and MaskedSoftmaxKernel are fixed consistently, wrapping end_position with max(0, min(...)). This matches the CPU-side defense-in-depth approach.
The existing start_position logic already had max(0, mask_start[batch]), so the fix correctly targets only the missing lower-bound clamp on end_position.
The fix is in threadIdx.x == 0 shared-memory initialization, so it runs once per block with no performance impact.

Concern:

⚠️ No upper-bound clamp on start_position in CUDA: The CUDA kernel computes start_position = mask_start != nullptr ? max(0, mask_start[batch]) : 0 — this has a lower-bound clamp but no upper-bound clamp against total_sequence_length. A very large start_position value exceeding total_sequence_length would make start_position >= end_position, which triggers the fallback start_position = 0; end_position = total_sequence_length;. So this is functionally safe due to the existing guard, but bears noting for completeness. No action needed.

4. Unit Test (`onnxruntime/test/contrib_ops/attention_op_test.cc`)

Positive:

The test directly targets the security vulnerability: negative end_position values ({-10, -1}) in a 1D mask_index, verifying the validation produces a failure.
Correct use of OpTester::ExpectResult::kExpectFailure with the error string prefix "mask_index value".
The test explicitly pins to the CPU execution provider via DefaultCpuExecutionProvider(), which is appropriate since the validation fires in the shared CheckMask code before EP-specific execution.

Concern:

⚠️ No test for MASK_1D_END_START format with negative values: The test covers MASK_1D_KEY_SEQ_LEN (mask dims = {batch_size}). Consider adding a test with mask_index_dims = {2 * batch_size} and negative values in the first batch_size elements to also cover the MASK_1D_END_START path. Negative start positions (second half) should also be tested if validation is extended per the suggestion above.

Summary of Concerns

#	Severity	Component	Issue
1	Suggestion	`attention_base.h` validation	No validation for negative `start_position` values in `MASK_1D_END_START` format; creates silent-clamp vs. error asymmetry.
2	Suggestion	`attention_op_test.cc`	Test only covers `MASK_1D_KEY_SEQ_LEN`; no coverage for `MASK_1D_END_START` mask format with negative values.
3	Nitpick	`attention_softmax.cu`	CUDA `start_position` lacks upper-bound clamp, but existing `start >= end` guard makes this safe.

Verdict

APPROVE — The PR correctly fixes a real heap buffer underflow vulnerability with a well-layered defense strategy (validation + clamping). The suggestions above would improve validation completeness and test coverage, but the core fix is sound and the remaining gaps are mitigated by the runtime clamping already in place.

Add validation for non-negative end_position in Attention mask_index

4892e09

vraspar requested a review from Copilot March 20, 2026 22:39

Copilot started reviewing on behalf of vraspar March 20, 2026 22:41 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

Comment thread onnxruntime/contrib_ops/cpu/bert/attention_helper.h

Comment thread onnxruntime/contrib_ops/cpu/bert/attention_base.h

Add missing include for algorithm in attention_helper.h

b446cfc

vraspar requested review from Copilot and tianleiwu March 24, 2026 20:36

Copilot started reviewing on behalf of vraspar March 24, 2026 20:37 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

hariharans29 previously approved these changes Mar 30, 2026

View reviewed changes

vraspar enabled auto-merge (squash) March 30, 2026 21:06

Copilot AI mentioned this pull request Apr 2, 2026

Fix CUDA segfault in ContribOpAttentionTest.AttentionBatch1 #27947

Closed

Guard mask_index validation to CPU tensors only to fix CUDA segfault

39aa32d

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

vraspar dismissed hariharans29’s stale review via 39aa32d April 2, 2026 21:05

tianleiwu approved these changes Apr 3, 2026

View reviewed changes

vraspar merged commit f427e3e into main Apr 6, 2026
108 of 112 checks passed

vraspar deleted the vraspar/attention-op branch April 6, 2026 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix heap OOB write in Attention PrepareMask via negative mask_index values#27789

Fix heap OOB write in Attention PrepareMask via negative mask_index values#27789
vraspar merged 3 commits intomainfrom
vraspar/attention-op

vraspar commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

tianleiwu commented Apr 2, 2026

Uh oh!

vraspar commented Apr 3, 2026

Uh oh!

tianleiwu left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

vraspar commented Mar 20, 2026

Description

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

tianleiwu commented Apr 2, 2026

Uh oh!

vraspar commented Apr 3, 2026

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

1. Input Validation (onnxruntime/contrib_ops/cpu/bert/attention_base.h)

2. CPU Runtime Clamping (onnxruntime/contrib_ops/cpu/bert/attention_helper.h)

3. CUDA Kernel Clamping (onnxruntime/contrib_ops/cuda/bert/attention_softmax.cu)

4. Unit Test (onnxruntime/test/contrib_ops/attention_op_test.cc)

Summary of Concerns

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

1. Input Validation (`onnxruntime/contrib_ops/cpu/bert/attention_base.h`)

2. CPU Runtime Clamping (`onnxruntime/contrib_ops/cpu/bert/attention_helper.h`)

3. CUDA Kernel Clamping (`onnxruntime/contrib_ops/cuda/bert/attention_softmax.cu`)

4. Unit Test (`onnxruntime/test/contrib_ops/attention_op_test.cc`)