Skip to content

Fix heap OOB write in Attention PrepareMask via negative mask_index values#27789

Merged
vraspar merged 3 commits intomainfrom
vraspar/attention-op
Apr 6, 2026
Merged

Fix heap OOB write in Attention PrepareMask via negative mask_index values#27789
vraspar merged 3 commits intomainfrom
vraspar/attention-op

Conversation

@vraspar
Copy link
Copy Markdown
Contributor

@vraspar vraspar commented Mar 20, 2026

Description

Fixes a heap out-of-bounds write (underflow) in the Attention contrib operator's PrepareMask function. Negative values in the 1D mask_index tensor were used directly as loop start indices without bounds checking, allowing writes at negative offsets before the p_mask buffer.

In PrepareMask() (attention_helper.h), end_position is read from mask_index[b_i] and used as the starting index in a write loop with no lower-bound validation. When end_position is negative, the loop writes mask_filter_value at negative offsets — a heap buffer underflow. In contrast, start_position had partial clamping via std::min() but lacked a lower bound as well.

Motivation and Context

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a heap buffer underflow risk in the CPU Attention contrib operator’s PrepareMask path when mask_index contains negative 1D end positions, and adds coverage to ensure the invalid input is rejected.

Changes:

  • Clamp 1D mask_index start/end positions in CPU PrepareMask to prevent negative indexing.
  • Add CPU-side input validation to reject negative 1D end positions in AttentionBase::CheckMask.
  • Add a regression test that expects failure for negative 1D end positions; also clamp CUDA softmax’s end_position to be non-negative.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
onnxruntime/test/contrib_ops/attention_op_test.cc Adds a regression test for negative 1D mask_index end positions.
onnxruntime/contrib_ops/cuda/bert/attention_softmax.cu Ensures CUDA masking end position is clamped to >= 0.
onnxruntime/contrib_ops/cpu/bert/attention_helper.h Clamps 1D mask start/end positions to prevent negative indexing in mask fill loops.
onnxruntime/contrib_ops/cpu/bert/attention_base.h Adds explicit validation rejecting negative 1D end positions in mask_index.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/contrib_ops/cpu/bert/attention_helper.h
Comment thread onnxruntime/contrib_ops/cpu/bert/attention_base.h
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hariharans29
hariharans29 previously approved these changes Mar 30, 2026
@tianleiwu
Copy link
Copy Markdown
Contributor

onnxruntime_provider_test segfaults on CUDA CI (ContribOpAttentionTest.AttentionBatch1) due to a crash in the CUDA Attention kernel for the configuration batch=1, seq=2, hidden=4, heads=2 with a 1D key-sequence-length mask.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@vraspar
Copy link
Copy Markdown
Contributor Author

vraspar commented Apr 3, 2026

@tianleiwu Thanks! I fixed it.

Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. Input Validation (onnxruntime/contrib_ops/cpu/bert/attention_base.h)

Positive:

  • The validation is placed at the right layer — inside CheckMask, which is on the validation path for both CPU and CUDA Attention operators (the CUDA Attention::ComputeInternal calls CheckInputsCheckMask).
  • The error message is clear and actionable, including the offending value and index, which aids debugging.
  • The validation fires for all three 1D mask types (MASK_1D_KEY_SEQ_LEN, MASK_1D_END_START, MASK_1D_KEY_SEQ_LEN_START), which is correct since the first batch_size elements serve as key-sequence-length / end-position in all three formats.

Concern:

  • ⚠️ Incomplete validation for MASK_1D_END_START start positions: For the 2 * batch_size mask format, the PR validates that end_position values [0, batch_size) are non-negative but does not validate start_position values at indices [batch_size, 2 * batch_size). Although the runtime clamping in PrepareMask and the CUDA kernels prevents OOB access from negative start positions, this creates an asymmetry: negative end_position produces a clear error, while negative start_position is silently clamped to 0. For consistent user-facing behavior, consider extending the loop to also validate start positions when mask_type == MASK_1D_END_START:
    // Also validate start_position values for MASK_1D_END_START
    if (mask_type == AttentionMaskType::MASK_1D_END_START) {
      for (int64_t i = 0; i < batch_size; i++) {
        if (mask_data[batch_size + i] < 0) {
          return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
                                 "mask_index start_position value ", mask_data[batch_size + i],
                                 " at index ", batch_size + i,
                                 " is negative. mask_index start_position values must be non-negative.");
        }
      }
    }

2. CPU Runtime Clamping (onnxruntime/contrib_ops/cpu/bert/attention_helper.h)

Positive:

  • Defense in depth: even though the validation in CheckMask should prevent negative values from reaching PrepareMask, the clamping std::max(0, std::min(..., all_sequence_length)) ensures safety if PrepareMask is ever called from a path that bypasses validation.
  • Both end_position and start_position are clamped symmetrically with the same [0, all_sequence_length] range, which is correct.
  • The static_cast<int> on mask_index[b_i] is explicit about the int32_tint conversion, avoiding potential type-mismatch issues with std::min.

Positive:

  • The #include <algorithm> addition is correct. The previous code relied on a transitive include for std::min (likely via <limits> or platform headers), which is fragile. This fix addresses that latent issue — good catch.

3. CUDA Kernel Clamping (onnxruntime/contrib_ops/cuda/bert/attention_softmax.cu)

Positive:

  • Both MaskedSoftmaxKernelSmall and MaskedSoftmaxKernel are fixed consistently, wrapping end_position with max(0, min(...)). This matches the CPU-side defense-in-depth approach.
  • The existing start_position logic already had max(0, mask_start[batch]), so the fix correctly targets only the missing lower-bound clamp on end_position.
  • The fix is in threadIdx.x == 0 shared-memory initialization, so it runs once per block with no performance impact.

Concern:

  • ⚠️ No upper-bound clamp on start_position in CUDA: The CUDA kernel computes start_position = mask_start != nullptr ? max(0, mask_start[batch]) : 0 — this has a lower-bound clamp but no upper-bound clamp against total_sequence_length. A very large start_position value exceeding total_sequence_length would make start_position >= end_position, which triggers the fallback start_position = 0; end_position = total_sequence_length;. So this is functionally safe due to the existing guard, but bears noting for completeness. No action needed.

4. Unit Test (onnxruntime/test/contrib_ops/attention_op_test.cc)

Positive:

  • The test directly targets the security vulnerability: negative end_position values ({-10, -1}) in a 1D mask_index, verifying the validation produces a failure.
  • Correct use of OpTester::ExpectResult::kExpectFailure with the error string prefix "mask_index value".
  • The test explicitly pins to the CPU execution provider via DefaultCpuExecutionProvider(), which is appropriate since the validation fires in the shared CheckMask code before EP-specific execution.

Concern:

  • ⚠️ No test for MASK_1D_END_START format with negative values: The test covers MASK_1D_KEY_SEQ_LEN (mask dims = {batch_size}). Consider adding a test with mask_index_dims = {2 * batch_size} and negative values in the first batch_size elements to also cover the MASK_1D_END_START path. Negative start positions (second half) should also be tested if validation is extended per the suggestion above.

Summary of Concerns

# Severity Component Issue
1 Suggestion attention_base.h validation No validation for negative start_position values in MASK_1D_END_START format; creates silent-clamp vs. error asymmetry.
2 Suggestion attention_op_test.cc Test only covers MASK_1D_KEY_SEQ_LEN; no coverage for MASK_1D_END_START mask format with negative values.
3 Nitpick attention_softmax.cu CUDA start_position lacks upper-bound clamp, but existing start >= end guard makes this safe.

Verdict

APPROVE — The PR correctly fixes a real heap buffer underflow vulnerability with a well-layered defense strategy (validation + clamping). The suggestions above would improve validation completeness and test coverage, but the core fix is sound and the remaining gaps are mitigated by the runtime clamping already in place.

@vraspar vraspar merged commit f427e3e into main Apr 6, 2026
108 of 112 checks passed
@vraspar vraspar deleted the vraspar/attention-op branch April 6, 2026 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants