Fix heap OOB write in Attention PrepareMask via negative mask_index values#27789
Fix heap OOB write in Attention PrepareMask via negative mask_index values#27789
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a heap buffer underflow risk in the CPU Attention contrib operator’s PrepareMask path when mask_index contains negative 1D end positions, and adds coverage to ensure the invalid input is rejected.
Changes:
- Clamp 1D
mask_indexstart/end positions in CPUPrepareMaskto prevent negative indexing. - Add CPU-side input validation to reject negative 1D end positions in
AttentionBase::CheckMask. - Add a regression test that expects failure for negative 1D end positions; also clamp CUDA softmax’s
end_positionto be non-negative.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| onnxruntime/test/contrib_ops/attention_op_test.cc | Adds a regression test for negative 1D mask_index end positions. |
| onnxruntime/contrib_ops/cuda/bert/attention_softmax.cu | Ensures CUDA masking end position is clamped to >= 0. |
| onnxruntime/contrib_ops/cpu/bert/attention_helper.h | Clamps 1D mask start/end positions to prevent negative indexing in mask fill loops. |
| onnxruntime/contrib_ops/cpu/bert/attention_base.h | Adds explicit validation rejecting negative 1D end positions in mask_index. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
onnxruntime_provider_test segfaults on CUDA CI (ContribOpAttentionTest.AttentionBatch1) due to a crash in the CUDA Attention kernel for the configuration batch=1, seq=2, hidden=4, heads=2 with a 1D key-sequence-length mask. |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@tianleiwu Thanks! I fixed it. |
tianleiwu
left a comment
There was a problem hiding this comment.
1. Input Validation (onnxruntime/contrib_ops/cpu/bert/attention_base.h)
Positive:
- The validation is placed at the right layer — inside
CheckMask, which is on the validation path for both CPU and CUDA Attention operators (the CUDAAttention::ComputeInternalcallsCheckInputs→CheckMask). - The error message is clear and actionable, including the offending value and index, which aids debugging.
- The validation fires for all three 1D mask types (
MASK_1D_KEY_SEQ_LEN,MASK_1D_END_START,MASK_1D_KEY_SEQ_LEN_START), which is correct since the firstbatch_sizeelements serve as key-sequence-length / end-position in all three formats.
Concern:
⚠️ Incomplete validation forMASK_1D_END_STARTstart positions: For the2 * batch_sizemask format, the PR validates thatend_positionvalues[0, batch_size)are non-negative but does not validatestart_positionvalues at indices[batch_size, 2 * batch_size). Although the runtime clamping inPrepareMaskand the CUDA kernels prevents OOB access from negative start positions, this creates an asymmetry: negativeend_positionproduces a clear error, while negativestart_positionis silently clamped to 0. For consistent user-facing behavior, consider extending the loop to also validate start positions whenmask_type == MASK_1D_END_START:// Also validate start_position values for MASK_1D_END_START if (mask_type == AttentionMaskType::MASK_1D_END_START) { for (int64_t i = 0; i < batch_size; i++) { if (mask_data[batch_size + i] < 0) { return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "mask_index start_position value ", mask_data[batch_size + i], " at index ", batch_size + i, " is negative. mask_index start_position values must be non-negative."); } } }
2. CPU Runtime Clamping (onnxruntime/contrib_ops/cpu/bert/attention_helper.h)
Positive:
- Defense in depth: even though the validation in
CheckMaskshould prevent negative values from reachingPrepareMask, the clampingstd::max(0, std::min(..., all_sequence_length))ensures safety ifPrepareMaskis ever called from a path that bypasses validation. - Both
end_positionandstart_positionare clamped symmetrically with the same[0, all_sequence_length]range, which is correct. - The
static_cast<int>onmask_index[b_i]is explicit about theint32_t→intconversion, avoiding potential type-mismatch issues withstd::min.
Positive:
- The
#include <algorithm>addition is correct. The previous code relied on a transitive include forstd::min(likely via<limits>or platform headers), which is fragile. This fix addresses that latent issue — good catch.
3. CUDA Kernel Clamping (onnxruntime/contrib_ops/cuda/bert/attention_softmax.cu)
Positive:
- Both
MaskedSoftmaxKernelSmallandMaskedSoftmaxKernelare fixed consistently, wrappingend_positionwithmax(0, min(...)). This matches the CPU-side defense-in-depth approach. - The existing
start_positionlogic already hadmax(0, mask_start[batch]), so the fix correctly targets only the missing lower-bound clamp onend_position. - The fix is in
threadIdx.x == 0shared-memory initialization, so it runs once per block with no performance impact.
Concern:
⚠️ No upper-bound clamp onstart_positionin CUDA: The CUDA kernel computesstart_position = mask_start != nullptr ? max(0, mask_start[batch]) : 0— this has a lower-bound clamp but no upper-bound clamp againsttotal_sequence_length. A very largestart_positionvalue exceedingtotal_sequence_lengthwould makestart_position >= end_position, which triggers the fallbackstart_position = 0; end_position = total_sequence_length;. So this is functionally safe due to the existing guard, but bears noting for completeness. No action needed.
4. Unit Test (onnxruntime/test/contrib_ops/attention_op_test.cc)
Positive:
- The test directly targets the security vulnerability: negative
end_positionvalues ({-10, -1}) in a 1D mask_index, verifying the validation produces a failure. - Correct use of
OpTester::ExpectResult::kExpectFailurewith the error string prefix"mask_index value". - The test explicitly pins to the CPU execution provider via
DefaultCpuExecutionProvider(), which is appropriate since the validation fires in the sharedCheckMaskcode before EP-specific execution.
Concern:
⚠️ No test forMASK_1D_END_STARTformat with negative values: The test coversMASK_1D_KEY_SEQ_LEN(mask dims ={batch_size}). Consider adding a test withmask_index_dims = {2 * batch_size}and negative values in the firstbatch_sizeelements to also cover theMASK_1D_END_STARTpath. Negative start positions (second half) should also be tested if validation is extended per the suggestion above.
Summary of Concerns
| # | Severity | Component | Issue |
|---|---|---|---|
| 1 | Suggestion | attention_base.h validation |
No validation for negative start_position values in MASK_1D_END_START format; creates silent-clamp vs. error asymmetry. |
| 2 | Suggestion | attention_op_test.cc |
Test only covers MASK_1D_KEY_SEQ_LEN; no coverage for MASK_1D_END_START mask format with negative values. |
| 3 | Nitpick | attention_softmax.cu |
CUDA start_position lacks upper-bound clamp, but existing start >= end guard makes this safe. |
Verdict
APPROVE — The PR correctly fixes a real heap buffer underflow vulnerability with a well-layered defense strategy (validation + clamping). The suggestions above would improve validation completeness and test coverage, but the core fix is sound and the remaining gaps are mitigated by the runtime clamping already in place.
Description
Fixes a heap out-of-bounds write (underflow) in the
Attentioncontrib operator'sPrepareMaskfunction. Negative values in the 1Dmask_indextensor were used directly as loop start indices without bounds checking, allowing writes at negative offsets before thep_maskbuffer.In
PrepareMask()(attention_helper.h),end_positionis read frommask_index[b_i]and used as the starting index in a write loop with no lower-bound validation. Whenend_positionis negative, the loop writesmask_filter_valueat negative offsets — a heap buffer underflow. In contrast,start_positionhad partial clamping viastd::min()but lacked a lower bound as well.Motivation and Context