Skip to content

fix(ptobc): support dense executed constants in v0#717

Draft
HecreReed wants to merge 1 commit into
hw-native-sys:mainfrom
HecreReed:codex/issue18-ptobc-dense-const-v0
Draft

fix(ptobc): support dense executed constants in v0#717
HecreReed wants to merge 1 commit into
hw-native-sys:mainfrom
HecreReed:codex/issue18-ptobc-dense-const-v0

Conversation

@HecreReed
Copy link
Copy Markdown
Collaborator

Summary

  • support DenseElementsAttr constants in compact v0 ptobc encoding
  • decode dense constant payloads back into MLIR for roundtrip coverage
  • add a regression test for tmrgsort executed constants with vector<4xi16>

Validation

  • cmake --build /Users/laoda/pto/PTOAS/_pr_issue18_fix/build-ptobc --target ptobc -j4
  • ctest --test-dir /Users/laoda/pto/PTOAS/_pr_issue18_fix/build-ptobc --output-on-failure -R 'ptobc_mrgsort_dense_const_v0_encode|ptobc_tstore_fp_v0_encode|ptobc_tdequant_v0_encode'
  • /Users/laoda/pto/PTOAS/_pr_issue18_fix/build-ptobc/tools/ptobc/ptobc encode /Users/laoda/pto/PTOAS/_pr_issue18_fix/test/lit/pto/tmrgsort_executed_constant_emitc.pto -o /tmp/tmrgsort_executed_constant_emitc.ptobc

@reedhecre
Copy link
Copy Markdown

reedhecre commented May 28, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: fix(ptobc): support dense executed constants in v0 #717 fix(ptobc): support dense executed constants in v0
  • Author: HecreReed
  • Base/Head: main / codex/issue18-ptobc-dense-const-v0
  • Head SHA: 6cf4ea594fd2
  • Trigger: 检测到新的 open PR
  • Generated At: 2026-05-28T02:32:25Z
  • Status: completed

Summary

未检查到 PR #717 存在问题

Findings

No issues found.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for encoding and decoding dense constant attributes (DenseElementsAttr) in the ptobc tool, along with a corresponding roundtrip test. The code review identified critical security vulnerabilities in ptobc_decode_print.cpp where integer overflows could occur when multiplying numElements by byteLen during decoding and attribute reconstruction (for both integer and float types). These overflows could bypass size validation and lead to out-of-bounds reads, so the reviewer recommended adding overflow checks before performing the multiplications.

Comment on lines +263 to +265
uint64_t numElements = shapedType.getNumElements();
uint64_t byteLen = (bitWidth + 7) / 8;
return r.readBytes(size_t(numElements * byteLen));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

Security Vulnerability: Integer Overflow leading to Out-of-Bounds Read

When decoding a dense constant, numElements and byteLen are multiplied to determine the total number of bytes to read. If a maliciously crafted .ptobc file specifies an extremely large shape, this multiplication can overflow size_t. This causes r.readBytes to allocate and read a much smaller buffer than expected, which subsequently leads to out-of-bounds heap reads and potential crashes during attribute reconstruction.

To prevent this, we must check for integer overflow before performing the multiplication.

  uint64_t numElements = shapedType.getNumElements();
  uint64_t byteLen = (bitWidth + 7) / 8;
  if (numElements > 0 && byteLen > SIZE_MAX / numElements)
    throw std::runtime_error("dense const size overflow");
  return r.readBytes(size_t(numElements * byteLen));

Comment on lines +413 to +415
unsigned byteLen = (bitWidth + 7) / 8;
if (entry.denseBytes.size() != size_t(numElements) * byteLen)
throw std::runtime_error("ConstDenseBits integer byte_len mismatch");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

Security Vulnerability: Integer Overflow leading to Out-of-Bounds Read

Similarly to the decoding phase, we must guard against integer overflow when validating the byte length of integer dense constants during attribute reconstruction. If numElements * byteLen overflows size_t, the mismatch check can be bypassed, leading to out-of-bounds reads in the subsequent loop.

Add an overflow check before validating the buffer size.

Suggested change
unsigned byteLen = (bitWidth + 7) / 8;
if (entry.denseBytes.size() != size_t(numElements) * byteLen)
throw std::runtime_error("ConstDenseBits integer byte_len mismatch");
unsigned byteLen = (bitWidth + 7) / 8;
if (numElements > 0 && byteLen > SIZE_MAX / numElements)
throw std::runtime_error("ConstDenseBits integer size overflow");
if (entry.denseBytes.size() != size_t(numElements) * byteLen)
throw std::runtime_error("ConstDenseBits integer byte_len mismatch");

Comment on lines +431 to +433
unsigned byteLen = (bitWidth + 7) / 8;
if (entry.denseBytes.size() != size_t(numElements) * byteLen)
throw std::runtime_error("ConstDenseBits float byte_len mismatch");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

Security Vulnerability: Integer Overflow leading to Out-of-Bounds Read

Similarly to the integer path, we must guard against integer overflow when validating the byte length of float dense constants during attribute reconstruction.

Add an overflow check before validating the buffer size.

Suggested change
unsigned byteLen = (bitWidth + 7) / 8;
if (entry.denseBytes.size() != size_t(numElements) * byteLen)
throw std::runtime_error("ConstDenseBits float byte_len mismatch");
unsigned byteLen = (bitWidth + 7) / 8;
if (numElements > 0 && byteLen > SIZE_MAX / numElements)
throw std::runtime_error("ConstDenseBits float size overflow");
if (entry.denseBytes.size() != size_t(numElements) * byteLen)
throw std::runtime_error("ConstDenseBits float byte_len mismatch");

@HecreReed
Copy link
Copy Markdown
Collaborator Author

/run all

@reedhecre
Copy link
Copy Markdown

已接收 /run all,A3 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre
Copy link
Copy Markdown

A3 板测失败

  • 触发方式:manual
  • 源码提交:ac2f8f48d86d
  • 结果汇总:OK 217 / FAIL 2 / SKIP 1
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260528_220305_manual_pr717.log
  • 手动指令:/run all
  • 触发人:HecreReed
  • 触发评论:fix(ptobc): support dense executed constants in v0 #717 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • syncall_binding (run, exit=1)
  • tprefetch_async_binding (run, exit=1)

@reedhecre
Copy link
Copy Markdown

A3 板测失败详情:PR #717

syncall_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507014 (/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260528_220305_manual_pr717/npu_validation/SyncAll/syncall_binding/main.cpp:84)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 586012] 2026-05-28-22:46:20.971.603 (EZ9999):  The error from device(chipId:2, dieId:0), serial number is 353, there is an exception of aicore error, core id is 17, error code = 0, dump info: pc start: 0x124800000000, current: 0x124800000188, vec error info: 0, mte error info: 0x210300004c, ifu error info: 0x212c200024600, ccu error info: 0x40a01900000000be, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100000000.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:645]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0, 0) errorStr: timeout or trap error. fixp_error0 info: 0x300004c, fixp_error1 info: 0x21, fsmId:0, tslot:3, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:658]
       Kernel task happen error, retCode=0x25, [aicore timeout].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1729]
       AICORE Kernel task happen error, retCode=0x25.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [DFX_INFO]Aicore kernel execute failed, device_id=4, stream_id=46, report_stream_id=46, task_id=0, flip_num=0, fault kernel_name=_Z22syncall_binding_kernelPii, fault kernel info ext=_Z22syncall_binding_kernelPii, program id=0, hash=3129332313788381512.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       rtStreamSynchronize execution failed, reason=aicore timeout[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507014[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-28 22:46:22] ERROR: testcase failed (exit 1): syncall_binding
tprefetch_async_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260528_220305_manual_pr717/npu_validation/TPrefetchAsync/tprefetch_async_binding/main.cpp:91)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 721068] 2026-05-28-22:46:58.714.516 (EZ9999):  The error from device(chipId:2, dieId:0), serial number is 354, there is an exception of aivec error, core id is 37, error code = 0, dump info: pc start: 0x124800000000, current: 0x124800000160, vec error info: 0x1e000000a8, mte error info: 0xa50312808b, ifu error info: 0x200008112db40, ccu error info: 0x52, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100000000.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:645]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0x200000000000000, 0) errorStr: The MPU address access is invalid. fixp_error0 info: 0x312808b, fixp_error1 info: 0xa5, fsmId:0, tslot:3, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:658]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1729]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [DFX_INFO]Aicore kernel execute failed, device_id=4, stream_id=46, report_stream_id=46, task_id=0, flip_num=0, fault kernel_name=_Z30tprefetch_async_binding_kernelPfPa, fault kernel info ext=_Z30tprefetch_async_binding_kernelPfPa, program id=0, hash=8435686547367685641.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-28 22:47:00] ERROR: testcase failed (exit 1): tprefetch_async_binding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants