Skip to content

Port frontend tile fusion to EmitC mainline#704

Open
Likai-19 wants to merge 2 commits into
hw-native-sys:mainfrom
Likai-19:tile_front_fusion
Open

Port frontend tile fusion to EmitC mainline#704
Likai-19 wants to merge 2 commits into
hw-native-sys:mainfrom
Likai-19:tile_front_fusion

Conversation

@Likai-19
Copy link
Copy Markdown

Summary

Reintroduce frontend tile fusion on the current A5 EmitC mainline behind
--enable-op-fusion, but keep the implementation intentionally small:

  • run fusion planning and scheduling on tile-native PTO IR before
    PTOViewToMemref
  • mark fused tile ops with pto.last_use directly on scheduled block-local
    spans
  • preserve the final EmitC contract by emitting
    [[pto::last_use(... )]] CALLEE(...)
  • do not introduce or preserve a pto.fusion_region / pto.yield
    lifecycle in the shared mainline

In other words, this PR keeps the user-visible goal of "frontend op scheduling

  • final last_use emission", while removing the larger FusionRegion-based IR
    contract from the implementation.

What changed

Driver and pipeline

  • add --enable-op-fusion on the current ptoas driver
  • gate it to --pto-arch=a5 with --pto-level=level2|level3
  • run the frontend fusion core on tile-native PTO IR:
    • FusionPlan
    • OpScheduling
    • PTOMarkLastUse
  • keep this pipeline before PTOViewToMemref
  • leave unsupported configurations on the ordinary unfused path with warnings
    instead of failing compilation

Frontend fusion core

  • port the tile-fusion planning/scheduling support needed on the current
    mainline:
    • FusionAnalysis
    • FusionOpSemantics
    • PTOFusionPlan
    • PTOOpScheduling
  • represent accepted fusion groups as contiguous scheduled spans in a block
    rather than wrapping them in a region op

last_use implementation

  • introduce PTOMarkLastUse as the place that computes pto.last_use
  • make the analysis span-based instead of region/yield-based:
    • collect each contiguous scheduled group span from
      pto.fusion.group_id / pto.fusion.order
    • compute last-use per tile operand slot inside that span
    • block a bit if the tile value is used later in the same span
    • also block a bit if the tile value is used later in the parent block after
      the span
  • encode last_use per tile operand slot, with the following rules:
    • scalar operands do not occupy slots
    • DPS init / output tile slots are preserved but always stay 0
    • repeated SSA tile operands are evaluated independently per slot

EmitC last_use output

  • keep the final output contract as [[pto::last_use(... )]] CALLEE(...)
  • lower marked fused tile ops through a PTOAS-local marker callee path in
    PTOToEmitC
  • rewrite that marker to the final C++ attribute spelling in
    CppPostprocess
  • fix marker bit ordering so single-DPS-init tile intrinsics follow the final
    emitted operand order, which keeps the output tile slot at 0 in the final
    emitted attribute

Explicit non-goals / removed scope

  • no pto.fusion_region
  • no pto.yield
  • no PTOFusionRegionGen
  • no PTOFlattenFusionRegion
  • no shared-pass preservation contract for fusion-region lifecycle through
    PTOViewToMemref, memory planning, reserved-buffer resolution, sync
    insertion, or tile-handle materialization

Why this shape

The original larger port bundled three concerns together:

  1. frontend fusion planning/scheduling
  2. region formation / flattening
  3. final EmitC last_use emission

For the current goal, only (1) and (3) are essential. This PR keeps the
useful part of the feature and localizes the extra complexity to
PTOMarkLastUse, instead of requiring multiple existing shared passes to
understand and preserve a new region lifecycle.

Testing

Added focused tile-fusion coverage for:

  • fusion planning:
    • join
    • diamond
    • interleaved join
    • treshape boundary
    • dynamic-shape negative case
  • scheduling:
    • basic compaction
    • treshape bridge
    • pure-op bridge
    • negative region / call / SSA boundary cases
  • last_use:
    • slot-mask encoding
    • repeated SSA operands
    • post-span later-use blocking
  • end-to-end EmitC output:
    • final [[pto::last_use(... )]] emission
    • absence of residual pto.fusion_region / pto.yield
  • control surface:
    • CLI visibility / gating
    • non-fused fallback behavior
    • adapter placement in level2 and level3 shared lowering paths

Focused verification run:

  • llvm-lit -sv build/test/lit/tile_fusion

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a frontend tile-fusion optimization pipeline for the A5 EmitC mainline, adding passes for fusion planning, instruction scheduling, and last-use marking, along with supporting semantic analyses and C++ post-processing. The review feedback highlights a critical scheduling bug in PTOOpScheduling.cpp where moving only the placement operator breaks the contiguity of the fusion group. Additionally, improvements are suggested to translate a Chinese comment to English in PTOMarkLastUse.cpp, replace std::isdigit with llvm::isDigit in CppPostprocess.cpp to prevent potential undefined behavior, and simplify a redundant ArrayRef conversion in FusionAnalysis.cpp.

!canMoveLaterAcross(placement, blockingOp))
break;

placement->moveAfter(blockingOp);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Moving only placement later via placement->moveAfter(blockingOp) leaves the previously scheduled members of the group behind, which breaks the contiguity of the fusion group. To maintain contiguity, all previously scheduled members of the group must be moved together with placement, or the scheduling logic should be revised to avoid breaking contiguity.

lastUseMask.push_back(0);
continue;
}
// isSpanLocalLastUseCandidate的检查范围大于hasLaterUseAfterSpan
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Please translate this Chinese comment to English to maintain consistency and readability for international developers.

Suggested change
// isSpanLocalLastUseCandidate的检查范围大于hasLaterUseAfterSpan
// The check scope of isSpanLocalLastUseCandidate is larger than hasLaterUseAfterSpan

: encoded.slice(pos, next);
if (token.empty())
return false;
if (!llvm::all_of(token, [](char c) { return std::isdigit(c); }))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using std::isdigit with a char argument can lead to undefined behavior if the character is signed and has a negative value. It is safer and more idiomatic in LLVM/MLIR to use llvm::isDigit.

Suggested change
if (!llvm::all_of(token, [](char c) { return std::isdigit(c); }))
if (!llvm::all_of(token, [](char c) { return llvm::isDigit(c); }))

if (info.vRow == ShapedType::kDynamic || info.vCol == ShapedType::kDynamic)
info.unprovenReason = IterationDomainUnprovenReason::DynamicShape;

for (Value value : ArrayRef<Value>(anchorValues).drop_front()) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The explicit conversion to ArrayRef<Value> is redundant because anchorValues is already an ArrayRef<Value>. You can simplify this by calling drop_front() directly on anchorValues.

Suggested change
for (Value value : ArrayRef<Value>(anchorValues).drop_front()) {
for (Value value : anchorValues.drop_front()) {

@reedhecre
Copy link
Copy Markdown

reedhecre commented May 26, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: Port frontend tile fusion to EmitC mainline #704 Port frontend tile fusion to EmitC mainline
  • Author: Likai-19
  • Base/Head: main / tile_front_fusion
  • Head SHA: 433027ba6541
  • Trigger: PR 有新提交
  • Generated At: 2026-05-31T09:46:42Z
  • Previous Head SHA: 71cd913377fd
  • Status: completed

Summary

PR #704 has two merge-blocking issues in op-fusion scheduling / last_use C++ rewriting, plus one important hard-boundary contract mismatch across the new fusion passes.

Findings

  1. P1 Pure helper ops are still treated as hard scheduling barriers lib/PTO/Transforms/TileFusion/PTOOpScheduling.cpp:85

classifySchedulingBarrier() consults getFusionOpSemantics() before it falls back to isMemoryEffectFree(). For ops such as arith.constant / arith.index_cast, getFusionOpSemantics() returns FusionOpKind::HardBoundary, so the scheduler refuses to move fused ops across them. That means groups separated only by pure helper ops stay split, which contradicts the intended behavior and makes the newly added test/lit/tile_fusion/op_scheduling_pure_op_bridge.pto case fail. This is both a CI blocker and a real fusion regression for otherwise legal cases.

  1. P1 Last-use marker rewriting breaks on templated intrinsics lib/PTO/Transforms/CppPostprocess.cpp:60

rewriteLastUseMarkersInCpp() assumes the marker name ends immediately before (. For fused calls that also carry template args, EmitC prints forms like PTOAS__LAST_USE__TEXP__...<pto::ExpAlgorithm::HIGH_PRECISION>(...) (same issue for TDIV / TROWEXPANDDIV). The <...> suffix becomes part of the encoded marker payload, parseLastUseMarkerName() rejects it as non-digit text, and the synthetic PTOAS__LAST_USE__... callee is left in the final C++. Any precision-tagged fused op therefore emits invalid/unrewritten C++ instead of [[pto::last_use(...)]] CALLEE<...>(...).

  1. P2 Fusion planning and last-use marking disagree with scheduling on hard boundaries lib/PTO/Transforms/TileFusion/PTOFusionPlan.cpp:85

FusionPlan only treats terminators, region ops and calls as hard boundaries. OpScheduling is stricter and blocks motion across any memory-effecting non-fusion op via !isMemoryEffectFree(). As a result, patterns like tadd ; tstore ; tadd can still be given one pto.fusion.group_id, but the group can never be compacted into one contiguous span. PTOMarkLastUse uses the same narrow barrier notion, so it can still attach pto.last_use to these unschedulable groups. That breaks the pass contract that pto.last_use only annotates scheduled contiguous fusion spans and needs boundary alignment (or at least a negative regression test).

@Likai-19 Likai-19 force-pushed the tile_front_fusion branch 3 times, most recently from 0cfcd64 to 71cd913 Compare May 31, 2026 07:33
@Likai-19 Likai-19 force-pushed the tile_front_fusion branch from 71cd913 to 433027b Compare May 31, 2026 08:46
@Likai-19
Copy link
Copy Markdown
Author

/run a5

@reedhecre
Copy link
Copy Markdown

已接收 /run a5,A5 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre
Copy link
Copy Markdown

A5 板测失败

  • 触发方式:manual
  • 源码提交:0a977549b996
  • 结果汇总:OK 212 / FAIL 7 / SKIP 1
  • 日志:/root/ptoas-board-monitor-a5/logs/20260531_165906_manual_pr704.log
  • 手动指令:/run a5
  • 触发人:Likai-19
  • 触发评论:Port frontend tile fusion to EmitC mainline #704 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • tprefetch_async_binding (run, exit=1)
  • syncall_binding (run, exit=1)
  • rowsum (run, exit=139)
  • plan_memory_loop_no_reuse_outer_live (run, exit=139)
  • cmps (run, exit=2)
  • cmp (run, exit=2)
  • addptr_dynamic (run, exit=139)

@reedhecre
Copy link
Copy Markdown

A5 板测失败详情:PR #704

tprefetch_async_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/tmp/ptoas-board-monitor-a5/runs/20260531_165906_manual_pr704/npu_validation/TPrefetchAsync/tprefetch_async_binding/main.cpp:91)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 3215142] 2026-05-31-17:07:23.119.214 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 9, there is an aivec error exception, core id is 2, error code = 271, dump info: pc start: 0x100040800000, current: 0x1000408001a0, sc error info: 0xffffffffffff, su error info: 0xd8df85df2c9c0059,0x80400000f000cb7f, mte error info: 0x63bd187100077964, vec error info: 0xb3ceb62f007cefcf, cube error info: 0, l1 error info: 0, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(271) errorStr: The MPU address access is invalid. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=0, stream_id=62, report_stream_id=62, task_id=0, flip_num=0, fault kernel_name=_Z30tprefetch_async_binding_kernelPfPa, fault kernel info ext=_Z30tprefetch_async_binding_kernelPfPa, program id=0, hash=1899772384034012286.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-31 17:07:28] ERROR: testcase failed (exit 1): tprefetch_async_binding
syncall_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507015 (/tmp/ptoas-board-monitor-a5/runs/20260531_165906_manual_pr704/npu_validation/SyncAll/syncall_binding/main.cpp:84)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 3219053] 2026-05-31-17:08:51.456.953 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 10, there is an aicore error exception, core id is 1, error code = 259, dump info: pc start: 0x100040800000, current: 0x10004080010c, sc error info: 0xffffffffffff, su error info: 0xffffffff0f980030,0x720422087000f7ff, mte error info: 0x3decfd0f0006bfbf, vec error info: 0, cube error info: 0, l1 error info: 0xf192000ae49c, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(259) errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x26, [aicore exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AICORE Kernel task happen error, retCode=0x26.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=0, stream_id=62, report_stream_id=62, task_id=0, flip_num=0, fault kernel_name=_Z22syncall_binding_kernelPii, fault kernel info ext=_Z22syncall_binding_kernelPii, program id=0, hash=9475521060208115623.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=aicore exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507015[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-31 17:08:57] ERROR: testcase failed (exit 1): syncall_binding
rowsum

stage=run info=exit=139

./test/npu_validation/scripts/run_remote_npu_validation.sh: line 366: 3256967 Segmentation fault      (core dumped) LD_LIBRARY_PATH="${LD_LIBRARY_PATH_NPU}" ./build/${testcase}
[2026-05-31 17:22:34] ERROR: testcase failed (exit 139): rowsum
plan_memory_loop_no_reuse_outer_live

stage=run info=exit=139

./test/npu_validation/scripts/run_remote_npu_validation.sh: line 366: 3297945 Segmentation fault      (core dumped) LD_LIBRARY_PATH="${LD_LIBRARY_PATH_NPU}" ./build/${testcase}
[2026-05-31 17:36:52] ERROR: testcase failed (exit 139): plan_memory_loop_no_reuse_outer_live
cmps

stage=run info=exit=2

[ERROR] Packed mask mismatch: golden_v2.bin vs v2.bin, idx=4 (golden=98, out=0)
[ERROR] compare failed
[2026-05-31 18:00:06] ERROR: testcase failed (exit 2): cmps
cmp

stage=run info=exit=2

[ERROR] Packed mask mismatch: golden_v3.bin vs v3.bin, idx=4 (golden=49, out=0)
[ERROR] compare failed
[2026-05-31 18:00:18] ERROR: testcase failed (exit 2): cmp
addptr_dynamic

stage=run info=exit=139

./test/npu_validation/scripts/run_remote_npu_validation.sh: line 366: 3375169 Segmentation fault      (core dumped) LD_LIBRARY_PATH="${LD_LIBRARY_PATH_NPU}" ./build/${testcase}
[2026-05-31 18:02:44] ERROR: testcase failed (exit 139): addptr_dynamic

@Likai-19
Copy link
Copy Markdown
Author

/run a3

@reedhecre
Copy link
Copy Markdown

已接收 /run a3,A3 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre
Copy link
Copy Markdown

A3 板测失败

  • 触发方式:manual
  • 源码提交:0a977549b996
  • 结果汇总:OK 217 / FAIL 2 / SKIP 1
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260531_182105_manual_pr704.log
  • 手动指令:/run a3
  • 触发人:Likai-19
  • 触发评论:Port frontend tile fusion to EmitC mainline #704 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • syncall_binding (run, exit=1)
  • tprefetch_async_binding (run, exit=1)

@reedhecre
Copy link
Copy Markdown

A3 板测失败详情:PR #704

syncall_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507014 (/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260531_182105_manual_pr704/npu_validation/SyncAll/syncall_binding/main.cpp:84)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 1464828] 2026-05-31-19:05:02.203.373 (EZ9999):  The error from device(chipId:2, dieId:1), serial number is 539, there is an exception of aicore error, core id is 11, error code = 0, dump info: pc start: 0x124a00000000, current: 0x124a00000188, vec error info: 0, mte error info: 0xc503000030, ifu error info: 0x312c1c85b3300, ccu error info: 0x1ce600000000009b, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100000000.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:645]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0, 0) errorStr: timeout or trap error. fixp_error0 info: 0x3000030, fixp_error1 info: 0xc5, fsmId:0, tslot:3, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:658]
       Kernel task happen error, retCode=0x25, [aicore timeout].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1729]
       AICORE Kernel task happen error, retCode=0x25.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [DFX_INFO]Aicore kernel execute failed, device_id=5, stream_id=46, report_stream_id=46, task_id=0, flip_num=0, fault kernel_name=_Z22syncall_binding_kernelPii, fault kernel info ext=_Z22syncall_binding_kernelPii, program id=0, hash=3129332313788381512.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       rtStreamSynchronize execution failed, reason=aicore timeout[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507014[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-31 19:05:03] ERROR: testcase failed (exit 1): syncall_binding
tprefetch_async_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260531_182105_manual_pr704/npu_validation/TPrefetchAsync/tprefetch_async_binding/main.cpp:91)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 1573908] 2026-05-31-19:05:37.379.730 (EZ9999):  The error from device(chipId:2, dieId:1), serial number is 540, there is an exception of aivec error, core id is 35, error code = 0, dump info: pc start: 0x124a00000000, current: 0x124a00000160, vec error info: 0x22000000dc, mte error info: 0x1403003083, ifu error info: 0x2fffef3fae940, ccu error info: 0x1ce6000000000052, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100000000.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:645]
        TraceBack (most recent call last):
       The extend info: errcode:(0, 0x200000000000000, 0) errorStr: The MPU address access is invalid. fixp_error0 info: 0x3003083, fixp_error1 info: 0x14, fsmId:0, tslot:3, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:PrintCoreInfo][FILE:device_error_core_proc.cc][LINE:658]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1729]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1475]
       [DFX_INFO]Aicore kernel execute failed, device_id=5, stream_id=46, report_stream_id=46, task_id=0, flip_num=0, fault kernel_name=_Z30tprefetch_async_binding_kernelPfPa, fault kernel info ext=_Z30tprefetch_async_binding_kernelPfPa, program id=0, hash=8435686547367685641.[FUNC:GetError][FILE:stream.cc][LINE:1475]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-31 19:05:39] ERROR: testcase failed (exit 1): tprefetch_async_binding

@Likai-19
Copy link
Copy Markdown
Author

/run a5 tprefetch_async_binding syncall_binding rowsum plan_memory_loop_no_reuse_outer_live cmps cmp addptr_dynamic

@reedhecre
Copy link
Copy Markdown

已接收 /run a5 tprefetch_async_binding syncall_binding rowsum plan_memory_loop_no_reuse_outer_live cmps cmp addptr_dynamic,A5 板测器会处理这条请求。

  • 进度页:http://154.9.227.233/ptoas-board-dashboard/#board-a5
  • 当前状态:板测器空闲,这条请求会在本轮轮询启动。
  • 指定用例:tprefetch_async_binding,syncall_binding,rowsum,plan_memory_loop_no_reuse_outer_live,cmps,cmp,addptr_dynamic

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre
Copy link
Copy Markdown

A5 板测失败

  • 触发方式:manual
  • 源码提交:0a977549b996
  • 结果汇总:OK 5 / FAIL 2 / SKIP 0
  • 日志:/root/ptoas-board-monitor-a5/logs/20260531_200206_manual_pr704.log
  • 手动指令:/run a5 tprefetch_async_binding syncall_binding rowsum plan_memory_loop_no_reuse_outer_live cmps cmp addptr_dynamic
  • 触发人:Likai-19
  • 指定用例:tprefetch_async_binding,syncall_binding,rowsum,plan_memory_loop_no_reuse_outer_live,cmps,cmp,addptr_dynamic
  • 触发评论:Port frontend tile fusion to EmitC mainline #704 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • tprefetch_async_binding (run, exit=1)
  • syncall_binding (run, exit=1)

@reedhecre
Copy link
Copy Markdown

A5 板测失败详情:PR #704

tprefetch_async_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507035 (/tmp/ptoas-board-monitor-a5/runs/20260531_200206_manual_pr704/npu_validation/TPrefetchAsync/tprefetch_async_binding/main.cpp:91)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 3391222] 2026-05-31-20:05:43.601.194 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 11, there is an aivec error exception, core id is 2, error code = 271, dump info: pc start: 0x100040800000, current: 0x1000408001a0, sc error info: 0xffffffffffff, su error info: 0xd8df85df2c9c0059,0x80400000f000cb7f, mte error info: 0x63bd187100077964, vec error info: 0xb3ceb62f007cefcf, cube error info: 0, l1 error info: 0, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(271) errorStr: The MPU address access is invalid. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=0, stream_id=62, report_stream_id=62, task_id=0, flip_num=0, fault kernel_name=_Z30tprefetch_async_binding_kernelPfPa, fault kernel info ext=_Z30tprefetch_async_binding_kernelPfPa, program id=0, hash=1899772384034012286.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=vector core exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-31 20:05:48] ERROR: testcase failed (exit 1): tprefetch_async_binding
syncall_binding

stage=run info=exit=1

[ERROR] aclrtSynchronizeStream(stream) failed: 507015 (/tmp/ptoas-board-monitor-a5/runs/20260531_200206_manual_pr704/npu_validation/SyncAll/syncall_binding/main.cpp:84)
[ERROR] RecentErrMsg: EZ9999: Inner Error!
EZ9999[PID: 3391750] 2026-05-31-20:05:53.823.501 (EZ9999):  The error from device(chipId:0, dieId:0), serial number is 12, there is an aicore error exception, core id is 1, error code = 259, dump info: pc start: 0x100040800000, current: 0x100040800110, sc error info: 0xffffffffffff, su error info: 0xffffffff0f980030,0x720422087000f7ff, mte error info: 0x3decfd0f0006bfbf, vec error info: 0, cube error info: 0, l1 error info: 0xf192000ae49c, aic error mask: 0x395856, para base: 0x100040200000, mte error: 0.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:580]
        TraceBack (most recent call last):
       The extend info: errcode:(259) errorStr: Illegal instruction, which is usually caused by unaligned UUB addresses. subErrType: 0x4.[FUNC:ProcessDavidStarsCoreErrorInfo][FILE:device_error_proc_c.cc][LINE:583]
       Kernel task happen error, retCode=0x26, [aicore exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1728]
       AICORE Kernel task happen error, retCode=0x26.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1478]
       [DFX_INFO]Aicore kernel execute failed, device_id=0, stream_id=62, report_stream_id=62, task_id=0, flip_num=0, fault kernel_name=_Z22syncall_binding_kernelPii, fault kernel info ext=_Z22syncall_binding_kernelPii, program id=0, hash=9475521060208115623.[FUNC:GetError][FILE:stream.cc][LINE:1478]
       rtStreamSynchronize execution failed, reason=aicore exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
       synchronize stream failed, runtime result = 507015[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
[2026-05-31 20:05:58] ERROR: testcase failed (exit 1): syncall_binding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants