Port frontend tile fusion to EmitC mainline#704
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a frontend tile-fusion optimization pipeline for the A5 EmitC mainline, adding passes for fusion planning, instruction scheduling, and last-use marking, along with supporting semantic analyses and C++ post-processing. The review feedback highlights a critical scheduling bug in PTOOpScheduling.cpp where moving only the placement operator breaks the contiguity of the fusion group. Additionally, improvements are suggested to translate a Chinese comment to English in PTOMarkLastUse.cpp, replace std::isdigit with llvm::isDigit in CppPostprocess.cpp to prevent potential undefined behavior, and simplify a redundant ArrayRef conversion in FusionAnalysis.cpp.
| !canMoveLaterAcross(placement, blockingOp)) | ||
| break; | ||
|
|
||
| placement->moveAfter(blockingOp); |
There was a problem hiding this comment.
Moving only placement later via placement->moveAfter(blockingOp) leaves the previously scheduled members of the group behind, which breaks the contiguity of the fusion group. To maintain contiguity, all previously scheduled members of the group must be moved together with placement, or the scheduling logic should be revised to avoid breaking contiguity.
| lastUseMask.push_back(0); | ||
| continue; | ||
| } | ||
| // isSpanLocalLastUseCandidate的检查范围大于hasLaterUseAfterSpan |
There was a problem hiding this comment.
| : encoded.slice(pos, next); | ||
| if (token.empty()) | ||
| return false; | ||
| if (!llvm::all_of(token, [](char c) { return std::isdigit(c); })) |
There was a problem hiding this comment.
Using std::isdigit with a char argument can lead to undefined behavior if the character is signed and has a negative value. It is safer and more idiomatic in LLVM/MLIR to use llvm::isDigit.
| if (!llvm::all_of(token, [](char c) { return std::isdigit(c); })) | |
| if (!llvm::all_of(token, [](char c) { return llvm::isDigit(c); })) |
| if (info.vRow == ShapedType::kDynamic || info.vCol == ShapedType::kDynamic) | ||
| info.unprovenReason = IterationDomainUnprovenReason::DynamicShape; | ||
|
|
||
| for (Value value : ArrayRef<Value>(anchorValues).drop_front()) { |
There was a problem hiding this comment.
The explicit conversion to ArrayRef<Value> is redundant because anchorValues is already an ArrayRef<Value>. You can simplify this by calling drop_front() directly on anchorValues.
| for (Value value : ArrayRef<Value>(anchorValues).drop_front()) { | |
| for (Value value : anchorValues.drop_front()) { |
Codex Review该评论由 review 机器人自动更新。
SummaryPR #704 has two merge-blocking issues in op-fusion scheduling / last_use C++ rewriting, plus one important hard-boundary contract mismatch across the new fusion passes. Findings
|
0cfcd64 to
71cd913
Compare
71cd913 to
433027b
Compare
|
/run a5 |
|
已接收
页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。 |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #704tprefetch_async_binding
syncall_binding
rowsum
plan_memory_loop_no_reuse_outer_live
cmps
cmp
addptr_dynamic
|
|
/run a3 |
|
已接收
页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。 |
A3 板测失败
失败用例
|
A3 板测失败详情:PR #704syncall_binding
tprefetch_async_binding
|
|
/run a5 tprefetch_async_binding syncall_binding rowsum plan_memory_loop_no_reuse_outer_live cmps cmp addptr_dynamic |
|
已接收
页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。 |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #704tprefetch_async_binding
syncall_binding
|
Summary
Reintroduce frontend tile fusion on the current A5 EmitC mainline behind
--enable-op-fusion, but keep the implementation intentionally small:PTOViewToMemrefpto.last_usedirectly on scheduled block-localspans
[[pto::last_use(... )]] CALLEE(...)pto.fusion_region/pto.yieldlifecycle in the shared mainline
In other words, this PR keeps the user-visible goal of "frontend op scheduling
FusionRegion-based IRcontract from the implementation.
What changed
Driver and pipeline
--enable-op-fusionon the currentptoasdriver--pto-arch=a5with--pto-level=level2|level3FusionPlanOpSchedulingPTOMarkLastUsePTOViewToMemrefinstead of failing compilation
Frontend fusion core
mainline:
FusionAnalysisFusionOpSemanticsPTOFusionPlanPTOOpSchedulingrather than wrapping them in a region op
last_useimplementationPTOMarkLastUseas the place that computespto.last_usepto.fusion.group_id/pto.fusion.orderthe span
last_useper tile operand slot, with the following rules:0EmitC
last_useoutput[[pto::last_use(... )]] CALLEE(...)PTOToEmitCCppPostprocessemitted operand order, which keeps the output tile slot at
0in the finalemitted attribute
Explicit non-goals / removed scope
pto.fusion_regionpto.yieldPTOFusionRegionGenPTOFlattenFusionRegionPTOViewToMemref, memory planning, reserved-buffer resolution, syncinsertion, or tile-handle materialization
Why this shape
The original larger port bundled three concerns together:
last_useemissionFor the current goal, only (1) and (3) are essential. This PR keeps the
useful part of the feature and localizes the extra complexity to
PTOMarkLastUse, instead of requiring multiple existing shared passes tounderstand and preserve a new region lifecycle.
Testing
Added focused tile-fusion coverage for:
treshapeboundarytreshapebridgelast_use:[[pto::last_use(... )]]emissionpto.fusion_region/pto.yieldFocused verification run:
llvm-lit -sv build/test/lit/tile_fusion