fix(ir): handle mixed tensor store-load bridges#1005
fix(ir): handle mixed tensor store-load bridges#1005lwDavid wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
📝 WalkthroughWalkthroughThis PR implements tensor-bridge pattern detection in the mixed-kernel expansion pass to identify Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements a mechanism to handle tensor store/load boundaries during mixed kernel expansion, ensuring that Vector-to-Cube (V->C) tensor bridges are correctly transformed into tpush and tpop operations rather than simple tile aliases. The changes include refactoring BuildCoreBody to use modular emit_push and emit_pop helpers and adding logic to collect tensor store/load pairs. The review feedback correctly identifies a technical issue where the code attempts to access the name_ member directly from an ExprPtr base class; it is recommended to use GetKind() and static_pointer_cast to ensure both correctness and better performance compared to dynamic_pointer_cast.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ir/transforms/expand_mixed_kernel_pass.cpp`:
- Around line 229-236: The function ResolveTensorBridgeInputVar currently checks
for Var before IterArg causing IterArg (which derives from Var) to be returned
prematurely; update ResolveTensorBridgeInputVar to check for
std::dynamic_pointer_cast<const IterArg> first and, if present, recurse on
iter_arg->initValue_, otherwise then check for Var and return var.get(), so
loop/while-carried tensor bridges resolve to their originating tensor via
initValue_.
- Around line 295-299: The loop currently skips producers with multiple
consumers, which silently drops multi-load bridges; instead, when
producer_candidates.size() != 1 create a single shared boundary for that
producer and map push_boundaries[producer_stmt] to it and map every
consumer_stmt in producer_candidates to pop_boundaries with the same boundary so
all cross-core loads use a single transfer (avoid per-side private allocations).
Update the loop over candidates (and the handling of producer_candidates) to
handle the multi-consumer case by constructing one boundary and assigning it to
producer_stmt and to each consumer_stmt rather than continuing/ignoring; use the
existing symbols producer_stmt, producer_candidates, consumer_stmt, boundary,
push_boundaries, and pop_boundaries to implement this.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: fb2c9f81-a008-4b04-afdb-4aa7cf3da812
📒 Files selected for processing (2)
src/ir/transforms/expand_mixed_kernel_pass.cpptests/ut/ir/transforms/test_expand_mixed_kernel_a2a3.py
ea9fc40 to
49de991
Compare
|
The issue can be addressed by removing redundant lines in frontend kernel. Convert this PR to draft. |
|
I think it not a issue of expandkernle |
Summary
Fix
ExpandMixedKernelfor mixed V/C kernels where cross-core data transfer happens throughtile.store -> tile.loadinstead of an explicit
tile.move.Changes
tpush/tpopVar*Fixes #965