Skip to content

fix(ir): handle mixed tensor store-load bridges#1005

Closed
lwDavid wants to merge 1 commit intohw-native-sys:mainfrom
lwDavid:v2c
Closed

fix(ir): handle mixed tensor store-load bridges#1005
lwDavid wants to merge 1 commit intohw-native-sys:mainfrom
lwDavid:v2c

Conversation

@lwDavid
Copy link
Copy Markdown
Contributor

@lwDavid lwDavid commented Apr 13, 2026

Summary

Fix ExpandMixedKernel for mixed V/C kernels where cross-core data transfer happens through tile.store -> tile.load
instead of an explicit tile.move.

Changes

  • detect implicit tensor bridges in mixed kernels
  • lower V->C tensor bridges to tpush/tpop
  • match bridge producer/consumer by SSA name instead of Var*
  • reuse existing boundary-move data structures and emission logic
  • add A2A3 regression coverage for the tensor-bridge case

Fixes #965

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

This PR implements tensor-bridge pattern detection in the mixed-kernel expansion pass to identify tile.store and tile.load producer-consumer pairs crossing core affinities (VECTOR ↔ CUBE), emitting tpush/tpop instructions for cross-core transfers instead of free tile aliasing.

Changes

Cohort / File(s) Summary
Tensor-Bridge Pattern Detection & Emission
src/ir/transforms/expand_mixed_kernel_pass.cpp
Added CollectTensorStoreLoadBoundaries() to detect store-load chains via name-hint matching and recursive IterArg resolution; introduced GetTensorBridgeDirection() to infer CVDirection for VECTOR↔CUBE transitions; extended BuildCoreBody() signature with tensor_bridge_pushes/tensor_bridge_pops maps; added emit_push()/emit_pop() helpers and early dispatch for tensor-bridge statements to emit tpush/tpop instead of tile.move; propagated new boundary maps through nested MIXED compound statements.
Tensor-Bridge Functional Test
tests/ut/ir/transforms/test_expand_mixed_kernel_a2a3.py
Added test_v2c_tensor_bridge_store_load_uses_push_pop_on_a2a3() constructing a MixedTensorBridge program with intermediate tensor store-load crossing VECTOR→CUBE; asserts generated AIC/AIV contain tpush_to_aic/tpop_from_aiv; verifies absence of _nz/_zn substrings and bridge-referencing tile.load/tile.store lines.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • lyfne123
  • Hzfengsy

Poem

🐰 Whiskers twitch with joy ~

A bridge of tensors spans the cores,
From VECTOR dreams to CUBE's great door—
With push and pop, we flow so right,
No free aliases in our sight! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(ir): handle mixed tensor store-load bridges' directly and concisely summarizes the main change—handling implicit tensor bridges in mixed kernels via store-load patterns.
Description check ✅ Passed The pull request description directly addresses the changeset, explaining the fix for ExpandMixedKernel to handle implicit tensor bridges in mixed V/C kernels.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a mechanism to handle tensor store/load boundaries during mixed kernel expansion, ensuring that Vector-to-Cube (V->C) tensor bridges are correctly transformed into tpush and tpop operations rather than simple tile aliases. The changes include refactoring BuildCoreBody to use modular emit_push and emit_pop helpers and adding logic to collect tensor store/load pairs. The review feedback correctly identifies a technical issue where the code attempts to access the name_ member directly from an ExprPtr base class; it is recommended to use GetKind() and static_pointer_cast to ensure both correctness and better performance compared to dynamic_pointer_cast.

Comment thread src/ir/transforms/expand_mixed_kernel_pass.cpp
Comment thread src/ir/transforms/expand_mixed_kernel_pass.cpp
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ir/transforms/expand_mixed_kernel_pass.cpp`:
- Around line 229-236: The function ResolveTensorBridgeInputVar currently checks
for Var before IterArg causing IterArg (which derives from Var) to be returned
prematurely; update ResolveTensorBridgeInputVar to check for
std::dynamic_pointer_cast<const IterArg> first and, if present, recurse on
iter_arg->initValue_, otherwise then check for Var and return var.get(), so
loop/while-carried tensor bridges resolve to their originating tensor via
initValue_.
- Around line 295-299: The loop currently skips producers with multiple
consumers, which silently drops multi-load bridges; instead, when
producer_candidates.size() != 1 create a single shared boundary for that
producer and map push_boundaries[producer_stmt] to it and map every
consumer_stmt in producer_candidates to pop_boundaries with the same boundary so
all cross-core loads use a single transfer (avoid per-side private allocations).
Update the loop over candidates (and the handling of producer_candidates) to
handle the multi-consumer case by constructing one boundary and assigning it to
producer_stmt and to each consumer_stmt rather than continuing/ignoring; use the
existing symbols producer_stmt, producer_candidates, consumer_stmt, boundary,
push_boundaries, and pop_boundaries to implement this.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fb2c9f81-a008-4b04-afdb-4aa7cf3da812

📥 Commits

Reviewing files that changed from the base of the PR and between 3e680d1 and c8bdcc0.

📒 Files selected for processing (2)
  • src/ir/transforms/expand_mixed_kernel_pass.cpp
  • tests/ut/ir/transforms/test_expand_mixed_kernel_a2a3.py

Comment thread src/ir/transforms/expand_mixed_kernel_pass.cpp
Comment thread src/ir/transforms/expand_mixed_kernel_pass.cpp Outdated
@lwDavid lwDavid force-pushed the v2c branch 3 times, most recently from ea9fc40 to 49de991 Compare April 13, 2026 11:57
@lwDavid
Copy link
Copy Markdown
Contributor Author

lwDavid commented Apr 14, 2026

The issue can be addressed by removing redundant lines in frontend kernel. Convert this PR to draft.

@lwDavid lwDavid marked this pull request as draft April 14, 2026 01:54
@lyfne123
Copy link
Copy Markdown
Collaborator

I think it not a issue of expandkernle

@lwDavid lwDavid closed this Apr 14, 2026
@lwDavid lwDavid deleted the v2c branch April 16, 2026 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[Bug] ExpandMixedKernel fails with "Tensor view not found" for V→C pattern in split=UP_DOWN block

2 participants