[2025秋季][T1-2-1]ttaohe #816
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR 背景 / 动机
rearrange在 NVIDIA 上遇到典型 row-major ↔ col-major / full-transpose 的 stride pattern 时,通用 kernel 存在严重的非合并访存 + 索引开销,导致在一些 case(尤其 6D/大 2D)明显落后 PyTorch。本 PR 引入 pattern detection + 专用 transpose fast-path kernels + 保底 fallback,对常见 transpose 类 case 提升显著,并保持不命中时回退到原通用实现。
主要改动
1) NVIDIA
rearrange新增 transpose fast-path(带 fallback)Descriptor::calculate()中识别并优先尝试 fast-path:prepareRearrangeParams + static/dynamic kernel路径,保证正确性。涉及文件:
src/infiniop/ops/rearrange/nvidia/rearrange_nvidia.cu2) 新增/增强 transpose kernels(2D / 5D / 6D)
新增专用 kernel 文件并持续扩展:
src/infiniop/ops/rearrange/nvidia/rearrange_transpose_kernel.cuh包含:
uint16_tbitwise copy)(3,4,7,53,9)这类中等规模 full-transpose3) 调整 full-transpose pattern 的启发式阈值
isFullTransposePattern()维持严格的 stride-order 反转判断涉及文件:
src/infiniop/ops/rearrange/nvidia/rearrange_nvidia.cu4) 测试/Benchmark 集成与修复
test/infinicore/ops/rearrange.py,支持run.py --ops rearrange --bench跑功能与性能对比rearrange_tensor()对非 contiguous tensor 使用.view(-1)报错,改为.reshape(-1):test/infinicore/framework/utils.pyBenchmark(示例)
在
srun --gres=gpu:nvidia:1 python test/infinicore/run.py --ops rearrange --nvidia --bench下:如何构建 & 复现
风险与回退策略
HONER_CODE.md
HONOR_CODE.md
REFERENCE.md
REFERENCE.md