[TEST] add test in /root/paddlejob/workspace/env_run/output/zkk/2026_04_17FL…#7766
[TEST] add test in /root/paddlejob/workspace/env_run/output/zkk/2026_04_17FL…#7766zhoutianzi666 wants to merge 2 commits intoPaddlePaddle:developfrom
Conversation
…ASHMLA/FastDeploy/tests/operators/test_deepgemm_precision.py
|
Thanks for your contribution! |
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览⏳ CI 进行中:3 个 Required 任务仍在运行,请等待完成后查看最终结果。
2 任务状态汇总2.1 Required任务 : 7/10 通过
2.2 可选任务 — 23/26 通过
3 失败详情(仅 required)无 required 失败任务。 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7766 +/- ##
==========================================
Coverage ? 72.22%
==========================================
Files ? 396
Lines ? 55696
Branches ? 8705
==========================================
Hits ? 40225
Misses ? 12704
Partials ? 2767
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ASHMLA/FastDeploy/tests/operators/test_deepgemm_precision.py
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-10 23:46:39
📋 Review 摘要
PR 概述:为 Blackwell (SM100) 架构新增基于 CUTLASS cute/SM100 接口的 Dense GEMM 精度测试,并重构 one_invoke 的容错处理
变更范围:tests/operators/test_deepgemm_precision.py
影响面 Tag:[OP] [CI]
📝 PR 规范检查
PR 标题 [TEST] 不是官方 Tag 列表中的合法 Tag,且标题包含本地文件路径,无实际描述意义;Motivation、Modifications、Usage or Command、Accuracy Tests 各 section 内容均为空,不符合描述模板要求。
标题建议(可直接复制):
[CI] Add DenseGemmKernel precision test for Blackwell (SM100)
PR 描述建议(可直接复制,复刻 checklist §D2 模板完整结构):
## Motivation
为 Blackwell (SM100) 架构的 Dense GEMM 精度验证添加单元测试,基于 CUTLASS cute SM100 TMA/TMEM 接口实现自定义 Dense GEMM kernel,并与 paddle.matmul 基准进行精度对比。
## Modifications
- `tests/operators/test_deepgemm_precision.py`:
- 新增 `DenseGemmKernel` 类:基于 CUTLASS cute/SM100 接口实现 Dense GEMM kernel(含 TMA 加载、TMEM 累加器、pipeline 同步)
- 新增 `TestDeepDenseGemm.two_invoke` 方法:将自定义 kernel 输出与 `paddle.matmul` 基准对比,要求误差为 0
- 重构 `one_invoke`:改用 try/except 保护 `deep_gemm` 导入
- `test_main` 新增 `self.two_invoke(128, 128, 64)` 调用
## Usage or Command
```bash
python -m pytest tests/operators/test_deepgemm_precision.py
```
## Accuracy Tests
`two_invoke(128, 128, 64)`:自定义 GEMM kernel 输出与 `paddle.matmul` BF16 基准精度误差要求为 0(`assert (my_tensor - baseline_out).abs().max().item() == 0.0`)
## Checklist
- [ ] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | tests/operators/test_deepgemm_precision.py:36 |
模块级 cutlass imports 无保护,pytest 收集时在所有环境执行,非 SM100/无 cutlass 环境导致整个测试文件 ImportError |
| 🟡 建议 | tests/operators/test_deepgemm_precision.py:271 |
two_invoke 缺少 SM100 设备检查,非 Blackwell GPU 上直接崩溃 |
| ❓ 疑问 | tests/operators/test_deepgemm_precision.py:352 |
one_invoke 精度断言被注释,测试实际不验证任何精度结果 |
总体评价
DenseGemmKernel 的 SM100 kernel 实现完整,two_invoke 精度对比逻辑清晰。但模块顶层的 cutlass imports 置于 if __name__ == "__main__": 保护块之外,pytest 运行时无条件执行,将导致 CI 在非 SM100/无 cutlass 环境崩溃,需修复后方可合入。
| exit(0) | ||
|
|
||
|
|
||
| import cutlass.cute as cute |
There was a problem hiding this comment.
🔴 Bug 模块级 cutlass imports 在 pytest 运行时无条件执行
此处 5 行 import cutlass.* 位于模块顶层,if __name__ == "__main__": 块中的设备检查和 import cutlass 保护对 pytest 完全无效(pytest 以模块 import 方式收集测试,不走 __main__ 分支)。在无 cutlass 安装或非 SM100 环境的 CI 机器上,这 5 行会直接抛出 ImportError,导致整个测试文件无法被收集,阻塞所有测试。
建议修复方式:将顶层 cutlass 导入改为条件导入,或使用 pytest.importorskip:
try:
import cutlass.cute as cute
import cutlass.pipeline as pipeline
import cutlass.utils as utils
import cutlass.utils.blackwell_helpers as sm100_utils
from cutlass.cute.nvgpu import cpasync, tcgen05
except ImportError:
cute = None # 后续在 two_invoke/DenseGemmKernel 中用 pytest.skip 守卫或在测试方法入口处使用:
cutlass = pytest.importorskip("cutlass")| def setUp(self): | ||
| pass | ||
|
|
||
| def two_invoke(self, M, N, K): |
There was a problem hiding this comment.
🟡 建议 two_invoke 缺少 SM100 设备能力检查
DenseGemmKernel 专为 Blackwell (SM100, prop.major == 10) 设计,但 two_invoke 方法没有任何设备检查,在 H100 (SM90)、A100 (SM80) 等设备上调用会直接崩溃。原 one_invoke 有 prop.major != 10 的守卫,two_invoke 应同样添加:
def two_invoke(self, M, N, K):
prop = paddle.device.cuda.get_device_properties()
if prop.major != 10:
return
try:
import cutlass
except ImportError:
return
...| @@ -83,6 +351,7 @@ def one_invoke(self, M, N, K): | |||
| ) | |||
|
|
|||
There was a problem hiding this comment.
❓ 疑问 one_invoke 精度断言被注释,测试失去验证意义
当前 one_invoke 仅打印 baseline_out - deepgemm_output 差值,不做任何断言。若 deep_gemm 输出完全错误,测试也会 pass。请确认:是否打算长期保留注释状态(如仅用于探索性输出)?若需验证精度,建议取消注释并设定合理阈值,例如:
assert (baseline_out - deepgemm_output).abs().max().item() < 0.1
…ASHMLA/FastDeploy/tests/operators/test_deepgemm_precision.py
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.