Skip to content

Conversation

@vickiw973
Copy link
Contributor

Tested results on B200 chip with Python3.13.8 and CuTe DSL 4.3.0.dev0

(env13_8) nvfp4_dual_gemm$ python3 eval.py test task.yml
compile: start
compile: pass
test-count: 10
test.0.spec: m: 128; n: 256; k: 256; l: 1; seed: 1111
test.0.status: pass
test.1.spec: m: 128; n: 1536; k: 7168; l: 1; seed: 1111
test.1.status: pass
test.2.spec: m: 128; n: 3072; k: 1536; l: 1; seed: 1111
test.2.status: pass
test.3.spec: m: 256; n: 7168; k: 256; l: 1; seed: 1111
test.3.status: pass
test.4.spec: m: 256; n: 7168; k: 2048; l: 1; seed: 1111
test.4.status: pass
test.5.spec: m: 2304; n: 4608; k: 7168; l: 1; seed: 1111
test.5.status: pass
test.6.spec: m: 384; n: 7168; k: 2304; l: 1; seed: 1111
test.6.status: pass
test.7.spec: m: 512; n: 512; k: 7168; l: 1; seed: 1111
test.7.status: pass
test.8.spec: m: 512; n: 4096; k: 512; l: 1; seed: 1111
test.8.status: pass
test.9.spec: m: 512; n: 1536; k: 7168; l: 1; seed: 1111
test.9.status: pass
check: pass
(env13_8) nvfp4_dual_gemm$ python3 eval.py benchmark task.yml
compile: start
compile: pass
benchmark-count: 3
benchmark.0.spec: m: 7168; n: 128; k: 16384; l: 1; seed: 1111
benchmark.0.runs: 200
benchmark.0.mean: 160051.9973784685
benchmark.0.std: 23031.866455664996
benchmark.0.err: 1628.5988954183692
benchmark.0.best: 152575.9994983673
benchmark.0.worst: 472128.0038356781
benchmark.1.spec: m: 4096; n: 128; k: 7168; l: 1; seed: 1111
benchmark.1.runs: 200
benchmark.1.mean: 99979.84111309052
benchmark.1.std: 20095.203008511555
benchmark.1.err: 1420.945431663883
benchmark.1.best: 93184.00174379349
benchmark.1.worst: 378879.99415397644
benchmark.2.spec: m: 7168; n: 128; k: 2048; l: 1; seed: 1111
benchmark.2.runs: 200
benchmark.2.mean: 74724.80170428753
benchmark.2.std: 21870.720279291818
benchmark.2.err: 1546.4934618921386
benchmark.2.best: 69632.00122117996
benchmark.2.worst: 374783.992767334
check: pass
(env13_8) nvfp4_dual_gemm$ python3 eval.py leaderboard task.yml
compile: start
compile: pass
benchmark-count: 3
benchmark.0.spec: m: 7168; n: 128; k: 16384; l: 1; seed: 1111
benchmark.0.runs: 200
benchmark.0.mean: 253803.03986370564
benchmark.0.std: 9263.37772232849
benchmark.0.err: 655.019720415087
benchmark.0.best: 230399.9960422516
benchmark.0.worst: 283648.0140686035
benchmark.1.spec: m: 4096; n: 128; k: 7168; l: 1; seed: 1111
benchmark.1.runs: 200
benchmark.1.mean: 143465.91904759407
benchmark.1.std: 26430.637287532198
benchmark.1.err: 1868.9282857096032
benchmark.1.best: 136191.99395179749
benchmark.1.worst: 509952.00872421265
benchmark.2.spec: m: 7168; n: 128; k: 2048; l: 1; seed: 1111
benchmark.2.runs: 200
benchmark.2.mean: 114432.32048302889
benchmark.2.std: 28682.384716506524
benchmark.2.err: 2028.1508733643152
benchmark.2.best: 107519.99914646149
benchmark.2.worst: 506911.9930267334
check: pass

@vickiw973
Copy link
Contributor Author

(env13_8) vickiw@6u1g-0014:/home/scratch.vickiw_gpu/reference-kernels/problems/nvidia/nvfp4_dual_gemm$ python3 eval.py leaderboard task.yml
compile: start
compile: pass
benchmark-count: 4
benchmark.0.spec: m: 256; n: 4096; k: 7168; l: 1; seed: 1111
benchmark.0.runs: 200
benchmark.0.mean: 161848.16002845764
benchmark.0.std: 14513.790725180155
benchmark.0.err: 1026.2799842497307
benchmark.0.best: 146431.99741840363
benchmark.0.worst: 222207.99326896667
benchmark.1.spec: m: 512; n: 4096; k: 7168; l: 1; seed: 1111
benchmark.1.runs: 200
benchmark.1.mean: 167805.43953180313
benchmark.1.std: 34217.14372801112
benchmark.1.err: 2419.5174362911407
benchmark.1.best: 151552.0066022873
benchmark.1.worst: 596000.0157356262
benchmark.2.spec: m: 256; n: 3072; k: 4096; l: 1; seed: 1111
benchmark.2.runs: 200
benchmark.2.mean: 146943.99930536747
benchmark.2.std: 28466.58417641395
benchmark.2.err: 2012.8914708359976
benchmark.2.best: 134112.00046539307
benchmark.2.worst: 484384.0003013611
benchmark.3.spec: m: 512; n: 3072; k: 7168; l: 1; seed: 1111
benchmark.3.runs: 200
benchmark.3.mean: 167575.03867149353
benchmark.3.std: 32406.27526056167
benchmark.3.err: 2291.469698974101
benchmark.3.best: 152575.9994983673
benchmark.3.worst: 573440.0153160095
check: pass

@S1ro1 S1ro1 merged commit 0bfe6ad into gpu-mode:main Dec 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants