[Metal] Reject tensor-scale nvfp4 in qqmm by Brooooooklyn · Pull Request #3551 · ml-explore/mlx

Brooooooklyn · 2026-05-14T07:24:59Z

Summary

QQMatmul::eval_gpu on Metal silently dropped global_scale_x / global_scale_w in its gemv special case (pre-quantized w, x.shape(-2) == 1), producing numerically incorrect results when tensor-scale nvfp4 weights were in use. The general case already throws [QQMatmul] NYI for the general case. qqmm() was missed when tensor-scale nvfp4 landed in #3022.

This change adds a backend-level guard at the top of QQMatmul::eval_gpu in mlx/backend/metal/quantized.cpp: when mode_ == Nvfp4 and extra inputs beyond (x, w[, scales_w]) are present (i.e. global scales were packed by ops.cpp), it throws std::runtime_error (Python RuntimeError) — matching the local throw style used by the existing NYI path and keeping the device-specific check out of the backend-agnostic ops.cpp. CUDA is unaffected.

Per-group nvfp4 (no global_scale) on Metal continues to work — that path is exercised by the existing test_qqmv and is unchanged.

Fixes #3550.

Test plan

Added test_qqmm_metal_global_scale_rejected in python/tests/test_quantized.py, asserts RuntimeError when mx.qqmm is evaluated on Metal with both global scales set. Verified the test fails on main (silently runs to completion) and passes with this fix.
Full python/tests/test_quantized.py (29 tests) still passes locally on Apple Silicon, including test_qqmv which exercises the gemv branch being guarded.

zcbenz

This check should be done in QQMatmul::eval_gpu, generally we should not check backend types in ops.cpp.

QQMatmul::eval_gpu on Metal silently dropped global_scale_x / global_scale_w in the gemv special case (pre-quantized w, M==1), producing numerically incorrect results when tensor-scale nvfp4 weights were in use. The general case already throws NYI. Add a backend-level guard at the top of QQMatmul::eval_gpu that rejects nvfp4 with global scales packed into inputs, matching the local throw style and keeping the check out of the backend-agnostic ops.cpp. Fixes ml-explore#3550.

Brooooooklyn · 2026-05-17T02:58:12Z

Updated

Brooooooklyn force-pushed the fix/metal-qqmm-global-scale-guard branch from 20f4211 to 0964646 Compare May 14, 2026 07:26

zcbenz reviewed May 17, 2026

View reviewed changes

Brooooooklyn force-pushed the fix/metal-qqmm-global-scale-guard branch from 0964646 to eb9a4d0 Compare May 17, 2026 02:55

Brooooooklyn requested a review from zcbenz May 17, 2026 08:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metal] Reject tensor-scale nvfp4 in qqmm#3551

[Metal] Reject tensor-scale nvfp4 in qqmm#3551
Brooooooklyn wants to merge 1 commit into
ml-explore:mainfrom
mlx-node:fix/metal-qqmm-global-scale-guard

Brooooooklyn commented May 14, 2026 •

edited

Loading

Uh oh!

zcbenz left a comment

Uh oh!

Brooooooklyn commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Brooooooklyn commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

zcbenz left a comment

Choose a reason for hiding this comment

Uh oh!

Brooooooklyn commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Brooooooklyn commented May 14, 2026 •

edited

Loading