Columnwise quantize #2989

nastya236 · 2026-01-12T22:26:36Z

Column-wise quantization for tensors stored in a column-major layout. This is used when one or both inputs are passed transposed (in qqmm backward):

nt layout: used in the VJP to compute dL/dx (the second argument is transposed)
tt layout: used in the VJP to compute dL/dw (both argument are transposed)

Overview:
Input [M, K] M-major.

Each thread processes group_size elements in a column
Load to registers
Compute scale -> store to shared memory (pad to avoid bank conflicts)
Quantize -> store to shared memory (pad to avoid bank conflicts)
Write scales to [M, K/group_size] K-major
Write quantized values to [M, K/elements_per_byte] K-major

nvfp4 qqmm:

M	N	K	layout	diff %
16384	11008	4096	nn	6.3
16384	11008	4096	tn	15.9
32768	11008	4096	nn	0.9
32768	11008	4096	tn	7.8
16384	4096	11008	nn	6.3
16384	4096	11008	tn	20.1
32768	4096	11008	nn	14.5
32768	4096	11008	tn	11.4
16384	12288	4096	nn	2.4
16384	12288	4096	tn	23.8
32768	12288	4096	nn	1.3
32768	12288	4096	tn	15.3
16384	4096	12288	nn	6.4
16384	4096	12288	tn	20.0
32768	4096	12288	nn	8.9
32768	4096	12288	tn	24.9
16384	27648	5120	nn	11.7
16384	27648	5120	tn	15.9
32768	27648	5120	nn	0.7
32768	27648	5120	tn	9.4
16384	5120	27648	nn	10.6
16384	5120	27648	tn	14.2
32768	5120	27648	nn	2.3
32768	5120	27648	tn	21.7

Probably can be optimized further.
Note: fixed small bug in QQMatmul::output_shape + removed unused reorder

…stya236/mlx into columnwise-quantize-transpose

This reverts commit 1085cda.

This reverts commit bbea65d.

nastya236 and others added 12 commits January 6, 2026 00:59

wip quantize + transpose

63045e4

wip, column wise+transpose

ed92c15

columnwise quantize transpose

c32827c

WIP

3a57c82

coalesed to global

b131d54

WIP

80ed0b4

wip

7030b2f

Merge branch 'main' into columnwise-quantize-transpose

3d9ddf0

bank conflicts

a98270e

Merge branch 'ml-explore:main' into columnwise-quantize-transpose

e71a0a7

shared memory for scales

5f33cb3

Merge branch 'columnwise-quantize-transpose' of https://github.com/na…

f4010b2

…stya236/mlx into columnwise-quantize-transpose

nastya236 changed the title ~~Columnwise quantize transpose~~ Columnwise quantize Jan 12, 2026

nastya236 added 11 commits January 14, 2026 22:39

refactoring

0bb2c22

return large arg

e07ce60

return large arg

1085cda

Revert "return large arg"

f53a6ef

This reverts commit 1085cda.

missed pragma unroll

e475b45

fix output shape in primitive

7f0d61a

Merge branch 'main' into columnwise-quantize-transpose

30088cd

return large arg

bbea65d

Revert "return large arg"

79cd128

This reverts commit bbea65d.

ensure row/column contiguous with flags

da2bdd6

ensure row/column contiguous with flags

ff753c0

nastya236 marked this pull request as ready for review January 14, 2026 23:13

Merge branch 'ml-explore:main' into columnwise-quantize-transpose

1890263

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Columnwise quantize #2989

Columnwise quantize #2989

nastya236 commented Jan 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Columnwise quantize #2989

Are you sure you want to change the base?

Columnwise quantize #2989

Conversation

nastya236 commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nastya236 commented Jan 12, 2026 •

edited

Loading