Skip to content

[CUDA] Branchless NF4/FP4 kDequantizeBlockwise kernel for faster dequantization#1746

Merged
matthewdouglas merged 3 commits intobitsandbytes-foundation:mainfrom
Mhmd-Hisham:cuda-branchless-dequantization-float32-lut
Sep 18, 2025
Merged

[CUDA] Branchless NF4/FP4 kDequantizeBlockwise kernel for faster dequantization#1746
matthewdouglas merged 3 commits intobitsandbytes-foundation:mainfrom
Mhmd-Hisham:cuda-branchless-dequantization-float32-lut

Commits

Commits on Sep 6, 2025