-
Notifications
You must be signed in to change notification settings - Fork 254
Description
Hi, I've implemented more quant types IQ1_S, IQ1_M, IQ2_XXS, IQ2_S, IQ3_XXS, IQ3_S in torch at https://github.com/woct0rdho/transformers-qwen3-moe-fused/blob/master/qwen3_moe_fused/quantize_gguf/dequant.py
Also I've found that torch.compile cannot handle some complicated view operations, and it silently gives wrong numbers or nan. I've added view_float16 in dequant functions like Q6_K to work around it, and rewritten the IQ4_XS dequant function. Now all dequant functions work with torch.compile correctly. Later we should find some minimal reproducers and report to PyTorch.
There is a unit test at https://github.com/woct0rdho/transformers-qwen3-moe-fused/blob/master/test_gguf_dequant.py
I'm not sure if we're interested in quantizing diffusion models into IQ3 or smaller quants, but this at least allows us to load some existing LLMs as text encoders, especially for Unsloth UD quants. I can make a PR to this repo if needed.