More IQ quant types and fix for torch.compile

Hi, I've implemented more quant types `IQ1_S, IQ1_M, IQ2_XXS, IQ2_S, IQ3_XXS, IQ3_S` in torch at https://github.com/woct0rdho/transformers-qwen3-moe-fused/blob/master/qwen3_moe_fused/quantize_gguf/dequant.py

Also I've found that `torch.compile` cannot handle some complicated `view` operations, and it silently gives wrong numbers or nan. I've added `view_float16` in dequant functions like Q6_K to work around it, and rewritten the IQ4_XS dequant function. Now all dequant functions work with `torch.compile` correctly. Later we should find some minimal reproducers and report to PyTorch.

There is a unit test at https://github.com/woct0rdho/transformers-qwen3-moe-fused/blob/master/test_gguf_dequant.py

I'm not sure if we're interested in quantizing diffusion models into IQ3 or smaller quants, but this at least allows us to load some existing LLMs as text encoders, especially for Unsloth UD quants. I can make a PR to this repo if needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More IQ quant types and fix for torch.compile #405

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

More IQ quant types and fix for torch.compile #405

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions