Skip to content

More IQ quant types and fix for torch.compile #405

@woct0rdho

Description

@woct0rdho

Hi, I've implemented more quant types IQ1_S, IQ1_M, IQ2_XXS, IQ2_S, IQ3_XXS, IQ3_S in torch at https://github.com/woct0rdho/transformers-qwen3-moe-fused/blob/master/qwen3_moe_fused/quantize_gguf/dequant.py

Also I've found that torch.compile cannot handle some complicated view operations, and it silently gives wrong numbers or nan. I've added view_float16 in dequant functions like Q6_K to work around it, and rewritten the IQ4_XS dequant function. Now all dequant functions work with torch.compile correctly. Later we should find some minimal reproducers and report to PyTorch.

There is a unit test at https://github.com/woct0rdho/transformers-qwen3-moe-fused/blob/master/test_gguf_dequant.py

I'm not sure if we're interested in quantizing diffusion models into IQ3 or smaller quants, but this at least allows us to load some existing LLMs as text encoders, especially for Unsloth UD quants. I can make a PR to this repo if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions