Skip to content

[Feature Request] Support for Tencent Hy-MT2-30B (Requires upstream STQ kernel PR #22836)? #134

@qqba

Description

@qqba

Hi @JamePeng ,

First of all, thank you so much for maintaining these pre-built wheels! They save Windows users a lot of time and effort. πŸ‘

I am currently trying to use the newly released Tencent Hunyuan translation model Hy-MT2-30B-A3B-GGUF. While the 7B version loads and runs perfectly with the current package, loading the 30B version results in a ValueError: Failed to load model from file crash.

According to the official model card on Hugging Face, the 30B GGUF model relies on a new Sparse Tensor Quantization (STQ) kernel, which was recently merged into the upstream llama.cpp repository:

❕❕ This gguf depends on our STQ kernel, which is released at PR #22836. (ggml-org/llama.cpp#22836)

My Request:
Could you please consider syncing the upstream llama.cpp (which now includes this STQ kernel PR) in the next release of your llama-cpp-python wheels? Many users in the community are looking forward to running this powerful 30B translation model locally via your packages.

Thank you again for your hard work and great contributions!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions