[Feature Request] Support for Tencent Hy-MT2-30B (Requires upstream STQ kernel PR #22836)？

Hi @JamePeng ,

First of all, thank you so much for maintaining these pre-built wheels! They save Windows users a lot of time and effort. 👍

I am currently trying to use the newly released Tencent Hunyuan translation model **Hy-MT2-30B-A3B-GGUF**. While the 7B version loads and runs perfectly with the current package, loading the 30B version results in a `ValueError: Failed to load model from file` crash.

According to the [official model card on Hugging Face](https://huggingface.co/tencent/Hy-MT2-30B-A3B), the 30B GGUF model relies on a new Sparse Tensor Quantization (STQ) kernel, which was recently merged into the upstream `llama.cpp` repository:
> ❕❕ This gguf depends on our STQ kernel, which is released at PR #22836. (ggml-org/llama.cpp#22836)

**My Request:**
Could you please consider syncing the upstream `llama.cpp` (which now includes this STQ kernel PR) in the next release of your `llama-cpp-python` wheels? Many users in the community are looking forward to running this powerful 30B translation model locally via your packages.

Thank you again for your hard work and great contributions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Support for Tencent Hy-MT2-30B (Requires upstream STQ kernel PR #22836)？ #134

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature Request] Support for Tencent Hy-MT2-30B (Requires upstream STQ kernel PR #22836)？ #134

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions