Hi @JamePeng ,
First of all, thank you so much for maintaining these pre-built wheels! They save Windows users a lot of time and effort. π
I am currently trying to use the newly released Tencent Hunyuan translation model Hy-MT2-30B-A3B-GGUF. While the 7B version loads and runs perfectly with the current package, loading the 30B version results in a ValueError: Failed to load model from file crash.
According to the official model card on Hugging Face, the 30B GGUF model relies on a new Sparse Tensor Quantization (STQ) kernel, which was recently merged into the upstream llama.cpp repository:
ββ This gguf depends on our STQ kernel, which is released at PR #22836. (ggml-org/llama.cpp#22836)
My Request:
Could you please consider syncing the upstream llama.cpp (which now includes this STQ kernel PR) in the next release of your llama-cpp-python wheels? Many users in the community are looking forward to running this powerful 30B translation model locally via your packages.
Thank you again for your hard work and great contributions!
Hi @JamePeng ,
First of all, thank you so much for maintaining these pre-built wheels! They save Windows users a lot of time and effort. π
I am currently trying to use the newly released Tencent Hunyuan translation model Hy-MT2-30B-A3B-GGUF. While the 7B version loads and runs perfectly with the current package, loading the 30B version results in a
ValueError: Failed to load model from filecrash.According to the official model card on Hugging Face, the 30B GGUF model relies on a new Sparse Tensor Quantization (STQ) kernel, which was recently merged into the upstream
llama.cpprepository:My Request:
Could you please consider syncing the upstream
llama.cpp(which now includes this STQ kernel PR) in the next release of yourllama-cpp-pythonwheels? Many users in the community are looking forward to running this powerful 30B translation model locally via your packages.Thank you again for your hard work and great contributions!