Issue description
qmd query fails with CUDA error ggml-cuda.cu:98 despite GPU being available
Expected Behavior
Title: qmd query fails with CUDA error ggml-cuda.cu:98 despite GPU being available
Description:
After following the troubleshooting steps from node-llama-cpp issue #577 (removing node_modules and reinstalling), node-llama-cpp can correctly detect GPU:
Actual Behavior
Root cause analysis:
The issue appears to be in qmd's llm.js file. When calling llama.loadModel({ modelPath }), it doesn't specify the gpuLayers parameter, which causes the model to load on CPU while some CUDA initialization code is still triggered.
Solution needed:
In llm.js, the loadModel call should include GPU configuration:
const model = await llama.loadModel({
modelPath,
gpuLayers: 999 // or appropriate value based on available VRAM
});
Environment:
Windows 10 x64
node-llama-cpp v3.18.1
@tobilu/qmd latest
NVIDIA RTX 5090 with 24GB VRAM
CUDA 13.2, Driver 596.21
Workaround:
BM25 search (qmd search) works fine and is recommended for now.
### Steps to reproduce
l
### My Environment
Environment:
Windows 10 x64
node-llama-cpp v3.18.1
@tobilu/qmd latest
NVIDIA RTX 5090 with 24GB VRAM
CUDA 13.2, Driver 596.21
Workaround:
BM25 search (qmd search) works fine and is recommended for now.
### Additional Context
_No response_
### Relevant Features Used
- [x] Metal support
- [x] CUDA support
- [x] Vulkan support
- [x] Grammar
- [x] Function calling
### Are you willing to resolve this issue by submitting a Pull Request?
Yes, I have the time, and I know how to start.
Issue description
qmd query fails with CUDA error ggml-cuda.cu:98 despite GPU being available
Expected Behavior
Title: qmd query fails with CUDA error ggml-cuda.cu:98 despite GPU being available
Description:
After following the troubleshooting steps from node-llama-cpp issue #577 (removing node_modules and reinstalling), node-llama-cpp can correctly detect GPU:
Actual Behavior
Root cause analysis:
The issue appears to be in qmd's llm.js file. When calling
llama.loadModel({ modelPath }), it doesn't specify thegpuLayersparameter, which causes the model to load on CPU while some CUDA initialization code is still triggered.Solution needed:
In llm.js, the loadModel call should include GPU configuration: