Skip to content

bug: qmd query fails with CUDA error ggml-cuda.cu:98 despite GPU being available #598

@wzgrx

Description

@wzgrx

Issue description

qmd query fails with CUDA error ggml-cuda.cu:98 despite GPU being available

Expected Behavior

Title: qmd query fails with CUDA error ggml-cuda.cu:98 despite GPU being available

Description:
After following the troubleshooting steps from node-llama-cpp issue #577 (removing node_modules and reinstalling), node-llama-cpp can correctly detect GPU:

Actual Behavior

Root cause analysis:
The issue appears to be in qmd's llm.js file. When calling llama.loadModel({ modelPath }), it doesn't specify the gpuLayers parameter, which causes the model to load on CPU while some CUDA initialization code is still triggered.

Solution needed:
In llm.js, the loadModel call should include GPU configuration:

const model = await llama.loadModel({ 
    modelPath,
    gpuLayers: 999  // or appropriate value based on available VRAM
});


Environment:
Windows 10 x64
node-llama-cpp v3.18.1
@tobilu/qmd latest
NVIDIA RTX 5090 with 24GB VRAM
CUDA 13.2, Driver 596.21
Workaround:
BM25 search (qmd search) works fine and is recommended for now.

### Steps to reproduce

l

### My Environment

Environment:
Windows 10 x64
node-llama-cpp v3.18.1
@tobilu/qmd latest
NVIDIA RTX 5090 with 24GB VRAM
CUDA 13.2, Driver 596.21
Workaround:
BM25 search (qmd search) works fine and is recommended for now.

### Additional Context

_No response_

### Relevant Features Used

- [x] Metal support
- [x] CUDA support
- [x] Vulkan support
- [x] Grammar
- [x] Function calling

### Are you willing to resolve this issue by submitting a Pull Request?

Yes, I have the time, and I know how to start.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingrequires triageRequires triaging

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions