bug: qmd query fails with CUDA error ggml-cuda.cu:98 despite GPU being available

### Issue description

qmd query fails with CUDA error ggml-cuda.cu:98 despite GPU being available

### Expected Behavior

Title: qmd query fails with CUDA error ggml-cuda.cu:98 despite GPU being available

Description:
After following the troubleshooting steps from node-llama-cpp issue #577 (removing node_modules and reinstalling), node-llama-cpp can correctly detect GPU:

### Actual Behavior

Root cause analysis:
The issue appears to be in qmd's llm.js file. When calling `llama.loadModel({ modelPath })`, it doesn't specify the `gpuLayers` parameter, which causes the model to load on CPU while some CUDA initialization code is still triggered.

Solution needed:
In llm.js, the loadModel call should include GPU configuration:
```javascript
const model = await llama.loadModel({ 
    modelPath,
    gpuLayers: 999  // or appropriate value based on available VRAM
});


Environment:
Windows 10 x64
node-llama-cpp v3.18.1
@tobilu/qmd latest
NVIDIA RTX 5090 with 24GB VRAM
CUDA 13.2, Driver 596.21
Workaround:
BM25 search (qmd search) works fine and is recommended for now.

### Steps to reproduce

l

### My Environment

Environment:
Windows 10 x64
node-llama-cpp v3.18.1
@tobilu/qmd latest
NVIDIA RTX 5090 with 24GB VRAM
CUDA 13.2, Driver 596.21
Workaround:
BM25 search (qmd search) works fine and is recommended for now.

### Additional Context

_No response_

### Relevant Features Used

- [x] Metal support
- [x] CUDA support
- [x] Vulkan support
- [x] Grammar
- [x] Function calling

### Are you willing to resolve this issue by submitting a Pull Request?

Yes, I have the time, and I know how to start.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: qmd query fails with CUDA error ggml-cuda.cu:98 despite GPU being available #598

Issue description

Expected Behavior

Actual Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

bug: qmd query fails with CUDA error ggml-cuda.cu:98 despite GPU being available #598

Description

Issue description

Expected Behavior

Actual Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions