Skip to content

bug: loadBackends(backendsPath) skipped when buildGpu === false, silently drops custom GPU backends compiled via NODE_LLAMA_CPP_CMAKE_OPTION_* #599

@Zighy

Description

@Zighy

Issue description

RELATED: #479 (feat: builtin ROCm support)

Expected Behavior

In src/bindings/Llama.ts, the call to loadBackends(backendsPath) is guarded by
buildGpu !== false. When a binary is built with --gpu false, buildGpu is false
and the guard makes the entire block a no-op. Any backend .so placed in the binary's
Release/ directory by a custom cmake build (e.g. libggml-hip.so via GGML_HIP=ON)
is never loaded. Inference silently falls back to CPU with no warning or error.

// src/bindings/Llama.ts — v3.18.1 (affected code)
let loadedGpu = bindings.getGpuType();
if (loadedGpu == null || (loadedGpu === false && buildGpu !== false)) {
    const backendsPath = path.dirname(bindingPath);
    const fallbackBackendsDir = path.join(extBackendsPath ?? backendsPath, "fallback");
    bindings.loadBackends(backendsPath);          // ← never reached when buildGpu === false
    loadedGpu = bindings.getGpuType();
    if (loadedGpu == null || (loadedGpu === false && buildGpu !== false))
        bindings.loadBackends(fallbackBackendsDir);
}

loadBackends(backendsPath) is called regardless of buildGpu, so that custom GPU
backends compiled via NODE_LLAMA_CPP_CMAKE_OPTION_* are loaded and used at runtime.
If no backend initialises, getGpuType() returns false and the existing fallback
path proceeds unchanged.

Actual Behavior

In src/bindings/Llama.ts, the call to loadBackends(backendsPath) is guarded by
buildGpu !== false. When a binary is built with --gpu false, buildGpu is false
and the guard makes the entire block a no-op. Any backend .so placed in the binary's
Release/ directory by a custom cmake build (e.g. libggml-hip.so via GGML_HIP=ON)
is never loaded. Inference silently falls back to CPU with no warning or error.

// src/bindings/Llama.ts — v3.18.1 (affected code)
let loadedGpu = bindings.getGpuType();
if (loadedGpu == null || (loadedGpu === false && buildGpu !== false)) {
    const backendsPath = path.dirname(bindingPath);
    const fallbackBackendsDir = path.join(extBackendsPath ?? backendsPath, "fallback");
    bindings.loadBackends(backendsPath);          // ← never reached when buildGpu === false
    loadedGpu = bindings.getGpuType();
    if (loadedGpu == null || (loadedGpu === false && buildGpu !== false))
        bindings.loadBackends(fallbackBackendsDir);
}

loadBackends(backendsPath) is never called when the binary was built with --gpu false.
llama.gpu is false even when a valid GPU backend (e.g. libggml-hip.so) was
compiled into the binary directory. Inference runs on CPU.

Steps to reproduce

# 1. Set cmake options to compile a custom GPU backend
export NODE_LLAMA_CPP_CMAKE_OPTION_GGML_HIP=ON
export NODE_LLAMA_CPP_CMAKE_OPTION_AMDGPU_TARGETS=gfx1200

# 2. Build with --gpu false
node node-llama-cpp/dist/cli/cli.js source download --gpu false --noUsageExample

# 3. Confirm the backend .so was compiled
find ~/.cache/node-llama-cpp -name "libggml-hip.so"
# → file exists in Release/

# 4. Check llama.gpu at runtime
node -e "
const { getLlama } = require('node-llama-cpp');
getLlama({ gpu: false }).then(l => console.log('gpu:', l.gpu));
"
# Expected: gpu: cuda   (ROCm maps its device names to "cuda" internally)
# Actual:   gpu: false  (libggml-hip.so was never loaded)

My Environment

node-llama-cpp 3.18.1
llama.cpp release b8390
Node.js 22.22.2
OS Ubuntu 24.04.4 LTS (Docker, rocm/dev-ubuntu-24.04:latest)
GPU AMD RX 9060 XT — gfx1200 (RDNA 4)
ROCm 7.2.2

Additional Context

The buildGpu !== false guard is redundant: loadBackends(backendsPath) already has
no effect if no backend is found — getGpuType() simply returns false again and the
fallback path proceeds. The guard only prevents the probe from being attempted.

Proposed fix — remove buildGpu !== false from both checks:

let loadedGpu = bindings.getGpuType();
if (loadedGpu == null || loadedGpu === false) {
    const backendsPath = path.dirname(bindingPath);
    const fallbackBackendsDir = path.join(extBackendsPath ?? backendsPath, "fallback");
    bindings.loadBackends(backendsPath);
    loadedGpu = bindings.getGpuType();
    if (loadedGpu == null || loadedGpu === false)
        bindings.loadBackends(fallbackBackendsDir);
}

This fix is a prerequisite for any --gpu false + cmake workaround for ROCm/HIP while
native support is pending (#479). It also affects any other custom GPU backend injected
via NODE_LLAMA_CPP_CMAKE_OPTION_* on non-NVIDIA/non-Apple hardware.

Relevant Features Used

  • Metal support
  • CUDA support
  • Vulkan support
  • Grammar
  • Function calling

Are you willing to resolve this issue by submitting a Pull Request?

No, I don’t have the time and I’m okay to wait for the community / maintainers to resolve this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingrequires triageRequires triaging

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions