bare-llama

Native llama.cpp bindings for Bare.

Run LLM inference directly in your Bare JavaScript applications with GPU acceleration support.

Requirements

CMake 3.25+
C/C++ compiler (clang, gcc, or MSVC)
Node.js (for npm/cmake-bare)
Bare runtime

Building

Clone with submodules:

git clone --recursive https://github.com/user/bare-llama.cpp
cd bare-llama.cpp

Or if already cloned:

git submodule update --init --recursive

Install dependencies and build:

npm install
npm run prebuild

Or manually:

bare-make generate
bare-make build
bare-make install

This creates prebuilds/<platform>-<arch>/bare-llama.bare.

Build Options

For a debug build:

bare-make generate -- -D CMAKE_BUILD_TYPE=Debug
bare-make build

To disable GPU acceleration:

bare-make generate -- -D GGML_METAL=OFF -D GGML_CUDA=OFF
bare-make build

Usage

const { LlamaModel, LlamaContext, LlamaSampler, generate } = require('bare-llama')

// Load model (GGUF format)
const model = new LlamaModel('./model.gguf', {
  nGpuLayers: 99  // Offload layers to GPU (0 = CPU only)
})

// Create context
const ctx = new LlamaContext(model, {
  contextSize: 2048,  // Max context length
  batchSize: 512      // Batch size for prompt processing
})

// Create sampler
const sampler = new LlamaSampler({
  temp: 0.7,    // Temperature (0 = greedy)
  topK: 40,     // Top-K sampling
  topP: 0.95    // Top-P (nucleus) sampling
})

// Generate text
const output = generate(model, ctx, sampler, 'The meaning of life is', 128)
console.log(output)

// Cleanup
sampler.free()
ctx.free()
model.free()

See examples/ for more:

basic.js - Simple generation using the high-level API
lowlevel.js - Token-by-token generation with full control

Run examples with:

bare examples/basic.js -- /path/to/model.gguf

API Reference

LlamaModel

new LlamaModel(path, options?)

Option	Type	Default	Description
`nGpuLayers`	number	0	Number of layers to offload to GPU

Methods:

tokenize(text, addBos?) - Convert text to tokens (Int32Array)
detokenize(tokens) - Convert tokens back to text
isEogToken(token) - Check if token is end-of-generation
free() - Release model resources

LlamaContext

new LlamaContext(model, options?)

Option	Type	Default	Description
`contextSize`	number	512	Maximum context length
`batchSize`	number	512	Batch size for processing

Methods:

decode(tokens) - Process tokens through the model
free() - Release context resources

LlamaSampler

new LlamaSampler(options?)

Option	Type	Default	Description
`temp`	number	0	Temperature (0 = greedy sampling)
`topK`	number	40	Top-K sampling parameter
`topP`	number	0.95	Top-P (nucleus) sampling parameter

Methods:

sample(ctx, idx) - Sample next token (-1 for last position)
accept(token) - Accept token into sampler state
free() - Release sampler resources

generate()

generate(model, ctx, sampler, prompt, maxTokens?)

Convenience function for simple text generation. Returns the generated text (not including the prompt).

Models

This addon works with GGUF format models. You can find models at:

Example models to try:

TinyLlama-1.1B-Chat-v1.0.Q4_K_M.gguf - Small, fast
Mistral-7B-Instruct-v0.2.Q4_K_M.gguf - Good quality
Llama-2-13B-chat.Q4_K_M.gguf - Higher quality

Platform Support

Platform	Architecture	GPU Support
macOS	arm64, x64	Metal
Linux	x64, arm64	CUDA (if available)
Windows	x64	CUDA (if available)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
examples		examples
vendor		vendor
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md
binding.cpp		binding.cpp
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

bare-llama

Requirements

Building

Build Options

Usage

API Reference

LlamaModel

LlamaContext

LlamaSampler

generate()

Models

Platform Support

License

About

Uh oh!

Releases

Packages

Languages

CameronTofer/bare-llama.cpp

Folders and files

Latest commit

History

Repository files navigation

bare-llama

Requirements

Building

Build Options

Usage

API Reference

LlamaModel

LlamaContext

LlamaSampler

generate()

Models

Platform Support

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages