quickfill.nvim

Quick code infill suggestions by combining a local llama.cpp server with active LSP servers.

Features

Local AI Inference: Uses llama.cpp for low latency, on-device inference == no data leaves your machine.
LSP-Backed Context: Leverages your existing LSP servers for rich context (completions & signatures).
Prompt Caching: Caches suggestions for repeated contexts to reduce latency.
Cross-file Context Chunks: Automatically extracts and includes relevant code snippets from your project files.
Git-Aware: Respects .gitignore for context extraction.

Installation

vim.pack.add "https://github.com/davkk/quickfill.nvim"

-- no need to call setup!

-- the plugin uses `<Plug>` mappings for flexibility
-- you can map them to your preferred keys like this:
vim.keymap.set("i", "<C-y>", "<Plug>(quickfill-accept)")         -- accept full suggestion
vim.keymap.set("i", "<C-k>", "<Plug>(quickfill-accept-word)")    -- accept next word
vim.keymap.set("i", "<C-x>", "<Plug>(quickfill-trigger)")        -- trigger fresh infill request

Configuration

Customize behavior via vim.g.quickfill.

Defaults are used if not set explicitly:

vim.g.quickfill = {
    url = "http://localhost:8012",          -- llama.cpp server URL

    n_predict = 8,                          -- max tokens to predict
    top_k = 30,                             -- top-k sampling
    top_p = 0.4,                            -- top-p sampling
    repeat_penalty = 1.5,                   -- repeat penalty

    stop_chars = { "\n", "\r", "\r\n" },    -- stop characters
    stop_on_trigger_char = true,            -- stop on trigger chars defined by LSP server

    n_prefix = 16,                          -- prefix context lines
    n_suffix = 8,                           -- suffix context lines

    max_cache_entries = 32,                 -- max cache entries

    extra_chunks = false,                   -- enable extra project chunks
    max_extra_chunks = 4,                   -- max extra chunks
    chunk_lines = 16,                       -- lines per chunk

    lsp_completion = true,                  -- enable LSP completions
    max_lsp_completion_items = 15,          -- max LSP completion items

    lsp_signature_help = false,             -- enable signature help
}

Local Inference Server Setup

Before using the plugin, make sure to have a llama.cpp server running.

Here's an example command to start the server in the background:

llama-server \
    -hf bartowski/Qwen2.5-Coder-0.5B-GGUF:Q4_0 \
    --n-gpu-layers 99 \
    --threads 8 \
    --ctx-size 0 \
    --flash-attn on \
    --mlock \
    --cache-reuse 256 \
    --verbose \
    --host localhost \
    --port 8012

This starts the server on http://localhost:8012 with optimized settings for the Qwen2.5-Coder-0.5B model. Adjust the host and port as needed.

Commands

start plugin with :AI start or :AI
stop plugin with :AI stop

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
after/plugin		after/plugin
doc		doc
lua		lua
.busted		.busted
.gitignore		.gitignore
.luarc.json		.luarc.json
.stylua.toml		.stylua.toml
README.md		README.md
llama.sh		llama.sh
quickfill-scm-1.rockspec		quickfill-scm-1.rockspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

quickfill.nvim

Features

Installation

Configuration

Local Inference Server Setup

Commands

About

Uh oh!

Releases

Packages

Languages

davkk/quickfill.nvim

Folders and files

Latest commit

History

Repository files navigation

quickfill.nvim

Features

Installation

Configuration

Local Inference Server Setup

Commands

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages