-
Notifications
You must be signed in to change notification settings - Fork 18.2k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
server, webui: accept continue_final_message flag for vLLM API compat
#23012
opened May 13, 2026 by
ServeurpersoCom
Contributor
Loading…
ggml-cpu : fix riscv xtheadvector builds and add a q1_0 vec dot kernel
ggml
changes relating to the ggml tensor library for machine learning
#23009
opened May 13, 2026 by
xctan
Collaborator
Loading…
ggml-cuda: auto apply iGPU flag for CUDA/HIP if integrated device
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#23007
opened May 13, 2026 by
fl0rianr
Contributor
Loading…
ggml-webgpu: Support GPU profiling beyond the maximum query count
ggml
changes relating to the ggml tensor library for machine learning
WebGPU
#22995
opened May 13, 2026 by
yomaytk
Contributor
Loading…
Fix for issue #22974. Cast intermediate results to float before adding.
ggml
changes relating to the ggml tensor library for machine learning
merge ready
A maintainer can use this label to indicate that they consider the changes final and ready to merge.
Nvidia GPU
Issues specific to Nvidia GPUs
#22994
opened May 13, 2026 by
scutler-nv
Contributor
Loading…
opencl: add q5_0 and q5_1 MoE for Adreno
ggml
changes relating to the ggml tensor library for machine learning
OpenCL
Issues specific to the OpenCL backend
#22985
opened May 12, 2026 by
shaofeiqi
Contributor
Loading…
ggml-webgpu: Enable NVIDIA self-hosted CI
devops
improvements to build systems and github actions
ggml
changes relating to the ggml tensor library for machine learning
testing
Everything test related
WebGPU
#22976
opened May 12, 2026 by
reeselevine
Contributor
•
Draft
vulkan : transpose A-matrix data layout for K-quant mul_mat performance
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#22970
opened May 12, 2026 by
Alex-JP-93
Loading…
[SYCL] update by stable version of compute-runtime
devops
improvements to build systems and github actions
#22968
opened May 12, 2026 by
arthw
Contributor
Loading…
convert : lock MiniCPM-V 4.6 chat_template default enable_thinking in…
python
python script changes
#22963
opened May 12, 2026 by
tc-mb
Contributor
Loading…
server : emit empty input field in anthropic streaming tool_use content_block_start
examples
server
#22960
opened May 12, 2026 by
Biilow-Bailang
Loading…
vulkan: Pad Q3_K/Q6_K tensors out to 32-bit alignment
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#22951
opened May 11, 2026 by
TheBlueMatt
Contributor
Loading…
[Tensor Parallel] Enable Auto parameter fitting in split-mode tensor
documentation
Improvements or additions to documentation
#22950
opened May 11, 2026 by
gaugarg-nv
Contributor
Loading…
hexagon: fix OpenCL not found error when building hexagon backend
documentation
Improvements or additions to documentation
#22946
opened May 11, 2026 by
Russyyds
Contributor
Loading…
ggml-cpu: avoid treating all host RAM as free
ggml
changes relating to the ggml tensor library for machine learning
#22939
opened May 11, 2026 by
fl0rianr
Contributor
Loading…
webui: Move static build output from repo code to HF Bucket
build
Compilation issues
devops
improvements to build systems and github actions
examples
script
Script related
server/webui
server
#22937
opened May 11, 2026 by
allozaur
Contributor
Loading…
tests: support multi-op perf groups in test-backend-ops
testing
Everything test related
#22934
opened May 11, 2026 by
zzzzwc
Contributor
Loading…
vulkan: opt mul_mat_vecq for mi50
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#22933
opened May 11, 2026 by
chraac
Contributor
Loading…
UMA buffers prefer host-visible memory
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#22930
opened May 11, 2026 by
winstonma
Loading…
server: fix checkpoints creation
examples
server
testing
Everything test related
#22929
opened May 11, 2026 by
jacekpoplawski
Contributor
Loading…
kv-cache: use changes relating to the ggml tensor library for machine learning
-t threads for IQ4 packing from ggml code
ggml
#22928
opened May 11, 2026 by
shikaku2
Loading…
common: improve --fit host-memory accounting for CPU and iGPU
ggml
changes relating to the ggml tensor library for machine learning
#22922
opened May 10, 2026 by
fl0rianr
Contributor
Loading…
webui: fix theme from --webui-config-file not applied on first load (fresh localStorage)
examples
server/webui
server
#22902
opened May 10, 2026 by
ServeurpersoCom
Contributor
Loading…
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.