forked from TheTom/llama-cpp-turboquant
-
Notifications
You must be signed in to change notification settings - Fork 26
Pull requests: AtomicBot-ai/atomic-llama-cpp-turboquant
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Phase C.2 dispatch behavior: MTP+mmproj coexistence behind --allow-mtp-with-mmproj (5th first-in-world)
examples
server
testing
#19
opened May 21, 2026 by
WillowOneVision
Loading…
4 of 9 tasks
Phase C.2 foundational APIs: server_tokens coexistence + common_speculative_reset
examples
server
testing
#18
opened May 21, 2026 by
WillowOneVision
Loading…
5 of 6 tasks
fix(server): SEGV when --mtp-head + --mmproj are both passed
examples
server
#17
opened May 21, 2026 by
WillowOneVision
Loading…
4 of 5 tasks
ggml: ARM NEON dequant kernel for turbo4 (vqtbl4q_u8 4-bit PolarQuant)
ggml
#16
opened May 21, 2026 by
WillowOneVision
Loading…
5 tasks done
feat(wasm): add tools/wasm/ Emscripten entrypoint for browser-resident inference
AMD ZenDNN
Apple Metal
Ascend NPU
build
devops
documentation
Improvements or additions to documentation
examples
ggml
Hexagon
IBM zDNN
jinja parser
model
nix
Nvidia GPU
OpenCL
OpenVINO
python
script
server
SYCL
testing
Vulkan
WebGPU
#15
opened May 17, 2026 by
wordingone
Loading…
fix: add missing prototype for turbo_cpu_fwht_inverse to resolve -Wmissing-prototypes CI error
ggml
#12
opened May 13, 2026 by
sujitvasanth
Loading…
feat: one-sided target probability acceptance for MTP drafts increases acceptance rate and throughput compared to argmax alone
examples
ggml
server
#8
opened May 11, 2026 by
sujitvasanth
Loading…
Enhance CUDA flash attention kernel selection for DKQ=512 with low gq…
ggml
Nvidia GPU
#6
opened May 8, 2026 by
Ooooze
Loading…
Repro: MTP path on CUDA aborts at fattn.cu:109 (DKQ=512) for Gemma 4 — Blackwell sm_120 + Ampere sm_86
documentation
Improvements or additions to documentation
#5
opened May 8, 2026 by
jameseiten
•
Draft
ProTip!
What’s not been updated in a month: updated:<2026-04-23.