convert : lock MiniCPM-V 4.6 chat_template default enable_thinking in…#22963
convert : lock MiniCPM-V 4.6 chat_template default enable_thinking in…#22963tc-mb wants to merge 2 commits into
Conversation
Thanks @CISC, and for the cc. MiniCPM-V 4.6 have two separate checkpoints (instruct and Thinking), The instruct one currently defaults to thinking under llama.cpp because the runtime injection short-circuits the author's We don't have a single model that switches thinking via a switch; instead, we have two models that require the corresponding gguf to be forcibly specified. Perhaps this can provide additional background information. |
Understood, my thinking is that our chat parser should not unilaterally send We should be able to detect a check such as yours and not do that. |
yes, I had the same intuition until I ran into this case. Since the v4.6 GGUFs are already released, would you be open to merging this small convert-side patch first, so users on the existing llama.cpp builds get the correct out-of-box behavior? We can keep iterating on the runtime-side design separately afterwards — no urgency to land both together. For context, I'm one of the MiniCPM-V / MiniCPM-o maintainers and happy to keep helping with anything llama.cpp needs from our side to give the community a clean experience with these models. |
|
The parser respects the |
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Indeed, your observation is accurate; thank you for the reminder. I've made a modification to only fix the instruction version of chat-template. |
Overview
MiniCPM-V 4.6 (instruct) and 4.6-Thinking ship the same chat_template guarded by
{% if enable_thinking is not defined %}, but llama.cpp's runtime always injectsenable_thinking=trueand short-circuits the guard, so the instruct GGUF defaults to thinking mode. At convert time, rewrite the guard into an unconditional top-levelset, which overrides the runtime injection and locks each GGUF to its author-intended default.