convert : lock MiniCPM-V 4.6 chat_template default enable_thinking in… by tc-mb · Pull Request #22963 · ggml-org/llama.cpp

tc-mb · 2026-05-12T07:55:04Z

Overview

MiniCPM-V 4.6 (instruct) and 4.6-Thinking ship the same chat_template guarded by {% if enable_thinking is not defined %}, but llama.cpp's runtime always injects enable_thinking=true and short-circuits the guard, so the instruct GGUF defaults to thinking mode. At convert time, rewrite the guard into an unconditional top-level set, which overrides the runtime injection and locks each GGUF to its author-intended default.

…to the GGUF

CISC · 2026-05-12T08:03:34Z

This is not the correct fix, the chat parser should probably detect such a check. cc/ @aldehir @pwilkin

You can stop it from setting this with --reasoning off BTW.

tc-mb · 2026-05-12T08:12:12Z

This is not the correct fix, the chat parser should probably detect such a check. cc/ @aldehir @pwilkin

You can stop it from setting this with --reasoning off BTW.

Thanks @CISC, and for the cc.

MiniCPM-V 4.6 have two separate checkpoints (instruct and Thinking), The instruct one currently defaults to thinking under llama.cpp because the runtime injection short-circuits the author's {% if enable_thinking is not defined %} header — that's the regression I need to fix in the artifact users actually download.

We don't have a single model that switches thinking via a switch; instead, we have two models that require the corresponding gguf to be forcibly specified.

Perhaps this can provide additional background information.

CISC · 2026-05-12T08:48:13Z

We don't have a single model that switches thinking via a switch; instead, we have two models that require the corresponding gguf to be forcibly specified.

Understood, my thinking is that our chat parser should not unilaterally send enable_thinking unless user-specified, however I think it does so now (if enable_thinking is detected in template?) because it was deemed harmless.

We should be able to detect a check such as yours and not do that.

tc-mb · 2026-05-12T09:02:57Z

We don't have a single model that switches thinking via a switch; instead, we have two models that require the corresponding gguf to be forcibly specified.

Understood, my thinking is that our chat parser should not unilaterally send enable_thinking unless user-specified, however I think it does so now (if enable_thinking is detected in template?) because it was deemed harmless.

We should be able to detect a check such as yours and not do that.

yes, I had the same intuition until I ran into this case.

Since the v4.6 GGUFs are already released, would you be open to merging this small convert-side patch first, so users on the existing llama.cpp builds get the correct out-of-box behavior? We can keep iterating on the runtime-side design separately afterwards — no urgency to land both together.

For context, I'm one of the MiniCPM-V / MiniCPM-o maintainers and happy to keep helping with anything llama.cpp needs from our side to give the community a clean experience with these models.

pwilkin · 2026-05-12T09:05:13Z

The parser respects the --reasoning flag for this, but the best approach would be to simply change the default chat template on the instruct model to the non-reasoning branch in the GGUF.

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

tc-mb · 2026-05-12T09:21:38Z

The parser respects the --reasoning flag for this, but the best approach would be to simply change the default chat template on the instruct model to the non-reasoning branch in the GGUF.

Indeed, your observation is accurate; thank you for the reminder.

I've made a modification to only fix the instruction version of chat-template.

convert : lock MiniCPM-V 4.6 chat_template default enable_thinking in…

944b571

…to the GGUF

tc-mb requested a review from CISC as a code owner May 12, 2026 07:55

github-actions Bot added the python python script changes label May 12, 2026

only lock MiniCPM-V 4.6 instruct chat_template

ff0e169

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert : lock MiniCPM-V 4.6 chat_template default enable_thinking in…#22963

convert : lock MiniCPM-V 4.6 chat_template default enable_thinking in…#22963
tc-mb wants to merge 2 commits into
ggml-org:masterfrom
tc-mb:feat/minicpmv4_6-lock-chat-template-default

tc-mb commented May 12, 2026

Uh oh!

CISC commented May 12, 2026

Uh oh!

tc-mb commented May 12, 2026

Uh oh!

CISC commented May 12, 2026

Uh oh!

tc-mb commented May 12, 2026

Uh oh!

pwilkin commented May 12, 2026

Uh oh!

tc-mb commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tc-mb commented May 12, 2026

Overview

Uh oh!

CISC commented May 12, 2026

Uh oh!

tc-mb commented May 12, 2026

Uh oh!

CISC commented May 12, 2026

Uh oh!

tc-mb commented May 12, 2026

Uh oh!

pwilkin commented May 12, 2026

Uh oh!

tc-mb commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants