Skip to content

convert : lock MiniCPM-V 4.6 chat_template default enable_thinking in…#22963

Open
tc-mb wants to merge 2 commits into
ggml-org:masterfrom
tc-mb:feat/minicpmv4_6-lock-chat-template-default
Open

convert : lock MiniCPM-V 4.6 chat_template default enable_thinking in…#22963
tc-mb wants to merge 2 commits into
ggml-org:masterfrom
tc-mb:feat/minicpmv4_6-lock-chat-template-default

Conversation

@tc-mb
Copy link
Copy Markdown
Contributor

@tc-mb tc-mb commented May 12, 2026

Overview

MiniCPM-V 4.6 (instruct) and 4.6-Thinking ship the same chat_template guarded by {% if enable_thinking is not defined %}, but llama.cpp's runtime always injects enable_thinking=true and short-circuits the guard, so the instruct GGUF defaults to thinking mode. At convert time, rewrite the guard into an unconditional top-level set, which overrides the runtime injection and locks each GGUF to its author-intended default.

@tc-mb tc-mb requested a review from CISC as a code owner May 12, 2026 07:55
@github-actions github-actions Bot added the python python script changes label May 12, 2026
@CISC
Copy link
Copy Markdown
Member

CISC commented May 12, 2026

This is not the correct fix, the chat parser should probably detect such a check. cc/ @aldehir @pwilkin

You can stop it from setting this with --reasoning off BTW.

@tc-mb
Copy link
Copy Markdown
Contributor Author

tc-mb commented May 12, 2026

This is not the correct fix, the chat parser should probably detect such a check. cc/ @aldehir @pwilkin

You can stop it from setting this with --reasoning off BTW.

Thanks @CISC, and for the cc.

MiniCPM-V 4.6 have two separate checkpoints (instruct and Thinking), The instruct one currently defaults to thinking under llama.cpp because the runtime injection short-circuits the author's {% if enable_thinking is not defined %} header — that's the regression I need to fix in the artifact users actually download.

We don't have a single model that switches thinking via a switch; instead, we have two models that require the corresponding gguf to be forcibly specified.

Perhaps this can provide additional background information.

@CISC
Copy link
Copy Markdown
Member

CISC commented May 12, 2026

We don't have a single model that switches thinking via a switch; instead, we have two models that require the corresponding gguf to be forcibly specified.

Understood, my thinking is that our chat parser should not unilaterally send enable_thinking unless user-specified, however I think it does so now (if enable_thinking is detected in template?) because it was deemed harmless.

We should be able to detect a check such as yours and not do that.

@tc-mb
Copy link
Copy Markdown
Contributor Author

tc-mb commented May 12, 2026

We don't have a single model that switches thinking via a switch; instead, we have two models that require the corresponding gguf to be forcibly specified.

Understood, my thinking is that our chat parser should not unilaterally send enable_thinking unless user-specified, however I think it does so now (if enable_thinking is detected in template?) because it was deemed harmless.

We should be able to detect a check such as yours and not do that.

yes, I had the same intuition until I ran into this case.

Since the v4.6 GGUFs are already released, would you be open to merging this small convert-side patch first, so users on the existing llama.cpp builds get the correct out-of-box behavior? We can keep iterating on the runtime-side design separately afterwards — no urgency to land both together.

For context, I'm one of the MiniCPM-V / MiniCPM-o maintainers and happy to keep helping with anything llama.cpp needs from our side to give the community a clean experience with these models.

@pwilkin
Copy link
Copy Markdown
Member

pwilkin commented May 12, 2026

The parser respects the --reasoning flag for this, but the best approach would be to simply change the default chat template on the instruct model to the non-reasoning branch in the GGUF.

Signed-off-by: tc-mb <tianchi_cai@icloud.com>
@tc-mb
Copy link
Copy Markdown
Contributor Author

tc-mb commented May 12, 2026

The parser respects the --reasoning flag for this, but the best approach would be to simply change the default chat template on the instruct model to the non-reasoning branch in the GGUF.

Indeed, your observation is accurate; thank you for the reminder.

I've made a modification to only fix the instruction version of chat-template.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants