feat: Add support for toggling Qwen3.5 thinking tokens#9
Conversation
Adds a UI toggle in the model config editor to enable or disable thinking
(reasoning) tokens for models like Qwen3.5. Disabling thinking is handled
by dynamically injecting `{% set enable_thinking = false %}` into the raw
Jinja chat template.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new feature that provides users with direct control over the 'thinking tokens' mechanism in models like Qwen3.5. By integrating a UI toggle and corresponding backend logic, it allows for dynamic modification of the chat template, giving users the flexibility to optimize model behavior for specific use cases where reasoning output might be desired or not. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds a feature to toggle 'thinking tokens' for certain models by dynamically modifying the Jinja chat template. The changes correctly propagate from the data model to the UI and ViewModel. The core logic in GGUFEngine.kt prepends the necessary template instruction when the feature is disabled. My main feedback is to address code duplication in GGUFEngine.kt by extracting the template modification logic into a shared helper function to improve code maintainability.
| val template = if (!inference.enableThinking) { | ||
| "{% set enable_thinking = false %}\n" + inference.chatTemplate | ||
| } else { | ||
| inference.chatTemplate | ||
| } | ||
| nativeLib.nativeSetChatTemplate(template) |
There was a problem hiding this comment.
This logic for preparing the chat template is duplicated in the loadFromFd function (lines 123-128). To improve maintainability and avoid code duplication, consider extracting this logic into a private helper function.
For example, you could add a function to the GGUFEngine class:
private fun prepareChatTemplate(chatTemplate: String, enableThinking: Boolean): String {
return if (!enableThinking) {
// Prepend the Jinja variable to disable thinking tokens
"{% set enable_thinking = false %}\n" + chatTemplate
} else {
chatTemplate
}
}Then, you can call this function from both load() and loadFromFd():
if (inference.chatTemplate.isNotEmpty()) {
val template = prepareChatTemplate(inference.chatTemplate, inference.enableThinking)
nativeLib.nativeSetChatTemplate(template)
}This will make the code cleaner and easier to modify in the future.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e058450e4d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| val template = if (!inference.enableThinking) { | ||
| "{% set enable_thinking = false %}\n" + inference.chatTemplate |
There was a problem hiding this comment.
Honor thinking toggle when template is missing
This logic only applies enableThinking inside the chat-template branch, so the new toggle is ignored whenever inference.chatTemplate is empty. In the normal GGUF flow, configs are created from GgufEngineSchema() (ModelDownloadService.kt), and chatTemplate defaults to "" (GgufEngineSchema.kt), so users can disable “Enable Thinking” in the editor but Qwen-style thinking tokens still remain enabled at runtime. The feature effectively becomes a no-op unless a custom template was manually persisted.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Adds a configuration toggle to enable/disable “thinking” (reasoning) tokens for GGUF models (intended for Qwen3.5-style Jinja templates) by prepending a Jinja {% set enable_thinking = false %} directive to the configured chat template.
Changes:
- Adds
enableThinkingtoGgufInferenceParamsand wires it through the config editor ViewModel. - Adds an “Enable Thinking” switch to the GGUF config editor UI.
- Injects the Jinja directive into the chat template passed to the native GGUF engine when thinking is disabled.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| app/src/main/java/com/dark/tool_neuron/viewmodel/ModelConfigEditorViewModel.kt | Adds a ViewModel updater to persist the new GGUF enableThinking flag into state/config. |
| app/src/main/java/com/dark/tool_neuron/ui/screen/ModelConfigEditorScreen.kt | Adds a UI switch to toggle enableThinking for GGUF inference params. |
| app/src/main/java/com/dark/tool_neuron/models/engine_schema/GgufEngineSchema.kt | Extends the serialized GGUF inference schema with enableThinking. |
| app/src/main/java/com/dark/tool_neuron/engine/GGUFEngine.kt | Applies the toggle by conditionally prepending a Jinja directive before setting the chat template in the native engine. |
| val template = if (!inference.enableThinking) { | ||
| "{% set enable_thinking = false %}\n" + inference.chatTemplate | ||
| } else { | ||
| inference.chatTemplate | ||
| } | ||
| nativeLib.nativeSetChatTemplate(template) |
There was a problem hiding this comment.
The template-selection/injection block is duplicated in both load(...) and loadFromFd(...). Since this is easy to accidentally diverge (and will grow if more template flags are added), consider extracting a small helper (e.g., applyChatTemplate(inference)) or computing template via a shared function/constant.
| val template = if (!inference.enableThinking) { | ||
| "{% set enable_thinking = false %}\n" + inference.chatTemplate | ||
| } else { | ||
| inference.chatTemplate | ||
| } | ||
| nativeLib.nativeSetChatTemplate(template) |
There was a problem hiding this comment.
Same duplication as above: the template-selection/injection block is repeated here as well. Extracting shared logic would reduce the chance of future inconsistencies between load(...) and loadFromFd(...).
| SwitchField( | ||
| label = "Enable Thinking", | ||
| description = "Enable reasoning tokens for supported models (e.g., Qwen3.5)", | ||
| checked = ggufConfig.inferenceParams.enableThinking, | ||
| onCheckedChange = { viewModel.updateGgufEnableThinking(it) } | ||
| ) |
There was a problem hiding this comment.
This UI toggle is always shown, but the engine-side behavior only changes anything when a non-empty chatTemplate override is present. Given the current default GGUF config schema uses an empty chatTemplate, users may flip this switch and see no change. Consider disabling/hiding the switch unless a template override is being used, or ensure the model’s built-in chat_template is persisted into the GGUF inference params so the toggle is effective.
| SwitchField( | |
| label = "Enable Thinking", | |
| description = "Enable reasoning tokens for supported models (e.g., Qwen3.5)", | |
| checked = ggufConfig.inferenceParams.enableThinking, | |
| onCheckedChange = { viewModel.updateGgufEnableThinking(it) } | |
| ) | |
| if (ggufConfig.inferenceParams.chatTemplate.isNotBlank()) { | |
| SwitchField( | |
| label = "Enable Thinking", | |
| description = "Enable reasoning tokens for supported models (e.g., Qwen3.5)", | |
| checked = ggufConfig.inferenceParams.enableThinking, | |
| onCheckedChange = { viewModel.updateGgufEnableThinking(it) } | |
| ) | |
| } |
| val chatTemplate: String = "", | ||
| val toolsJson: String = "" // JSON array of tool definitions | ||
| val toolsJson: String = "", // JSON array of tool definitions | ||
| val enableThinking: Boolean = true // Enable/Disable Qwen3.5 thinking tokens |
There was a problem hiding this comment.
Minor formatting/naming clarity: the inline comment for enableThinking doesn’t match the spacing/style used on adjacent fields (two spaces before //) and it hard-codes a single model name (Qwen3.5) even though the flag name is generic. Consider aligning comment formatting with the rest of the file and wording it in a model-agnostic way (e.g., “Enable/disable thinking tokens when supported by the chat template”).
| val enableThinking: Boolean = true // Enable/Disable Qwen3.5 thinking tokens | |
| val enableThinking: Boolean = true // Enable/disable thinking tokens when supported by the chat template |
| if (inference.chatTemplate.isNotEmpty()) { | ||
| nativeLib.nativeSetChatTemplate(inference.chatTemplate) | ||
| val template = if (!inference.enableThinking) { | ||
| "{% set enable_thinking = false %}\n" + inference.chatTemplate | ||
| } else { | ||
| inference.chatTemplate | ||
| } | ||
| nativeLib.nativeSetChatTemplate(template) |
There was a problem hiding this comment.
enableThinking is only applied when inference.chatTemplate is non-empty, because the injection + nativeSetChatTemplate(...) call is gated by if (inference.chatTemplate.isNotEmpty()). In this codebase, new GGUF configs are created from GgufEngineSchema() (which defaults chatTemplate to ""), and there are no other writers for chatTemplate, so this toggle likely has no effect in practice. Consider either persisting the model’s built-in chat_template into modelInferenceParams (so it’s available here), or adding a native-level way to disable thinking without overriding the template, or at minimum removing the isNotEmpty() gate when enableThinking is false and you have a known template to apply.
|
address code reviews and dont comment back |
Adds a UI toggle in the model config editor to enable or disable thinking
(reasoning) tokens for models like Qwen3.5. Disabling thinking is handled
by dynamically injecting `{% set enable_thinking = false %}` into the raw
Jinja chat template.
Adds a UI toggle in the model config editor to enable or disable thinking (reasoning) tokens for models like Qwen3.5. Disabling thinking is handled by dynamically injecting
{% set enable_thinking = false %}into the raw Jinja chat template.PR created automatically by Jules for task 18296997036924831771 started by @Godzilla675