Skip to content

feat: Add support for toggling Qwen3.5 thinking tokens#9

Open
Godzilla675 wants to merge 2 commits intoAdd-support-for-Qwen3.5-models-and-option-to-additional-paramters-(to-enable/disable-thinking)from
feat/qwen-thinking-toggle-18296997036924831771
Open

feat: Add support for toggling Qwen3.5 thinking tokens#9
Godzilla675 wants to merge 2 commits intoAdd-support-for-Qwen3.5-models-and-option-to-additional-paramters-(to-enable/disable-thinking)from
feat/qwen-thinking-toggle-18296997036924831771

Conversation

@Godzilla675
Copy link
Owner

Adds a UI toggle in the model config editor to enable or disable thinking (reasoning) tokens for models like Qwen3.5. Disabling thinking is handled by dynamically injecting {% set enable_thinking = false %} into the raw Jinja chat template.


PR created automatically by Jules for task 18296997036924831771 started by @Godzilla675

Adds a UI toggle in the model config editor to enable or disable thinking
(reasoning) tokens for models like Qwen3.5. Disabling thinking is handled
by dynamically injecting `{% set enable_thinking = false %}` into the raw
Jinja chat template.
Copilot AI review requested due to automatic review settings March 5, 2026 04:21
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new feature that provides users with direct control over the 'thinking tokens' mechanism in models like Qwen3.5. By integrating a UI toggle and corresponding backend logic, it allows for dynamic modification of the chat template, giving users the flexibility to optimize model behavior for specific use cases where reasoning output might be desired or not.

Highlights

  • UI Toggle for Thinking Tokens: A new UI toggle has been added to the model config editor, allowing users to enable or disable 'thinking tokens' for supported models like Qwen3.5.
  • Dynamic Chat Template Injection: The GGUFEngine now dynamically injects {% set enable_thinking = false %} into the raw Jinja chat template if the 'Enable Thinking' toggle is disabled, effectively controlling the model's reasoning output.
  • Inference Parameter Update: The GgufInferenceParams schema has been extended with an enableThinking boolean parameter to store the state of this new setting.
Changelog
  • app/src/main/java/com/dark/tool_neuron/engine/GGUFEngine.kt
    • Modified the nativeSetChatTemplate calls to conditionally prepend {% set enable_thinking = false %} to the chat template based on the new enableThinking inference parameter.
  • app/src/main/java/com/dark/tool_neuron/models/engine_schema/GgufEngineSchema.kt
    • Added a new enableThinking boolean property to the GgufInferenceParams data class, defaulting to true.
  • app/src/main/java/com/dark/tool_neuron/ui/screen/ModelConfigEditorScreen.kt
    • Introduced a SwitchField UI component labeled 'Enable Thinking' into the GgufConfigEditor to allow users to toggle the new setting.
  • app/src/main/java/com/dark/tool_neuron/viewmodel/ModelConfigEditorViewModel.kt
    • Added a new function updateGgufEnableThinking to the ModelConfigEditorViewModel to handle state updates for the enableThinking parameter.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a feature to toggle 'thinking tokens' for certain models by dynamically modifying the Jinja chat template. The changes correctly propagate from the data model to the UI and ViewModel. The core logic in GGUFEngine.kt prepends the necessary template instruction when the feature is disabled. My main feedback is to address code duplication in GGUFEngine.kt by extracting the template modification logic into a shared helper function to improve code maintainability.

Comment on lines +67 to +72
val template = if (!inference.enableThinking) {
"{% set enable_thinking = false %}\n" + inference.chatTemplate
} else {
inference.chatTemplate
}
nativeLib.nativeSetChatTemplate(template)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for preparing the chat template is duplicated in the loadFromFd function (lines 123-128). To improve maintainability and avoid code duplication, consider extracting this logic into a private helper function.

For example, you could add a function to the GGUFEngine class:

private fun prepareChatTemplate(chatTemplate: String, enableThinking: Boolean): String {
    return if (!enableThinking) {
        // Prepend the Jinja variable to disable thinking tokens
        "{% set enable_thinking = false %}\n" + chatTemplate
    } else {
        chatTemplate
    }
}

Then, you can call this function from both load() and loadFromFd():

if (inference.chatTemplate.isNotEmpty()) {
    val template = prepareChatTemplate(inference.chatTemplate, inference.enableThinking)
    nativeLib.nativeSetChatTemplate(template)
}

This will make the code cleaner and easier to modify in the future.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e058450e4d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +67 to +68
val template = if (!inference.enableThinking) {
"{% set enable_thinking = false %}\n" + inference.chatTemplate

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Honor thinking toggle when template is missing

This logic only applies enableThinking inside the chat-template branch, so the new toggle is ignored whenever inference.chatTemplate is empty. In the normal GGUF flow, configs are created from GgufEngineSchema() (ModelDownloadService.kt), and chatTemplate defaults to "" (GgufEngineSchema.kt), so users can disable “Enable Thinking” in the editor but Qwen-style thinking tokens still remain enabled at runtime. The feature effectively becomes a no-op unless a custom template was manually persisted.

Useful? React with 👍 / 👎.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a configuration toggle to enable/disable “thinking” (reasoning) tokens for GGUF models (intended for Qwen3.5-style Jinja templates) by prepending a Jinja {% set enable_thinking = false %} directive to the configured chat template.

Changes:

  • Adds enableThinking to GgufInferenceParams and wires it through the config editor ViewModel.
  • Adds an “Enable Thinking” switch to the GGUF config editor UI.
  • Injects the Jinja directive into the chat template passed to the native GGUF engine when thinking is disabled.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
app/src/main/java/com/dark/tool_neuron/viewmodel/ModelConfigEditorViewModel.kt Adds a ViewModel updater to persist the new GGUF enableThinking flag into state/config.
app/src/main/java/com/dark/tool_neuron/ui/screen/ModelConfigEditorScreen.kt Adds a UI switch to toggle enableThinking for GGUF inference params.
app/src/main/java/com/dark/tool_neuron/models/engine_schema/GgufEngineSchema.kt Extends the serialized GGUF inference schema with enableThinking.
app/src/main/java/com/dark/tool_neuron/engine/GGUFEngine.kt Applies the toggle by conditionally prepending a Jinja directive before setting the chat template in the native engine.

Comment on lines +67 to +72
val template = if (!inference.enableThinking) {
"{% set enable_thinking = false %}\n" + inference.chatTemplate
} else {
inference.chatTemplate
}
nativeLib.nativeSetChatTemplate(template)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The template-selection/injection block is duplicated in both load(...) and loadFromFd(...). Since this is easy to accidentally diverge (and will grow if more template flags are added), consider extracting a small helper (e.g., applyChatTemplate(inference)) or computing template via a shared function/constant.

Copilot uses AI. Check for mistakes.
Comment on lines +123 to +128
val template = if (!inference.enableThinking) {
"{% set enable_thinking = false %}\n" + inference.chatTemplate
} else {
inference.chatTemplate
}
nativeLib.nativeSetChatTemplate(template)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same duplication as above: the template-selection/injection block is repeated here as well. Extracting shared logic would reduce the chance of future inconsistencies between load(...) and loadFromFd(...).

Copilot uses AI. Check for mistakes.
Comment on lines +467 to +472
SwitchField(
label = "Enable Thinking",
description = "Enable reasoning tokens for supported models (e.g., Qwen3.5)",
checked = ggufConfig.inferenceParams.enableThinking,
onCheckedChange = { viewModel.updateGgufEnableThinking(it) }
)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This UI toggle is always shown, but the engine-side behavior only changes anything when a non-empty chatTemplate override is present. Given the current default GGUF config schema uses an empty chatTemplate, users may flip this switch and see no change. Consider disabling/hiding the switch unless a template override is being used, or ensure the model’s built-in chat_template is persisted into the GGUF inference params so the toggle is effective.

Suggested change
SwitchField(
label = "Enable Thinking",
description = "Enable reasoning tokens for supported models (e.g., Qwen3.5)",
checked = ggufConfig.inferenceParams.enableThinking,
onCheckedChange = { viewModel.updateGgufEnableThinking(it) }
)
if (ggufConfig.inferenceParams.chatTemplate.isNotBlank()) {
SwitchField(
label = "Enable Thinking",
description = "Enable reasoning tokens for supported models (e.g., Qwen3.5)",
checked = ggufConfig.inferenceParams.enableThinking,
onCheckedChange = { viewModel.updateGgufEnableThinking(it) }
)
}

Copilot uses AI. Check for mistakes.
val chatTemplate: String = "",
val toolsJson: String = "" // JSON array of tool definitions
val toolsJson: String = "", // JSON array of tool definitions
val enableThinking: Boolean = true // Enable/Disable Qwen3.5 thinking tokens
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor formatting/naming clarity: the inline comment for enableThinking doesn’t match the spacing/style used on adjacent fields (two spaces before //) and it hard-codes a single model name (Qwen3.5) even though the flag name is generic. Consider aligning comment formatting with the rest of the file and wording it in a model-agnostic way (e.g., “Enable/disable thinking tokens when supported by the chat template”).

Suggested change
val enableThinking: Boolean = true // Enable/Disable Qwen3.5 thinking tokens
val enableThinking: Boolean = true // Enable/disable thinking tokens when supported by the chat template

Copilot uses AI. Check for mistakes.
Comment on lines 66 to +72
if (inference.chatTemplate.isNotEmpty()) {
nativeLib.nativeSetChatTemplate(inference.chatTemplate)
val template = if (!inference.enableThinking) {
"{% set enable_thinking = false %}\n" + inference.chatTemplate
} else {
inference.chatTemplate
}
nativeLib.nativeSetChatTemplate(template)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enableThinking is only applied when inference.chatTemplate is non-empty, because the injection + nativeSetChatTemplate(...) call is gated by if (inference.chatTemplate.isNotEmpty()). In this codebase, new GGUF configs are created from GgufEngineSchema() (which defaults chatTemplate to ""), and there are no other writers for chatTemplate, so this toggle likely has no effect in practice. Consider either persisting the model’s built-in chat_template into modelInferenceParams (so it’s available here), or adding a native-level way to disable thinking without overriding the template, or at minimum removing the isNotEmpty() gate when enableThinking is false and you have a known template to apply.

Copilot uses AI. Check for mistakes.
Repository owner deleted a comment from google-labs-jules bot Mar 5, 2026
@Godzilla675
Copy link
Owner Author

address code reviews and dont comment back

Adds a UI toggle in the model config editor to enable or disable thinking
(reasoning) tokens for models like Qwen3.5. Disabling thinking is handled
by dynamically injecting `{% set enable_thinking = false %}` into the raw
Jinja chat template.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants