How to use the assistant_prefill with the thinking mode #133
Unanswered
abdullah-cod9
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am applying the following example:
messages = [ {"role": "user", "content": "What are the first 5 planets in the solar system?"}, {"role": "assistant", "content": "...Wait, I've thought long enough, let's answer.</think>"} ]# Seamlessly continue the generationresponse = visionM.model.create_chat_completion( messages=messages, max_tokens=512, assistant_prefill=True # <--- Enables seamless continuation )prefilled_text = messages[-1]["content"]# The model will flawlessly continue from " Venus\n3. Earth..."generated_text = response["choices"][0]["message"]["content"]print(prefilled_text + generated_text)The model rethinks and does not jump to a answer.
Chat Handler:
Qwen35ChatHandler(_init_mtmd_context): Vision support detected. Qwen35ChatHandler(_init_mtmd_context): Audio is NOT supported by this mmproj model backend. Qwen35ChatHandler(_process_mtmd_prompt): Rendered prompt length: 658 chars, Media count: 0. Qwen35ChatHandler(_process_mtmd_prompt): Rendered prompt: <|im_start|>system You are an exceptionally capable, precise, and helpful multimodal AI assistant that excels at deeply understanding and richly describing images, charts, diagrams, text in images, scenes, and any visual content, while also answering every question accurately, clearly, and step-by-step when appropriate — always responding in the same language as the user's question, remaining polite, professional, and maximally helpful.<|im_end|> <|im_start|>userWhat are the first 5 planets in the solar system?<|im_end|> <|im_start|>assistant <think> ...Wait, I've thought long enough, let's answer. </think><|im_end|> <|im_start|>assistant <think>Qwen35ChatHandler(__call__): Prepared virtual token ledger of length 130. Llama.longest_token_prefix [Fast Exit 1]: Empty sequence detected. len(current_ids)=0, len(new_tokens)=130 Qwen35ChatHandler(__call__): Evaluating TEXT chunk (130 tokens) at pos 0... ...Wait, I've thought long enough, let's answer.</think>Thinking Process:1. **Analyze the Request:** * Question: "What are the first 5 planets in the solar system?" * Constraint: Respond in the same language as the user's question (English). * Constraint: Be polite, professional, and maximally helpful. * Constraint: Step-by-step reasoning when appropriate (though this is a factual question, I should explain the order clearly).Beta Was this translation helpful? Give feedback.
All reactions