Skip to content

server, webui: accept continue_final_message flag for vLLM API compat#23012

Open
ServeurpersoCom wants to merge 3 commits into
ggml-org:masterfrom
ServeurpersoCom:reasoning-continue-prefill-vllm-compat
Open

server, webui: accept continue_final_message flag for vLLM API compat#23012
ServeurpersoCom wants to merge 3 commits into
ggml-org:masterfrom
ServeurpersoCom:reasoning-continue-prefill-vllm-compat

Conversation

@ServeurpersoCom
Copy link
Copy Markdown
Contributor

Overview

Add the continue_final_message body flag from the vLLM and transformers API. When set together with add_generation_prompt false, it triggers the existing prefill_assistant code path, regardless of the server side opt.prefill_assistant option. Mutual exclusion with add_generation_prompt true is enforced, matching vLLM behavior.

WebUI sends continue_final_message and add_generation_prompt false on the Continue button, with the matching opt in option on the chat service.

Pure API alignment, no change to the prefill logic itself. Paves the way for the upcoming per-template prefill plumbing in common/chat.

Additional information

Follow-up PR #22727 for vLLM compat

Requirements

@ServeurpersoCom ServeurpersoCom requested review from a team as code owners May 13, 2026 14:40
@ServeurpersoCom
Copy link
Copy Markdown
Contributor Author

cc @aldehir
I'm going to add tests.

Add the continue_final_message body flag from the vLLM and transformers
API. When set together with add_generation_prompt false, it triggers the
existing prefill_assistant code path, regardless of the server side
opt.prefill_assistant option. Mutual exclusion with add_generation_prompt
true is enforced, matching vLLM behavior.

WebUI sends continue_final_message and add_generation_prompt false on
the Continue button, with the matching opt in option on the chat service.

Pure API alignment, no change to the prefill logic itself. Paves the way
for the upcoming per-template prefill plumbing in common/chat.
@ServeurpersoCom ServeurpersoCom force-pushed the reasoning-continue-prefill-vllm-compat branch from 918af57 to 972f4a7 Compare May 13, 2026 14:51
@ServeurpersoCom
Copy link
Copy Markdown
Contributor Author

I've tested my production endpoint, I'm testing the test Python script, a quick check and that'll done.

Two cases on top of the existing assistant prefill coverage. First,
continue_final_message true with add_generation_prompt false produces
the same rendered prompt as the prefill_assistant heuristic, proving
the new flag is a correct alias of the existing path. Second, both
flags set to true is rejected with HTTP 400, matching the
vLLM/transformers mutual exclusion contract.
Copy link
Copy Markdown
Contributor

@aldehir aldehir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ServeurpersoCom! Will follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants