server, webui: accept continue_final_message flag for vLLM API compat by ServeurpersoCom · Pull Request #23012 · ggml-org/llama.cpp

ServeurpersoCom · 2026-05-13T14:40:37Z

Overview

Add the continue_final_message body flag from the vLLM and transformers API. When set together with add_generation_prompt false, it triggers the existing prefill_assistant code path, regardless of the server side opt.prefill_assistant option. Mutual exclusion with add_generation_prompt true is enforced, matching vLLM behavior.

WebUI sends continue_final_message and add_generation_prompt false on the Continue button, with the matching opt in option on the chat service.

Pure API alignment, no change to the prefill logic itself. Paves the way for the upcoming per-template prefill plumbing in common/chat.

Additional information

Follow-up PR #22727 for vLLM compat

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES Opus 4.7 and local GPU pod

ServeurpersoCom · 2026-05-13T14:41:24Z

cc @aldehir
I'm going to add tests.

Add the continue_final_message body flag from the vLLM and transformers API. When set together with add_generation_prompt false, it triggers the existing prefill_assistant code path, regardless of the server side opt.prefill_assistant option. Mutual exclusion with add_generation_prompt true is enforced, matching vLLM behavior. WebUI sends continue_final_message and add_generation_prompt false on the Continue button, with the matching opt in option on the chat service. Pure API alignment, no change to the prefill logic itself. Paves the way for the upcoming per-template prefill plumbing in common/chat.

ServeurpersoCom · 2026-05-13T15:09:10Z

I've tested my production endpoint, I'm testing the test Python script, a quick check and that'll done.

Two cases on top of the existing assistant prefill coverage. First, continue_final_message true with add_generation_prompt false produces the same rendered prompt as the prefill_assistant heuristic, proving the new flag is a correct alias of the existing path. Second, both flags set to true is rejected with HTTP 400, matching the vLLM/transformers mutual exclusion contract.

aldehir

Thank you @ServeurpersoCom! Will follow up.

ServeurpersoCom requested review from a team as code owners May 13, 2026 14:40

allozaur approved these changes May 13, 2026

View reviewed changes

ServeurpersoCom force-pushed the reasoning-continue-prefill-vllm-compat branch from 918af57 to 972f4a7 Compare May 13, 2026 14:51

ngxson approved these changes May 13, 2026

View reviewed changes

ServeurpersoCom added 2 commits May 13, 2026 17:12

chore: update webui build output

310140e

ngxson approved these changes May 13, 2026

View reviewed changes

aldehir approved these changes May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server, webui: accept continue_final_message flag for vLLM API compat#23012

server, webui: accept continue_final_message flag for vLLM API compat#23012
ServeurpersoCom wants to merge 3 commits into
ggml-org:masterfrom
ServeurpersoCom:reasoning-continue-prefill-vllm-compat

ServeurpersoCom commented May 13, 2026

Uh oh!

ServeurpersoCom commented May 13, 2026

Uh oh!

ServeurpersoCom commented May 13, 2026

Uh oh!

aldehir left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ServeurpersoCom commented May 13, 2026

Overview

Additional information

Requirements

Uh oh!

ServeurpersoCom commented May 13, 2026

Uh oh!

ServeurpersoCom commented May 13, 2026

Uh oh!

aldehir left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants