[Optimization]Streaming requests return complete special tokens.#6998
[Optimization]Streaming requests return complete special tokens.#6998luukunn wants to merge 6 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull request overview
该 PR 旨在优化 OpenAI 兼容的 streaming 响应:当引擎输出被标记为 skipped 时,如果客户端开启 return_token_ids,仍然返回对应的 token ids(从而让“特殊 token/完整 token 流”在 streaming 场景下不丢失)。
Changes:
- 在 completion streaming 中:仅在
skipped且return_token_ids=False时跳过输出;否则返回空text并携带token_ids。 - 在 chat completion streaming 中:同样调整
skipped分支逻辑,并在skipped时返回空内容(文本/多模态)。 - 将
tool_calls的检测提前,以便即使该帧被跳过也能正确影响最终finish_reason判定。
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| fastdeploy/entrypoints/openai/serving_completion.py | 调整 streaming 中对 skipped 的处理逻辑,使 return_token_ids=True 时仍能输出 token ids(text 为空)。 |
| fastdeploy/entrypoints/openai/serving_chat.py | 调整 chat streaming 的 skipped 处理与内容填充逻辑(含 multimodal 分支)。 |
|
|
||
| if response_processor.enable_multimodal_content(): | ||
| delta_message.multimodal_content = output["multipart"] | ||
| delta_message.multimodal_content = [{}] if output["skipped"] else output["multipart"] |
There was a problem hiding this comment.
当启用 multimodal 输出且 output["skipped"] 为 True 时,这里把 delta_message.multimodal_content 设为 [{}] 会导致返回的 multipart item 缺少上游约定的字段(例如 response_processors 里文本部分是 {type: "text", text: ...}),可能破坏客户端对 multimodal_content 结构的解析。建议在 skipped 时也返回一个结构一致的占位项(例如 type/text 为空字符串),再在其上附加 completion_token_ids。
| delta_message.multimodal_content = [{}] if output["skipped"] else output["multipart"] | |
| delta_message.multimodal_content = ( | |
| [{"type": "text", "text": ""}] if output["skipped"] else output["multipart"] | |
| ) |
| delta_message = CompletionResponseStreamChoice( | ||
| index=idx, | ||
| text=output["text"], | ||
| text="" if output["skipped"] else (output["text"] or ""), | ||
| prompt_token_ids=None, |
There was a problem hiding this comment.
当前 PR 描述仍是模板占位(Motivation/Modifications/Usage/Accuracy Tests 未填写),且标题格式建议去掉引号并在 tag 后补空格(如 [Optimization] Streaming ...),否则后续回溯变更目的与验证方式会比较困难。请补充:为何要让 streaming 返回 skipped token 的 token_ids、预期行为/兼容性说明、以及至少一个可复现或验证的命令/用例。
| if output["skipped"] and not request.return_token_ids: | ||
| continue | ||
|
|
||
| delta_message = CompletionResponseStreamChoice( | ||
| index=idx, | ||
| text=output["text"], | ||
| text="" if output["skipped"] else (output["text"] or ""), | ||
| prompt_token_ids=None, |
There was a problem hiding this comment.
这里新增了 output["skipped"] and not request.return_token_ids 的分支,行为变化是:当 return_token_ids=True 时,即使 skipped 也会发送一个 text 为空的 chunk(携带 token_ids)。现有 tests/entrypoints/openai/test_serving_completion.py 的 streaming 用例只覆盖 skipped=False,建议补充 skipped=True 的场景(分别覆盖 return_token_ids=True/False),以避免后续回归(例如错误地继续 skip 导致 token_ids 丢失)。
| if response_processor.enable_multimodal_content(): | ||
| delta_message.multimodal_content = output["multipart"] | ||
| delta_message.multimodal_content = [{}] if output["skipped"] else output["multipart"] | ||
| else: | ||
| delta_message.content = output["text"] | ||
| delta_message.content = "" if output["skipped"] else (output["text"] or "") |
There was a problem hiding this comment.
针对 skipped + multimodal + return_token_ids 的组合,当前实现会产出不含 type/text 的 multipart item(仅有 completion_token_ids)。tests/entrypoints/openai/test_serving_chat.py 虽覆盖了 skipped=True,但未断言 multipart item 的结构一致性。建议在这里补充/更新测试,明确 skipped chunk 的 multimodal_content 至少包含 type: "text" 与空 text,并包含 completion_token_ids。
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6998 +/- ##
==========================================
Coverage ? 74.16%
==========================================
Files ? 399
Lines ? 56045
Branches ? 8849
==========================================
Hits ? 41568
Misses ? 11531
Partials ? 2946
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| if output["skipped"] and not request.return_token_ids: | ||
| continue | ||
|
|
||
| delta_message = CompletionResponseStreamChoice( | ||
| index=idx, | ||
| text=output["text"], | ||
| text="" if output["skipped"] else (output["text"] or ""), |
There was a problem hiding this comment.
PR 标题建议严格遵循模板要求的 [标签] Title 格式:当前为 [Optimization]Streaming...,缺少标签后的空格;另外本 PR 描述仍是模板占位,缺少 Motivation/Modifications/Usage 等关键信息,后续排查与回溯会比较困难。建议补全描述并调整标题格式。
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.