[Cherry-Pick][BugFix][KVCache][Speculative Decoding] Fix get_max_chunk_tokens for PD-split decode node in MTP scenario(#7756)#7758
Conversation
…PD-split decode node in MTP scenario (PaddlePaddle#7756)
|
Thanks for your contribution! |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-09 15:39:33
📋 Review 摘要
PR 概述:修复 PD 分离 + MTP 场景下,D 节点 get_max_chunk_tokens() 未乘以 mtp_steps 导致 token budget 估算偏低的问题
变更范围:fastdeploy/config.py(FDConfig 配置层)
影响面 Tag:[FDConfig] [KVCache] [Speculative Decoding]
📝 PR 规范检查
PR 描述缺少必填的 ## Accuracy Tests section,## Usage or Command 内容为空(模板要求填 N/A),Checklist 最后一条(cherry-pick 到 release 分支)应勾选 [x]。标题格式合规,无需修改。
PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):
## Motivation
在 PD 分离 + MTP(Multi-Token Prediction)场景下,D 节点的 `get_max_chunk_tokens()` 返回值计算有误。当前代码对 D 节点直接返回 `max_num_seqs`,未考虑 MTP 场景下每条 sequence 每个 decode step 实际处理 `num_speculative_tokens + 1` 个 token 的情况,导致 token budget 估算偏低。
## Modifications
`fastdeploy/config.py` `get_max_chunk_tokens()` 方法中,D 节点 non-XPU 分支:
- 修改前:`num_tokens = self.scheduler_config.max_num_seqs`
- 修改后:`num_tokens = self.scheduler_config.max_num_seqs * mtp_steps`
其中 `mtp_steps = num_speculative_tokens + 1`(非 MTP 时为 1,完全向后兼容),与同文件 `check()` 中 `tokens_per_seq` 的计算逻辑保持一致。
## Usage or Command
N/A
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. 该修复为单行逻辑修正,与现有 speculative + PD 分离相关测试覆盖场景对齐;独立单元测试需要 MTP+PD 分离联合环境,暂未新增。
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 📝 PR 规范 | — | 描述缺少 ## Accuracy Tests section;## Usage or Command 为空应填 N/A;Checklist 末项(cherry-pick 到 release)应勾选 [x] |
总体评价
修复逻辑正确,mtp_steps 计算与 check() 中 tokens_per_seq 完全对齐,speculative_config is None 时 mtp_steps=1 保持向后兼容。仅 PR 描述结构需补全规范,代码本身无阻塞性问题。
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览⏳ Required 任务进行中:6 个 Required 任务仍在运行,请等待完成后再合并。
2 任务状态汇总2.1 Required任务 : 4/10 通过
2.2 可选任务 — 21/25 通过
3 失败详情(仅 required)无 required 失败任务。 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## release/2.6 #7758 +/- ##
==============================================
Coverage ? 71.87%
==============================================
Files ? 378
Lines ? 53897
Branches ? 8425
==============================================
Hits ? 38736
Misses ? 12402
Partials ? 2759
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Cherry-pick of #7756 (authored by @kevincheng2) to
release/2.6.devPR:#7756
Motivation
在 PD 分离 + MTP(Multi-Token Prediction)场景下,D 节点的
get_max_chunk_tokens()返回值计算有误。当前代码对 D 节点直接返回
max_num_seqs,未考虑 MTP 场景下每条 sequence 每个 decode step 实际处理num_speculative_tokens + 1个 token 的情况Modifications
fastdeploy/config.pyget_max_chunk_tokens()方法中,D 节点 non-XPU 分支:num_tokens = self.scheduler_config.max_num_seqsnum_tokens = self.scheduler_config.max_num_seqs * mtp_steps其中
mtp_steps = num_speculative_tokens + 1(非 MTP 时为 1,完全向后兼容),与同文件_check_max_num_batched_tokens中tokens_per_seq的计算逻辑保持一致。Usage or Command
Checklist
pre-commitbefore commit.