[Cherry-Pick][BugFix][KVCache][Speculative Decoding] Fix get_max_chunk_tokens for PD-split decode node in MTP scenario(#7756) by EmmonsCurse · Pull Request #7758 · PaddlePaddle/FastDeploy

EmmonsCurse · 2026-05-09T07:31:08Z

Cherry-pick of #7756 (authored by @kevincheng2) to release/2.6.

devPR:#7756

Motivation

在 PD 分离 + MTP（Multi-Token Prediction）场景下，D 节点的 get_max_chunk_tokens() 返回值计算有误。

当前代码对 D 节点直接返回 max_num_seqs，未考虑 MTP 场景下每条 sequence 每个 decode step 实际处理 num_speculative_tokens + 1 个 token 的情况

Modifications

fastdeploy/config.py get_max_chunk_tokens() 方法中，D 节点 non-XPU 分支：

修改前：num_tokens = self.scheduler_config.max_num_seqs
修改后：num_tokens = self.scheduler_config.max_num_seqs * mtp_steps

其中 mtp_steps = num_speculative_tokens + 1（非 MTP 时为 1，完全向后兼容），与同文件 _check_max_num_batched_tokens 中 tokens_per_seq 的计算逻辑保持一致。

Usage or Command

Checklist

Add at least a tag in the PR title.
Format your code, run pre-commit before commit.
Add unit tests. 该修复为单行逻辑修正，与现有 speculative + PD 分离相关测试覆盖场景对齐；独立单元测试需要 MTP+PD 分离联合环境，暂未新增。
Provide accuracy results.

…PD-split decode node in MTP scenario (PaddlePaddle#7756)

paddle-bot · 2026-05-09T07:31:15Z

Thanks for your contribution!

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-09 15:39:33

📋 Review 摘要

PR 概述：修复 PD 分离 + MTP 场景下，D 节点 get_max_chunk_tokens() 未乘以 mtp_steps 导致 token budget 估算偏低的问题
变更范围：fastdeploy/config.py（FDConfig 配置层）
影响面 Tag：[FDConfig] [KVCache] [Speculative Decoding]

📝 PR 规范检查

PR 描述缺少必填的 ## Accuracy Tests section，## Usage or Command 内容为空（模板要求填 N/A），Checklist 最后一条（cherry-pick 到 release 分支）应勾选 [x]。标题格式合规，无需修改。

PR 描述建议（可直接复制，必须复刻 checklist §D2 模板的完整结构）：

## Motivation
在 PD 分离 + MTP（Multi-Token Prediction）场景下，D 节点的 `get_max_chunk_tokens()` 返回值计算有误。当前代码对 D 节点直接返回 `max_num_seqs`，未考虑 MTP 场景下每条 sequence 每个 decode step 实际处理 `num_speculative_tokens + 1` 个 token 的情况，导致 token budget 估算偏低。

## Modifications
`fastdeploy/config.py` `get_max_chunk_tokens()` 方法中，D 节点 non-XPU 分支：

- 修改前：`num_tokens = self.scheduler_config.max_num_seqs`
- 修改后：`num_tokens = self.scheduler_config.max_num_seqs * mtp_steps`

其中 `mtp_steps = num_speculative_tokens + 1`（非 MTP 时为 1，完全向后兼容），与同文件 `check()` 中 `tokens_per_seq` 的计算逻辑保持一致。

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. 该修复为单行逻辑修正，与现有 speculative + PD 分离相关测试覆盖场景对齐；独立单元测试需要 MTP+PD 分离联合环境，暂未新增。
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别	文件	概述
📝 PR 规范	—	描述缺少 `## Accuracy Tests` section；`## Usage or Command` 为空应填 N/A；Checklist 末项（cherry-pick 到 release）应勾选 `[x]`

总体评价

修复逻辑正确，mtp_steps 计算与 check() 中 tokens_per_seq 完全对齐，speculative_config is None 时 mtp_steps=1 保持向后兼容。仅 PR 描述结构需补全规范，代码本身无阻塞性问题。

PaddlePaddle-bot · 2026-05-09T07:46:43Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-09 15:45:28

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 3e58de8
Merge base: d5af459 (branch: release/2.6)
查看完整 Diff
CI 详情

1 任务总览

⏳ Required 任务进行中：6 个 Required 任务仍在运行，请等待完成后再合并。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
35(0)	35	25	2	6	2	0

2 任务状态汇总

2.1 Required任务 : 4/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
⏳	`Extracted partial CE model tasks to run in CI. / run_ce_cases`	-	运行中	-	Job	-
⏳	`Run Base Tests / base_tests`	-	运行中	-	Job	-
⏳	`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	-	运行中	-	Job	-
⏳	`Run Four Cards Tests / run_4_cards_tests`	-	运行中	-	Job	-
⏳	`xpu_4cards_case_test / run_xpu_4cards_cases`	-	运行中	-	Job	-
⏳	`xpu_8cards_case_test / run_xpu_8cards_cases`	-	运行中	-	Job	-
✅	其余 4 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 21/25 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Check PR Template`	10s	Job	-
❌	`Trigger Jenkins for PR`	1m9s	Job	-
⏸️	`CI_HPU`	-	-	-
⏸️	`Run iluvatar Tests / run_iluvatar_cases`	-	-	-
✅	其余 21 个可选任务通过	-	-	-

3 失败详情（仅 required）

无 required 失败任务。

codecov-commenter · 2026-05-09T08:52:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (release/2.6@d5af459). Learn more about missing BASE report.

Additional details and impacted files

@@              Coverage Diff               @@
##             release/2.6    #7758   +/-   ##
==============================================
  Coverage               ?   71.87%           
==============================================
  Files                  ?      378           
  Lines                  ?    53897           
  Branches               ?     8425           
==============================================
  Hits                   ?    38736           
  Misses                 ?    12402           
  Partials               ?     2759

Flag	Coverage Δ
GPU	`71.87% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

[BugFix][KVCache][Speculative Decoding] Fix get_max_chunk_tokens for …

3e58de8

…PD-split decode node in MTP scenario (PaddlePaddle#7756)

EmmonsCurse had a problem deploying to Metax_ci May 9, 2026 07:31 — with GitHub Actions Failure

EmmonsCurse mentioned this pull request May 9, 2026

[BugFix][KVCache][Speculative Decoding] Fix get_max_chunk_tokens for PD-split decode node in MTP scenario #7756

Merged

4 tasks

PaddlePaddle-bot reviewed May 9, 2026

View reviewed changes

yuanlehome approved these changes May 9, 2026

View reviewed changes

yuanlehome merged commit a5fa727 into PaddlePaddle:release/2.6 May 9, 2026
40 of 45 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cherry-Pick][BugFix][KVCache][Speculative Decoding] Fix get_max_chunk_tokens for PD-split decode node in MTP scenario(#7756)#7758

[Cherry-Pick][BugFix][KVCache][Speculative Decoding] Fix get_max_chunk_tokens for PD-split decode node in MTP scenario(#7756)#7758
yuanlehome merged 1 commit intoPaddlePaddle:release/2.6from
EmmonsCurse:cherry-pick/7756/release/2.6

EmmonsCurse commented May 9, 2026

Uh oh!

paddle-bot Bot commented May 9, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot commented May 9, 2026

Uh oh!

codecov-commenter commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

EmmonsCurse commented May 9, 2026

Motivation

Modifications

Usage or Command

Checklist

Uh oh!

paddle-bot Bot commented May 9, 2026

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

PaddlePaddle-bot commented May 9, 2026

1 任务总览

2 任务状态汇总

2.1 Required任务 : 4/10 通过

2.2 可选任务 — 21/25 通过

3 失败详情（仅 required）

Uh oh!

codecov-commenter commented May 9, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants