[BugFix][KVCache][Speculative Decoding] Fix get_max_chunk_tokens for PD-split decode node in MTP scenario by kevincheng2 · Pull Request #7756 · PaddlePaddle/FastDeploy

kevincheng2 · 2026-05-09T05:33:45Z

Motivation

在 PD 分离 + MTP（Multi-Token Prediction）场景下，D 节点的 get_max_chunk_tokens() 返回值计算有误。

当前代码对 D 节点直接返回 max_num_seqs，未考虑 MTP 场景下每条 sequence 每个 decode step 实际处理 num_speculative_tokens + 1 个 token 的情况

Modifications

fastdeploy/config.py get_max_chunk_tokens() 方法中，D 节点 non-XPU 分支：

修改前：num_tokens = self.scheduler_config.max_num_seqs
修改后：num_tokens = self.scheduler_config.max_num_seqs * mtp_steps

其中 mtp_steps = num_speculative_tokens + 1（非 MTP 时为 1，完全向后兼容），与同文件 _check_max_num_batched_tokens 中 tokens_per_seq 的计算逻辑保持一致。

Usage or Command

Checklist

Add at least a tag in the PR title.
Format your code, run pre-commit before commit.
Add unit tests. 该修复为单行逻辑修正，与现有 speculative + PD 分离相关测试覆盖场景对齐；独立单元测试需要 MTP+PD 分离联合环境，暂未新增。
Provide accuracy results.

…PD-split decode node in MTP scenario

paddle-bot · 2026-05-09T05:33:51Z

Thanks for your contribution!

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-09 13:39:36

📋 Review 摘要

PR 概述：修复 PD 分离 + MTP 场景下 D 节点 get_max_chunk_tokens() 未乘以投机解码 token 倍数导致 KV Cache 分配不足的 Bug。
变更范围：fastdeploy/config.py
影响面 Tag：[BugFix] [KVCache] [Speculative Decoding] [PD Disaggregation]

📝 PR 规范检查

存在两处规范问题：①标题包含多个官方 Tag（规范要求仅含一个）；②PR 描述缺少 ## Accuracy Tests 必填段落。

标题建议（可直接复制）：

[BugFix] Fix get_max_chunk_tokens for PD-split decode node in MTP scenario

PR 描述建议（可直接复制，已补全所有必填段落）：

## Motivation
在 PD 分离 + MTP（Multi-Token Prediction）场景下，D 节点的 `get_max_chunk_tokens()` 返回值计算有误。

当前代码对 D 节点直接返回 `max_num_seqs`，未考虑 MTP 场景下每条 sequence 每个 decode step 实际处理 `num_speculative_tokens + 1` 个 token 的情况，导致：

1. **`cache_config.postprocess` KV cache 规划不足**：`get_max_chunk_tokens()` 用于 cache 管理策略，返回值偏小会导致 D 节点在 MTP 推理时出现 block 分配异常或 OOM。
2. **`profile_run` 内存估算偏低**：profile run 使用该值决定 dummy 输入的 token 数，低估会导致显存峰值评估不准确。

## Modifications
`fastdeploy/config.py` `get_max_chunk_tokens()` 方法中，D 节点 non-XPU 分支：

- 修改前：`num_tokens = self.scheduler_config.max_num_seqs`
- 修改后：`num_tokens = self.scheduler_config.max_num_seqs * mtp_steps`

其中 `mtp_steps = num_speculative_tokens + 1`（非 MTP 时为 1，完全向后兼容），与同文件 `_check_max_num_batched_tokens` 中 `tokens_per_seq` 的计算逻辑保持一致。

## Usage or Command
PD 分离 + MTP 场景下启动 D 节点，配置示例：

```bash
# 启动 D 节点，开启 MTP 推理
python -m fastdeploy.entrypoints.openai.api_server \
  --model /path/to/model \
  --speculative-config '{"method": "mtp", "num_speculative_tokens": 3}' \
  --splitwise-config '{"role": "decode"}' \
  --max-num-seqs 100 \
  ...
```

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别	文件	概述
📝 PR 规范	—	标题含多个 Tag；描述缺 `## Accuracy Tests` 段落
❓ 疑问	`fastdeploy/config.py:2513`	`method is not None` 涵盖所有投机解码方法，与注释 "In MTP scenario" 语义不一致

❓ 疑问 — config.py 条件判断与注释不一致

当前条件 self.speculative_config.method is not None 涵盖所有投机解码方法（ngram、suffix 等），但上方注释写的是 # In MTP scenario，两者语义不一致，请作者确认：

若所有投机解码方法均需要 num_speculative_tokens + 1 乘数（即 D 节点处理 draft token + 验证 token），建议将注释修改为通用描述：
```
# In speculative decoding scenario, each sequence processes (num_speculative_tokens + 1)
# tokens per decode step (draft tokens + 1 verification token)
```

若此修复仅适用于 MTP 方法，则条件应收紧为：

if self.speculative_config is not None and getattr(self.speculative_config, "method", None) == "mtp"

总体评价

修复方向正确，向后兼容性良好（num_speculative_tokens=0 时退化为原逻辑）。需作者确认 method is not None 条件是否应限定为 MTP 专用，并补全 PR 描述中缺失的 ## Accuracy Tests 段落。

PaddlePaddle-bot · 2026-05-09T05:56:10Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-09 13:54:56

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 1cbe7f2
Merge base: 85f1cb2 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

⏳ CI 进行中：必选任务 6/8 已通过，2 个仍在运行，暂无 required 失败。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
31(0)	31	24	3	3	1	0

2 任务状态汇总

2.1 Required任务 : 6/8 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
⏳	`Extracted partial CE model tasks to run in CI. / run_ce_cases`	-	运行中	-	Job	-
⏳	`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	-	运行中	-	Job	-
✅	其余 6 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 18/23 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	10m37s	Job	-
❌	`Check PR Template`	13s	Job	-
❌	`Trigger Jenkins for PR`	17m7s	Job	-
⏳	`xpu_build_test / xpu-build-test`	-	Job	-
⏸️	`CI_HPU`	-	-	-
✅	其余 18 个可选任务通过	-	-	-

3 失败详情（仅 required）

无

codecov-commenter · 2026-05-09T07:00:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@85f1cb2). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7756   +/-   ##
==========================================
  Coverage           ?   72.20%           
==========================================
  Files              ?      396           
  Lines              ?    55595           
  Branches           ?     8691           
==========================================
  Hits               ?    40141           
  Misses             ?    12690           
  Partials           ?     2764

Flag	Coverage Δ
GPU	`72.20% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

EmmonsCurse · 2026-05-09T07:31:11Z

✅ Cherry-pick successful! Created PR: #7758

[BugFix][KVCache][Speculative Decoding] Fix get_max_chunk_tokens for …

1cbe7f2

…PD-split decode node in MTP scenario

kevincheng2 had a problem deploying to Metax_ci May 9, 2026 05:33 — with GitHub Actions Failure

PaddlePaddle-bot reviewed May 9, 2026

View reviewed changes

kevincheng2 added the cherry-pick: release/2.6 label May 9, 2026

yuanlehome approved these changes May 9, 2026

View reviewed changes

yuanlehome merged commit 9acfc89 into PaddlePaddle:develop May 9, 2026
34 of 38 checks passed

EmmonsCurse mentioned this pull request May 9, 2026

[Cherry-Pick][BugFix][KVCache][Speculative Decoding] Fix get_max_chunk_tokens for PD-split decode node in MTP scenario(#7756) #7758

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix][KVCache][Speculative Decoding] Fix get_max_chunk_tokens for PD-split decode node in MTP scenario#7756

[BugFix][KVCache][Speculative Decoding] Fix get_max_chunk_tokens for PD-split decode node in MTP scenario#7756
yuanlehome merged 1 commit intoPaddlePaddle:developfrom
kevincheng2:fix/pd-split-d-node-max-chunk-tokens-mtp

kevincheng2 commented May 9, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented May 9, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot commented May 9, 2026

Uh oh!

codecov-commenter commented May 9, 2026

Uh oh!

Uh oh!

EmmonsCurse commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

kevincheng2 commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Checklist

Uh oh!

paddle-bot Bot commented May 9, 2026

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

PaddlePaddle-bot commented May 9, 2026

1 任务总览

2 任务状态汇总

2.1 Required任务 : 6/8 通过

2.2 可选任务 — 18/23 通过

3 失败详情（仅 required）

Uh oh!

codecov-commenter commented May 9, 2026

Codecov Report

Uh oh!

Uh oh!

EmmonsCurse commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kevincheng2 commented May 9, 2026 •

edited

Loading