[Scheduler] Pre match radix tree in schedule by juncaipeng · Pull Request #6989 · PaddlePaddle/FastDeploy

juncaipeng · 2026-03-24T08:47:34Z

Motivation

提前匹配GPU Cache，只要block足够用于没命中缓存的token，降低多轮长请求调度的门槛。

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

新增pre_match_block_on_gpu
调整调用get_prefix_cached_blocks前面的判断

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-03-24T08:47:41Z

Thanks for your contribution!

Copilot

Pull request overview

该 PR 在 V1 调度流程中引入“提前在 GPU 前缀树上做只读匹配”的预检查，以便在调用 get_prefix_cached_blocks() 之前，先评估是否有足够 GPU blocks 覆盖未命中的 token，从而降低长请求多轮调度时因层级缓存匹配带来的资源门槛。

Changes:

新增 PrefixCacheManager.pre_match_block_on_gpu()：只读遍历 radix tree，计算 GPU resident 的前缀命中 token 数。
在 ResourceManagerV1.schedule() 与 preallocate_resource_in_p() 中，使用预匹配结果计算 need_block_num 后再做 can_allocate_gpu_blocks() 判断。
微调 request_match_blocks() 中 CPU cache 预备阶段的条件分支，避免对 0 blocks 做无意义的 can_allocate 检查。

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
fastdeploy/engine/sched/resource_manager_v1.py	在调度/预分配前加入 GPU 前缀预匹配后的 block 预算检查，减少层级缓存匹配导致的调度门槛/死锁风险。
fastdeploy/cache_manager/prefix_cache_manager.py	新增 GPU-only 预匹配方法，并调整 CPU cache 分配判断逻辑。

fastdeploy/cache_manager/prefix_cache_manager.py

fastdeploy/engine/sched/resource_manager_v1.py

fastdeploy/cache_manager/prefix_cache_manager.py

codecov-commenter · 2026-03-24T10:37:51Z

Codecov Report

❌ Patch coverage is 32.60870% with 31 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@5e469fc). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/cache_manager/prefix_cache_manager.py	12.50%	28 Missing ⚠️
fastdeploy/engine/sched/resource_manager_v1.py	78.57%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #6989   +/-   ##
==========================================
  Coverage           ?   73.79%           
==========================================
  Files              ?      399           
  Lines              ?    56085           
  Branches           ?     8854           
==========================================
  Hits               ?    41389           
  Misses             ?    11770           
  Partials           ?     2926

Flag	Coverage Δ
GPU	`73.79% <32.60%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Pre match radix tree in schedule

c02c2e2

Copilot AI review requested due to automatic review settings March 24, 2026 08:47

juncaipeng temporarily deployed to Metax_ci March 24, 2026 08:47 — with GitHub Actions Inactive

Copilot started reviewing on behalf of juncaipeng March 24, 2026 08:48 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

up

78cca72

juncaipeng temporarily deployed to Metax_ci March 24, 2026 11:04 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Scheduler] Pre match radix tree in schedule#6989

[Scheduler] Pre match radix tree in schedule#6989
juncaipeng wants to merge 2 commits intoPaddlePaddle:developfrom
juncaipeng:pre_match_tree

juncaipeng commented Mar 24, 2026

Uh oh!

paddle-bot bot commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

juncaipeng commented Mar 24, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Mar 24, 2026 •

edited

Loading