Skip to content

[Scheduler] Pre match radix tree in schedule#6989

Open
juncaipeng wants to merge 2 commits intoPaddlePaddle:developfrom
juncaipeng:pre_match_tree
Open

[Scheduler] Pre match radix tree in schedule#6989
juncaipeng wants to merge 2 commits intoPaddlePaddle:developfrom
juncaipeng:pre_match_tree

Conversation

@juncaipeng
Copy link
Collaborator

Motivation

提前匹配GPU Cache,只要block足够用于没命中缓存的token,降低多轮长请求调度的门槛。

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

新增pre_match_block_on_gpu
调整调用get_prefix_cached_blocks前面的判断

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot AI review requested due to automatic review settings March 24, 2026 08:47
@paddle-bot
Copy link

paddle-bot bot commented Mar 24, 2026

Thanks for your contribution!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 在 V1 调度流程中引入“提前在 GPU 前缀树上做只读匹配”的预检查,以便在调用 get_prefix_cached_blocks() 之前,先评估是否有足够 GPU blocks 覆盖未命中的 token,从而降低长请求多轮调度时因层级缓存匹配带来的资源门槛。

Changes:

  • 新增 PrefixCacheManager.pre_match_block_on_gpu():只读遍历 radix tree,计算 GPU resident 的前缀命中 token 数。
  • ResourceManagerV1.schedule()preallocate_resource_in_p() 中,使用预匹配结果计算 need_block_num 后再做 can_allocate_gpu_blocks() 判断。
  • 微调 request_match_blocks() 中 CPU cache 预备阶段的条件分支,避免对 0 blocks 做无意义的 can_allocate 检查。

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
fastdeploy/engine/sched/resource_manager_v1.py 在调度/预分配前加入 GPU 前缀预匹配后的 block 预算检查,减少层级缓存匹配导致的调度门槛/死锁风险。
fastdeploy/cache_manager/prefix_cache_manager.py 新增 GPU-only 预匹配方法,并调整 CPU cache 分配判断逻辑。

@codecov-commenter
Copy link

codecov-commenter commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 32.60870% with 31 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@5e469fc). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/cache_manager/prefix_cache_manager.py 12.50% 28 Missing ⚠️
fastdeploy/engine/sched/resource_manager_v1.py 78.57% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #6989   +/-   ##
==========================================
  Coverage           ?   73.79%           
==========================================
  Files              ?      399           
  Lines              ?    56085           
  Branches           ?     8854           
==========================================
  Hits               ?    41389           
  Misses             ?    11770           
  Partials           ?     2926           
Flag Coverage Δ
GPU 73.79% <32.60%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants