[Scheduler] Pre match radix tree in schedule#6989
Open
juncaipeng wants to merge 2 commits intoPaddlePaddle:developfrom
Open
[Scheduler] Pre match radix tree in schedule#6989juncaipeng wants to merge 2 commits intoPaddlePaddle:developfrom
juncaipeng wants to merge 2 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
Contributor
There was a problem hiding this comment.
Pull request overview
该 PR 在 V1 调度流程中引入“提前在 GPU 前缀树上做只读匹配”的预检查,以便在调用 get_prefix_cached_blocks() 之前,先评估是否有足够 GPU blocks 覆盖未命中的 token,从而降低长请求多轮调度时因层级缓存匹配带来的资源门槛。
Changes:
- 新增
PrefixCacheManager.pre_match_block_on_gpu():只读遍历 radix tree,计算 GPU resident 的前缀命中 token 数。 - 在
ResourceManagerV1.schedule()与preallocate_resource_in_p()中,使用预匹配结果计算need_block_num后再做can_allocate_gpu_blocks()判断。 - 微调
request_match_blocks()中 CPU cache 预备阶段的条件分支,避免对 0 blocks 做无意义的can_allocate检查。
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| fastdeploy/engine/sched/resource_manager_v1.py | 在调度/预分配前加入 GPU 前缀预匹配后的 block 预算检查,减少层级缓存匹配导致的调度门槛/死锁风险。 |
| fastdeploy/cache_manager/prefix_cache_manager.py | 新增 GPU-only 预匹配方法,并调整 CPU cache 分配判断逻辑。 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6989 +/- ##
==========================================
Coverage ? 73.79%
==========================================
Files ? 399
Lines ? 56085
Branches ? 8854
==========================================
Hits ? 41389
Misses ? 11770
Partials ? 2926
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
提前匹配GPU Cache,只要block足够用于没命中缓存的token,降低多轮长请求调度的门槛。
Modifications
新增pre_match_block_on_gpu
调整调用get_prefix_cached_blocks前面的判断
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.