-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
[Core] Add SLA-tiered scheduling (opt-in) and docs #30297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Add SLA-tiered scheduling (opt-in) and docs #30297
Conversation
…tization - Introduced SLA-tier scheduling, allowing requests to carry an `sla_tier` attribute for prioritization. - Updated `PriorityRequestQueue` to order requests based on SLA tier and priority. - Added tests for the new scheduling behavior, ensuring correct request handling based on SLA tiers. - Refactored relevant classes and methods to support the new scheduling logic, improving overall request management. This change enhances the system's ability to manage requests efficiently, particularly in scenarios with varying service level agreements. Signed-off-by: Billy Coleman III <prodbybuddha@icloud.com>
|
Documentation preview: https://vllm--30297.org.readthedocs.build/en/30297/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an opt-in SLA-tiered scheduling mechanism, which is a valuable addition for managing different types of workloads. The implementation appears robust, complete with corresponding documentation and unit tests that validate the new logic for request prioritization, preemption, and budgeting. The changes to the priority queue and scheduler are well-designed. However, I've identified an unrelated change that removes the vllm:request_prefill_kv_computed_tokens metric, which could negatively impact observability of the prefix caching feature. Please see the detailed comment.
|
Hi @ProdByBuddha, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
…scheduling-rebased
…scheduling-rebased
Signed-off-by: Billy Coleman III <prodbybuddha@icloud.com>
Adds missing Signed-off-by for commit a461d0b. Signed-off-by: Billy Coleman III <prodbybuddha@icloud.com>
Purpose
Add opt-in SLA-tiered scheduling to the V1 engine:
SamplingParams.sla_tier(interactive,batch,background).SchedulerConfig.sla_tier_enabledis true, the scheduler orders by tier then priority and enforces an interactive token cap viamax_interactive_batch_tokens.SchedulerStats.docs/design/arch_overview.md.Test Plan
python -m pytest tests/v1/core/sched/test_scheduler.pyTest Result
tests/v1/core/sched/test_scheduler.py(WSL, Python 3.12.8); only SWIG deprecation warnings.tests/v1/core/test_scheduler.py,tests/v1/core/test_priority_scheduler_random.py,tests/v1/core/test_async_scheduler.py,tests/plugins_tests/test_scheduler_plugins.py) fail in this CPU-only environment due to missing GPU/EC/KV resources; expected to run in GPU-capable CI.Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.See Issue