[Core] Add SLA-tiered scheduling (opt-in) and docs #30297

ProdByBuddha · 2025-12-09T03:31:46Z

Purpose

Add opt-in SLA-tiered scheduling to the V1 engine:

Requests can set SamplingParams.sla_tier (interactive, batch, background).
When SchedulerConfig.sla_tier_enabled is true, the scheduler orders by tier then priority and enforces an interactive token cap via max_interactive_batch_tokens.
Surface per-tier metrics (waiting/preempted counts and interactive-cap hits) in SchedulerStats.
Document the feature and CPU-only limitations in docs/design/arch_overview.md.

Test Plan

python -m pytest tests/v1/core/sched/test_scheduler.py

Test Result

Passed: tests/v1/core/sched/test_scheduler.py (WSL, Python 3.12.8); only SWIG deprecation warnings.
Note: Broader scheduler suites (tests/v1/core/test_scheduler.py, tests/v1/core/test_priority_scheduler_random.py, tests/v1/core/test_async_scheduler.py, tests/plugins_tests/test_scheduler_plugins.py) fail in this CPU-only environment due to missing GPU/EC/KV resources; expected to run in GPU-capable CI.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

See Issue

…tization - Introduced SLA-tier scheduling, allowing requests to carry an `sla_tier` attribute for prioritization. - Updated `PriorityRequestQueue` to order requests based on SLA tier and priority. - Added tests for the new scheduling behavior, ensuring correct request handling based on SLA tiers. - Refactored relevant classes and methods to support the new scheduling logic, improving overall request management. This change enhances the system's ability to manage requests efficiently, particularly in scenarios with varying service level agreements. Signed-off-by: Billy Coleman III <prodbybuddha@icloud.com>

mergify · 2025-12-09T03:32:22Z

Documentation preview: https://vllm--30297.org.readthedocs.build/en/30297/

gemini-code-assist

Code Review

This pull request introduces an opt-in SLA-tiered scheduling mechanism, which is a valuable addition for managing different types of workloads. The implementation appears robust, complete with corresponding documentation and unit tests that validate the new logic for request prioritization, preemption, and budgeting. The changes to the priority queue and scheduler are well-designed. However, I've identified an unrelated change that removes the vllm:request_prefill_kv_computed_tokens metric, which could negatively impact observability of the prefix caching feature. Please see the detailed comment.

vllm/v1/metrics/stats.py

…scheduling-rebased

mergify · 2025-12-09T04:34:37Z

Hi @ProdByBuddha, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

…scheduling-rebased

Signed-off-by: Billy Coleman III <prodbybuddha@icloud.com>

Adds missing Signed-off-by for commit a461d0b. Signed-off-by: Billy Coleman III <prodbybuddha@icloud.com>

ProdByBuddha requested review from ApostaC, ProExpertProg, WoosukKwon, alexm-redhat, heheda12345, hmellor, houseroad, markmc, mgoin, njhill, robertgshaw2-redhat, tlrmchlsmth, yewentao256, youkaichao and ywang96 as code owners December 9, 2025 03:31

mergify bot added documentation Improvements or additions to documentation v1 labels Dec 9, 2025

gemini-code-assist bot reviewed Dec 9, 2025

View reviewed changes

vllm/v1/metrics/stats.py Show resolved Hide resolved

ProdByBuddha added 2 commits December 8, 2025 19:41

Merge branch 'main' of https://github.com/vllm-project/vllm into sla-…

e90a1f3

…scheduling-rebased

Restore num_cached_tokens metric for prefill KV computed

a461d0b

ProdByBuddha and others added 7 commits December 9, 2025 18:26

restore num_cached_tokens

cced739

Merge branch 'main' of https://github.com/vllm-project/vllm into sla-…

782b95c

…scheduling-rebased

Merge branch 'main' of https://github.com/vllm-project/vllm into sla-…

1a3227f

…scheduling-rebased

Apply ruff formatting

dd96c91

Signed-off-by: Billy Coleman III <prodbybuddha@icloud.com>

Merge branch 'main' into sla-scheduling-rebased

ce1f7e0

Restore num_cached_tokens metric for prefill KV computed (sign-off)

e2355c9

Adds missing Signed-off-by for commit a461d0b. Signed-off-by: Billy Coleman III <prodbybuddha@icloud.com>

Merge branch 'main' into sla-scheduling-rebased

686f164

ProdByBuddha closed this Dec 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Add SLA-tiered scheduling (opt-in) and docs #30297

[Core] Add SLA-tiered scheduling (opt-in) and docs #30297

ProdByBuddha commented Dec 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Dec 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

mergify bot commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[Core] Add SLA-tiered scheduling (opt-in) and docs #30297

[Core] Add SLA-tiered scheduling (opt-in) and docs #30297

Conversation

ProdByBuddha commented Dec 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Dec 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify bot commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ProdByBuddha commented Dec 9, 2025 •

edited by github-actions bot

Loading