Skip to content

Conversation

@ProdByBuddha
Copy link

@ProdByBuddha ProdByBuddha commented Dec 9, 2025

Purpose

Add opt-in SLA-tiered scheduling to the V1 engine:

  • Requests can set SamplingParams.sla_tier (interactive, batch, background).
  • When SchedulerConfig.sla_tier_enabled is true, the scheduler orders by tier then priority and enforces an interactive token cap via max_interactive_batch_tokens.
  • Surface per-tier metrics (waiting/preempted counts and interactive-cap hits) in SchedulerStats.
  • Document the feature and CPU-only limitations in docs/design/arch_overview.md.

Test Plan

  • python -m pytest tests/v1/core/sched/test_scheduler.py

Test Result

  • Passed: tests/v1/core/sched/test_scheduler.py (WSL, Python 3.12.8); only SWIG deprecation warnings.
  • Note: Broader scheduler suites (tests/v1/core/test_scheduler.py, tests/v1/core/test_priority_scheduler_random.py, tests/v1/core/test_async_scheduler.py, tests/plugins_tests/test_scheduler_plugins.py) fail in this CPU-only environment due to missing GPU/EC/KV resources; expected to run in GPU-capable CI.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

See Issue

…tization

- Introduced SLA-tier scheduling, allowing requests to carry an `sla_tier` attribute for prioritization.
- Updated `PriorityRequestQueue` to order requests based on SLA tier and priority.
- Added tests for the new scheduling behavior, ensuring correct request handling based on SLA tiers.
- Refactored relevant classes and methods to support the new scheduling logic, improving overall request management.

This change enhances the system's ability to manage requests efficiently, particularly in scenarios with varying service level agreements.

Signed-off-by: Billy Coleman III <prodbybuddha@icloud.com>
@mergify
Copy link

mergify bot commented Dec 9, 2025

Documentation preview: https://vllm--30297.org.readthedocs.build/en/30297/

@mergify mergify bot added documentation Improvements or additions to documentation v1 labels Dec 9, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an opt-in SLA-tiered scheduling mechanism, which is a valuable addition for managing different types of workloads. The implementation appears robust, complete with corresponding documentation and unit tests that validate the new logic for request prioritization, preemption, and budgeting. The changes to the priority queue and scheduler are well-designed. However, I've identified an unrelated change that removes the vllm:request_prefill_kv_computed_tokens metric, which could negatively impact observability of the prefix caching feature. Please see the detailed comment.

@mergify
Copy link

mergify bot commented Dec 9, 2025

Hi @ProdByBuddha, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant