Skip to content

fix: chunk prefill#1032

Open
jiayyu wants to merge 3 commits into
mainfrom
fpz/dsv4
Open

fix: chunk prefill#1032
jiayyu wants to merge 3 commits into
mainfrom
fpz/dsv4

Conversation

@jiayyu
Copy link
Copy Markdown
Contributor

@jiayyu jiayyu commented Jun 2, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

jiayyu added 2 commits June 2, 2026 05:40
Preempted seqs keep their decoded token_ids (preempt() only deallocates
KV blocks) so seq.num_tokens > seq.num_prompt_tokens on re-admit.
Computing num_new_tokens from num_prompt_tokens caused chunk=0 when a
fully-cached prefix exhausted num_prompt_tokens, triggering the
"chunk must be positive" assert under high concurrency benchmarks.
Copilot AI review requested due to automatic review settings June 2, 2026 07:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes chunked-prefill scheduling so preempted sequences (whose KV blocks have been freed but whose decoded token_ids are retained) re-forward all of their tokens — not just the original prompt tokens — when they are re-admitted from the waiting queue. Also removes the DeepSeek-V4 carve-out that auto-disabled chunked prefill, indicating chunked prefill is now considered safe for V4.

Changes:

  • In scheduler.py, base num_new_tokens on seq.num_tokens instead of seq.num_prompt_tokens so previously-decoded tokens of a preempted seq are recomputed.
  • Remove the enable_chunked_prefill = False auto-override and accompanying warning for DeepSeek-V4 in config.py.
  • Drop the corresponding "DeepSeek-V4 auto-disables this" note from the --enable-chunked-prefill CLI help in arg_utils.py.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
atom/model_engine/scheduler.py Use num_tokens (prompt + decoded) when sizing the new prefill chunk for waiting seqs, so preempted/requeued seqs recompute KV for all retained tokens.
atom/config.py Remove DeepSeek-V4 auto-disable of chunked prefill and its warning.
atom/model_engine/arg_utils.py Remove now-stale DeepSeek-V4 note from the chunked-prefill CLI help.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants