Skip to content

fix: warmup uses full token budget for DP#1024

Open
ZhangLirong-amd wants to merge 1 commit into
mainfrom
dp_mem
Open

fix: warmup uses full token budget for DP#1024
ZhangLirong-amd wants to merge 1 commit into
mainfrom
dp_mem

Conversation

@ZhangLirong-amd
Copy link
Copy Markdown
Contributor

Warmup now uses the full max_num_batched_tokens instead of dividing by dp_size. Under DP attention each rank's MoE sees up to dp_size * local_tokens after the all-gather, so warmup must exercise the full token budget to capture the true peak activation / CUDA-graph footprint; dividing by dp_size under-sized warmup and let decode OOM later. Also updated the warning message accordingly.

Copilot AI review requested due to automatic review settings June 2, 2026 02:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates model warmup behavior so it exercises the full configured batch token budget (max_num_batched_tokens) rather than scaling it down by data-parallel size, aiming to better match peak activation / CUDA-graph memory seen during real DP-attention decode workloads.

Changes:

  • Set warmup_max_tokens to max_num_batched_tokens (no longer divided by dp_size).
  • Update the warmup warning message to reflect the new sizing behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1069 to 1071
f"{self.label}: warmup_max_tokens={warmup_max_tokens} (=max_num_batched_tokens) "
f"< max_model_len={max_model_len}. "
f"Using {num_seqs} seq with length {seq_len} for warmup."
@ZhangLirong-amd ZhangLirong-amd changed the title fix: warmup uses full token budget fix: warmup uses full token budget for DP Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants