[EAGLE] Dynamic sequence length for training samples by benchislett · Pull Request #1069 · NVIDIA/Model-Optimizer

benchislett · 2026-03-19T03:30:40Z

What does this PR do?

Type of change: Optimization

Depends on #1044. Currently has that branch as the target branch for easy diff viewing.

By allowing the training sample sequence length to vary, we can greatly increase the efficiency of training with large training_seq_len upper bounds. In many cases, the mean sequence length of a training sample is far lower than the max sequence length.

In order to maintain torch.compile specialization (which seems to provide a lot of performance boost compared to dynamic=True), I propose to round the max sequence length in each batch up to a fixed "bucket" interval, such as 1024. Finer-grained buckets have better performance for long training runs but lead to much more compilation in the early stages of training.

Usage

This PR adds --bucket_granularity as an optional argument, defaulting to 1024 for a significant performance improvement during training for training_seq_len > 2048.

Testing

Validated the performance improvement and accuracy with a small sample training run.

Optimized run with seqlen 4k and bucket granularity 1k:

Step 200 AR: 1.4002
{'loss': 14.1508, 'grad_norm': 4.03125, 'learning_rate': 9.838235294117647e-05, 'epoch': 0.03}
Completed in ~4 mins

Reference run with no bucketing:

Step 200 AR: 1.4094
{'loss': 14.1479, 'grad_norm': 3.5, 'learning_rate': 9.838235294117647e-05, 'epoch': 0.03}         
Completed in ~5 mins

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: N/A
Did you update N/A

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

coderabbitai · 2026-03-19T03:30:48Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (3)

main
release/.*
feature/.*

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b47f423a-b55e-49ce-b38a-6c9a7c8497d6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch bchislett/eagle-optimize-dynamic-seqlen

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

bucket-based sequence length padding

c8facf8

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

benchislett requested review from a team as code owners March 19, 2026 03:30

benchislett requested review from AAnoosheh and h-guo18 and removed request for a team March 19, 2026 03:30

benchislett requested review from a team, ChenhanYu and yeyu-nvidia March 19, 2026 03:31

fix compile cache thrash in pseudo spec generate

4a64528

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

benchislett changed the title ~~[EAGLE][WIP] Dynamic sequence length for training samples~~ [EAGLE] Dynamic sequence length for training samples Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EAGLE] Dynamic sequence length for training samples#1069

[EAGLE] Dynamic sequence length for training samples#1069
benchislett wants to merge 2 commits intopull-request/1044from
bchislett/eagle-optimize-dynamic-seqlen

benchislett commented Mar 19, 2026

Uh oh!

coderabbitai bot commented Mar 19, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benchislett commented Mar 19, 2026

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Uh oh!

coderabbitai bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 19, 2026 •

edited

Loading