Skip to content

[EAGLE] Dynamic sequence length for training samples#1069

Open
benchislett wants to merge 2 commits intopull-request/1044from
bchislett/eagle-optimize-dynamic-seqlen
Open

[EAGLE] Dynamic sequence length for training samples#1069
benchislett wants to merge 2 commits intopull-request/1044from
bchislett/eagle-optimize-dynamic-seqlen

Conversation

@benchislett
Copy link
Contributor

What does this PR do?

Type of change: Optimization

Depends on #1044. Currently has that branch as the target branch for easy diff viewing.

By allowing the training sample sequence length to vary, we can greatly increase the efficiency of training with large training_seq_len upper bounds. In many cases, the mean sequence length of a training sample is far lower than the max sequence length.

In order to maintain torch.compile specialization (which seems to provide a lot of performance boost compared to dynamic=True), I propose to round the max sequence length in each batch up to a fixed "bucket" interval, such as 1024. Finer-grained buckets have better performance for long training runs but lead to much more compilation in the early stages of training.

Usage

This PR adds --bucket_granularity as an optional argument, defaulting to 1024 for a significant performance improvement during training for training_seq_len > 2048.

Testing

Validated the performance improvement and accuracy with a small sample training run.

Optimized run with seqlen 4k and bucket granularity 1k:

Step 200 AR: 1.4002
{'loss': 14.1508, 'grad_norm': 4.03125, 'learning_rate': 9.838235294117647e-05, 'epoch': 0.03}
Completed in ~4 mins

Reference run with no bucketing:

Step 200 AR: 1.4094
{'loss': 14.1479, 'grad_norm': 3.5, 'learning_rate': 9.838235294117647e-05, 'epoch': 0.03}         
Completed in ~5 mins                                                

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: N/A
  • Did you update N/A

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
@benchislett benchislett requested review from a team as code owners March 19, 2026 03:30
@benchislett benchislett requested review from AAnoosheh and h-guo18 and removed request for a team March 19, 2026 03:30
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 19, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (3)
  • main
  • release/.*
  • feature/.*

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b47f423a-b55e-49ce-b38a-6c9a7c8497d6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bchislett/eagle-optimize-dynamic-seqlen
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@benchislett benchislett requested review from a team, ChenhanYu and yeyu-nvidia March 19, 2026 03:31
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
@benchislett benchislett changed the title [EAGLE][WIP] Dynamic sequence length for training samples [EAGLE] Dynamic sequence length for training samples Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant