[EAGLE] Dynamic sequence length for training samples#1069
Open
benchislett wants to merge 2 commits intopull-request/1044from
Open
[EAGLE] Dynamic sequence length for training samples#1069benchislett wants to merge 2 commits intopull-request/1044from
benchislett wants to merge 2 commits intopull-request/1044from
Conversation
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Contributor
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (3)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Type of change: Optimization
Depends on #1044. Currently has that branch as the target branch for easy diff viewing.
By allowing the training sample sequence length to vary, we can greatly increase the efficiency of training with large
training_seq_lenupper bounds. In many cases, the mean sequence length of a training sample is far lower than the max sequence length.In order to maintain torch.compile specialization (which seems to provide a lot of performance boost compared to
dynamic=True), I propose to round the max sequence length in each batch up to a fixed "bucket" interval, such as 1024. Finer-grained buckets have better performance for long training runs but lead to much more compilation in the early stages of training.Usage
This PR adds
--bucket_granularityas an optional argument, defaulting to 1024 for a significant performance improvement during training fortraining_seq_len> 2048.Testing
Validated the performance improvement and accuracy with a small sample training run.
Optimized run with seqlen 4k and bucket granularity 1k:
Reference run with no bucketing:
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/A