Bugfix: batch_size_warmup_scheduler was taking too long#205
Open
onurgu wants to merge 1 commit intoAnswerDotAI:mainfrom
Open
Bugfix: batch_size_warmup_scheduler was taking too long#205onurgu wants to merge 1 commit intoAnswerDotAI:mainfrom
onurgu wants to merge 1 commit intoAnswerDotAI:mainfrom
Conversation
…ible for real world max_batch_size values
|
I have a question regarding your statement that using the 'sum(range(x, y))' idiom to sum values in a range is inefficient for large y – to the point of being impractical when y is around 50B, for example. My understanding is that x and y are derived from batch size variables and are not related to the number of tokens. Could you clarify why you consider a scenario where y equals 50B? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BatchSizeWarmupScheduler was taking too long or was impossible for real world max_batch_size values
When trying to use the training script like the following:
the script was not giving any output for a long long while. So I started to read the code. I saw that the code was using sum(range(x, y)) idiom to summing the values along a range, this was inefficient for large y, especially impossible when y=50B or something.
Changes
Simplify BatchSizeWarmupScheduler Implementation
Summary
This PR simplifies the batch size warmup scheduling logic by replacing the step-based threshold calculation with a more straightforward token-based approach. The new implementation provides a more intuitive and mathematically precise way to handle batch size warmup during training.
Changes
_calculate_step_thresholds()with_calculate_tokens_per_batch_size()current_step→current_token_count)Technical Details
The new implementation:
(n(a₁ + aₙ))/2Benefits
Discussions
If any, please include references to the relevant issues/previous PR/discord discussions around these changes.
Tests