-
Notifications
You must be signed in to change notification settings - Fork 402
[Feat] Support RLDataPacker with several packing strategy #1438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@copilot generate the description for this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds DataBatchPacker functionality to support efficient batch packing for the RLTrainer. The implementation includes three packing strategies (greedy, balance, native) and utilities for workload calculation and partition balancing using the Karmarkar-Karp algorithm.
Changes:
- Added utility functions for workload calculation and balanced partitioning using the Karmarkar-Karp differencing method
- Implemented DataBatchPacker class with three packing strategies: greedy (maximize pack utilization), balance (token-balanced distribution), and native (simple sample-based splitting)
- Added comprehensive test coverage for all three packing strategies with various edge cases
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| xtuner/v1/rl/utils.py | Added workload calculation function and Karmarkar-Karp algorithm for balanced partitioning of sequence lengths across workers |
| xtuner/v1/rl/base/pack.py | Implemented DataBatchPacker class with three strategies for packing data batches across data parallel ranks and optimizer steps |
| tests/ray/test_pack.py | Added unit tests validating all three packing strategies with various input scenarios and padding expectations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4ef4b9d to
60b8ec2
Compare
60b8ec2 to
5af5e13
Compare
5af5e13 to
2b5aa84
Compare
2b5aa84 to
49e481e
Compare
e4600fb to
3e71db0
Compare
3e71db0 to
0f8f3bb
Compare
Key Changes
The PR includes three packing strategies: greedy, balance, native
Native Strategy: It preserves strict order but results in higher padding overhead. It first splits samples across DP ranks, then divides them into optimizer steps, and finally performs packing and padding.
Balance Strategy: It balances the token load across all GPUs for each mini training step. Same to n ative strategy, it splits across DP ranks and optimizer steps, and the performs packing and padding.
Greedy Strategy: It minimize the number of padding tokens. This is same to XTuner's original packing strategy. Unlike the other methods, it packs samples first to fill the max_seq_len as tightly as possible (disregarding the original sample order between steps). These dense packs are then distributed across DP ranks and optimizer steps.
Examples
Test Context:
test_variable_packsintests/ray/test_pack.pyParameters:
[1500, 1000, 2800, 3000, 1500, 2000, 2100, 1000, 800]Strategy 1: Native
按样本数量朴素切分,仅保证样本数均衡,不考虑长度。
Pre-processing (Padding for Divisibility)
补齐样本数量以确保能被 DP Size 整除(添加 padding item
1024)。Split by DP Rank
Split by Optimizer Steps
Pack & Pad (Independent per Step)
Cross-Rank Alignment (Final Result)
对齐各 Rank 在同一 Step 内的 Pack 数量(不足补空包)。
Strategy 2: Balance (Sorted Split)
先排序再切分,将长短样本均匀分配给各卡,减少 Padding 浪费。
Global Sort
Split by DP Rank & Step (Interleaved)
相近长度的样本被分发到不同的卡上以平衡负载。
Pack & Pad
Cross-Rank Alignment
(Skipped as packs are already balanced in this case)
Strategy 3: Greedy (Global Packing First)
先进行全局 Packing,再切分给各卡,最大化填充率。
Global Packing
将所有样本贪心地打成 Pack:
Split by DP Rank
Split by Optimizer Steps