[Feat] Support RLDataPacker with several packing strategy #1438

YanhuiDua · 2026-01-22T03:29:54Z

Key Changes

The PR includes three packing strategies: greedy, balance, native

Native Strategy: It preserves strict order but results in higher padding overhead. It first splits samples across DP ranks, then divides them into optimizer steps, and finally performs packing and padding.
Balance Strategy: It balances the token load across all GPUs for each mini training step. Same to n ative strategy, it splits across DP ranks and optimizer steps, and the performs packing and padding.
Greedy Strategy: It minimize the number of padding tokens. This is same to XTuner's original packing strategy. Unlike the other methods, it packs samples first to fill the max_seq_len as tightly as possible (disregarding the original sample order between steps). These dense packs are then distributed across DP ranks and optimizer steps.

Examples

Test Context: test_variable_packs in tests/ray/test_pack.py
Parameters:

Input Seqlens: [1500, 1000, 2800, 3000, 1500, 2000, 2100, 1000, 800]
DP Size: 2
Optimizer Steps: 2
Max Pack Length: 3072 (Target)

Strategy 1: Native

按样本数量朴素切分，仅保证样本数均衡，不考虑长度。

Pre-processing (Padding for Divisibility)
补齐样本数量以确保能被 DP Size 整除（添加 padding item 1024）。

Input:  [1500, 1000, 2800, 3000, 1500, 2000, 2100, 1000, 800]
Output: [1500, 1000, 2800, 3000, 1500, 2000, 2100, 1000, 800, 1024]

Split by DP Rank

Rank 0: [1500, 1000, 2800, 3000, 1500]
Rank 1: [2000, 2100, 1000,  800, 1024]

Split by Optimizer Steps

Rank 0: Step 0: [1500, 1000, 2800] | Step 1: [3000, 1500]
Rank 1: Step 0: [2000, 2100, 1000] | Step 1: [ 800, 1024]

Pack & Pad (Independent per Step)

Rank 0:
  Step 0: [2500 -> 3072], [2800 -> 3072]
  Step 1: [3000 -> 3072], [1500 -> 3072]
Rank 1:
  Step 0: [2000 -> 3072], [2100 -> 3072], [1000 -> 3072]
  Step 1: [1824 -> 3072]

Cross-Rank Alignment (Final Result)
对齐各 Rank 在同一 Step 内的 Pack 数量（不足补空包）。

Rank 0:
  Step 0: [2500], [2800], [0 (Pad)]  <-- Aligned to max 3 packs
  Step 1: [3000], [1500]
Rank 1:
  Step 0: [2100], [2000], [1000]
  Step 1: [1824], [0 (Pad)]          <-- Aligned to max 2 packs

Strategy 2: Balance (Sorted Split)

先排序再切分，将长短样本均匀分配给各卡，减少 Padding 浪费。

Global Sort

Original: [1500, 1000, 2800, 3000, 1500, 2000, 2100, 1000, 800]
Sorted:   [3000, 2800, 2100, 2000, 1500, 1500, 1000, 1000, 800]

Split by DP Rank & Step (Interleaved)
相近长度的样本被分发到不同的卡上以平衡负载。

Rank 0: Step 0: [3000, 1500, 800], Step 1: [2100, 1000]
Rank 1: Step 0: [2800, 1500],  Step 1: [2000, 1000]

Pack & Pad

Rank 0:
  Step 0: [3000 -> 3072], [2300 -> 3072]
  Step 1: [2100 -> 3072], [1000 -> 3072]
Rank 1:
  Step 0: [2800 -> 3072], [1500 -> 3072]
  Step 1: [3000 -> 3072], [0 (Pad)]

Cross-Rank Alignment
(Skipped as packs are already balanced in this case)

Strategy 3: Greedy (Global Packing First)

先进行全局 Packing，再切分给各卡，最大化填充率。

Global Packing
将所有样本贪心地打成 Pack：

Pack 1: [1500, 1000] -> 2500
Pack 2: [2800]       -> 2800
Pack 3: [3000]       -> 3000
Pack 4: [1500]       -> 1500
Pack 5: [2000]       -> 2000
Pack 6: [2100]       -> 2100
Pack 7: [1000, 800]  -> 1800
Pack 8: []           -> 0 (Padding Pack)

Split by DP Rank

Rank 0: [Pack 1, Pack 2, Pack 3, Pack 4]
Rank 1: [Pack 5, Pack 6, Pack 7, Pack 8]

Split by Optimizer Steps

Rank 0: Step 0: [Pack 1, Pack 2] | Step 1: [Pack 3, Pack 4]
Rank 1: Step 0: [Pack 5, Pack 6] | Step 1: [Pack 7, Pack 8]

YanhuiDua · 2026-01-22T03:32:08Z

@copilot generate the description for this PR

Copilot

Pull request overview

This PR adds DataBatchPacker functionality to support efficient batch packing for the RLTrainer. The implementation includes three packing strategies (greedy, balance, native) and utilities for workload calculation and partition balancing using the Karmarkar-Karp algorithm.

Changes:

Added utility functions for workload calculation and balanced partitioning using the Karmarkar-Karp differencing method
Implemented DataBatchPacker class with three packing strategies: greedy (maximize pack utilization), balance (token-balanced distribution), and native (simple sample-based splitting)
Added comprehensive test coverage for all three packing strategies with various edge cases

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File	Description
xtuner/v1/rl/utils.py	Added workload calculation function and Karmarkar-Karp algorithm for balanced partitioning of sequence lengths across workers
xtuner/v1/rl/base/pack.py	Implemented DataBatchPacker class with three strategies for packing data batches across data parallel ranks and optimizer steps
tests/ray/test_pack.py	Added unit tests validating all three packing strategies with various input scenarios and padding expectations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

xtuner/v1/rl/base/pack.py

xtuner/v1/rl/utils.py

xtuner/v1/rl/base/pack.py

xtuner/v1/rl/pack.py

xtuner/v1/rl/base/pack.py

xtuner/v1/rl/utils.py

YanhuiDua mentioned this pull request Jan 22, 2026

[Refactor] refactor packing in RL train controller and train worker #1393

Open

YanhuiDua requested a review from Copilot January 22, 2026 03:43

Copilot started reviewing on behalf of YanhuiDua January 22, 2026 03:43 View session

Copilot AI reviewed Jan 22, 2026

View reviewed changes

YanhuiDua force-pushed the add_datapacker branch 2 times, most recently from 4ef4b9d to 60b8ec2 Compare January 22, 2026 09:48

YanhuiDua changed the title ~~[Feat] Support DataBatchPacker for RLTrainer~~ [Feat] Support RLDistributedDataPacker for RLTrainer Jan 22, 2026

YanhuiDua force-pushed the add_datapacker branch from 60b8ec2 to 5af5e13 Compare January 26, 2026 06:55

YanhuiDua changed the title ~~[Feat] Support RLDistributedDataPacker for RLTrainer~~ [Feat] Support DataPacker for RL Jan 26, 2026

YanhuiDua force-pushed the add_datapacker branch from 5af5e13 to 2b5aa84 Compare January 26, 2026 07:13

YanhuiDua changed the title ~~[Feat] Support DataPacker for RL~~ [Feat] Support RLDataPacker with "native, balance, greedy" packing strategy Jan 26, 2026

YanhuiDua force-pushed the add_datapacker branch from 2b5aa84 to 49e481e Compare January 26, 2026 07:28

YanhuiDua changed the title ~~[Feat] Support RLDataPacker with "native, balance, greedy" packing strategy~~ [Feat] Support RLDataPacker with several packing strategy Jan 26, 2026

YanhuiDua force-pushed the add_datapacker branch 2 times, most recently from e4600fb to 3e71db0 Compare January 27, 2026 04:03

[Feat] Support DataPacker for RL

0f8f3bb

YanhuiDua force-pushed the add_datapacker branch from 3e71db0 to 0f8f3bb Compare January 28, 2026 04:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Support RLDataPacker with several packing strategy #1438

[Feat] Support RLDataPacker with several packing strategy #1438

Uh oh!

YanhuiDua commented Jan 22, 2026 •

edited

Loading

Uh oh!

YanhuiDua commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Feat] Support RLDataPacker with several packing strategy #1438

Are you sure you want to change the base?

[Feat] Support RLDataPacker with several packing strategy #1438

Uh oh!

Conversation

YanhuiDua commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Examples

Strategy 1: Native

Strategy 2: Balance (Sorted Split)

Strategy 3: Greedy (Global Packing First)

Uh oh!

YanhuiDua commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

YanhuiDua commented Jan 22, 2026 •

edited

Loading