Skip to content

Conversation

@YanhuiDua
Copy link
Collaborator

@YanhuiDua YanhuiDua commented Jan 22, 2026

Key Changes

The PR includes three packing strategies: greedy, balance, native

  • Native Strategy: It preserves strict order but results in higher padding overhead. It first splits samples across DP ranks, then divides them into optimizer steps, and finally performs packing and padding.

  • Balance Strategy: It balances the token load across all GPUs for each mini training step. Same to n ative strategy, it splits across DP ranks and optimizer steps, and the performs packing and padding.

  • Greedy Strategy: It minimize the number of padding tokens. This is same to XTuner's original packing strategy. Unlike the other methods, it packs samples first to fill the max_seq_len as tightly as possible (disregarding the original sample order between steps). These dense packs are then distributed across DP ranks and optimizer steps.

Examples

Test Context: test_variable_packs in tests/ray/test_pack.py
Parameters:

  • Input Seqlens: [1500, 1000, 2800, 3000, 1500, 2000, 2100, 1000, 800]
  • DP Size: 2
  • Optimizer Steps: 2
  • Max Pack Length: 3072 (Target)

Strategy 1: Native

按样本数量朴素切分,仅保证样本数均衡,不考虑长度。

  1. Pre-processing (Padding for Divisibility)
    补齐样本数量以确保能被 DP Size 整除(添加 padding item 1024)。

    Input:  [1500, 1000, 2800, 3000, 1500, 2000, 2100, 1000, 800]
    Output: [1500, 1000, 2800, 3000, 1500, 2000, 2100, 1000, 800, 1024]
    
  2. Split by DP Rank

    Rank 0: [1500, 1000, 2800, 3000, 1500]
    Rank 1: [2000, 2100, 1000,  800, 1024]
    
  3. Split by Optimizer Steps

    Rank 0: Step 0: [1500, 1000, 2800] | Step 1: [3000, 1500]
    Rank 1: Step 0: [2000, 2100, 1000] | Step 1: [ 800, 1024]
    
  4. Pack & Pad (Independent per Step)

    Rank 0:
      Step 0: [2500 -> 3072], [2800 -> 3072]
      Step 1: [3000 -> 3072], [1500 -> 3072]
    Rank 1:
      Step 0: [2000 -> 3072], [2100 -> 3072], [1000 -> 3072]
      Step 1: [1824 -> 3072]
    
  5. Cross-Rank Alignment (Final Result)
    对齐各 Rank 在同一 Step 内的 Pack 数量(不足补空包)。

    Rank 0:
      Step 0: [2500], [2800], [0 (Pad)]  <-- Aligned to max 3 packs
      Step 1: [3000], [1500]
    Rank 1:
      Step 0: [2100], [2000], [1000]
      Step 1: [1824], [0 (Pad)]          <-- Aligned to max 2 packs
    

Strategy 2: Balance (Sorted Split)

先排序再切分,将长短样本均匀分配给各卡,减少 Padding 浪费。

  1. Global Sort

    Original: [1500, 1000, 2800, 3000, 1500, 2000, 2100, 1000, 800]
    Sorted:   [3000, 2800, 2100, 2000, 1500, 1500, 1000, 1000, 800]
    
  2. Split by DP Rank & Step (Interleaved)
    相近长度的样本被分发到不同的卡上以平衡负载。

    Rank 0: Step 0: [3000, 1500, 800], Step 1: [2100, 1000]
    Rank 1: Step 0: [2800, 1500],  Step 1: [2000, 1000]
    
  3. Pack & Pad

    Rank 0:
      Step 0: [3000 -> 3072], [2300 -> 3072]
      Step 1: [2100 -> 3072], [1000 -> 3072]
    Rank 1:
      Step 0: [2800 -> 3072], [1500 -> 3072]
      Step 1: [3000 -> 3072], [0 (Pad)]
    
  4. Cross-Rank Alignment
    (Skipped as packs are already balanced in this case)

Strategy 3: Greedy (Global Packing First)

先进行全局 Packing,再切分给各卡,最大化填充率。

  1. Global Packing
    将所有样本贪心地打成 Pack:

    Pack 1: [1500, 1000] -> 2500
    Pack 2: [2800]       -> 2800
    Pack 3: [3000]       -> 3000
    Pack 4: [1500]       -> 1500
    Pack 5: [2000]       -> 2000
    Pack 6: [2100]       -> 2100
    Pack 7: [1000, 800]  -> 1800
    Pack 8: []           -> 0 (Padding Pack)
    
  2. Split by DP Rank

    Rank 0: [Pack 1, Pack 2, Pack 3, Pack 4]
    Rank 1: [Pack 5, Pack 6, Pack 7, Pack 8]
    
  3. Split by Optimizer Steps

    Rank 0: Step 0: [Pack 1, Pack 2] | Step 1: [Pack 3, Pack 4]
    Rank 1: Step 0: [Pack 5, Pack 6] | Step 1: [Pack 7, Pack 8]
    

@YanhuiDua
Copy link
Collaborator Author

@copilot generate the description for this PR

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds DataBatchPacker functionality to support efficient batch packing for the RLTrainer. The implementation includes three packing strategies (greedy, balance, native) and utilities for workload calculation and partition balancing using the Karmarkar-Karp algorithm.

Changes:

  • Added utility functions for workload calculation and balanced partitioning using the Karmarkar-Karp differencing method
  • Implemented DataBatchPacker class with three packing strategies: greedy (maximize pack utilization), balance (token-balanced distribution), and native (simple sample-based splitting)
  • Added comprehensive test coverage for all three packing strategies with various edge cases

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File Description
xtuner/v1/rl/utils.py Added workload calculation function and Karmarkar-Karp algorithm for balanced partitioning of sequence lengths across workers
xtuner/v1/rl/base/pack.py Implemented DataBatchPacker class with three strategies for packing data batches across data parallel ranks and optimizer steps
tests/ray/test_pack.py Added unit tests validating all three packing strategies with various input scenarios and padding expectations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@YanhuiDua YanhuiDua force-pushed the add_datapacker branch 2 times, most recently from 4ef4b9d to 60b8ec2 Compare January 22, 2026 09:48
@YanhuiDua YanhuiDua changed the title [Feat] Support DataBatchPacker for RLTrainer [Feat] Support RLDistributedDataPacker for RLTrainer Jan 22, 2026
@YanhuiDua YanhuiDua changed the title [Feat] Support RLDistributedDataPacker for RLTrainer [Feat] Support DataPacker for RL Jan 26, 2026
@YanhuiDua YanhuiDua changed the title [Feat] Support DataPacker for RL [Feat] Support RLDataPacker with "native, balance, greedy" packing strategy Jan 26, 2026
@YanhuiDua YanhuiDua changed the title [Feat] Support RLDataPacker with "native, balance, greedy" packing strategy [Feat] Support RLDataPacker with several packing strategy Jan 26, 2026
@YanhuiDua YanhuiDua force-pushed the add_datapacker branch 2 times, most recently from e4600fb to 3e71db0 Compare January 27, 2026 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant