Skip to content

ColorDavid/pods

 
 

Repository files navigation

Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

Table of Contents

General Information

This repository contains the source code for the experiments in paper "Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning" by Yixuan Even Xu*, Yash Savani*, Fei Fang, Zico Kolter. We implemented GRPO-PODS (Policy Optimization with Down-Sampling) and compared its performance with vanilla GRPO. Our implementation is based on Unsloth and OpenR1.

Reproducing the Experiments

  • To install relevant dependencies, install uv and enter

    uv sync
  • To re-run the single-GPU experiments, edit config/train.yaml, and enter

    mkdir -p checkpoints
    uv run python3 train.py
  • To evaluate the saved checkpoints of a single-GPU experiment run, edit config/test.yaml, and enter

    uv run python3 evaluate-run.py
  • To run the multi-GPU experiments, cd into the open-r1 directory, follow the install instructions in the README.md file within the directory, and then run the following script.

    bash exp.sh

    The data can be collected and downloaded from the corresponding wandb runs and plotted using the plotting scripts.

  • To generate the plots in the paper, enter

    uv run python3 scripts/plot.py
    uv run python3 scripts/plot-h100s.py

License

This repository's source code is available under the Apache-2.0 License.

Citation

If you use this code in your research, please cite our paper:

@article{xu2025not,
  title={Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning},
  author={Xu, Yixuan Even and Savani, Yash and Fang, Fei and Kolter, Zico},
  journal={arXiv preprint arXiv:2504.13818},
  year={2025}
}

Contact

For any questions or issues, please contact us via email:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 95.1%
  • Shell 4.6%
  • Makefile 0.3%