Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

General Information

This repository contains the source code for the experiments in paper "Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning" by Yixuan Even Xu*, Yash Savani*, Fei Fang, Zico Kolter. We implemented GRPO-PODS (Policy Optimization with Down-Sampling) and compared its performance with vanilla GRPO. Our implementation is based on Unsloth and OpenR1.

Reproducing the Experiments

To install relevant dependencies, install uv and enter
```
uv sync
```
To re-run the single-GPU experiments, edit config/train.yaml, and enter
```
mkdir -p checkpoints
uv run python3 train.py
```
To evaluate the saved checkpoints of a single-GPU experiment run, edit config/test.yaml, and enter
```
uv run python3 evaluate-run.py
```
To run the multi-GPU experiments, cd into the open-r1 directory, follow the install instructions in the README.md file within the directory, and then run the following script.
```
bash exp.sh
```
The data can be collected and downloaded from the corresponding wandb runs and plotted using the plotting scripts.

To generate the plots in the paper, enter

uv run python3 scripts/plot.py
uv run python3 scripts/plot-h100s.py

License

This repository's source code is available under the Apache-2.0 License.

Citation

If you use this code in your research, please cite our paper:

@article{xu2025not,
  title={Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning},
  author={Xu, Yixuan Even and Savani, Yash and Fang, Fei and Kolter, Zico},
  journal={arXiv preprint arXiv:2504.13818},
  year={2025}
}

Contact

For any questions or issues, please contact us via email:

Yixuan Even Xu: yixuanx@cs.cmu.edu
Yash Savani: ysavani@cs.cmu.edu

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
figures		figures
open-r1		open-r1
results		results
scripts		scripts
trainer		trainer
utils		utils
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
evaluate-base.py		evaluate-base.py
evaluate-run.py		evaluate-run.py
pyproject.toml		pyproject.toml
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

Table of Contents

General Information

Reproducing the Experiments

License

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

Table of Contents

General Information

Reproducing the Experiments

License

Citation

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages