build: update trl requirement from <=0.21.0 to <=1.1.0 by dependabot[bot] · Pull Request #625 · PrunaAI/pruna

dependabot · 2026-04-13T22:28:02Z

Updates the requirements on trl to permit the latest version.

Release notes

v1.1.0

Features

DistillationTrainer for efficient on-policy distillation

Read the blog post: https://huggingface.co/spaces/HuggingFaceTB/trl-distillation-trainer

The new DistillationTrainer implements on-policy knowledge distillation as described in On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes. It extends the ideas from the GKDTrainer with three key optimizations: a generation buffer that decouples the training microbatch size from the generation batch size (up to 40x speedup), external teacher server support so the teacher doesn't need to fit on training GPUs, and binary-encoded logprob payloads that shrink transfer payloads by ~5x.
from datasets import load_dataset
from trl.experimental.distillation import DistillationConfig, DistillationTrainer
dataset = load_dataset("openai/gsm8k", "main", split="train")
dataset = dataset.map(
lambda x: {"messages": [{"role": "user", "content": x["question"]}]},
remove_columns=dataset.column_names,
)
trainer = DistillationTrainer(
model="Qwen/Qwen2.5-1.5B-Instruct",
teacher_model="Qwen/Qwen2.5-7B-Instruct",
args=DistillationConfig(
output_dir="results/distill-qwen-gsm8k",
lmbda=1.0,                   # fully on-policy (student generates)
beta=1.0,                    # reverse KL
teacher_model_init_kwargs={"torch_dtype": "bfloat16"},
),
train_dataset=dataset,
)
trainer.train()
by @cmpatino in huggingface/trl#5407, huggingface/trl#5500 and huggingface/trl#5501

Chunked LM head for memory-efficient log-prob computation in AsyncGRPOTrainer

AsyncGRPOTrainer now supports a chunked LM-head path that computes per-token log-probs and entropy via online logsumexp without materializing the full [N, V] logits tensor. Combined with completion_mask filtering to skip prompt tokens, this brings massive memory savings on long sequences — up to 44x lower peak-allocated memory on an 8192-token sequence:

chunk_lm_head_size Peak Alloc (GB) Reduction Wall Time (ms)

None (baseline) 18.55 1.00x 808.7

4096 0.42 44.32x 459.0

8192 0.76 24.34x 393.0

Enable it via the new chunk_lm_head_size option in AsyncGRPOConfig:
</tr></table> 

... (truncated)

Commits

3179965 Release: v1.1 (#5524)
d6d5efc feat: add Qwen2.5 training chat template with generation markers (#5522)
ca995b4 Add docs and good defaults for DistillationTrainer (#5500)
c73c2ec Add Qwen3-VL tool calling support (#5469)
9c8e191 Add GLM-4-MoE tool calling support (#5463)
dbd3fac feat: add Llama 3 training chat template with generation markers (#5493)
f2925a8 Add trackio support to DistillationTrainer (#5501)
d4caab8 Fix prepare_multimodal_messages not normalizing empty string content for assi...
b48c788 [docs] Add code example for completion_only_loss in SFT trainer docs (#5494)
d4e8354 Update GitHub Action to use specific version of github-script (#5491)
Additional commits viewable in compare view

codacy-production · 2026-04-13T22:29:22Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes. Give us feedback}

Updates the requirements on [trl](https://github.com/huggingface/trl) to permit the latest version. - [Release notes](https://github.com/huggingface/trl/releases) - [Changelog](https://github.com/huggingface/trl/blob/main/RELEASE.md) - [Commits](huggingface/trl@v0.2.0...v1.1.0) --- updated-dependencies: - dependency-name: trl dependency-version: 1.1.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot · 2026-04-28T00:54:52Z

Superseded by #644.

dependabot Bot added the python-dependencies label Apr 13, 2026

dependabot Bot mentioned this pull request Apr 13, 2026

build: update trl requirement from <=0.21.0 to <=1.0.0 #604

Closed

dependabot Bot force-pushed the dependabot/pip/trl-lte-1.1.0 branch from 579d2ce to 424f528 Compare April 16, 2026 13:43

dependabot Bot force-pushed the dependabot/pip/trl-lte-1.1.0 branch from 424f528 to cc09448 Compare April 21, 2026 15:32

dependabot Bot closed this Apr 28, 2026

dependabot Bot deleted the dependabot/pip/trl-lte-1.1.0 branch April 28, 2026 00:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build: update trl requirement from <=0.21.0 to <=1.1.0#625

build: update trl requirement from <=0.21.0 to <=1.1.0#625
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/trl-lte-1.1.0

dependabot Bot commented on behalf of github Apr 13, 2026 •

edited

Loading

Uh oh!

codacy-production Bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

dependabot Bot commented on behalf of github Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

`chunk_lm_head_size`	Peak Alloc (GB)	Reduction	Wall Time (ms)
`None` (baseline)	18.55	1.00x	808.7
`4096`	0.42	44.32x	459.0
`8192`	0.76	24.34x	393.0

Conversation

dependabot Bot commented on behalf of github Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

v1.1.0

Features

DistillationTrainer for efficient on-policy distillation

Chunked LM head for memory-efficient log-prob computation in AsyncGRPOTrainer

Uh oh!

codacy-production Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

dependabot Bot commented on behalf of github Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

dependabot Bot commented on behalf of github Apr 13, 2026 •

edited

Loading

`DistillationTrainer` for efficient on-policy distillation

Chunked LM head for memory-efficient log-prob computation in `AsyncGRPOTrainer`

codacy-production Bot commented Apr 13, 2026 •

edited

Loading