Skip to content

build: update trl requirement from <=0.21.0 to <=1.1.0#625

Closed
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/trl-lte-1.1.0
Closed

build: update trl requirement from <=0.21.0 to <=1.1.0#625
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/trl-lte-1.1.0

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github Apr 13, 2026

Updates the requirements on trl to permit the latest version.

Release notes

Sourced from trl's releases.

v1.1.0

Features

DistillationTrainer for efficient on-policy distillation

Read the blog post: https://huggingface.co/spaces/HuggingFaceTB/trl-distillation-trainer

off_vs_on_policy_distillation yePX-mwe_1umXK5

The new DistillationTrainer implements on-policy knowledge distillation as described in On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes. It extends the ideas from the GKDTrainer with three key optimizations: a generation buffer that decouples the training microbatch size from the generation batch size (up to 40x speedup), external teacher server support so the teacher doesn't need to fit on training GPUs, and binary-encoded logprob payloads that shrink transfer payloads by ~5x.

from datasets import load_dataset
from trl.experimental.distillation import DistillationConfig, DistillationTrainer
dataset = load_dataset("openai/gsm8k", "main", split="train")
dataset = dataset.map(
lambda x: {"messages": [{"role": "user", "content": x["question"]}]},
remove_columns=dataset.column_names,
)
trainer = DistillationTrainer(
model="Qwen/Qwen2.5-1.5B-Instruct",
teacher_model="Qwen/Qwen2.5-7B-Instruct",
args=DistillationConfig(
output_dir="results/distill-qwen-gsm8k",
lmbda=1.0,                   # fully on-policy (student generates)
beta=1.0,                    # reverse KL
teacher_model_init_kwargs={"torch_dtype": "bfloat16"},
),
train_dataset=dataset,
)
trainer.train()

by @​cmpatino in huggingface/trl#5407, huggingface/trl#5500 and huggingface/trl#5501

Chunked LM head for memory-efficient log-prob computation in AsyncGRPOTrainer

AsyncGRPOTrainer now supports a chunked LM-head path that computes per-token log-probs and entropy via online logsumexp without materializing the full [N, V] logits tensor. Combined with completion_mask filtering to skip prompt tokens, this brings massive memory savings on long sequences — up to 44x lower peak-allocated memory on an 8192-token sequence:

chunk_lm_head_size Peak Alloc (GB) Reduction Wall Time (ms)
None (baseline) 18.55 1.00x 808.7
4096 0.42 44.32x 459.0
8192 0.76 24.34x 393.0

Enable it via the new chunk_lm_head_size option in AsyncGRPOConfig:

</tr></table> 

... (truncated)

Commits
  • 3179965 Release: v1.1 (#5524)
  • d6d5efc feat: add Qwen2.5 training chat template with generation markers (#5522)
  • ca995b4 Add docs and good defaults for DistillationTrainer (#5500)
  • c73c2ec Add Qwen3-VL tool calling support (#5469)
  • 9c8e191 Add GLM-4-MoE tool calling support (#5463)
  • dbd3fac feat: add Llama 3 training chat template with generation markers (#5493)
  • f2925a8 Add trackio support to DistillationTrainer (#5501)
  • d4caab8 Fix prepare_multimodal_messages not normalizing empty string content for assi...
  • b48c788 [docs] Add code example for completion_only_loss in SFT trainer docs (#5494)
  • d4e8354 Update GitHub Action to use specific version of github-script (#5491)
  • Additional commits viewable in compare view

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 13, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes. Give us feedback

@dependabot dependabot Bot force-pushed the dependabot/pip/trl-lte-1.1.0 branch from 579d2ce to 424f528 Compare April 16, 2026 13:43
Updates the requirements on [trl](https://github.com/huggingface/trl) to permit the latest version.
- [Release notes](https://github.com/huggingface/trl/releases)
- [Changelog](https://github.com/huggingface/trl/blob/main/RELEASE.md)
- [Commits](huggingface/trl@v0.2.0...v1.1.0)

---
updated-dependencies:
- dependency-name: trl
  dependency-version: 1.1.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot force-pushed the dependabot/pip/trl-lte-1.1.0 branch from 424f528 to cc09448 Compare April 21, 2026 15:32
@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github Apr 28, 2026

Superseded by #644.

@dependabot dependabot Bot closed this Apr 28, 2026
@dependabot dependabot Bot deleted the dependabot/pip/trl-lte-1.1.0 branch April 28, 2026 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants