fix: avoid applying rollout temperature to critic values by Baiyu-Su · Pull Request #1928 · THUDM/slime

Baiyu-Su · 2026-05-21T02:01:44Z

What

Avoid applying rollout_temperature to critic value-head outputs.

get_responses() is shared by response-aligned policy-logit extraction and value extraction. Temperature scaling is needed for policy logits when reconstructing rollout log probabilities, but critic values are scalar predictions and should not be divided by the rollout sampling temperature.

Changes

Add an apply_temperature flag to get_responses().
Keep temperature scaling enabled by default for existing policy/logprob paths.
Disable temperature scaling in get_values().
Add a zero-GPU unit test for non-unit rollout_temperature.

Tested

ruff check slime/backends/megatron_utils/loss.py tests/test_value_temperature.py
isort --profile=black --filter-files --check-only slime/backends/megatron_utils/loss.py tests/test_value_temperature.py
black --check slime/backends/megatron_utils/loss.py tests/test_value_temperature.py
PYTHONDONTWRITEBYTECODE=1 python -m py_compile slime/backends/megatron_utils/loss.py tests/test_value_temperature.py

Fix value temperature scaling

d2bef9a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: avoid applying rollout temperature to critic values#1928

fix: avoid applying rollout temperature to critic values#1928
Baiyu-Su wants to merge 1 commit into
THUDM:mainfrom
Baiyu-Su:fix-value-temperature-scaling

Baiyu-Su commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Baiyu-Su commented May 21, 2026

What

Changes

Tested

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant