[Feature] chunk actor logprob computation for memory saving by tina-wen · Pull Request #1555 · InternLM/xtuner

tina-wen · 2026-03-10T12:32:36Z

Description

This PR adds chunking along the seq_len dimension when computing actor_logprob.
The computation now processes in chunks iteratively, significantly reducing peak memory usage.

Key Changes

Add chunking logic for seq_len dimension in actor_logprob computation
Process logprob calculation in chunks to trade compute for memory
Configurable via chunk_size in WorkerConfig.loss_cfg (BaseRLLossConfig, default: 1024)

HAOCHENYE · 2026-03-17T09:31:33Z

@claude review

HAOCHENYE · 2026-03-16T07:09:53Z

xtuner/v1/rl/base/worker.py

-            output = self._engine.forward_only(seq_ctx=seq_ctx)
-            old_logprobs = gather_logprobs(output["logits"], shifted_labels)
+            if self.logprob_chunk_size is not None:
+                loss_ctx = LogProbContext(chunk_size=self.logprob_chunk_size, shifted_labels=shifted_labels)


Please add LogProbContextConfig like CELossConfig, loss_ctx should be built by loss_cfg.build

xtuner/v1/rl/base/worker.py

xtuner/v1/loss/rl_loss.py

xtuner/v1/engine/train_engine.py

claude · 2026-03-17T09:37:35Z

xtuner/v1/loss/rl_loss.py

+from xtuner.v1.rl.utils import gather_logprobs
+
+
+class LogProbContext:


Claude: Nit: Public class LogProbContext is missing a docstring (Google Style per CLAUDE.md). Should document its purpose, the chunk_size and shifted_labels parameters, and that forward() returns log-probabilities (not a traditional loss).

claude · 2026-03-17T09:37:48Z

Claude: ## Summary

Adds chunked log-probability computation along the seq_len dimension to reduce peak memory during compute_actor_logprobs. New LogProbContext class processes logits in configurable chunks via WorkerConfig.loss_cfg.chunk_size.

Issues

Critical

[worker.py:380-383] Fallback path broken when logprob_chunk_size is None: The else branch sets loss_ctx = None, but the code unconditionally reads output["loss"] which will be None. The original gather_logprobs(output["logits"], shifted_labels) call was removed without a replacement for the non-chunked path.

Warning

[rl_loss.py:7] LogProbContext does not inherit from BaseLossContext, breaking the type contract expected by LMHead and the model's __call__ signature. Should follow the established Config/Context pattern (as the other reviewer also noted).
[rl_loss.py:25] Return value (loss, (None, None)) doesn't match LMHead's return contract — inner tuple second element should be {} not None.
[rl_loss.py:19] Variable named loss actually holds log-probabilities — misleading.
[train_engine.py:173] Missing type annotation on loss_ctx parameter; should also default to None for backward compatibility.

Nit

[rl_loss.py:7] Public class LogProbContext missing Google Style docstring.

Verdict

REQUEST_CHANGES

YanhuiDua · 2026-03-19T03:22:47Z

xtuner/v1/rl/base/worker.py

+                mode = "chunk"
+            else:
+                mode = "eager"
+            loss_ctx = LogProbConfig(chunk_size=self.logprob_chunk_size, mode=mode).build(


build方法是不是应该init时调用呢？

HAOCHENYE added the npu label Mar 11, 2026

tina-wen force-pushed the rl_chunk_logprobs branch from 17da095 to 3368beb Compare March 17, 2026 09:10

HAOCHENYE reviewed Mar 17, 2026

View reviewed changes