Skip to content

[Doc] Memory-efficient RL training tutorial + cross-refs#3745

Open
vmoens wants to merge 1 commit into
pytorch:mainfrom
vmoens:feature/memory-efficient-docs
Open

[Doc] Memory-efficient RL training tutorial + cross-refs#3745
vmoens wants to merge 1 commit into
pytorch:mainfrom
vmoens:feature/memory-efficient-docs

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented May 12, 2026

Summary

Ties together the three recently-merged memory-efficiency PRs into a
single story:

Two parts:

1. Runnable Sphinx-gallery tutorial at
tutorials/sphinx-tutorials/memory_efficient_rl.py. Sections:

  • Where the observation memory goes (concrete td.bytes() numbers)
  • Why ("next", obs) is kept by default — bootstrap target at
    trajectory ends, MultiStepTransform n-step fallback
  • Knob 1 — SyncDataCollector(compact_obs=True)
  • Knob 2 — NextStateReconstructor with the traj_id + done contract
  • Knob 2.5 — value-estimator NaN safety
    (_sanitize_next_obs_nan), GAE finite everywhere
  • When not to take this path — MultiStepTransform incompatibility,
    the V(obs[t]) ≈ V(real_next_obs) approximation at truncated steps,
    and how shifted=True interacts
  • Knob 3 — LazyMemmapStorage for buffers ≥ VRAM
  • Knob 4 — SliceSampler + the new "scan" / "triton"
    recurrent backends for padding-free sequence training
  • End-to-end pipeline snippet
  • Conclusion + Further reading

Runs end-to-end on CPU (CartPole-v1, 200 frames; <2s wall) and reports
the byte-level savings concretely from td.bytes().

2. Docstring cross-references so a reader landing on any of the
three new APIs finds the other two:

  • Collector(compact_obs=…) (and the multi-process collectors):
    pointers to NextStateReconstructor, the value-estimator
    sanitizer, the MultiStepTransform incompatibility note, and the
    new tutorial.
  • NextStateReconstructor: .. seealso:: block covering
    compact_obs, the sanitizer, MultiStepTransform, and the
    tutorial.
  • ValueEstimatorBase._sanitize_next_obs_nan: .. seealso::
    block to compact_obs, NextStateReconstructor, and the
    tutorial.

docs/source/index.rst registers the new tutorial under "Basics".

Test plan

  • Tutorial runs end-to-end with concrete output (memory savings
    reported, NaN at slice boundaries confirmed to coincide with
    trajectory boundaries, GAE advantage finite everywhere, memmap
    roundtrip works).
  • All cross-references resolve to existing public symbols
    (verified by reading the rendered class docstrings via
    Collector.__doc__ etc.).

🤖 Generated with Claude Code

New tutorial under tutorials/sphinx-tutorials/memory_efficient_rl.py
that ties together the three recent memory-efficiency PRs:
  - compact_obs flag on the collector (pytorch#3742)
  - NextStateReconstructor RB transform (pytorch#3743)
  - NaN-safe value-estimator forward (pytorch#3744)

The tutorial walks through:
  - Where the observation memory goes and why TorchRL keeps both
    obs and ("next", obs) by default (bootstrap targets, MultiStep
    n-step fallback)
  - Knob 1: SyncDataCollector(compact_obs=True) — halves the obs
    footprint at the producer side
  - Knob 2: NextStateReconstructor — rebuilds ("next", obs) at
    sampling time, NaN at trajectory ends
  - Knob 2.5: ValueEstimatorBase._sanitize_next_obs_nan keeps GAE/TD
    targets numerically defined
  - When NOT to take this path: MultiStepTransform, truncated
    transitions where the V(obs[t]) ≈ V(real_next_obs) approximation
    is unacceptable
  - Knob 3: LazyMemmapStorage for buffers larger than VRAM
  - Knob 4: SliceSampler + scan/Triton recurrent backends for
    padding-free sequence training
  - End-to-end pipeline snippet

The tutorial runs end-to-end on CPU (CartPole-v1, 200 frames) and
reports concrete byte-level savings from `td.bytes()`.

Cross-references added to:
  - SyncDataCollector / MultiSyncCollector / MultiAsyncCollector
    (`compact_obs` docstring) — pointers to NextStateReconstructor,
    the value-estimator sanitizer, MultiStep incompatibility note,
    and the new tutorial.
  - NextStateReconstructor — `.. seealso::` block to compact_obs,
    the sanitizer, MultiStep incompatibility, and the tutorial.
  - ValueEstimatorBase._sanitize_next_obs_nan — `.. seealso::` to
    compact_obs, NextStateReconstructor, and the tutorial.

docs/source/index.rst — register the new tutorial under "Basics".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 12, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3745

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures, 2 Pending

As of commit bf4a3f6 with merge base cc31dc3 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Collectors Documentation Improvements or additions to documentation Integrations/torch_geometric Integrations Objectives Transforms tutorials/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant