[Feature] Collector.fake_tensordict() / MultiCollector.fake_tensordict() by vmoens · Pull Request #3761 · pytorch/rl

vmoens · 2026-05-15T07:49:29Z

Stack from ghstack (oldest at bottom):

-> [Feature] Collector.fake_tensordict() / MultiCollector.fake_tensordict() #3761
[Example] Isaac Lab RNN PPO with compact memory + knowledge-base notes #3760
[Feature] MultiCollector: track policy version per fresh continue command #3759
[Feature] Collector final_obs: store true boundary next-obs for shifted-GAE #3758
[BugFix] GAE shifted=True: tolerate missing next obs, non-canonical strides, docs #3757
[Feature] timeit.mark_start/mark_end for non-context-manager timing #3756
[BugFix] PolicyVersion: int64 dtype + preserve tensordict device #3755
[Performance] LSTM/GRU scan: canonical strides + cuDNN flat-storage clones + thread-local recurrent mode #3754
[BugFix] Recurrent policy auto-register with policy_factory #3753

Public method that returns a zero-filled tensordict shaped exactly like
one batch yielded by the collector, useful for storage initialization
and torch.compile / cudagraph warmup without having to step the env
or spin up the worker processes first.

On Collector (single):

Reuses the existing _final_rollout template; builds it lazily via
_maybe_make_final_rollout(make_rollout=True) even when
use_buffers=False so the public API is consistent.
Mirrors the rollout post-pipeline: _maybe_attach_final_obs,
_maybe_set_truncated, then _postproc (which runs
split_trajectories, the user postproc, and private-key
exclusion).
Result: env keys + policy out-keys + ("collector", "traj_ids"),
compact_obs exclusions and final_obs UnbatchedTensor
leaves applied, last dim named "time".

On MultiCollector:

Builds a per-worker fake from create_env_fn[0] (mirroring the
legacy replay-buffer init path), applies _add_policy_outputs_to_fake_td,
expands to (*env.batch_size, frames_per_worker), refines "time".
For MultiSyncCollector, stacks num_workers copies along dim 0
(or concatenates along cat_results when an integer was provided);
for MultiAsyncCollector, returns a single worker's shape (async
yields one batch at a time).
Applies split_trajs / postproc / private-key exclusion to
match the iterator pipeline.

Tests pin: shape / names / keys / zero-fill parity between
fake_tensordict() and next(iter(collector)) (with and without
buffers); compact_obs drops ("next", obs) and final_obs
attaches ("final", obs) as UnbatchedTensor; multi-sync stacks
along worker dim 0.

[ghstack-poisoned]

pytorch-bot · 2026-05-15T07:49:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3761

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens · 2026-05-15T08:04:37Z

+        """
+        # Build / borrow one env to read fake_tensordict, compact-obs leaf
+        # keys, and final-obs leaf shapes from.
+        env_fn = self.create_env_fn[0]


I'm not a big fan of this implementation.
We are creating an env in the main process which defies the purpose of the MultiCollector.
If we cannot get the fake data from the inner collector easily, we should just raise a NotImplementedError. But we should not pretend we're doing that (which is the right thing) and do it via a ton of custom code on the main process.

Update

612f857

[ghstack-poisoned]

vmoens mentioned this pull request May 15, 2026

[BugFix] Recurrent policy auto-register with policy_factory #3753

Merged

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 15, 2026

This was referenced May 15, 2026

[Performance] LSTM/GRU scan: canonical strides + cuDNN flat-storage clones + thread-local recurrent mode #3754

Merged

[BugFix] PolicyVersion: int64 dtype + preserve tensordict device #3755

Merged

github-actions Bot added the Feature New feature label May 15, 2026

vmoens mentioned this pull request May 15, 2026

[Feature] timeit.mark_start/mark_end for non-context-manager timing #3756

Closed

github-actions Bot added Collectors Integrations/torch_geometric Integrations and removed Feature New feature labels May 15, 2026

vmoens commented May 15, 2026

View reviewed changes

vmoens closed this May 15, 2026

vmoens deleted the gh/vmoens/277/head branch May 15, 2026 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Collector.fake_tensordict() / MultiCollector.fake_tensordict()#3761

[Feature] Collector.fake_tensordict() / MultiCollector.fake_tensordict()#3761
vmoens wants to merge 1 commit into
gh/vmoens/277/basefrom
gh/vmoens/277/head

vmoens commented May 15, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented May 15, 2026

Uh oh!

vmoens May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vmoens commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented May 15, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3761

❗ 1 Active SEVs

Uh oh!

vmoens May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vmoens commented May 15, 2026 •

edited

Loading