[WIP] Fix triton RNN kernel#3785
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3785
Note: Links to docs will display an error until the docs builds have been completed. ❌ 8 New Failures, 2 Unrelated FailuresAs of commit 79a2eb2 with merge base d596150 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
4 similar comments
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
Inductor's Triton 3.6 frontend returned ``None`` when typing the 4-term
inline cell update ``c = f * c + i * g`` in the LSTM forward kernel
(observed inside torch.compile graphs), surfacing as
``AttributeError("'NoneType' object has no attribute 'type'")`` at the
``i``-token position. Rename the single-letter gate variables and split
the cell update into two adds so each subexpression is typed
independently.
Adds an optional ``log_env_packages`` flag (default ``True``) to :class:`~torchrl.record.loggers.WandbLogger` that snapshots the Python runtime, installed package versions, and editable source locations into ``wandb.config["env"]`` at run creation. Makes wandb runs comparable across environments without manual book-keeping; covers the pip-installed distribution set and surfaces editable installs via PEP 610 ``direct_url.json``. Plumbed through the Hydra ``WandbLoggerConfig`` and exercised by ``test_logs_env_packages``.
Adds two off-by-default flags to the Isaac Lab recurrent PPO example: - ``--compile-gae``: wraps the ``GAE`` advantage module with :func:`torch.compile`, mirroring the existing ``--compile-update`` switch and exercising the value path under ``torch.compile`` when the scan/triton recurrent backends are used. - ``--env-compile``: forwards ``compile=True`` through ``make_env`` to the ``TransformedEnv`` constructor, opting into the recently added env-side ``step_and_maybe_reset`` compilation. Both flags also flip ``torch._dynamo.config.capture_scalar_outputs`` so the existing recurrent-state branching paths trace cleanly.
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
Compiling ``TransformedEnv.step_and_maybe_reset`` via the ``compile=True`` env constructor kwarg crashed with ``RecursionError`` while dynamo traced over the env. The trigger was ``nn.Module.__dir__`` reading ``self._parameters`` / ``_buffers`` / ``_modules`` through the descriptor protocol; when those reads fell through ``__getattr__``, the existing fallback hit ``"base_env" in self.__dir__()``, which re-entered ``__getattr__`` for ``_parameters`` and infinite-looped under the tracer. Short-circuit any attribute starting with ``_`` (already done for ``__``-prefixed names) before the ``__dir__``-based base-env lookup so ``nn.Module``'s own internals raise ``AttributeError`` immediately. Forwarding private attributes to ``base_env`` was never the intent; the public-attribute path is unchanged. Add a regression test that calls ``__getattr__`` directly on the nn.Module internals and asserts ``base_env`` is still present in ``dir(env)``.
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
Replace boolean --env-compile with --env-compile-warmup N (default 0,
disabled). When N>0, pass compile_env={"warmup": N} to make_env so the
first N step_and_maybe_reset calls run eagerly and populate
Transform.parent / done_keys / obs_keys caches before torch.compile
traces.
This avoids the failure mode where dynamo wraps a half-initialised
TransformedEnv built by Compose._rebuild_up_to during the compiled
trace and trips an AttributeError on _parameters in nn.Module
machinery.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
Avoid passing a TensorDict keys view directly through starred arguments in TransformedEnv._reset. Dynamo cannot unpack that user-defined iterable when tracing the compiled reset path, so turn it into a plain list before calling select. Co-authored-by: Cursor <cursoragent@cursor.com>
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
Plumb a configurable `tl.dot` precision through the triton GRU/LSTM kernels and align the wrapper-side `F.linear` and `dW` matmuls. Adds a new ``recurrent_matmul_precision`` kwarg on ``LSTMModule``/``GRUModule`` with modes ``"ieee"``, ``"tf32"``, ``"tf32x3"`` and ``"auto"``; a global setter ``set_recurrent_matmul_precision`` and a ``TORCHRL_RNN_PRECISION`` env var; and an ``"auto"`` resolution that falls back to ``torch.get_float32_matmul_precision`` with ``"high"`` mapped to ``"tf32x3"``. This fixes the silent precision mismatch where the recurrent matmul ran TF32 (Triton default) while the input matmul ran IEEE FP32 (cuBLAS default with ``allow_tf32=False``), which produced a small but systematic gradient bias in long-T RL rollouts and pushed the triton training curve below the scan curve on long PPO runs. Tests: - 9 new CPU tests for the global setter, env var, validation, resolution precedence and the cuBLAS context manager (flip + restore + exception safety). - 6 new GPU-only parametrized backward parity tests (``ieee`` / ``tf32`` / ``tf32x3``) on ``LSTMModule`` and ``GRUModule``. - 1 new GPU-only test that the per-module kwarg beats the process-global override. Authored with Claude. Co-authored-by: Cursor <cursoragent@cursor.com>
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
The GPU CI runner uses ``-m gpu`` to filter pytest collection, which means every test gated only by ``skipif(not _has_triton, ...)`` was being deselected on GPU runners and skipped on CPU runners — i.e. not covered by CI at all. This is what allowed the triton recurrent matmul precision bug to ship unnoticed. Add ``@pytest.mark.gpu`` to all 15 triton-gated tests in ``test/modules/test_rnn.py`` (existing ones plus the precision-control tests added in the previous commit) so they get picked up by the ``tests-gpu`` job, shard 3. After this change pytest reports 40 GPU-marked tests vs 187 CPU tests in this file. Authored with Claude. Co-authored-by: Cursor <cursoragent@cursor.com>
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
No description provided.