Skip to content

[CI] Selective PR test matrix gated by changed-files + ciflow/* labels#3674

Open
vmoens wants to merge 1 commit intomainfrom
ci/run-selected-tests
Open

[CI] Selective PR test matrix gated by changed-files + ciflow/* labels#3674
vmoens wants to merge 1 commit intomainfrom
ci/run-selected-tests

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Apr 27, 2026

Summary

PRs currently run a heavy matrix on every push: 5 Python versions × CPU, 3 GPU shards, tests-olddeps, tests-optdeps, tests-stable-gpu (3 more shards) + distributed, plus SOTA / tutorials / Windows. This PR prunes that matrix on PRs to only what the diff demands, with ciflow/* labels as opt-in escape hatches. The full matrix is unchanged on push to main / nightly / release/*.

The label-gating pattern already used by test-linux-libs.yml, test-linux-llm.yml, and test-linux-habitat.yml is the model — this PR extends it to the heavy workflows and adds a small prepare job that maps the diff to which GPU shards / peer workflows should run.

Behaviour

Track PR default Full trigger
tests-cpu Python 3.12 only push / ciflow/cpu-matrix / ciflow/full
tests-gpu only the shard(s) whose paths changed (shard 3 fallback) push / ciflow/gpu / ciflow/full
tests-gpu-distributed only if distributed code changed push / ciflow/distributed / ciflow/full
tests-olddeps skipped push / ciflow/olddeps / ciflow/full
tests-optdeps skipped push / ciflow/optdeps / ciflow/full
tests-stable-gpu* skipped push / ciflow/stable / ciflow/full
test-linux-sota only if sota-implementations/** or torchrl/trainers/** changed push / ciflow/sota / ciflow/full
test-linux-tutorials only if tutorials/** changed push / ciflow/tutorials / ciflow/full
test-windows-optdepts only if torchrl/** / test/** / setup.py / pyproject.toml changed push / ciflow/windows / ciflow/full

Docs-only / tutorials/-only / sota-implementations/-only PRs skip the relevant heavy workflows entirely via paths-ignore:.

Implementation

  • .github/scripts/ci_decide.sh — centralised precedence logic (push > ciflow/full > per-track labels + file flags). Unit-tested by test_ci_decide.sh (40 assertions, all passing locally).
  • .github/workflows/test-linux.yml — new prepare job using tj-actions/changed-files@v45 to compute shard1/shard2/shard3/distributed change flags. All test jobs gain needs: prepare + if: gates; tests-cpu and tests-gpu matrices are now driven by fromJSON(needs.prepare.outputs.…).
  • Peer workflows (test-linux-sota.yml, test-linux-tutorials.yml, test-windows-optdepts.yml) — small changes job + label/file if: gate.
  • run_all.sh — new optional TORCHRL_TEST_PATHS env var that short-circuits the shard dispatch and passes its value verbatim to pytest. Useful for workflow_dispatch runs and local reproduction.
  • .github/labels.yml + create_labels.sh — 10 new ciflow/* labels registered (already created on the repo).
  • .github/CI.md — end-to-end docs for contributors.

Test plan

  • `bash .github/scripts/test_ci_decide.sh` → 40 passed, 0 failed
  • `actionlint` clean on all 4 modified workflows
  • `bash -n` clean on `run_all.sh`
  • All 10 `ciflow/*` labels created on `pytorch/rl`
  • After merge, open a draft PR touching only `torchrl/envs/transforms/vec_norm.py`. Expect: `tests-cpu` matrix is `["3.12"]`; `tests-gpu` matrix is `["1"]`; `tests-olddeps` / `tests-optdeps` / `tests-stable-gpu*` / sota / tutorials / windows all skipped.
  • After merge, open a draft PR editing only a `.md`. Expect: `test-linux.yml`, `test-linux-sota.yml`, `test-linux-tutorials.yml`, `test-windows-optdepts.yml` do not enqueue.
  • After merge, open a draft PR with `ciflow/full`. Expect: pre-change baseline matrix runs.
  • Verify the next push to `main` runs the full matrix (no regression vs. today's behaviour).

Rollback

All changes are additive `needs:` / `if:` conditions plus a new `prepare` job; reverting this PR restores current behaviour.

PRs now run only the tracks relevant to the diff (Python 3.12 CPU + a
filtered subset of GPU shards), with the full matrix preserved on push to
main / nightly / release/*. Maintainers can pull individual tracks back in
on a PR by applying ciflow/full, ciflow/cpu-matrix, ciflow/gpu, ciflow/stable,
ciflow/olddeps, ciflow/optdeps, ciflow/distributed, ciflow/sota,
ciflow/tutorials, or ciflow/windows.

- New prepare job in test-linux.yml uses tj-actions/changed-files to compute
  shard flags; .github/scripts/ci_decide.sh centralises the precedence logic
  (push > ciflow/full > per-track labels + file flags) and is unit-tested by
  test_ci_decide.sh (40 assertions).
- Heavy peer workflows (test-linux-sota, test-linux-tutorials,
  test-windows-optdepts) gain a small changes job + if: gate using the same
  pattern.
- run_all.sh gains an optional TORCHRL_TEST_PATHS env var that short-circuits
  the shard dispatch and runs pytest against arbitrary paths/expressions, for
  workflow_dispatch and local reproduction.
- New ciflow/* labels registered in labels.yml and added to create_labels.sh.
- .github/CI.md documents the gating model end-to-end.
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 27, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3674

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures

As of commit 1ed531b with merge base 306cb92 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 27, 2026
@github-actions github-actions Bot added the CI Has to do with CI setup (e.g. wheels & builds, tests...) label Apr 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 79.2588μs 78.6170μs 12.7199 KOps/s 12.4515 KOps/s $\color{#35bf28}+2.16\%$
test_tensor_to_bytestream_speed[torch.save] 0.1380ms 0.1372ms 7.2903 KOps/s 7.2745 KOps/s $\color{#35bf28}+0.22\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1041s 0.1038s 9.6306 Ops/s 9.7535 Ops/s $\color{#d91a1a}-1.26\%$
test_tensor_to_bytestream_speed[numpy] 2.4741μs 2.4687μs 405.0666 KOps/s 401.0089 KOps/s $\color{#35bf28}+1.01\%$
test_tensor_to_bytestream_speed[safetensors] 38.1069μs 37.9405μs 26.3571 KOps/s 28.1660 KOps/s $\textbf{\color{#d91a1a}-6.42\%}$
test_simple 0.5314s 0.5309s 1.8836 Ops/s 1.7817 Ops/s $\textbf{\color{#35bf28}+5.72\%}$
test_transformed 1.0678s 1.0613s 0.9422 Ops/s 0.9145 Ops/s $\color{#35bf28}+3.03\%$
test_serial 1.6902s 1.6813s 0.5948 Ops/s 0.5963 Ops/s $\color{#d91a1a}-0.26\%$
test_parallel 1.1185s 1.0298s 0.9711 Ops/s 0.9951 Ops/s $\color{#d91a1a}-2.42\%$
test_step_mdp_speed[True-True-True-True-True] 0.4568ms 41.3272μs 24.1971 KOps/s 24.6501 KOps/s $\color{#d91a1a}-1.84\%$
test_step_mdp_speed[True-True-True-True-False] 0.4516ms 22.8452μs 43.7729 KOps/s 44.6476 KOps/s $\color{#d91a1a}-1.96\%$
test_step_mdp_speed[True-True-True-False-True] 58.5810μs 23.5595μs 42.4457 KOps/s 43.4322 KOps/s $\color{#d91a1a}-2.27\%$
test_step_mdp_speed[True-True-True-False-False] 0.4489ms 12.4289μs 80.4574 KOps/s 79.6056 KOps/s $\color{#35bf28}+1.07\%$
test_step_mdp_speed[True-True-False-True-True] 0.4739ms 43.0758μs 23.2149 KOps/s 22.5174 KOps/s $\color{#35bf28}+3.10\%$
test_step_mdp_speed[True-True-False-True-False] 54.6210μs 24.8474μs 40.2456 KOps/s 40.0571 KOps/s $\color{#35bf28}+0.47\%$
test_step_mdp_speed[True-True-False-False-True] 0.4513ms 25.6519μs 38.9834 KOps/s 38.5796 KOps/s $\color{#35bf28}+1.05\%$
test_step_mdp_speed[True-True-False-False-False] 0.4352ms 15.2032μs 65.7755 KOps/s 65.6278 KOps/s $\color{#35bf28}+0.23\%$
test_step_mdp_speed[True-False-True-True-True] 70.4220μs 46.1096μs 21.6875 KOps/s 21.5726 KOps/s $\color{#35bf28}+0.53\%$
test_step_mdp_speed[True-False-True-True-False] 0.4579ms 27.6516μs 36.1643 KOps/s 36.6850 KOps/s $\color{#d91a1a}-1.42\%$
test_step_mdp_speed[True-False-True-False-True] 0.4520ms 25.5616μs 39.1212 KOps/s 37.9302 KOps/s $\color{#35bf28}+3.14\%$
test_step_mdp_speed[True-False-True-False-False] 48.4110μs 15.1873μs 65.8447 KOps/s 65.2775 KOps/s $\color{#35bf28}+0.87\%$
test_step_mdp_speed[True-False-False-True-True] 0.4808ms 48.5107μs 20.6140 KOps/s 20.3249 KOps/s $\color{#35bf28}+1.42\%$
test_step_mdp_speed[True-False-False-True-False] 0.4606ms 30.4417μs 32.8497 KOps/s 32.9522 KOps/s $\color{#d91a1a}-0.31\%$
test_step_mdp_speed[True-False-False-False-True] 0.4574ms 28.6272μs 34.9318 KOps/s 34.8378 KOps/s $\color{#35bf28}+0.27\%$
test_step_mdp_speed[True-False-False-False-False] 54.0010μs 17.7051μs 56.4808 KOps/s 56.3036 KOps/s $\color{#35bf28}+0.31\%$
test_step_mdp_speed[False-True-True-True-True] 0.4689ms 45.9624μs 21.7569 KOps/s 21.3154 KOps/s $\color{#35bf28}+2.07\%$
test_step_mdp_speed[False-True-True-True-False] 2.3549ms 27.7872μs 35.9878 KOps/s 36.0292 KOps/s $\color{#d91a1a}-0.11\%$
test_step_mdp_speed[False-True-True-False-True] 0.4534ms 29.5408μs 33.8514 KOps/s 33.2144 KOps/s $\color{#35bf28}+1.92\%$
test_step_mdp_speed[False-True-True-False-False] 54.8310μs 16.4862μs 60.6570 KOps/s 59.9761 KOps/s $\color{#35bf28}+1.14\%$
test_step_mdp_speed[False-True-False-True-True] 0.4706ms 48.0782μs 20.7994 KOps/s 20.2121 KOps/s $\color{#35bf28}+2.91\%$
test_step_mdp_speed[False-True-False-True-False] 0.4514ms 29.7689μs 33.5921 KOps/s 32.6416 KOps/s $\color{#35bf28}+2.91\%$
test_step_mdp_speed[False-True-False-False-True] 64.4310μs 31.3534μs 31.8944 KOps/s 31.9804 KOps/s $\color{#d91a1a}-0.27\%$
test_step_mdp_speed[False-True-False-False-False] 0.4407ms 18.8495μs 53.0519 KOps/s 52.4271 KOps/s $\color{#35bf28}+1.19\%$
test_step_mdp_speed[False-False-True-True-True] 0.1250ms 49.8532μs 20.0589 KOps/s 19.2780 KOps/s $\color{#35bf28}+4.05\%$
test_step_mdp_speed[False-False-True-True-False] 65.5310μs 32.3879μs 30.8758 KOps/s 30.7874 KOps/s $\color{#35bf28}+0.29\%$
test_step_mdp_speed[False-False-True-False-True] 59.6710μs 31.2557μs 31.9942 KOps/s 31.3337 KOps/s $\color{#35bf28}+2.11\%$
test_step_mdp_speed[False-False-True-False-False] 95.3820μs 18.5985μs 53.7678 KOps/s 52.3202 KOps/s $\color{#35bf28}+2.77\%$
test_step_mdp_speed[False-False-False-True-True] 0.4646ms 52.3980μs 19.0847 KOps/s 18.8150 KOps/s $\color{#35bf28}+1.43\%$
test_step_mdp_speed[False-False-False-True-False] 0.4576ms 35.2812μs 28.3437 KOps/s 28.1009 KOps/s $\color{#35bf28}+0.86\%$
test_step_mdp_speed[False-False-False-False-True] 62.0510μs 33.1475μs 30.1682 KOps/s 30.2262 KOps/s $\color{#d91a1a}-0.19\%$
test_step_mdp_speed[False-False-False-False-False] 48.7010μs 21.5029μs 46.5054 KOps/s 46.4907 KOps/s $\color{#35bf28}+0.03\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7078s 0.7007s 1.4271 Ops/s 1.3622 Ops/s $\color{#35bf28}+4.77\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.6997s 0.5891s 1.6976 Ops/s 1.6627 Ops/s $\color{#35bf28}+2.10\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.6918s 1.6094s 0.6213 Ops/s 0.6167 Ops/s $\color{#35bf28}+0.75\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4751s 1.3867s 0.7211 Ops/s 0.7140 Ops/s $\color{#35bf28}+1.00\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9441s 1.8535s 0.5395 Ops/s 0.5367 Ops/s $\color{#35bf28}+0.53\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7157s 1.6316s 0.6129 Ops/s 0.6062 Ops/s $\color{#35bf28}+1.11\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6482s 4.5495s 0.2198 Ops/s 0.2184 Ops/s $\color{#35bf28}+0.65\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.4761s 4.3451s 0.2301 Ops/s 0.2306 Ops/s $\color{#d91a1a}-0.20\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9173s 1.8422s 0.5428 Ops/s 0.5399 Ops/s $\color{#35bf28}+0.55\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6975s 1.5874s 0.6300 Ops/s 0.6438 Ops/s $\color{#d91a1a}-2.14\%$
test_values[generalized_advantage_estimate-True-True] 10.1695ms 9.8800ms 101.2146 Ops/s 99.7525 Ops/s $\color{#35bf28}+1.47\%$
test_values[vec_generalized_advantage_estimate-True-True] 19.8827ms 17.6602ms 56.6246 Ops/s 56.9592 Ops/s $\color{#d91a1a}-0.59\%$
test_values[td0_return_estimate-False-False] 0.2174ms 0.1269ms 7.8831 KOps/s 7.9852 KOps/s $\color{#d91a1a}-1.28\%$
test_values[td1_return_estimate-False-False] 28.3369ms 26.6391ms 37.5388 Ops/s 37.5453 Ops/s $\color{#d91a1a}-0.02\%$
test_values[vec_td1_return_estimate-False-False] 18.6128ms 17.7487ms 56.3421 Ops/s 56.6305 Ops/s $\color{#d91a1a}-0.51\%$
test_values[td_lambda_return_estimate-True-False] 41.9088ms 39.8318ms 25.1056 Ops/s 25.3695 Ops/s $\color{#d91a1a}-1.04\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.0571ms 17.6434ms 56.6783 Ops/s 56.5201 Ops/s $\color{#35bf28}+0.28\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.9240ms 8.7017ms 114.9203 Ops/s 114.6010 Ops/s $\color{#35bf28}+0.28\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7533ms 1.4717ms 679.4653 Ops/s 676.5096 Ops/s $\color{#35bf28}+0.44\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.6435ms 0.4025ms 2.4848 KOps/s 2.4945 KOps/s $\color{#d91a1a}-0.39\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.5986ms 34.7957ms 28.7392 Ops/s 29.1356 Ops/s $\color{#d91a1a}-1.36\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.9649ms 1.6999ms 588.2539 Ops/s 589.4209 Ops/s $\color{#d91a1a}-0.20\%$
test_dqn_speed[False-None] 1.4589ms 1.3617ms 734.3720 Ops/s 732.0193 Ops/s $\color{#35bf28}+0.32\%$
test_dqn_speed[False-backward] 1.9333ms 1.8575ms 538.3509 Ops/s 534.0848 Ops/s $\color{#35bf28}+0.80\%$
test_dqn_speed[True-None] 0.9628ms 0.5590ms 1.7889 KOps/s 1.7826 KOps/s $\color{#35bf28}+0.35\%$
test_dqn_speed[True-backward] 1.0534ms 1.0112ms 988.8764 Ops/s 967.1607 Ops/s $\color{#35bf28}+2.25\%$
test_dqn_speed[reduce-overhead-None] 0.9606ms 0.5395ms 1.8534 KOps/s 1.8136 KOps/s $\color{#35bf28}+2.19\%$
test_ddpg_speed[False-None] 3.1573ms 2.7983ms 357.3567 Ops/s 363.5839 Ops/s $\color{#d91a1a}-1.71\%$
test_ddpg_speed[False-backward] 4.3850ms 3.9991ms 250.0577 Ops/s 254.7583 Ops/s $\color{#d91a1a}-1.85\%$
test_ddpg_speed[True-None] 1.6126ms 1.4393ms 694.7610 Ops/s 689.6279 Ops/s $\color{#35bf28}+0.74\%$
test_ddpg_speed[True-backward] 2.4937ms 2.4221ms 412.8698 Ops/s 413.1064 Ops/s $\color{#d91a1a}-0.06\%$
test_ddpg_speed[reduce-overhead-None] 1.7942ms 1.4123ms 708.0615 Ops/s 700.8302 Ops/s $\color{#35bf28}+1.03\%$
test_sac_speed[False-None] 8.5084ms 7.8825ms 126.8633 Ops/s 125.1330 Ops/s $\color{#35bf28}+1.38\%$
test_sac_speed[False-backward] 11.8353ms 11.0752ms 90.2918 Ops/s 90.2084 Ops/s $\color{#35bf28}+0.09\%$
test_sac_speed[True-None] 2.5563ms 2.1622ms 462.4984 Ops/s 448.5891 Ops/s $\color{#35bf28}+3.10\%$
test_sac_speed[True-backward] 4.2517ms 4.0736ms 245.4839 Ops/s 225.7241 Ops/s $\textbf{\color{#35bf28}+8.75\%}$
test_sac_speed[reduce-overhead-None] 2.5515ms 2.1432ms 466.5921 Ops/s 453.0109 Ops/s $\color{#35bf28}+3.00\%$
test_redq_speed[False-None] 14.2016ms 10.4605ms 95.5976 Ops/s 95.3473 Ops/s $\color{#35bf28}+0.26\%$
test_redq_speed[False-backward] 19.5858ms 17.6427ms 56.6807 Ops/s 56.4498 Ops/s $\color{#35bf28}+0.41\%$
test_redq_speed[True-None] 5.0087ms 4.5810ms 218.2924 Ops/s 211.2615 Ops/s $\color{#35bf28}+3.33\%$
test_redq_speed[reduce-overhead-None] 5.6995ms 4.6120ms 216.8243 Ops/s 219.0296 Ops/s $\color{#d91a1a}-1.01\%$
test_redq_deprec_speed[False-None] 11.6729ms 10.9799ms 91.0752 Ops/s 90.5841 Ops/s $\color{#35bf28}+0.54\%$
test_redq_deprec_speed[False-backward] 17.1397ms 15.6393ms 63.9416 Ops/s 63.2256 Ops/s $\color{#35bf28}+1.13\%$
test_redq_deprec_speed[True-None] 4.1510ms 3.6766ms 271.9869 Ops/s 270.5650 Ops/s $\color{#35bf28}+0.53\%$
test_redq_deprec_speed[True-backward] 7.9315ms 7.4721ms 133.8307 Ops/s 131.7414 Ops/s $\color{#35bf28}+1.59\%$
test_redq_deprec_speed[reduce-overhead-None] 3.7562ms 3.6148ms 276.6431 Ops/s 277.9835 Ops/s $\color{#d91a1a}-0.48\%$
test_td3_speed[False-None] 8.1628ms 7.9058ms 126.4894 Ops/s 126.0876 Ops/s $\color{#35bf28}+0.32\%$
test_td3_speed[False-backward] 10.8828ms 10.6687ms 93.7323 Ops/s 93.6880 Ops/s $\color{#35bf28}+0.05\%$
test_td3_speed[True-None] 1.8720ms 1.8267ms 547.4404 Ops/s 548.7830 Ops/s $\color{#d91a1a}-0.24\%$
test_td3_speed[True-backward] 3.7031ms 3.5906ms 278.5030 Ops/s 239.0137 Ops/s $\textbf{\color{#35bf28}+16.52\%}$
test_td3_speed[reduce-overhead-None] 1.8212ms 1.7833ms 560.7608 Ops/s 560.8812 Ops/s $\color{#d91a1a}-0.02\%$
test_cql_speed[False-None] 29.4684ms 25.8870ms 38.6295 Ops/s 38.4896 Ops/s $\color{#35bf28}+0.36\%$
test_cql_speed[False-backward] 37.9435ms 35.0793ms 28.5068 Ops/s 28.6536 Ops/s $\color{#d91a1a}-0.51\%$
test_cql_speed[True-None] 12.7688ms 12.4812ms 80.1202 Ops/s 79.8636 Ops/s $\color{#35bf28}+0.32\%$
test_cql_speed[True-backward] 18.4967ms 18.0967ms 55.2587 Ops/s 55.3521 Ops/s $\color{#d91a1a}-0.17\%$
test_cql_speed[reduce-overhead-None] 13.1021ms 12.5353ms 79.7747 Ops/s 78.8754 Ops/s $\color{#35bf28}+1.14\%$
test_a2c_speed[False-None] 5.6306ms 5.3919ms 185.4636 Ops/s 187.4510 Ops/s $\color{#d91a1a}-1.06\%$
test_a2c_speed[False-backward] 12.1634ms 11.7925ms 84.7994 Ops/s 85.2986 Ops/s $\color{#d91a1a}-0.59\%$
test_a2c_speed[True-None] 4.2438ms 3.8902ms 257.0546 Ops/s 256.2717 Ops/s $\color{#35bf28}+0.31\%$
test_a2c_speed[True-backward] 8.8783ms 8.6707ms 115.3311 Ops/s 112.2510 Ops/s $\color{#35bf28}+2.74\%$
test_a2c_speed[reduce-overhead-None] 4.1940ms 3.8760ms 258.0012 Ops/s 258.4619 Ops/s $\color{#d91a1a}-0.18\%$
test_ppo_speed[False-None] 6.3673ms 5.9932ms 166.8558 Ops/s 167.0476 Ops/s $\color{#d91a1a}-0.11\%$
test_ppo_speed[False-backward] 12.8590ms 12.5257ms 79.8360 Ops/s 79.9074 Ops/s $\color{#d91a1a}-0.09\%$
test_ppo_speed[True-None] 4.0081ms 3.8305ms 261.0636 Ops/s 255.1276 Ops/s $\color{#35bf28}+2.33\%$
test_ppo_speed[True-backward] 9.1290ms 8.7966ms 113.6802 Ops/s 114.3283 Ops/s $\color{#d91a1a}-0.57\%$
test_ppo_speed[reduce-overhead-None] 4.2569ms 3.8090ms 262.5376 Ops/s 262.0976 Ops/s $\color{#35bf28}+0.17\%$
test_reinforce_speed[False-None] 4.9448ms 4.5143ms 221.5166 Ops/s 218.1581 Ops/s $\color{#35bf28}+1.54\%$
test_reinforce_speed[False-backward] 7.6204ms 7.3594ms 135.8800 Ops/s 135.0911 Ops/s $\color{#35bf28}+0.58\%$
test_reinforce_speed[True-None] 3.2258ms 3.0042ms 332.8703 Ops/s 331.3155 Ops/s $\color{#35bf28}+0.47\%$
test_reinforce_speed[True-backward] 8.2007ms 7.9260ms 126.1670 Ops/s 118.4799 Ops/s $\textbf{\color{#35bf28}+6.49\%}$
test_reinforce_speed[reduce-overhead-None] 3.3687ms 2.9863ms 334.8663 Ops/s 334.0941 Ops/s $\color{#35bf28}+0.23\%$
test_iql_speed[False-None] 21.2454ms 19.9893ms 50.0267 Ops/s 50.4064 Ops/s $\color{#d91a1a}-0.75\%$
test_iql_speed[False-backward] 30.9458ms 30.2098ms 33.1019 Ops/s 33.3635 Ops/s $\color{#d91a1a}-0.78\%$
test_iql_speed[True-None] 9.0016ms 8.5673ms 116.7228 Ops/s 116.9477 Ops/s $\color{#d91a1a}-0.19\%$
test_iql_speed[True-backward] 17.0685ms 16.8164ms 59.4658 Ops/s 58.0324 Ops/s $\color{#35bf28}+2.47\%$
test_iql_speed[reduce-overhead-None] 9.0671ms 8.5718ms 116.6611 Ops/s 110.2414 Ops/s $\textbf{\color{#35bf28}+5.82\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0777ms 5.8627ms 170.5697 Ops/s 166.8266 Ops/s $\color{#35bf28}+2.24\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.0482ms 0.2911ms 3.4350 KOps/s 3.0634 KOps/s $\textbf{\color{#35bf28}+12.13\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5256ms 0.2703ms 3.7001 KOps/s 3.0756 KOps/s $\textbf{\color{#35bf28}+20.31\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9609ms 5.6508ms 176.9660 Ops/s 174.6431 Ops/s $\color{#35bf28}+1.33\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.8018ms 0.2893ms 3.4570 KOps/s 3.0468 KOps/s $\textbf{\color{#35bf28}+13.46\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4987ms 0.2684ms 3.7256 KOps/s 2.8844 KOps/s $\textbf{\color{#35bf28}+29.16\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7037ms 1.2653ms 790.3235 Ops/s 745.7970 Ops/s $\textbf{\color{#35bf28}+5.97\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6164ms 1.1628ms 860.0074 Ops/s 780.1531 Ops/s $\textbf{\color{#35bf28}+10.24\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 10.1620ms 5.9355ms 168.4779 Ops/s 169.9821 Ops/s $\color{#d91a1a}-0.88\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8468ms 0.4270ms 2.3418 KOps/s 1.8763 KOps/s $\textbf{\color{#35bf28}+24.81\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7317ms 0.4088ms 2.4464 KOps/s 2.0777 KOps/s $\textbf{\color{#35bf28}+17.75\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.6775ms 5.5873ms 178.9773 Ops/s 174.3806 Ops/s $\color{#35bf28}+2.64\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9570s 1.9446ms 514.2356 Ops/s 2.8024 KOps/s $\textbf{\color{#d91a1a}-81.65\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5552ms 0.3517ms 2.8437 KOps/s 2.9982 KOps/s $\textbf{\color{#d91a1a}-5.15\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.8369ms 5.6082ms 178.3101 Ops/s 175.5063 Ops/s $\color{#35bf28}+1.60\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.6849ms 0.3448ms 2.8999 KOps/s 2.8895 KOps/s $\color{#35bf28}+0.36\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6402ms 0.3115ms 3.2107 KOps/s 2.8556 KOps/s $\textbf{\color{#35bf28}+12.43\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0099ms 5.8366ms 171.3333 Ops/s 170.9969 Ops/s $\color{#35bf28}+0.20\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1078ms 0.4902ms 2.0401 KOps/s 1.9724 KOps/s $\color{#35bf28}+3.43\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6224ms 0.4386ms 2.2801 KOps/s 2.1942 KOps/s $\color{#35bf28}+3.91\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.4483ms 4.9905ms 200.3809 Ops/s 46.5901 Ops/s $\textbf{\color{#35bf28}+330.09\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 11.0998ms 2.1832ms 458.0409 Ops/s 516.8839 Ops/s $\textbf{\color{#d91a1a}-11.38\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.1481ms 0.9132ms 1.0951 KOps/s 805.8727 Ops/s $\textbf{\color{#35bf28}+35.89\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.6523s 17.9946ms 55.5721 Ops/s 197.8238 Ops/s $\textbf{\color{#d91a1a}-71.91\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 11.5047ms 1.9948ms 501.3119 Ops/s 539.7421 Ops/s $\textbf{\color{#d91a1a}-7.12\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 0.9893ms 0.8658ms 1.1550 KOps/s 1.1180 KOps/s $\color{#35bf28}+3.31\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 10.6443ms 5.2846ms 189.2278 Ops/s 190.3927 Ops/s $\color{#d91a1a}-0.61\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 3.9350ms 1.9035ms 525.3411 Ops/s 524.0821 Ops/s $\color{#35bf28}+0.24\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.9639ms 1.0689ms 935.5753 Ops/s 933.2646 Ops/s $\color{#35bf28}+0.25\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 42.7497ms 38.5126ms 25.9655 Ops/s 25.6081 Ops/s $\color{#35bf28}+1.40\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.3769ms 17.9086ms 55.8390 Ops/s 54.7816 Ops/s $\color{#35bf28}+1.93\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 41.6192ms 39.5878ms 25.2603 Ops/s 24.7764 Ops/s $\color{#35bf28}+1.95\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.1495ms 18.4732ms 54.1326 Ops/s 54.2453 Ops/s $\color{#d91a1a}-0.21\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 42.5024ms 40.9969ms 24.3921 Ops/s 23.3577 Ops/s $\color{#35bf28}+4.43\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.4578ms 19.7553ms 50.6192 Ops/s 51.5081 Ops/s $\color{#d91a1a}-1.73\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8634ms 0.2182ms 4.5831 KOps/s 4.3652 KOps/s $\color{#35bf28}+4.99\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.5520ms 1.3766ms 726.4142 Ops/s 711.9213 Ops/s $\color{#35bf28}+2.04\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7521ms 2.3145ms 432.0672 Ops/s 432.9618 Ops/s $\color{#d91a1a}-0.21\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0844ms 2.8861ms 346.4870 Ops/s 338.0426 Ops/s $\color{#35bf28}+2.50\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2155ms 0.1332ms 7.5099 KOps/s 7.4800 KOps/s $\color{#35bf28}+0.40\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3353ms 0.1822ms 5.4886 KOps/s 5.2315 KOps/s $\color{#35bf28}+4.92\%$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9050ms 1.7767ms 562.8393 Ops/s 577.4944 Ops/s $\color{#d91a1a}-2.54\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.3975ms 1.2768ms 783.2076 Ops/s 780.7633 Ops/s $\color{#35bf28}+0.31\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2073ms 1.0964ms 912.1090 Ops/s 909.1688 Ops/s $\color{#35bf28}+0.32\%$
test_collector_stack_then_write[100-img_shape1-atari] 7.5149ms 3.5359ms 282.8097 Ops/s 278.1414 Ops/s $\color{#35bf28}+1.68\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.1685ms 5.6906ms 175.7278 Ops/s 178.2876 Ops/s $\color{#d91a1a}-1.44\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 14.9593ms 7.0552ms 141.7404 Ops/s 143.2483 Ops/s $\color{#d91a1a}-1.05\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4420ms 0.2758ms 3.6258 KOps/s 3.6472 KOps/s $\color{#d91a1a}-0.59\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7026ms 1.5136ms 660.6624 Ops/s 661.2937 Ops/s $\color{#d91a1a}-0.10\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.5748ms 2.4444ms 409.0909 Ops/s 411.2209 Ops/s $\color{#d91a1a}-0.52\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.4247ms 3.0886ms 323.7662 Ops/s 316.1881 Ops/s $\color{#35bf28}+2.40\%$
test_collector_without_rb[100-img_shape0-atari] 32.2848ms 31.8345ms 31.4125 Ops/s 31.0017 Ops/s $\color{#35bf28}+1.33\%$
test_collector_without_rb[200-img_shape1-large_batch] 64.2693ms 62.6401ms 15.9642 Ops/s 15.4474 Ops/s $\color{#35bf28}+3.35\%$
test_collector_with_rb[100-img_shape0-atari] 0.7122s 60.5420ms 16.5175 Ops/s 26.7551 Ops/s $\textbf{\color{#d91a1a}-38.26\%}$
test_collector_with_rb[200-img_shape1-large_batch] 72.2606ms 71.6638ms 13.9541 Ops/s 13.7638 Ops/s $\color{#35bf28}+1.38\%$

@github-actions
Copy link
Copy Markdown
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 82.0306μs 80.8881μs 12.3628 KOps/s 12.4316 KOps/s $\color{#d91a1a}-0.55\%$
test_tensor_to_bytestream_speed[torch.save] 0.1419ms 0.1414ms 7.0710 KOps/s 7.0053 KOps/s $\color{#35bf28}+0.94\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1118s 0.1113s 8.9847 Ops/s 8.9324 Ops/s $\color{#35bf28}+0.59\%$
test_tensor_to_bytestream_speed[numpy] 2.7088μs 2.7041μs 369.8030 KOps/s 377.8188 KOps/s $\color{#d91a1a}-2.12\%$
test_tensor_to_bytestream_speed[safetensors] 37.3690μs 37.0870μs 26.9636 KOps/s 26.2202 KOps/s $\color{#35bf28}+2.84\%$
test_simple 0.7913s 0.7895s 1.2666 Ops/s 1.2100 Ops/s $\color{#35bf28}+4.68\%$
test_transformed 1.3925s 1.3838s 0.7226 Ops/s 0.7024 Ops/s $\color{#35bf28}+2.88\%$
test_serial 2.3188s 2.3166s 0.4317 Ops/s 0.4242 Ops/s $\color{#35bf28}+1.75\%$
test_parallel 1.9330s 1.8384s 0.5439 Ops/s 0.5570 Ops/s $\color{#d91a1a}-2.35\%$
test_step_mdp_speed[True-True-True-True-True] 0.1506ms 40.5242μs 24.6766 KOps/s 23.6588 KOps/s $\color{#35bf28}+4.30\%$
test_step_mdp_speed[True-True-True-True-False] 51.5100μs 22.8872μs 43.6925 KOps/s 43.8075 KOps/s $\color{#d91a1a}-0.26\%$
test_step_mdp_speed[True-True-True-False-True] 52.0410μs 23.3435μs 42.8385 KOps/s 43.2028 KOps/s $\color{#d91a1a}-0.84\%$
test_step_mdp_speed[True-True-True-False-False] 37.6600μs 12.6896μs 78.8044 KOps/s 79.8916 KOps/s $\color{#d91a1a}-1.36\%$
test_step_mdp_speed[True-True-False-True-True] 67.6210μs 43.8373μs 22.8116 KOps/s 22.7503 KOps/s $\color{#35bf28}+0.27\%$
test_step_mdp_speed[True-True-False-True-False] 51.8510μs 25.4447μs 39.3010 KOps/s 40.0821 KOps/s $\color{#d91a1a}-1.95\%$
test_step_mdp_speed[True-True-False-False-True] 54.2110μs 25.8040μs 38.7537 KOps/s 38.7063 KOps/s $\color{#35bf28}+0.12\%$
test_step_mdp_speed[True-True-False-False-False] 45.4800μs 15.3071μs 65.3290 KOps/s 65.8244 KOps/s $\color{#d91a1a}-0.75\%$
test_step_mdp_speed[True-False-True-True-True] 79.8410μs 47.1072μs 21.2282 KOps/s 21.1963 KOps/s $\color{#35bf28}+0.15\%$
test_step_mdp_speed[True-False-True-True-False] 63.1110μs 28.2127μs 35.4450 KOps/s 35.8260 KOps/s $\color{#d91a1a}-1.06\%$
test_step_mdp_speed[True-False-True-False-True] 57.0110μs 26.4272μs 37.8397 KOps/s 38.6676 KOps/s $\color{#d91a1a}-2.14\%$
test_step_mdp_speed[True-False-True-False-False] 44.6900μs 15.3484μs 65.1532 KOps/s 65.9954 KOps/s $\color{#d91a1a}-1.28\%$
test_step_mdp_speed[True-False-False-True-True] 88.9310μs 49.4873μs 20.2072 KOps/s 20.3055 KOps/s $\color{#d91a1a}-0.48\%$
test_step_mdp_speed[True-False-False-True-False] 57.9610μs 30.6947μs 32.5789 KOps/s 33.3069 KOps/s $\color{#d91a1a}-2.19\%$
test_step_mdp_speed[True-False-False-False-True] 55.6010μs 29.3089μs 34.1193 KOps/s 34.5927 KOps/s $\color{#d91a1a}-1.37\%$
test_step_mdp_speed[True-False-False-False-False] 51.9510μs 17.7883μs 56.2168 KOps/s 56.7239 KOps/s $\color{#d91a1a}-0.89\%$
test_step_mdp_speed[False-True-True-True-True] 83.7320μs 46.9616μs 21.2940 KOps/s 21.4754 KOps/s $\color{#d91a1a}-0.84\%$
test_step_mdp_speed[False-True-True-True-False] 2.2320ms 28.5570μs 35.0177 KOps/s 35.9777 KOps/s $\color{#d91a1a}-2.67\%$
test_step_mdp_speed[False-True-True-False-True] 60.8710μs 29.9585μs 33.3795 KOps/s 32.9669 KOps/s $\color{#35bf28}+1.25\%$
test_step_mdp_speed[False-True-True-False-False] 42.1410μs 16.9440μs 59.0181 KOps/s 58.6474 KOps/s $\color{#35bf28}+0.63\%$
test_step_mdp_speed[False-True-False-True-True] 76.7010μs 49.2680μs 20.2971 KOps/s 20.0147 KOps/s $\color{#35bf28}+1.41\%$
test_step_mdp_speed[False-True-False-True-False] 70.5710μs 30.8570μs 32.4075 KOps/s 32.7704 KOps/s $\color{#d91a1a}-1.11\%$
test_step_mdp_speed[False-True-False-False-True] 67.9510μs 32.2027μs 31.0533 KOps/s 30.9145 KOps/s $\color{#35bf28}+0.45\%$
test_step_mdp_speed[False-True-False-False-False] 45.7010μs 19.5403μs 51.1764 KOps/s 51.1722 KOps/s $+0.01\%$
test_step_mdp_speed[False-False-True-True-True] 84.6410μs 52.7539μs 18.9560 KOps/s 19.2394 KOps/s $\color{#d91a1a}-1.47\%$
test_step_mdp_speed[False-False-True-True-False] 59.8010μs 33.5940μs 29.7672 KOps/s 30.4637 KOps/s $\color{#d91a1a}-2.29\%$
test_step_mdp_speed[False-False-True-False-True] 58.9510μs 32.4712μs 30.7965 KOps/s 31.5037 KOps/s $\color{#d91a1a}-2.24\%$
test_step_mdp_speed[False-False-True-False-False] 43.5210μs 19.4166μs 51.5023 KOps/s 51.6393 KOps/s $\color{#d91a1a}-0.27\%$
test_step_mdp_speed[False-False-False-True-True] 78.4810μs 55.2671μs 18.0939 KOps/s 18.3917 KOps/s $\color{#d91a1a}-1.62\%$
test_step_mdp_speed[False-False-False-True-False] 61.7710μs 35.9432μs 27.8217 KOps/s 28.4126 KOps/s $\color{#d91a1a}-2.08\%$
test_step_mdp_speed[False-False-False-False-True] 71.7920μs 34.4338μs 29.0412 KOps/s 29.0483 KOps/s $\color{#d91a1a}-0.02\%$
test_step_mdp_speed[False-False-False-False-False] 46.8610μs 22.0463μs 45.3590 KOps/s 45.7797 KOps/s $\color{#d91a1a}-0.92\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7274s 0.7186s 1.3916 Ops/s 1.3411 Ops/s $\color{#35bf28}+3.76\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7110s 0.6079s 1.6450 Ops/s 1.6493 Ops/s $\color{#d91a1a}-0.26\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7199s 1.6435s 0.6084 Ops/s 0.6084 Ops/s $-0.00\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5153s 1.4283s 0.7001 Ops/s 0.6985 Ops/s $\color{#35bf28}+0.23\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9872s 1.9023s 0.5257 Ops/s 0.5270 Ops/s $\color{#d91a1a}-0.25\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7668s 1.6774s 0.5962 Ops/s 0.5951 Ops/s $\color{#35bf28}+0.18\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7515s 4.6615s 0.2145 Ops/s 0.2148 Ops/s $\color{#d91a1a}-0.13\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5370s 4.4263s 0.2259 Ops/s 0.2239 Ops/s $\color{#35bf28}+0.92\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9499s 1.8772s 0.5327 Ops/s 0.5246 Ops/s $\color{#35bf28}+1.54\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6611s 1.5772s 0.6340 Ops/s 0.6374 Ops/s $\color{#d91a1a}-0.53\%$
test_values[generalized_advantage_estimate-True-True] 20.6902ms 20.3310ms 49.1859 Ops/s 49.5241 Ops/s $\color{#d91a1a}-0.68\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1366s 3.6561ms 273.5154 Ops/s 265.1883 Ops/s $\color{#35bf28}+3.14\%$
test_values[td0_return_estimate-False-False] 0.1233ms 85.2106μs 11.7356 KOps/s 12.0123 KOps/s $\color{#d91a1a}-2.30\%$
test_values[td1_return_estimate-False-False] 51.2510ms 48.7393ms 20.5173 Ops/s 20.1553 Ops/s $\color{#35bf28}+1.80\%$
test_values[vec_td1_return_estimate-False-False] 1.3488ms 1.0918ms 915.9276 Ops/s 913.2059 Ops/s $\color{#35bf28}+0.30\%$
test_values[td_lambda_return_estimate-True-False] 83.6943ms 81.6643ms 12.2453 Ops/s 12.2233 Ops/s $\color{#35bf28}+0.18\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3418ms 1.0941ms 913.9860 Ops/s 914.6745 Ops/s $\color{#d91a1a}-0.08\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 22.4782ms 22.1070ms 45.2346 Ops/s 48.5539 Ops/s $\textbf{\color{#d91a1a}-6.84\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0144ms 0.7567ms 1.3215 KOps/s 1.3189 KOps/s $\color{#35bf28}+0.20\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7863ms 0.7029ms 1.4226 KOps/s 1.4677 KOps/s $\color{#d91a1a}-3.07\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.6629ms 1.5080ms 663.1283 Ops/s 671.1680 Ops/s $\color{#d91a1a}-1.20\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7759ms 0.7150ms 1.3987 KOps/s 1.4352 KOps/s $\color{#d91a1a}-2.55\%$
test_dqn_speed[False-None] 1.7244ms 1.6024ms 624.0552 Ops/s 622.5630 Ops/s $\color{#35bf28}+0.24\%$
test_dqn_speed[False-backward] 2.4427ms 2.2708ms 440.3781 Ops/s 439.5773 Ops/s $\color{#35bf28}+0.18\%$
test_dqn_speed[True-None] 0.9273ms 0.6238ms 1.6030 KOps/s 1.5697 KOps/s $\color{#35bf28}+2.13\%$
test_dqn_speed[True-backward] 1.1971ms 1.1692ms 855.2579 Ops/s 835.1321 Ops/s $\color{#35bf28}+2.41\%$
test_dqn_speed[reduce-overhead-None] 0.6652ms 0.6123ms 1.6333 KOps/s 1.5350 KOps/s $\textbf{\color{#35bf28}+6.40\%}$
test_ddpg_speed[False-None] 3.4626ms 3.0229ms 330.8087 Ops/s 324.6221 Ops/s $\color{#35bf28}+1.91\%$
test_ddpg_speed[False-backward] 4.7699ms 4.3243ms 231.2507 Ops/s 230.2357 Ops/s $\color{#35bf28}+0.44\%$
test_ddpg_speed[True-None] 1.5291ms 1.3961ms 716.2822 Ops/s 712.5990 Ops/s $\color{#35bf28}+0.52\%$
test_ddpg_speed[True-backward] 2.5289ms 2.4195ms 413.3053 Ops/s 389.4797 Ops/s $\textbf{\color{#35bf28}+6.12\%}$
test_ddpg_speed[reduce-overhead-None] 1.4779ms 1.3866ms 721.1832 Ops/s 704.4889 Ops/s $\color{#35bf28}+2.37\%$
test_sac_speed[False-None] 8.9080ms 8.5036ms 117.5972 Ops/s 116.3390 Ops/s $\color{#35bf28}+1.08\%$
test_sac_speed[False-backward] 12.0144ms 11.5490ms 86.5877 Ops/s 85.6936 Ops/s $\color{#35bf28}+1.04\%$
test_sac_speed[True-None] 2.3042ms 1.9552ms 511.4681 Ops/s 508.6984 Ops/s $\color{#35bf28}+0.54\%$
test_sac_speed[True-backward] 4.1220ms 3.6314ms 275.3732 Ops/s 275.4775 Ops/s $\color{#d91a1a}-0.04\%$
test_sac_speed[reduce-overhead-None] 17.2688ms 10.1369ms 98.6494 Ops/s 96.3816 Ops/s $\color{#35bf28}+2.35\%$
test_redq_deprec_speed[False-None] 10.4135ms 9.5682ms 104.5134 Ops/s 104.0618 Ops/s $\color{#35bf28}+0.43\%$
test_redq_deprec_speed[False-backward] 13.1400ms 12.6798ms 78.8655 Ops/s 78.6265 Ops/s $\color{#35bf28}+0.30\%$
test_redq_deprec_speed[True-None] 3.1976ms 2.7687ms 361.1758 Ops/s 354.7120 Ops/s $\color{#35bf28}+1.82\%$
test_redq_deprec_speed[True-backward] 4.9688ms 4.4895ms 222.7401 Ops/s 222.2511 Ops/s $\color{#35bf28}+0.22\%$
test_redq_deprec_speed[reduce-overhead-None] 14.6501ms 9.7482ms 102.5828 Ops/s 101.7234 Ops/s $\color{#35bf28}+0.84\%$
test_td3_speed[False-None] 8.6662ms 8.3671ms 119.5155 Ops/s 118.4528 Ops/s $\color{#35bf28}+0.90\%$
test_td3_speed[False-backward] 11.4050ms 10.9988ms 90.9193 Ops/s 90.6641 Ops/s $\color{#35bf28}+0.28\%$
test_td3_speed[True-None] 1.7635ms 1.7187ms 581.8467 Ops/s 572.9943 Ops/s $\color{#35bf28}+1.54\%$
test_td3_speed[True-backward] 3.3861ms 3.3020ms 302.8476 Ops/s 296.8796 Ops/s $\color{#35bf28}+2.01\%$
test_td3_speed[reduce-overhead-None] 99.8028ms 26.2316ms 38.1220 Ops/s 37.4315 Ops/s $\color{#35bf28}+1.84\%$
test_cql_speed[False-None] 18.1539ms 17.7501ms 56.3378 Ops/s 55.7626 Ops/s $\color{#35bf28}+1.03\%$
test_cql_speed[False-backward] 23.7908ms 23.3569ms 42.8139 Ops/s 42.3647 Ops/s $\color{#35bf28}+1.06\%$
test_cql_speed[True-None] 3.6497ms 3.4883ms 286.6717 Ops/s 278.9512 Ops/s $\color{#35bf28}+2.77\%$
test_cql_speed[True-backward] 9.1024ms 6.1288ms 163.1633 Ops/s 168.0468 Ops/s $\color{#d91a1a}-2.91\%$
test_cql_speed[reduce-overhead-None] 17.5311ms 11.9765ms 83.4971 Ops/s 81.8423 Ops/s $\color{#35bf28}+2.02\%$
test_a2c_speed[False-None] 3.4612ms 3.3580ms 297.7983 Ops/s 298.7816 Ops/s $\color{#d91a1a}-0.33\%$
test_a2c_speed[False-backward] 6.9836ms 6.5040ms 153.7517 Ops/s 152.6939 Ops/s $\color{#35bf28}+0.69\%$
test_a2c_speed[True-None] 1.6771ms 1.4757ms 677.6575 Ops/s 685.4017 Ops/s $\color{#d91a1a}-1.13\%$
test_a2c_speed[True-backward] 3.4872ms 3.3215ms 301.0726 Ops/s 311.4539 Ops/s $\color{#d91a1a}-3.33\%$
test_a2c_speed[reduce-overhead-None] 1.1891ms 1.1190ms 893.6775 Ops/s 896.3309 Ops/s $\color{#d91a1a}-0.30\%$
test_ppo_speed[False-None] 4.2688ms 4.1001ms 243.8951 Ops/s 242.1110 Ops/s $\color{#35bf28}+0.74\%$
test_ppo_speed[False-backward] 7.8442ms 7.4548ms 134.1424 Ops/s 135.0358 Ops/s $\color{#d91a1a}-0.66\%$
test_ppo_speed[True-None] 1.7537ms 1.6332ms 612.3101 Ops/s 609.7996 Ops/s $\color{#35bf28}+0.41\%$
test_ppo_speed[True-backward] 3.6382ms 3.5377ms 282.6674 Ops/s 278.6589 Ops/s $\color{#35bf28}+1.44\%$
test_ppo_speed[reduce-overhead-None] 1.2979ms 1.1801ms 847.3887 Ops/s 830.2176 Ops/s $\color{#35bf28}+2.07\%$
test_reinforce_speed[False-None] 2.5151ms 2.4071ms 415.4458 Ops/s 415.3195 Ops/s $\color{#35bf28}+0.03\%$
test_reinforce_speed[False-backward] 3.8428ms 3.4379ms 290.8731 Ops/s 281.6058 Ops/s $\color{#35bf28}+3.29\%$
test_reinforce_speed[True-None] 1.5579ms 1.4766ms 677.2101 Ops/s 682.9928 Ops/s $\color{#d91a1a}-0.85\%$
test_reinforce_speed[True-backward] 3.3267ms 3.1974ms 312.7546 Ops/s 296.4748 Ops/s $\textbf{\color{#35bf28}+5.49\%}$
test_reinforce_speed[reduce-overhead-None] 0.6790s 10.4286ms 95.8897 Ops/s 111.2260 Ops/s $\textbf{\color{#d91a1a}-13.79\%}$
test_iql_speed[False-None] 9.9833ms 9.7583ms 102.4771 Ops/s 102.0052 Ops/s $\color{#35bf28}+0.46\%$
test_iql_speed[False-backward] 13.9747ms 13.5521ms 73.7890 Ops/s 72.4229 Ops/s $\color{#35bf28}+1.89\%$
test_iql_speed[True-None] 2.5723ms 2.3482ms 425.8583 Ops/s 420.0694 Ops/s $\color{#35bf28}+1.38\%$
test_iql_speed[True-backward] 5.4834ms 4.9782ms 200.8768 Ops/s 192.0583 Ops/s $\color{#35bf28}+4.59\%$
test_iql_speed[reduce-overhead-None] 16.7456ms 10.0445ms 99.5570 Ops/s 97.6947 Ops/s $\color{#35bf28}+1.91\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2009ms 6.0023ms 166.6039 Ops/s 163.6810 Ops/s $\color{#35bf28}+1.79\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.7019ms 0.2918ms 3.4273 KOps/s 2.9625 KOps/s $\textbf{\color{#35bf28}+15.69\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5063ms 0.2698ms 3.7066 KOps/s 2.8376 KOps/s $\textbf{\color{#35bf28}+30.63\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1141ms 5.8800ms 170.0676 Ops/s 170.4773 Ops/s $\color{#d91a1a}-0.24\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9867ms 0.3334ms 2.9994 KOps/s 2.9793 KOps/s $\color{#35bf28}+0.68\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5630ms 0.3275ms 3.0533 KOps/s 3.1439 KOps/s $\color{#d91a1a}-2.88\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7234ms 1.4391ms 694.8963 Ops/s 690.9860 Ops/s $\color{#35bf28}+0.57\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5969ms 1.3416ms 745.3512 Ops/s 770.1278 Ops/s $\color{#d91a1a}-3.22\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.5609ms 6.1180ms 163.4529 Ops/s 164.6113 Ops/s $\color{#d91a1a}-0.70\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9859ms 0.4382ms 2.2823 KOps/s 2.0234 KOps/s $\textbf{\color{#35bf28}+12.80\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7478ms 0.4165ms 2.4010 KOps/s 2.0054 KOps/s $\textbf{\color{#35bf28}+19.73\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9781ms 5.8379ms 171.2939 Ops/s 169.0529 Ops/s $\color{#35bf28}+1.33\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6867ms 0.2882ms 3.4695 KOps/s 2.5699 KOps/s $\textbf{\color{#35bf28}+35.01\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5286ms 0.2717ms 3.6808 KOps/s 2.7124 KOps/s $\textbf{\color{#35bf28}+35.70\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0715ms 5.8050ms 172.2639 Ops/s 168.9180 Ops/s $\color{#35bf28}+1.98\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.5803ms 0.2866ms 3.4897 KOps/s 2.6057 KOps/s $\textbf{\color{#35bf28}+33.93\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6122ms 0.2885ms 3.4665 KOps/s 2.7053 KOps/s $\textbf{\color{#35bf28}+28.14\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 8.5142ms 6.0177ms 166.1755 Ops/s 165.5075 Ops/s $\color{#35bf28}+0.40\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.3170ms 0.5207ms 1.9206 KOps/s 1.8524 KOps/s $\color{#35bf28}+3.68\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8038ms 0.5091ms 1.9643 KOps/s 2.0093 KOps/s $\color{#d91a1a}-2.24\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.9783s 24.5379ms 40.7533 Ops/s 34.3034 Ops/s $\textbf{\color{#35bf28}+18.80\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 8.8227ms 2.0591ms 485.6603 Ops/s 514.8506 Ops/s $\textbf{\color{#d91a1a}-5.67\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 3.5101ms 1.0355ms 965.7174 Ops/s 855.8421 Ops/s $\textbf{\color{#35bf28}+12.84\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.6304ms 5.1959ms 192.4584 Ops/s 192.0031 Ops/s $\color{#35bf28}+0.24\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9477ms 1.8625ms 536.9015 Ops/s 560.4372 Ops/s $\color{#d91a1a}-4.20\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.1777ms 0.9769ms 1.0237 KOps/s 720.2249 Ops/s $\textbf{\color{#35bf28}+42.13\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 6.8210ms 5.2789ms 189.4323 Ops/s 187.0897 Ops/s $\color{#35bf28}+1.25\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.3144ms 2.0508ms 487.6109 Ops/s 465.6887 Ops/s $\color{#35bf28}+4.71\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.8766ms 1.1754ms 850.7844 Ops/s 777.1068 Ops/s $\textbf{\color{#35bf28}+9.48\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 43.1204ms 39.8291ms 25.1073 Ops/s 24.9846 Ops/s $\color{#35bf28}+0.49\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.8097ms 18.3935ms 54.3669 Ops/s 53.3061 Ops/s $\color{#35bf28}+1.99\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 43.2940ms 41.1524ms 24.2999 Ops/s 17.3652 Ops/s $\textbf{\color{#35bf28}+39.93\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.0976ms 18.7048ms 53.4622 Ops/s 53.5695 Ops/s $\color{#d91a1a}-0.20\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 44.5805ms 42.9597ms 23.2776 Ops/s 23.2058 Ops/s $\color{#35bf28}+0.31\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.5574ms 20.4623ms 48.8704 Ops/s 49.2075 Ops/s $\color{#d91a1a}-0.69\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8605ms 0.2209ms 4.5261 KOps/s 4.3805 KOps/s $\color{#35bf28}+3.32\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.5684ms 1.4015ms 713.5079 Ops/s 714.8308 Ops/s $\color{#d91a1a}-0.19\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.6390ms 2.2901ms 436.6536 Ops/s 437.7063 Ops/s $\color{#d91a1a}-0.24\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0860ms 2.9017ms 344.6300 Ops/s 338.8066 Ops/s $\color{#35bf28}+1.72\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2496ms 0.1687ms 5.9272 KOps/s 5.9764 KOps/s $\color{#d91a1a}-0.82\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3958ms 0.2740ms 3.6494 KOps/s 4.3753 KOps/s $\textbf{\color{#d91a1a}-16.59\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9403ms 1.7735ms 563.8670 Ops/s 560.3013 Ops/s $\color{#35bf28}+0.64\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5466ms 1.4072ms 710.6168 Ops/s 744.1700 Ops/s $\color{#d91a1a}-4.51\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2750ms 1.1633ms 859.6579 Ops/s 858.7995 Ops/s $\color{#35bf28}+0.10\%$
test_collector_stack_then_write[100-img_shape1-atari] 4.1013ms 3.6067ms 277.2600 Ops/s 273.0822 Ops/s $\color{#35bf28}+1.53\%$
test_collector_stack_then_write[100-img_shape2-large_img] 9.7425ms 5.7736ms 173.2032 Ops/s 174.9033 Ops/s $\color{#d91a1a}-0.97\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 15.2165ms 7.1507ms 139.8455 Ops/s 135.1323 Ops/s $\color{#35bf28}+3.49\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.5332ms 0.2771ms 3.6087 KOps/s 3.5925 KOps/s $\color{#35bf28}+0.45\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7332ms 1.5197ms 658.0431 Ops/s 654.3500 Ops/s $\color{#35bf28}+0.56\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.5476ms 2.3934ms 417.8192 Ops/s 417.1555 Ops/s $\color{#35bf28}+0.16\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.2938ms 3.1023ms 322.3458 Ops/s 319.4007 Ops/s $\color{#35bf28}+0.92\%$
test_collector_without_rb[100-img_shape0-atari] 33.5582ms 33.1597ms 30.1571 Ops/s 29.8698 Ops/s $\color{#35bf28}+0.96\%$
test_collector_without_rb[200-img_shape1-large_batch] 65.3875ms 65.1449ms 15.3504 Ops/s 15.3130 Ops/s $\color{#35bf28}+0.24\%$
test_collector_with_rb[100-img_shape0-atari] 38.4318ms 37.8334ms 26.4317 Ops/s 26.2825 Ops/s $\color{#35bf28}+0.57\%$
test_collector_with_rb[200-img_shape1-large_batch] 74.3119ms 73.8096ms 13.5484 Ops/s 13.0676 Ops/s $\color{#35bf28}+3.68\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 56.4042ms 56.1888ms 17.7971 Ops/s 17.2663 Ops/s $\color{#35bf28}+3.07\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1156s 0.1129s 8.8606 Ops/s 8.6845 Ops/s $\color{#35bf28}+2.03\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 60.5848ms 58.8633ms 16.9885 Ops/s 16.9433 Ops/s $\color{#35bf28}+0.27\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1198s 0.1179s 8.4847 Ops/s 8.5387 Ops/s $\color{#d91a1a}-0.63\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Has to do with CI setup (e.g. wheels & builds, tests...) CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant