Skip to content

Emit v1.0 schema bundles from benchmark scripts#5842

Draft
AntoineRichard wants to merge 2 commits into
isaac-sim:developfrom
AntoineRichard:antoiner/feat/benchmark-scripts-v1
Draft

Emit v1.0 schema bundles from benchmark scripts#5842
AntoineRichard wants to merge 2 commits into
isaac-sim:developfrom
AntoineRichard:antoiner/feat/benchmark-scripts-v1

Conversation

@AntoineRichard
Copy link
Copy Markdown
Collaborator

Description

Stacked on #5840 (isaaclab.benchmark.schema). This PR's diff includes #5840's commit until that lands; once #5840 merges, I'll rebase and the diff will shrink to just the script changes. Marked Draft for that reason — please review the schema PR first, then this one.

Wire the three standalone benchmark scripts under scripts/benchmarks/ to emit self-contained JSON bundles conforming to the v1.0 schema added in #5840 (isaaclab.benchmark.schema).

What's in here

Scripts (opt-in via new --schema_v1_output <path> flag):

  • benchmark_startup.py writes a StartupBundle with per-phase cProfile top-N data and total durations.
  • benchmark_rsl_rl.py writes a TrainingBundle with run identity, captured Versions / Hardware, aggregated Runtime + Resources, and EMA-smoothed Learning curves. EMA factor is configurable via --ema_alpha (default 0.05); --no_series drops per-iteration curves and keeps only the final_raw + final_ema scalars.
  • benchmark_skrl.py is new — the SKRL-framework counterpart that emits the same TrainingBundle with framework: \"skrl\". Pairs with a small skrl_benchmark_trainer.PerIterRewardTrainer subclass that exposes per-iteration reward and episode-length values to the script without patching upstream skrl.

The legacy per-backend output format remains the default when --schema_v1_output is omitted, so existing invocations and CI keep working unchanged.

Shared helpers:

  • scripts/benchmarks/_action_sampling.py — single-agent + multi-agent action sampling for the benchmark's first-step phase. Multi-agent envs expose action_spaces (dict); single-agent envs expose single_action_space. The helper picks the right shape.
  • scripts/benchmarks/_schema_helpers.py — builds Versions / Hardware from the recorder metadata and synthesises a fallback run_id of the form <framework>_<backend>_<task>_<YYYYMMDD-HHMMSS>_seed<seed>.

Other changes:

  • scripts/benchmarks/utils.parse_cprofile_stats now returns a 4-tuple (function_label, tottime_ms, cumtime_ms, ncalls) instead of a 3-tuple, exposing the primitive call count from pstats so the schema's CProfileFunction.calls field can be populated. Whitelist placeholder rows carry ncalls=0.
  • scripts/benchmarks/startup_whitelist.yaml reworked to track the IsaacLab v3 configclass / cloner / scene-init call paths. Adds an explicit task_config phase entry; python_imports and first_step intentionally fall through to top_n (documented in file comments).

Tests:

  • scripts/benchmarks/tests/ covering: action-sampling shape for single-agent and multi-agent envs; CLI surface tests for benchmark_rsl_rl.py and benchmark_skrl.py (parse-only, no Isaac Sim launch); skrl_benchmark_trainer reward/ep-length collection.
  • source/isaaclab/test/benchmark/test_parse_cprofile_stats.py for the ncalls extension to utils.parse_cprofile_stats.

Docs:

  • docs/source/features/benchmarking.rst — invocation examples per script, v1.0 schema summary, and CLI-flag reference. Wired into the Features TOC in docs/index.rst.

Compatibility

  • All new behavior is opt-in via the new --schema_v1_output flag. Pre-existing CI and ad-hoc invocations are byte-identical without it.
  • parse_cprofile_stats is a private helper (scripts/benchmarks/utils.py) and only used by the benchmark scripts themselves, so the 3→4-tuple change has no external callers.

Fixes # (no issue)

Type of change

  • New feature (non-breaking change which adds functionality)
  • Documentation update

Screenshots

N/A — JSON-emitter additions.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the `pre-commit` checks with `./isaaclab.sh --format`
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have added a changelog fragment under `source//changelog.d/` for every touched package
  • I have added my name to the `CONTRIBUTORS.md` or my name already exists there

Promote the JSON bundle schema produced by the standalone benchmark
scripts under scripts/benchmarks/ into a real public-API module,
isaaclab.benchmark.schema. Until now there was no single place in
lab that defined the shape of training.json / startup.json, even
though three lab scripts emit it and downstream tooling (e.g. the
in-tree Odin evaluation harness) is starting to consume it.

The module ships frozen dataclasses for TrainingBundle, StartupBundle,
and all their building blocks, plus a small write_bundle_file helper
that serialises any dataclass tree as schema-v1 JSON. The package
__init__ re-exports the public surface so callers can write
`from isaaclab.benchmark import TrainingBundle`.

This commit also extends GPUInfoRecorder and MemoryInfoRecorder to
report per-device peak alongside the existing mean/std rows. The
peak rows are always emitted (initialised to 0.0) so dashboards see
a consistent key set regardless of whether any sample was recorded.
Existing rows are unchanged.

The benchmark scripts themselves continue to use the legacy output
format on develop today; a follow-up PR rewrites them to emit
schema-v1 bundles directly via this module.
Wire the three standalone benchmark scripts under scripts/benchmarks/
to emit self-contained JSON bundles conforming to the v1.0 schema
added in the previous commit (isaaclab.benchmark.schema):

- benchmark_startup.py now optionally writes a StartupBundle to the
  path given by --schema_v1_output, with per-phase cProfile top-N
  data and total durations.
- benchmark_rsl_rl.py now optionally writes a TrainingBundle with
  the run identity, captured versions/hardware, aggregated runtime
  and resource metrics, and EMA-smoothed reward / episode-length
  curves. The EMA factor is configurable via --ema_alpha; --no_series
  drops the full per-iteration curves and keeps only the scalars.
- benchmark_skrl.py is new: a SKRL-framework counterpart that emits
  the same TrainingBundle with framework set to "skrl". Pairs with a
  small skrl_benchmark_trainer subclass that exposes per-iteration
  reward / episode-length values to the script without touching
  upstream skrl.

The legacy per-backend output format remains the default when
--schema_v1_output is omitted, so existing CI and ad-hoc invocations
keep working unchanged.

Shared helpers (_action_sampling.sample_random_actions to keep
single-agent + multi-agent benchmark startup working, _schema_helpers
to build Versions/Hardware from the recorder metadata and synthesise
a fallback run_id) live alongside the scripts.

utils.parse_cprofile_stats now returns ncalls as a fourth tuple
element so the schema's CProfileFunction.calls field can be populated.

Updated startup_whitelist.yaml to track the IsaacLab v3 configclass /
cloner / scene-init call paths and explicitly fall through to top_n
for python_imports and first_step (per file comments).

Added scripts/benchmarks/tests/ covering the new helpers and CLI
surfaces, plus source/isaaclab/test/benchmark/test_parse_cprofile_stats.py
for the ncalls extension. Added docs/source/features/benchmarking.rst
documenting the scripts and the schema.
@github-actions github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels May 28, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR introduces a comprehensive v1.0 JSON schema (isaaclab.benchmark.schema) for benchmark bundles and wires the three standalone benchmark scripts (benchmark_startup.py, benchmark_rsl_rl.py, and the new benchmark_skrl.py) to emit self-contained JSON bundles via an opt-in --schema_v1_output flag. The implementation is well-structured with frozen dataclasses, proper helper modules, and extensive test coverage.

Findings

🔵 Suggestionscripts/benchmarks/_action_sampling.py:47-52
The multi-agent action sampling creates numpy arrays via list comprehension and stacks, then converts to torch. For large num_envs × number of agents, this could be optimized by sampling directly into a pre-allocated tensor using torch.rand with the appropriate bounds, avoiding the numpy intermediate allocation.

🔵 Suggestionscripts/benchmarks/benchmark_rsl_rl.py:269-276
The _compute_ema() function is duplicated verbatim in benchmark_skrl.py. Consider extracting this into _schema_helpers.py to keep these two training bundle emitters DRY.

🔵 Suggestionscripts/benchmarks/skrl_benchmark_trainer.py:90-95
The episode length tracking falls back to 0.0 when no episodes have terminated. While documented, consider whether None or float("nan") would be more semantically correct for "no data available" vs. "actual episode length of zero".

🔵 Suggestionsource/isaaclab/isaaclab/benchmark/schema.py:249
The write_bundle_file creates the parent directory but uses os.path.dirname(os.path.abspath(path)) or "." which could return "" for relative paths like "output.json". The or "." handles this, but consider documenting this edge case.

🟡 Warningscripts/benchmarks/benchmark_rsl_rl.py:595-598
The first_step_s proxy uses the first iteration's collection + learning time from rl_training_times. If rl_training_times has fewer than expected entries (e.g., early termination), the contextlib.suppress(IndexError, KeyError, ValueError) silently falls back to 0.0. This is safe but may mask real issues; consider logging when this fallback is hit.

Test Coverage

Excellent test coverage. The PR includes:

  • Unit tests for sample_random_actions covering single-agent, multi-agent, heterogeneous action dims, and device placement
  • CLI surface tests for both RSL-RL and SKRL scripts (parse-only, no Isaac Sim)
  • BenchmarkTrainer unit tests with fake env/agent covering timing, reward tracking, multi-env vs single-env reset behavior
  • parse_cprofile_stats tests validating the new ncalls 4-tuple return
  • Recorder tests for peak memory/utilization tracking

Verdict

Minor suggestions only — This is a well-designed, non-breaking feature addition with proper opt-in behavior. The schema design is clean (frozen dataclasses, clear separation of concerns), the backward compatibility is maintained (legacy output when --schema_v1_output is omitted), and the test coverage is thorough. Ready to merge once CI passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants