Emit v1.0 schema bundles from benchmark scripts by AntoineRichard · Pull Request #5842 · isaac-sim/IsaacLab

AntoineRichard · 2026-05-28T15:49:54Z

Description

Stacked on #5840 (isaaclab.benchmark.schema). This PR's diff includes #5840's commit until that lands; once #5840 merges, I'll rebase and the diff will shrink to just the script changes. Marked Draft for that reason — please review the schema PR first, then this one.

Wire the three standalone benchmark scripts under scripts/benchmarks/ to emit self-contained JSON bundles conforming to the v1.0 schema added in #5840 (isaaclab.benchmark.schema).

What's in here

Scripts (opt-in via new --schema_v1_output <path> flag):

benchmark_startup.py writes a StartupBundle with per-phase cProfile top-N data and total durations.
benchmark_rsl_rl.py writes a TrainingBundle with run identity, captured Versions / Hardware, aggregated Runtime + Resources, and EMA-smoothed Learning curves. EMA factor is configurable via --ema_alpha (default 0.05); --no_series drops per-iteration curves and keeps only the final_raw + final_ema scalars.
benchmark_skrl.py is new — the SKRL-framework counterpart that emits the same TrainingBundle with framework: \"skrl\". Pairs with a small skrl_benchmark_trainer.PerIterRewardTrainer subclass that exposes per-iteration reward and episode-length values to the script without patching upstream skrl.

The legacy per-backend output format remains the default when --schema_v1_output is omitted, so existing invocations and CI keep working unchanged.

Shared helpers:

scripts/benchmarks/_action_sampling.py — single-agent + multi-agent action sampling for the benchmark's first-step phase. Multi-agent envs expose action_spaces (dict); single-agent envs expose single_action_space. The helper picks the right shape.
scripts/benchmarks/_schema_helpers.py — builds Versions / Hardware from the recorder metadata and synthesises a fallback run_id of the form <framework>_<backend>_<task>_<YYYYMMDD-HHMMSS>_seed<seed>.

Other changes:

scripts/benchmarks/utils.parse_cprofile_stats now returns a 4-tuple (function_label, tottime_ms, cumtime_ms, ncalls) instead of a 3-tuple, exposing the primitive call count from pstats so the schema's CProfileFunction.calls field can be populated. Whitelist placeholder rows carry ncalls=0.
scripts/benchmarks/startup_whitelist.yaml reworked to track the IsaacLab v3 configclass / cloner / scene-init call paths. Adds an explicit task_config phase entry; python_imports and first_step intentionally fall through to top_n (documented in file comments).

Tests:

scripts/benchmarks/tests/ covering: action-sampling shape for single-agent and multi-agent envs; CLI surface tests for benchmark_rsl_rl.py and benchmark_skrl.py (parse-only, no Isaac Sim launch); skrl_benchmark_trainer reward/ep-length collection.
source/isaaclab/test/benchmark/test_parse_cprofile_stats.py for the ncalls extension to utils.parse_cprofile_stats.

Docs:

docs/source/features/benchmarking.rst — invocation examples per script, v1.0 schema summary, and CLI-flag reference. Wired into the Features TOC in docs/index.rst.

Compatibility

All new behavior is opt-in via the new --schema_v1_output flag. Pre-existing CI and ad-hoc invocations are byte-identical without it.
parse_cprofile_stats is a private helper (scripts/benchmarks/utils.py) and only used by the benchmark scripts themselves, so the 3→4-tuple change has no external callers.

Fixes # (no issue)

Type of change

New feature (non-breaking change which adds functionality)
Documentation update

Screenshots

N/A — JSON-emitter additions.

Checklist

I have read and understood the contribution guidelines
I have run the `pre-commit` checks with `./isaaclab.sh --format`
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have added a changelog fragment under `source//changelog.d/` for every touched package
I have added my name to the `CONTRIBUTORS.md` or my name already exists there

Promote the JSON bundle schema produced by the standalone benchmark scripts under scripts/benchmarks/ into a real public-API module, isaaclab.benchmark.schema. Until now there was no single place in lab that defined the shape of training.json / startup.json, even though three lab scripts emit it and downstream tooling (e.g. the in-tree Odin evaluation harness) is starting to consume it. The module ships frozen dataclasses for TrainingBundle, StartupBundle, and all their building blocks, plus a small write_bundle_file helper that serialises any dataclass tree as schema-v1 JSON. The package __init__ re-exports the public surface so callers can write `from isaaclab.benchmark import TrainingBundle`. This commit also extends GPUInfoRecorder and MemoryInfoRecorder to report per-device peak alongside the existing mean/std rows. The peak rows are always emitted (initialised to 0.0) so dashboards see a consistent key set regardless of whether any sample was recorded. Existing rows are unchanged. The benchmark scripts themselves continue to use the legacy output format on develop today; a follow-up PR rewrites them to emit schema-v1 bundles directly via this module.

Wire the three standalone benchmark scripts under scripts/benchmarks/ to emit self-contained JSON bundles conforming to the v1.0 schema added in the previous commit (isaaclab.benchmark.schema): - benchmark_startup.py now optionally writes a StartupBundle to the path given by --schema_v1_output, with per-phase cProfile top-N data and total durations. - benchmark_rsl_rl.py now optionally writes a TrainingBundle with the run identity, captured versions/hardware, aggregated runtime and resource metrics, and EMA-smoothed reward / episode-length curves. The EMA factor is configurable via --ema_alpha; --no_series drops the full per-iteration curves and keeps only the scalars. - benchmark_skrl.py is new: a SKRL-framework counterpart that emits the same TrainingBundle with framework set to "skrl". Pairs with a small skrl_benchmark_trainer subclass that exposes per-iteration reward / episode-length values to the script without touching upstream skrl. The legacy per-backend output format remains the default when --schema_v1_output is omitted, so existing CI and ad-hoc invocations keep working unchanged. Shared helpers (_action_sampling.sample_random_actions to keep single-agent + multi-agent benchmark startup working, _schema_helpers to build Versions/Hardware from the recorder metadata and synthesise a fallback run_id) live alongside the scripts. utils.parse_cprofile_stats now returns ncalls as a fourth tuple element so the schema's CProfileFunction.calls field can be populated. Updated startup_whitelist.yaml to track the IsaacLab v3 configclass / cloner / scene-init call paths and explicitly fall through to top_n for python_imports and first_step (per file comments). Added scripts/benchmarks/tests/ covering the new helpers and CLI surfaces, plus source/isaaclab/test/benchmark/test_parse_cprofile_stats.py for the ncalls extension. Added docs/source/features/benchmarking.rst documenting the scripts and the schema.

isaaclab-review-bot

🤖 Isaac Lab Review Bot

Summary

This PR introduces a comprehensive v1.0 JSON schema (isaaclab.benchmark.schema) for benchmark bundles and wires the three standalone benchmark scripts (benchmark_startup.py, benchmark_rsl_rl.py, and the new benchmark_skrl.py) to emit self-contained JSON bundles via an opt-in --schema_v1_output flag. The implementation is well-structured with frozen dataclasses, proper helper modules, and extensive test coverage.

Findings

🔵 Suggestion — scripts/benchmarks/_action_sampling.py:47-52
The multi-agent action sampling creates numpy arrays via list comprehension and stacks, then converts to torch. For large num_envs × number of agents, this could be optimized by sampling directly into a pre-allocated tensor using torch.rand with the appropriate bounds, avoiding the numpy intermediate allocation.

🔵 Suggestion — scripts/benchmarks/benchmark_rsl_rl.py:269-276
The _compute_ema() function is duplicated verbatim in benchmark_skrl.py. Consider extracting this into _schema_helpers.py to keep these two training bundle emitters DRY.

🔵 Suggestion — scripts/benchmarks/skrl_benchmark_trainer.py:90-95
The episode length tracking falls back to 0.0 when no episodes have terminated. While documented, consider whether None or float("nan") would be more semantically correct for "no data available" vs. "actual episode length of zero".

🔵 Suggestion — source/isaaclab/isaaclab/benchmark/schema.py:249
The write_bundle_file creates the parent directory but uses os.path.dirname(os.path.abspath(path)) or "." which could return "" for relative paths like "output.json". The or "." handles this, but consider documenting this edge case.

🟡 Warning — scripts/benchmarks/benchmark_rsl_rl.py:595-598
The first_step_s proxy uses the first iteration's collection + learning time from rl_training_times. If rl_training_times has fewer than expected entries (e.g., early termination), the contextlib.suppress(IndexError, KeyError, ValueError) silently falls back to 0.0. This is safe but may mask real issues; consider logging when this fallback is hit.

Test Coverage

Excellent test coverage. The PR includes:

Unit tests for sample_random_actions covering single-agent, multi-agent, heterogeneous action dims, and device placement
CLI surface tests for both RSL-RL and SKRL scripts (parse-only, no Isaac Sim)
BenchmarkTrainer unit tests with fake env/agent covering timing, reward tracking, multi-env vs single-env reset behavior
parse_cprofile_stats tests validating the new ncalls 4-tuple return
Recorder tests for peak memory/utilization tracking

Verdict

Minor suggestions only — This is a well-designed, non-breaking feature addition with proper opt-in behavior. The schema design is clean (frozen dataclasses, clear separation of concerns), the backward compatibility is maintained (legacy output when --schema_v1_output is omitted), and the test coverage is thorough. Ready to merge once CI passes.

AntoineRichard added 2 commits May 28, 2026 17:32

github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels May 28, 2026

isaaclab-review-bot Bot reviewed May 28, 2026

View reviewed changes

zoctipus approved these changes Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit v1.0 schema bundles from benchmark scripts#5842

Emit v1.0 schema bundles from benchmark scripts#5842
AntoineRichard wants to merge 2 commits into
isaac-sim:developfrom
AntoineRichard:antoiner/feat/benchmark-scripts-v1

AntoineRichard commented May 28, 2026

Uh oh!

isaaclab-review-bot Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AntoineRichard commented May 28, 2026

Description

What's in here

Compatibility

Type of change

Screenshots

Checklist

Uh oh!

isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

🤖 Isaac Lab Review Bot

Summary

Findings

Test Coverage

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants