Skip to content

ci: upgrade to GPU runner via moveit_pro_ci v0.3.1, enrich test diagnostics#655

Draft
davetcoleman wants to merge 3 commits into
mainfrom
ci-bisect-9.3.0-rc9-only
Draft

ci: upgrade to GPU runner via moveit_pro_ci v0.3.1, enrich test diagnostics#655
davetcoleman wants to merge 3 commits into
mainfrom
ci-bisect-9.3.0-rc9-only

Conversation

@davetcoleman
Copy link
Copy Markdown
Member

@davetcoleman davetcoleman commented May 21, 2026

image

CI infra upgrade and test-diagnostics improvements for objectives_integration_test. Three commits, kept separate intentionally (please do not squash).


Commit 1 — enable_gpu: true on a picknik-16-amd64-gpu runner

Bumps the reusable workflow ref to PickNikRobotics/moveit_pro_ci@v0.3.1, sets enable_gpu: true, and switches the runner label from picknik-16-amd64 to picknik-16-amd64-gpu. v0.3.1 appends the CUDA suffix to the image when enable_gpu is true (moveit_pro_ci#26) — without that, v0.3.0 set the runner label but kept the non-CUDA image, so MuJoCo's EGL rendering still went through llvmpipe on CPU.

image_tag is pinned to 9.3.0-rc9 until the main-*-cuda12.6-cudnn9 images are being published.

Test-diagnostics addition: src/lab_sim/test/conftest.py

Two pytest hooks (pytest_runtest_logstart, pytest_runtest_logreport) write directly to fd 2, bypassing pytest's --capture=fd. Without this, a CTest timeout kills pytest before any per-test output is flushed, leaving the CI log silent past "collected N items". Now each test prints START <nodeid> on entry and PASSED|FAILED|SKIPPED <nodeid> (<elapsed>s) on completion, so CI logs always show which objective was running and how long each one took — critical for triaging flakes and timeouts.


Commit 2 — .github/scripts/render_report.py (HTML report)

Self-contained Python script that turns pytest's objectives_integration_test.xunit.xml artifact into a single-file HTML report:

  • Groups tests by parent XML directory; row shows only the filename, group header shows the full path and is collapsible.
  • Resolves the human-readable objective name (e.g. move_flasks_to_burners.xmlMove Flasks to Burners) by reading each XML's main_tree_to_execute attribute against the local moveit_pro / moveit_pro_example_ws checkouts.
  • Filter buttons (All / Failed / Passed / Skipped), live filename search, click-to-expand failure messages.
  • No external JS/CSS dependencies — everything inlined so the report opens straight from a CI artifact download.

Usage: python3 .github/scripts/render_report.py <xunit.xml> <out.html>.


Commit 3 — redirect ROS node logs into the test_results artifact

ament_add_pytest_test does not set ROS_LOG_DIR, so launched nodes write to ~/.ros/log/<ts>/ inside the doomed CI container — those logs never get uploaded, making post-mortem of objective failures impossible. Points ROS_LOG_DIR at build/lab_sim/test_results/lab_sim/ros_logs/, which is already inside the existing test-results artifact glob, so launch.log + per-node *.log come back with each CI run.

reset_simulation_before_test relaunches the stack per-test, so each test gets its own timestamped <ts>/ subdirectory under ros_logs/.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 21, 2026

⚠️ This PR modifies 1 file(s) that also exist in PickNikRobotics/moveit_pro_empty_ws.

Consider whether the change should land upstream in moveit_pro_empty_ws first so downstream forks pick it up on the next sync.

Overlapping files
  • .github/workflows/ci.yaml

@davetcoleman davetcoleman changed the title CI bisect: does 9.3.0-rc9 + GPU runner pass with no scene changes? ci: upgrade to GPU runner via moveit_pro_ci v0.3.1, enrich test diagnostics May 21, 2026
@davetcoleman davetcoleman force-pushed the ci-bisect-9.3.0-rc9-only branch from 5efd377 to 8640403 Compare May 22, 2026 00:43
@davetcoleman
Copy link
Copy Markdown
Member Author

… comment

Three CI-infra changes folded together:

1. Bump the reusable workflow ref. v0.3.1 (d490a1d) had the GPU + CUDA-suffix
   fix. v0.3.2 (90b506e, currently pre-tag) adds the test-results artifact-name
   suffix `-${{ matrix.ros_distro }}`, so the humble and jazzy jobs no longer
   both upload to the same artifact name (moveit_pro_ci#27). The pin is by
   SHA, not tag, so this works against the merged branch before the tag is
   formally cut.

2. Add render-report job (matrix on humble/jazzy) that downloads each distro's
   test-results artifact and runs .github/scripts/render_report.py against it,
   uploading report.html as integration-test-report-${{ matrix.ros_distro }}.
   Runs whether the integration test passed, failed, or timed out -- the
   report is most useful for failure post-mortem.

3. Add post-report-comment job that posts (or updates in place via a sticky
   marker) a single PR comment linking to the rendered reports for that run.

Also retained from before:
- enable_gpu: true on a picknik-16-amd64-gpu runner so MuJoCo EGL rendering
  uses the GPU instead of llvmpipe.
- src/lab_sim/test/conftest.py pytest hooks (logstart, logreport) writing to
  fd 2 directly so per-test progress survives a CTest timeout.
Reads pytest's JUnit xunit XML (the test artifact already published by moveit_pro_ci's reusable workflow) and produces a self-contained, single-file HTML report. Groups tests by their parent XML directory, shows the human-readable objective name extracted from each objective XML's main_tree_to_execute attribute, and surfaces filter/search/collapse UI without any external JS dependencies.
ament_add_pytest_test does not set ROS_LOG_DIR, so launched nodes write to the default ~/.ros/log/<ts>/ inside the doomed CI container -- never uploaded. Point ROS_LOG_DIR at build/lab_sim/test_results/lab_sim/ros_logs/ instead, which is already inside the existing 'test-results' artifact glob, so launch.log + per-node *.log come back with each CI run.
@davetcoleman davetcoleman force-pushed the ci-bisect-9.3.0-rc9-only branch from 8640403 to 608fabe Compare May 22, 2026 01:22
@github-actions
Copy link
Copy Markdown

📊 Integration test report

Per-distro HTML reports (status table + per-test ROS log slices) are attached to this run's artifacts:

  • integration-test-report-humble
  • integration-test-report-jazzy

Download the zip, extract, and open report.html in a browser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant