ci: upgrade to GPU runner via moveit_pro_ci v0.3.1, enrich test diagnostics#655
Draft
davetcoleman wants to merge 3 commits into
Draft
ci: upgrade to GPU runner via moveit_pro_ci v0.3.1, enrich test diagnostics#655davetcoleman wants to merge 3 commits into
davetcoleman wants to merge 3 commits into
Conversation
|
Consider whether the change should land upstream in Overlapping files
|
This was referenced May 21, 2026
5efd377 to
8640403
Compare
Member
Author
|
failing because of https://github.com/PickNikRobotics/moveit_pro/issues/19269 |
… comment
Three CI-infra changes folded together:
1. Bump the reusable workflow ref. v0.3.1 (d490a1d) had the GPU + CUDA-suffix
fix. v0.3.2 (90b506e, currently pre-tag) adds the test-results artifact-name
suffix `-${{ matrix.ros_distro }}`, so the humble and jazzy jobs no longer
both upload to the same artifact name (moveit_pro_ci#27). The pin is by
SHA, not tag, so this works against the merged branch before the tag is
formally cut.
2. Add render-report job (matrix on humble/jazzy) that downloads each distro's
test-results artifact and runs .github/scripts/render_report.py against it,
uploading report.html as integration-test-report-${{ matrix.ros_distro }}.
Runs whether the integration test passed, failed, or timed out -- the
report is most useful for failure post-mortem.
3. Add post-report-comment job that posts (or updates in place via a sticky
marker) a single PR comment linking to the rendered reports for that run.
Also retained from before:
- enable_gpu: true on a picknik-16-amd64-gpu runner so MuJoCo EGL rendering
uses the GPU instead of llvmpipe.
- src/lab_sim/test/conftest.py pytest hooks (logstart, logreport) writing to
fd 2 directly so per-test progress survives a CTest timeout.
Reads pytest's JUnit xunit XML (the test artifact already published by moveit_pro_ci's reusable workflow) and produces a self-contained, single-file HTML report. Groups tests by their parent XML directory, shows the human-readable objective name extracted from each objective XML's main_tree_to_execute attribute, and surfaces filter/search/collapse UI without any external JS dependencies.
ament_add_pytest_test does not set ROS_LOG_DIR, so launched nodes write to the default ~/.ros/log/<ts>/ inside the doomed CI container -- never uploaded. Point ROS_LOG_DIR at build/lab_sim/test_results/lab_sim/ros_logs/ instead, which is already inside the existing 'test-results' artifact glob, so launch.log + per-node *.log come back with each CI run.
8640403 to
608fabe
Compare
📊 Integration test reportPer-distro HTML reports (status table + per-test ROS log slices) are attached to this run's artifacts:
Download the zip, extract, and open |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CI infra upgrade and test-diagnostics improvements for
objectives_integration_test. Three commits, kept separate intentionally (please do not squash).Commit 1 —
enable_gpu: trueon apicknik-16-amd64-gpurunnerBumps the reusable workflow ref to
PickNikRobotics/moveit_pro_ci@v0.3.1, setsenable_gpu: true, and switches the runner label frompicknik-16-amd64topicknik-16-amd64-gpu. v0.3.1 appends the CUDA suffix to the image whenenable_gpuis true (moveit_pro_ci#26) — without that,v0.3.0set the runner label but kept the non-CUDA image, so MuJoCo's EGL rendering still went through llvmpipe on CPU.image_tagis pinned to9.3.0-rc9until themain-*-cuda12.6-cudnn9images are being published.Test-diagnostics addition:
src/lab_sim/test/conftest.pyTwo pytest hooks (
pytest_runtest_logstart,pytest_runtest_logreport) write directly to fd 2, bypassing pytest's--capture=fd. Without this, a CTest timeout kills pytest before any per-test output is flushed, leaving the CI log silent past "collected N items". Now each test printsSTART <nodeid>on entry andPASSED|FAILED|SKIPPED <nodeid> (<elapsed>s)on completion, so CI logs always show which objective was running and how long each one took — critical for triaging flakes and timeouts.Commit 2 —
.github/scripts/render_report.py(HTML report)Self-contained Python script that turns pytest's
objectives_integration_test.xunit.xmlartifact into a single-file HTML report:move_flasks_to_burners.xml→Move Flasks to Burners) by reading each XML'smain_tree_to_executeattribute against the localmoveit_pro/moveit_pro_example_wscheckouts.Usage:
python3 .github/scripts/render_report.py <xunit.xml> <out.html>.Commit 3 — redirect ROS node logs into the test_results artifact
ament_add_pytest_testdoes not setROS_LOG_DIR, so launched nodes write to~/.ros/log/<ts>/inside the doomed CI container — those logs never get uploaded, making post-mortem of objective failures impossible. PointsROS_LOG_DIRatbuild/lab_sim/test_results/lab_sim/ros_logs/, which is already inside the existingtest-resultsartifact glob, solaunch.log+ per-node*.logcome back with each CI run.reset_simulation_before_testrelaunches the stack per-test, so each test gets its own timestamped<ts>/subdirectory underros_logs/.