Fix: Coarsen MuJoCo timestep on CI to stop slower-than-realtime flakes#615
Open
JWhitleyWork wants to merge 3 commits into
Open
Fix: Coarsen MuJoCo timestep on CI to stop slower-than-realtime flakes#615JWhitleyWork wants to merge 3 commits into
JWhitleyWork wants to merge 3 commits into
Conversation
ee5d05a to
7611d04
Compare
There was a problem hiding this comment.
Pull request overview
Updates the repository CI workflow to reduce MuJoCo integration-test flakiness by overriding the simulator timestep only in CI, giving the heavier MuJoCo 3.6.0 solver more wall-clock budget per step while keeping local development behavior unchanged.
Changes:
- Pin the reusable
workspace_integration_test.yamlworkflow to a newermoveit_pro_cicommit that supports the newmujoco_ci_timestepinput. - Pass
mujoco_ci_timestep: "0.004"to run the CI lab simulation at 250 Hz instead of the default 500 Hz.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…me flakes The MuJoCo 3.2.7 -> 3.6.0 upgrade in moveit_pro (6eedef88a5) made the constraint solver heavier per step, so the lab_sim scene runs slower than realtime on CI runners. That surfaces as `Mujoco model timestep not running in realtime` warnings and timing-related test failures (MoveGripperAction 15s timeouts, GetImage 5s wrist-camera timeouts). CI on main has been ~92% red since Apr 17 as a result, and the in-tree mitigations applied so far (constraint-arena memory, MPC retunes, push-button tolerance, publisher timeout fixes) did not address the underlying realtime gap. Pin to the moveit_pro_ci branch that adds the new `mujoco_ci_timestep` input (PR PickNikRobotics/moveit_pro_ci#18) and pass "0.004" -- 250 Hz, ~2x the wall-clock budget per step versus the MuJoCo default of 500 Hz. This only takes effect on CI; local dev runs the scene unmodified. After moveit_pro_ci tags a release containing this input, swap the SHA pin for that tag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7611d04 to
c70604d
Compare
shaur-k
previously approved these changes
May 8, 2026
The objective integration test runs ~117 parametrized objectives against a single shared backend and MuJoCo simulation. Pick/place, push-button, and similar objectives leave residual world state that caused order-dependent failures after the MuJoCo 3.6.0 upgrade. Re-export reset_simulation_before_test from moveit_pro_test_utils so pytest activates the autouse reset fixture for this test module. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
moveit_pro_cibranch that adds the newmujoco_ci_timestepinput (companion PR: PickNikRobotics/moveit_pro_ci#18).mujoco_ci_timestep: "0.004"so CI runs thelab_simscene at 250 Hz instead of MuJoCo's 500 Hz default, doubling the wall-clock budget per step.Why
The MuJoCo
3.2.7 → 3.6.0upgrade inmoveit_pro(6eedef88a5, Apr 14) made the constraint solver heavier per step. Within 24h,mainCI went from 100% green to flaky and to ~92% red within three days:Failure logs always include the warning
Mujoco model timestep not running in realtime. Increase the model timestep.and the timing-sensitive failures fall out of that —MoveGripperAction15s timeout inPush Button With a Trajectory(~9/10 runs),GetImage5s wrist-camera timeout inML Segment Point Cloud(~4/10), and various MPC pose-tracking variants. Several mitigations have already been merged (memory="64M" arena fix, MPC retunes, tolerance loosening, publisher timeout fixes); none addressed the underlying realtime gap.This PR fixes the root cause for CI specifically — by coarsening the MuJoCo timestep to give the heavier 3.6.0 solver enough wall-clock budget — without changing the experience on dev machines (where the simulator generally runs faster than realtime and the warning is diagnostic).
Why CI-only
Bumping the timestep in the scene file would affect local dev too. With
integrator="implicitfast"andimpratio="10"the scene is well within MuJoCo's stability envelope at 0.004s, but contact-stability for tight grasps on small objects is a real concern that warrants a separate validation pass. Doing this CI-only is the cheapest, lowest-risk route to a green main; we can revisit a global bump (or, longer-term, the test-harness rethink Shaur called out in #610) as a follow-up.Test plan
integration-test-in-studio-containerpasses.Override MuJoCo timestep for CIstep's log shows the expected scene files were patched (lab_sim/description/scene.xml, etc.).moveit_pro_ci#18merges and a new tag is cut, swap the SHA pin for that tagged release.🤖 Generated with Claude Code