Skip to content

Fix ReplicationPad3d failing at CoreML ML Program runtime#2697

Open
tritolol wants to merge 3 commits into
apple:mainfrom
tritolol:fix/replication-pad3d-runtime-failure
Open

Fix ReplicationPad3d failing at CoreML ML Program runtime#2697
tritolol wants to merge 3 commits into
apple:mainfrom
tritolol:fix/replication-pad3d-runtime-failure

Conversation

@tritolol
Copy link
Copy Markdown
Contributor

@tritolol tritolol commented May 14, 2026

Summary

  • Fixes torch.nn.ReplicationPad3d fails at runtime when there is padding #2571: torch.nn.ReplicationPad3d converted silently but failed at prediction time with "Padding for more than two dimensions only supports constant mode"
  • Adds a new MIL backend pass split_non_constant_pads that decomposes non-constant pad ops (reflect/replicate) covering >2 dimensions into sequential pads of ≤2 dimensions each, which CoreML supports
  • The pass runs in _BACKEND_MIL_PASSES after merge_consecutive_paddings so splits aren't merged back

Test plan

  • TestPad::test_replication_pad3d — regression test for symmetric 3D replication padding
  • TestPad::test_replication_pad3d_asymmetric — regression test for asymmetric 3D replication padding
  • Full TestPad suite passes (26 passed, 2 skipped for NN backend which doesn't support this case)
  • All mil/passes/tests/test_passes.py -k pad tests pass (215 passed)

🤖 Generated with Claude Code

@LeSingh1
Copy link
Copy Markdown
Contributor

@TobyRoseman, @tritolol -- I went through this carefully (was sent here from #2701). The approach looks right to me and is strictly better than what I had in #2701 (which only raised earlier); this one actually makes the case work.

Why the split is semantically safe

Sequential 1D reflect/replicate along independent axes is provably equivalent to a single multi-axis op, because each axis operation is index-independent:

  • replicate is just per-axis clamp; replicating an already-replicated edge value still gives the edge value.
  • reflect: O[i,j,k] = I[reflect_idx(i,D), reflect_idx(j,H), reflect_idx(k,W)] factors cleanly into sequential per-axis reflection over the previous result. Corner cells come out identical either way.

So the chain in _split_pads_block preserves semantics for both modes -- no corner-region drift.

Pipeline placement

Confirmed by reading pass_pipeline.py: common::merge_consecutive_paddings lives in the main _PASSES block (line 63), which runs well before _BACKEND_MIL_PASSES. The new mil_backend::split_non_constant_pads sits inside _BACKEND_MIL_PASSES (before sanitize_name_strings), so nothing later in the pipeline re-merges the chain. The placement comment in the diff is accurate.

Tests look right, plus a couple of suggestions

The two new ReplicationPad3d tests are good. Possible additions, in priority order:

  1. ReflectionPad3d regression test -- the pass handles both modes but only replicate is exercised end-to-end. A symmetric and/or asymmetric torch.nn.ReflectionPad3d case would close the loop. Issue ReflectionPad3D fails at runtime #2576 specifically mentions reflect as one of the affected modes.
  2. torch.nn.functional.pad rank-5 reflect/replicate -- the existing test_pad_reflect_replicate only parametrizes rank in {3, 4}. Extending it (or adding a sibling) to rank 5 with a >2D pad would catch any future regression in the functional path.
  3. (Optional) Negative test for constant mode -- a quick assertion that the pass is a no-op when mode == "constant" (e.g., a 5D F.pad(..., mode="constant") test that completes without splitting). test_pad_constant already exercises rank 5 constants and would catch any breakage, so this is nice-to-have rather than required.

Minor

  • The pass skips op.inputs["pad"].val is None (dynamic pad values). Worth a one-line docstring note that models with a runtime-computed pad still hit the original runtime error -- not a blocker, just a known limit.
  • Naming: last chunk reuses op.name, intermediates use {op.name}_split_{i} -- good, preserves identifiability and the post-pass sanitize_name_strings handles the new names.

#2701

Happy to close #2701 once this lands -- they cover the same issues (#2576 / #2571) and this is the cleaner fix.

Nice work, @tritolol.

Adrian Bauer and others added 3 commits May 20, 2026 16:42
CoreML's ML Program runtime rejects non-constant padding modes
(reflect, replicate) when more than 2 dimensions are padded, with:
"Padding for more than two dimensions only supports constant mode".
Previously, torch.nn.ReplicationPad3d converted silently but failed
at prediction time.

Add a new mil_backend pass `split_non_constant_pads` that decomposes
such pad ops into sequential pads each covering at most 2 dimensions,
which CoreML supports. The pass runs in _BACKEND_MIL_PASSES after
merge_consecutive_paddings to prevent splits from being merged back.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extend test_pad_reflect_replicate parametrization to rank 5 to cover
  the functional path for both reflect and replicate modes on >2 dims
- Add test_reflection_pad3d and test_reflection_pad3d_asymmetric
  regression tests (reflect mode was handled by the pass but untested)
- Add docstring note to split_non_constant_pads about the known
  limitation with dynamic (runtime-computed) pad values

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Explain why the skip applies only to rank 5: the NN backend's
add_padding (op_mapping.py:2604) requires all padding outside the
last 2 dimensions to be zero. Rank-5 with 3 spatial dims padded
violates this; ranks 3 and 4 pad only 1-2 spatial dims and pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tritolol tritolol force-pushed the fix/replication-pad3d-runtime-failure branch from 608c8dd to 847ad7e Compare May 20, 2026 14:43
@tritolol
Copy link
Copy Markdown
Contributor Author

tritolol commented May 20, 2026

Thanks for the thorough review @LeSingh1! Addressed all points:

Tests added

  • Extended test_pad_reflect_replicate to rank 5 (range(3, 6)) — covers the functional F.pad path for both reflect and replicate with 3 spatial dims padded
  • Added test_reflection_pad3d and test_reflection_pad3d_asymmetric — closes the gap where reflect mode was handled by the pass but not exercised end-to-end

Docstring

  • Added a **Limitation** note to split_non_constant_pads explaining that pad ops with runtime-computed (dynamic) pad values are skipped and will still fail at CoreML runtime

Clarified the NN backend skip

  • The skip in test_pad_reflect_replicate now has an inline comment pointing to op_mapping.py:2604 and explaining why it only applies to rank 5: the NN backend's add_padding requires all(i == 0 for i in pad[:-4]), which holds for ranks 3–4 (1–2 spatial dims padded) but fails for rank 5 (3 spatial dims padded, so the D-dimension leaks outside the pad[-4:] window)

Also rebased onto the latest main.

@TobyRoseman
Copy link
Copy Markdown
Collaborator

@tritolol - can you please explain, in your own words (i.e. not output from Claude), why this is implemented as a pass. Why can't this be implemented at op conversion time? That would be a much more self-contained and straightforward change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

torch.nn.ReplicationPad3d fails at runtime when there is padding

3 participants