Skip to content

[inductor] allow reorder_for_locality on training graphs (opt-in)#3235

Open
reger-men wants to merge 1 commit into
ROCm:developfrom
reger-men:pr4-reorder-locality
Open

[inductor] allow reorder_for_locality on training graphs (opt-in)#3235
reger-men wants to merge 1 commit into
ROCm:developfrom
reger-men:pr4-reorder-locality

Conversation

@reger-men
Copy link
Copy Markdown

reorder_for_locality is a bitwise-equivalent FX reorder (walks the graph in reverse and pulls each producer next to its sole consumer to improve L2 locality), so enabling it on training graphs cannot change operator semantics. Currently the pass is gated on is_inference.

This adds config.reorder_for_locality_in_training (env override TORCHINDUCTOR_REORDER_LOCALITY_TRAINING=1) that opts training graphs into the same pass. Default off, so upstream behaviour on training paths is preserved.

Per-workload tuning knob; do not set system-wide.

Test plan

  • test_default_off / test_env_one_turns_on / test_env_zero_keeps_off validate the env parsing in a fresh subprocess
  • test_pass_does_not_run_on_training_when_flag_off / test_pass_runs_on_training_when_flag_on spy on reorder_for_locality to confirm the gate
  • test_inference_path_unchanged confirms the existing default-on inference path is untouched

@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented May 19, 2026

Jenkins build for f3ba1dc00ccf1b91c81c5c3efdd825e1367c600d commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

@reger-men reger-men force-pushed the pr4-reorder-locality branch from f3ba1dc to f22830d Compare May 20, 2026 18:03
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented May 20, 2026

Jenkins build for f22830d4d9e3d0c54a60852deb43781bfd31fb98 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

@reger-men reger-men force-pushed the pr4-reorder-locality branch from f22830d to 3edffd2 Compare May 21, 2026 09:23
reorder_for_locality is a bitwise-equivalent FX reorder (walks the
graph in reverse and pulls each producer next to its sole consumer to
improve L2 locality), so enabling it on training graphs cannot change
operator semantics. Currently it is gated on `is_inference`.

This adds `config.reorder_for_locality_in_training` (env override
`TORCHINDUCTOR_REORDER_LOCALITY_TRAINING=1`) that opts training
graphs into the same pass. Default off, so upstream behaviour on
training paths is preserved.

Test asserts the gate works in both directions and that the inference
default-on path is unchanged. Tests use `torch._inductor.config.patch`
for in-process attribute patching and a subprocess for the import-time
env-parsing path.
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api Bot commented May 21, 2026

Jenkins build for 3edffd2daa3ffa89b9ed91f783e58266cc2a6d35 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant