[inductor] allow reorder_for_locality on training graphs (opt-in)#3235
Open
reger-men wants to merge 1 commit into
Open
[inductor] allow reorder_for_locality on training graphs (opt-in)#3235reger-men wants to merge 1 commit into
reger-men wants to merge 1 commit into
Conversation
|
Jenkins build for f3ba1dc00ccf1b91c81c5c3efdd825e1367c600d commit finished as FAILURE |
f3ba1dc to
f22830d
Compare
|
Jenkins build for f22830d4d9e3d0c54a60852deb43781bfd31fb98 commit finished as FAILURE |
f22830d to
3edffd2
Compare
reorder_for_locality is a bitwise-equivalent FX reorder (walks the graph in reverse and pulls each producer next to its sole consumer to improve L2 locality), so enabling it on training graphs cannot change operator semantics. Currently it is gated on `is_inference`. This adds `config.reorder_for_locality_in_training` (env override `TORCHINDUCTOR_REORDER_LOCALITY_TRAINING=1`) that opts training graphs into the same pass. Default off, so upstream behaviour on training paths is preserved. Test asserts the gate works in both directions and that the inference default-on path is unchanged. Tests use `torch._inductor.config.patch` for in-process attribute patching and a subprocess for the import-time env-parsing path.
|
Jenkins build for 3edffd2daa3ffa89b9ed91f783e58266cc2a6d35 commit finished as FAILURE |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
reorder_for_localityis a bitwise-equivalent FX reorder (walks the graph in reverse and pulls each producer next to its sole consumer to improve L2 locality), so enabling it on training graphs cannot change operator semantics. Currently the pass is gated onis_inference.This adds
config.reorder_for_locality_in_training(env overrideTORCHINDUCTOR_REORDER_LOCALITY_TRAINING=1) that opts training graphs into the same pass. Default off, so upstream behaviour on training paths is preserved.Per-workload tuning knob; do not set system-wide.
Test plan
test_default_off/test_env_one_turns_on/test_env_zero_keeps_offvalidate the env parsing in a fresh subprocesstest_pass_does_not_run_on_training_when_flag_off/test_pass_runs_on_training_when_flag_onspy onreorder_for_localityto confirm the gatetest_inference_path_unchangedconfirms the existing default-on inference path is untouched