Skip to content

Arm backend: Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167)#18167

Open
3l1 wants to merge 1 commit intopytorch:mainfrom
3l1:export-D96432610
Open

Arm backend: Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167)#18167
3l1 wants to merge 1 commit intopytorch:mainfrom
3l1:export-D96432610

Conversation

@3l1
Copy link
Contributor

@3l1 3l1 commented Mar 13, 2026

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

  1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
    shape_indices on the raw shapes and preserves the last dimension (NHWC
    channel), skip inserting input/output transposes. The view_copy can
    operate directly on NHWC data.

  2. Redundant permute_copy elimination: Model-level permute_copy ops whose
    permutation matches channels_last_order (NCHW→NHWC) or its inverse
    (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
    already handles format conversion. Replace them with view_copy (identity
    reshape) to avoid generating TOSA TRANSPOSE nodes.

Reviewed By: digantdesai

Differential Revision: D96432610

@3l1 3l1 requested a review from digantdesai as a code owner March 13, 2026 19:59
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 13, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18167

Note: Links to docs will display an error until the docs builds have been completed.

❌ 8 New Failures, 2 Cancelled Jobs, 2 Pending, 3 Unrelated Failures

As of commit 79951e1 with merge base bb8318d (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 13, 2026
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Mar 13, 2026

@3l1 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D96432610.

@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

3l1 added a commit to 3l1/executorch that referenced this pull request Mar 13, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:
Pull Request resolved: pytorch#18167

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. Redundant permute_copy elimination: Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D
   (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations.

This reduces Vela Transpose entries from 75→33
(-56%), Transpose op cycles from 33.4K→6.1K (-82%), and NPU operators
from 367→329 (-38).

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch from f9a57c8 to 6aac88a Compare March 13, 2026 20:45
@meta-codesync meta-codesync bot changed the title Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167) Mar 13, 2026
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 13, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. Redundant permute_copy elimination: Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D
   (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations.

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch from 6aac88a to 7f039b2 Compare March 13, 2026 20:47
@3l1 3l1 force-pushed the export-D96432610 branch from 7f039b2 to c019a17 Compare March 13, 2026 21:21
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 13, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:
Pull Request resolved: pytorch#18167

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. Redundant permute_copy elimination: Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D
   (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations.

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch from c019a17 to 41e5640 Compare March 16, 2026 20:10
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 16, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. Redundant permute_copy elimination: Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D
   (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations.

Reviewed By: digantdesai

Differential Revision: D96432610
@meta-codesync meta-codesync bot changed the title Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167) Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass Mar 16, 2026
@3l1 3l1 force-pushed the export-D96432610 branch from 41e5640 to d6ab788 Compare March 16, 2026 23:46
@meta-codesync meta-codesync bot changed the title Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167) Mar 17, 2026
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 17, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. Redundant permute_copy elimination: Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D
   (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations.

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch from d6ab788 to 347dbac Compare March 17, 2026 00:37
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 17, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. Redundant permute_copy elimination: Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D
   (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations.

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch 2 times, most recently from 8fff8d2 to 04da7fe Compare March 17, 2026 17:55
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 17, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. Redundant permute_copy elimination: Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D
   (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations.

Reviewed By: digantdesai

Differential Revision: D96432610
@meta-codesync meta-codesync bot changed the title Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167) Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass Mar 18, 2026
@3l1 3l1 force-pushed the export-D96432610 branch from 04da7fe to db00a4c Compare March 18, 2026 05:00
@meta-codesync meta-codesync bot changed the title Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167) Mar 18, 2026
@3l1 3l1 force-pushed the export-D96432610 branch from db00a4c to 9cd9fcd Compare March 18, 2026 20:19
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 18, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. Redundant permute_copy elimination: Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D
   (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations.

For the CC EMG model, this reduces Vela Transpose entries from 75→33
(-56%), Transpose op cycles from 33.4K→6.1K (-82%), and NPU operators
from 367→329 (-38).

Also removes the failed ReorderToNHWCPass which targeted permute_copy→
view_copy→permute_copy patterns that don't exist in the Edge IR graph.

Reviewed By: digantdesai

Differential Revision: D96432610
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 18, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:
Pull Request resolved: pytorch#18167

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. Redundant permute_copy elimination: Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D
   (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations.

For the CC EMG model, this reduces Vela Transpose entries from 75→33
(-56%), Transpose op cycles from 33.4K→6.1K (-82%), and NPU operators
from 367→329 (-38).

Also removes the failed ReorderToNHWCPass which targeted permute_copy→
view_copy→permute_copy patterns that don't exist in the Edge IR graph.

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch from 9cd9fcd to bee51a7 Compare March 18, 2026 20:35
@meta-codesync meta-codesync bot changed the title Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167) Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass Mar 18, 2026
@3l1 3l1 force-pushed the export-D96432610 branch from bee51a7 to a8d109d Compare March 18, 2026 21:21
@meta-codesync meta-codesync bot changed the title Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167) Mar 18, 2026
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 18, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. **NHWC-safe reshape detection:** When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. **Redundant permute_copy elimination:** Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes.

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch from a8d109d to a3b93f5 Compare March 18, 2026 21:32
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 18, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:
Pull Request resolved: pytorch#18167

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. **NHWC-safe reshape detection:** When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. **Redundant permute_copy elimination:** Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes.

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch from a3b93f5 to 27fb858 Compare March 18, 2026 21:35
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 18, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. **NHWC-safe reshape detection:** When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. **Redundant permute_copy elimination:** Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes.

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch from 27fb858 to e13f461 Compare March 18, 2026 21:52
@zingo zingo added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label Mar 18, 2026
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 18, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. **NHWC-safe reshape detection:** When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. **Redundant permute_copy elimination:** Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes.

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch from e13f461 to e54aac7 Compare March 18, 2026 22:14
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 18, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:
Pull Request resolved: pytorch#18167

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. **NHWC-safe reshape detection:** When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. **Redundant permute_copy elimination:** Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes.

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch from e54aac7 to c65b675 Compare March 18, 2026 22:18
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 18, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. **NHWC-safe reshape detection:** When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. **Redundant permute_copy elimination:** Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes.

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch from c65b675 to ec1f9ae Compare March 18, 2026 22:30
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 18, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:
Pull Request resolved: pytorch#18167

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. **NHWC-safe reshape detection:** When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. **Redundant permute_copy elimination:** Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes.

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch from ec1f9ae to 52a3927 Compare March 18, 2026 22:34
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. **NHWC-safe reshape detection:** When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. **Redundant permute_copy elimination:** Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes.

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch from 52a3927 to 79951e1 Compare March 18, 2026 22:41
@zingo zingo changed the title Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167) Arm backend: Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167) Mar 18, 2026
@3l1
Copy link
Contributor Author

3l1 commented Mar 18, 2026

Im investigating the failing tests...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants