Add a16w8 reduce_sum FVP coverage for Ethos-U85 by Ninja91 · Pull Request #19319 · pytorch/executorch

Ninja91 · 2026-05-06T01:21:53Z

Summary:
Adds an a16w8 (int16 IO + int8 weights) sweep for aten.sum.dim_IntList reducing the last dim with keepdim=True. The new tests test_sum_dim_intlist_a16w8_{u55,u85}_INT run on the standard Corstone-300 / Corstone-320 FVP harness. The U85 case surfaces a known numerics issue in the Vela regor lowering at int16 IO precision (silent zero output), tracked upstream at https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23. The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale.

Also annotates the four dim_None{,_4d_tensor} parametrize ids on test_sum_u{55,85}_INT_1_0 (and the corresponding fp16 / bf16 variants) with skips= -- those cases cannot be exercised through the FVP harness because executorch.devtools.bundled_program.config rejects None as a model input. The dim=None case is properly covered by the existing SumDefault class.

Test design:

Standard pipeline.run() with the same a16w8 kwargs other arm a16w8 tests use (e.g. test_native_layer_norm_16a8w_u85_INT in test_layer_norm.py): a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16.
Numerical comparison is the standard atol/rtol check from pipeline.run() -- no SQNR helpers.
The U85 a16w8 test is wrapped with both common.XfailIfNoCorstone320 (handles missing-FVP environments via FileNotFoundError) and pytest.mark.xfail(strict=False, reason="...") (handles the silent-zero bug). Both are function-level decorators that compose cleanly -- pattern matches test_max_pool1d.py:111-114. strict=False keeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream Vela fix is in tree (cases XPASS allowed).

Differential Revision: D103667823

pytorch-bot · 2026-05-06T01:21:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19319

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 92 Pending

As of commit fc67945 with merge base 851cffb ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-05-06T01:22:09Z

@Ninja91 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D103667823.

github-actions · 2026-05-06T01:23:12Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

This PR adds Arm backend test coverage for the a16w8 (int16 activations / IO quantization) path of aten.sum.dim_IntList (reducing the last dim with keepdim=True) on Corstone FVPs, with the intent of surfacing a known Ethos-U85 ReduceSum int16 numerics issue (silent-zero output) while keeping the overall test target green via non-strict XFAILs.

Changes:

Enables ops/test_sum.py in the Arm Bazel test target list.
Adds new SumLastDim-based a16w8 ReduceSum tests for Ethos-U55 and Ethos-U85, including per-case XFAILs for the known U85 issue.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
backends/arm/test/targets.bzl	Adds `ops/test_sum.py` to the default Arm test file list so it runs in the Bazel test suite.
backends/arm/test/ops/test_sum.py	Introduces new a16w8 ReduceSum last-dim tests for U55/U85 and marks U85 cases as non-strict XFAIL to capture the known Vela issue.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Summary: Adds an a16w8 (int16 IO + int8 weights) sweep for `aten.sum.dim_IntList` reducing the last dim with `keepdim=True`. The new tests `test_sum_dim_intlist_a16w8_{u55,u85}_INT` run on the standard Corstone-300 / Corstone-320 FVP harness and surface a numerics issue in the Ethos-U85 `ReduceSum` lowering at int16 IO precision (silent zero output). The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale. ## Context Part of a stack that documents and fixes a numerics bug in the Vela 5.0 Ethos-U85 backend (`regor`). Plan + cross-references: - **Plan:** {D103649006} ([Markup](https://internalfb.com/intern/markup/D103649006)) - **Step 1a (this diff):** ReduceSum-only a16w8 coverage in `test_sum.py` (LAND) - **Step 1b-softmax:** {D103734699} -- `test_softmax.py` a16w8 MHA softmax sweep (LAND) - **Step 1b-ops:** {D103760103} -- `test_softmax_ops.py` op-isolation harness (DNL) - **Step 2a:** {D103760153} -- `regor` patch in third-party Vela 5.0 fork (LAND) - **Step 2b:** {D103760514} -- DNL companion that drops `xfails=` from `test_sum.py` (lands in OSS only after upstream Vela syncs the fix) ## Test design Tests use the standard `pipeline.run()` with the same a16w8 kwargs other arm a16w8 tests use (e.g. `test_native_layer_norm_16a8w_u85_INT` in `test_layer_norm.py`): ``` a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16 ``` Numerical comparison is the standard `atol`/`rtol`-only check from `pipeline.run()` -- no SQNR helpers -- to stay consistent with the rest of `arm/test/ops/`. The U85 cases are wrapped with `xfails=a16w8_sum_u85_xfails, strict=False`. `strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) *and* after Step 2a lands the Vela patch (cases XPASS, allowed under non-strict). Step 2b separately drops the `xfails=` argument once the upstream Vela fix syncs down. The new U85 a16w8 test deliberately omits `common.XfailIfNoCorstone320` (which is present on the U55 sibling). Stacking that decorator with the per-id `xfails=` argument makes the per-id marks not fire (verified empirically) so the bug-firing cases would hard-fail instead of XFAIL. CI always has Corstone-320 installed; if it ever isn't, the test fails loudly with `FileNotFoundError`, which is the right signal for a missing-FVP misconfiguration. A code comment in the file documents this constraint. ## Scope note This diff only **adds** new tests for the a16w8 path. It does not modify any existing tests in `test_sum.py` -- the pre-existing `Sum.test_parameters` (including the `dim_None` cases) is left as-is. Pre-existing `dim_None` test failures on `test_sum_u{55,85}_INT_1_0` are out of scope and unrelated to this diff. Differential Revision: D103667823

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Summary: Adds an a16w8 (int16 IO + int8 weights) sweep for `aten.sum.dim_IntList` reducing the last dim with `keepdim=True`. The new tests `test_sum_dim_intlist_a16w8_{u55,u85}_INT` run on the standard Corstone-300 / Corstone-320 FVP harness. The U85 case surfaces a known numerics issue in the Vela `regor` lowering at int16 IO precision (silent zero output), tracked upstream at https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23. The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale. This diff is **additive only**: the `Sum` / `SumDefault` test classes and existing test functions are not modified, except for `skips=` annotations on the four pre-existing `dim_None` parametrize ids that are not bundled-program-serializable and surface only because this diff is the first to register `ops/test_sum.py` in the buck test target list. Test design: - Standard `pipeline.run()` with the same a16w8 kwargs other arm a16w8 tests use (e.g. `test_native_layer_norm_16a8w_u85_INT` in `test_layer_norm.py`): `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16`. - Numerical comparison is the standard `atol`/`rtol` check from `pipeline.run()` — no SQNR helpers. - The U85 cases are wrapped with `xfails=a16w8_sum_u85_xfails, strict=False`. `strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream Vela fix is in tree (cases XPASS allowed). - `XfailIfNoCorstone320` is intentionally omitted on the new a16w8 U85 test — stacking it with the per-id `xfails=` argument makes the per-id marks not fire (verified empirically in this buck test target). A code comment in the file documents this constraint. Differential Revision: D103667823

Summary: Adds an a16w8 (int16 IO + int8 weights) sweep for `aten.sum.dim_IntList` reducing the last dim with `keepdim=True`. The new tests `test_sum_dim_intlist_a16w8_{u55,u85}_INT` run on the standard Corstone-300 / Corstone-320 FVP harness. The U85 case surfaces a known numerics issue in the Vela `regor` lowering at int16 IO precision (silent zero output), tracked upstream at https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23. The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale. Also annotates the four `dim_None{,_4d_tensor}` parametrize ids on `test_sum_u{55,85}_INT_1_0` (and the corresponding fp16 / bf16 variants) with `skips=` -- those cases cannot be exercised through the FVP harness because `executorch.devtools.bundled_program.config` rejects `None` as a model input. The dim=None case is properly covered by the existing `SumDefault` class. Test design: - Standard `pipeline.run()` with the same a16w8 kwargs other arm a16w8 tests use (e.g. `test_native_layer_norm_16a8w_u85_INT` in `test_layer_norm.py`): `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16`. - Numerical comparison is the standard `atol`/`rtol` check from `pipeline.run()` -- no SQNR helpers. - The U85 a16w8 test is wrapped with both `common.XfailIfNoCorstone320` (handles missing-FVP environments via `FileNotFoundError`) and `pytest.mark.xfail(strict=False, reason="...")` (handles the silent-zero bug). Both are function-level decorators that compose cleanly -- pattern matches `test_max_pool1d.py:111-114`. `strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream Vela fix is in tree (cases XPASS allowed). Differential Revision: D103667823

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

+# dim=None cases skipped: executorch.devtools.bundled_program.config rejects
+# None as a model input. dim=None is covered by the SumDefault class below.
+_DIM_NONE_SKIP_REASON = (
+    "bundled_program cannot serialize None as a model input; "
+    "dim=None is covered by SumDefault"
+)


Copilot AI review requested due to automatic review settings May 6, 2026 01:21

Ninja91 requested a review from digantdesai as a code owner May 6, 2026 01:21

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 6, 2026

github-actions Bot added ciflow/trunk module: arm Issues related to arm backend and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. labels May 6, 2026

meta-codesync Bot added fb-exported meta-exported labels May 6, 2026

Ninja91 added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label May 6, 2026

Copilot started reviewing on behalf of Ninja91 May 6, 2026 01:22 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

Comment thread backends/arm/test/ops/test_sum.py Outdated

Comment thread backends/arm/test/ops/test_sum.py Outdated

Comment thread backends/arm/test/ops/test_sum.py

Comment thread backends/arm/test/targets.bzl

Ninja91 requested a review from 3l1 May 6, 2026 01:35

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 6, 2026

meta-codesync Bot changed the title ~~Add a16w8 reduce_sum FVP coverage for Ethos-U85~~ Add a16w8 reduce_sum FVP coverage for Ethos-U85 (#19319) May 6, 2026

Ninja91 force-pushed the export-D103667823 branch from b4603d2 to 20105f6 Compare May 6, 2026 04:07

meta-codesync Bot changed the title ~~Add a16w8 reduce_sum FVP coverage for Ethos-U85 (#19319)~~ Add a16w8 reduce_sum FVP coverage for Ethos-U85 May 6, 2026

Copilot AI review requested due to automatic review settings May 6, 2026 06:05

Ninja91 force-pushed the export-D103667823 branch from 20105f6 to 876f542 Compare May 6, 2026 06:05

Copilot started reviewing on behalf of Ninja91 May 6, 2026 06:05 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

Comment thread backends/arm/test/ops/test_sum.py

meta-codesync Bot changed the title ~~Add a16w8 reduce_sum FVP coverage for Ethos-U85~~ Add a16w8 reduce_sum FVP coverage for Ethos-U85 (#19319) May 6, 2026

Ninja91 force-pushed the export-D103667823 branch from 876f542 to 6639b8f Compare May 6, 2026 15:40

meta-codesync Bot changed the title ~~Add a16w8 reduce_sum FVP coverage for Ethos-U85 (#19319)~~ Add a16w8 reduce_sum FVP coverage for Ethos-U85 May 6, 2026

Copilot AI review requested due to automatic review settings May 6, 2026 22:08

Ninja91 force-pushed the export-D103667823 branch from 6639b8f to fc67945 Compare May 6, 2026 22:08

Copilot started reviewing on behalf of Ninja91 May 6, 2026 22:09 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a16w8 reduce_sum FVP coverage for Ethos-U85#19319

Add a16w8 reduce_sum FVP coverage for Ethos-U85#19319
Ninja91 wants to merge 1 commit intopytorch:mainfrom
Ninja91:export-D103667823

Ninja91 commented May 6, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented May 6, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ninja91 commented May 6, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19319

⏳ No Failures, 92 Pending

Uh oh!

meta-codesync Bot commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ninja91 commented May 6, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented May 6, 2026 •

edited

Loading

This PR needs a `release notes:` label