Add a16w8 reduce_sum FVP coverage for Ethos-U85#19319
Add a16w8 reduce_sum FVP coverage for Ethos-U85#19319Ninja91 wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19319
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 92 PendingAs of commit fc67945 with merge base 851cffb ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@Ninja91 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D103667823. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR adds Arm backend test coverage for the a16w8 (int16 activations / IO quantization) path of aten.sum.dim_IntList (reducing the last dim with keepdim=True) on Corstone FVPs, with the intent of surfacing a known Ethos-U85 ReduceSum int16 numerics issue (silent-zero output) while keeping the overall test target green via non-strict XFAILs.
Changes:
- Enables
ops/test_sum.pyin the Arm Bazel test target list. - Adds new
SumLastDim-based a16w8 ReduceSum tests for Ethos-U55 and Ethos-U85, including per-case XFAILs for the known U85 issue.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| backends/arm/test/targets.bzl | Adds ops/test_sum.py to the default Arm test file list so it runs in the Bazel test suite. |
| backends/arm/test/ops/test_sum.py | Introduces new a16w8 ReduceSum last-dim tests for U55/U85 and marks U85 cases as non-strict XFAIL to capture the known Vela issue. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary:
Adds an a16w8 (int16 IO + int8 weights) sweep for `aten.sum.dim_IntList` reducing the last dim with `keepdim=True`. The new tests `test_sum_dim_intlist_a16w8_{u55,u85}_INT` run on the standard Corstone-300 / Corstone-320 FVP harness and surface a numerics issue in the Ethos-U85 `ReduceSum` lowering at int16 IO precision (silent zero output). The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale.
## Context
Part of a stack that documents and fixes a numerics bug in the Vela 5.0 Ethos-U85 backend (`regor`). Plan + cross-references:
- **Plan:** {D103649006} ([Markup](https://internalfb.com/intern/markup/D103649006))
- **Step 1a (this diff):** ReduceSum-only a16w8 coverage in `test_sum.py` (LAND)
- **Step 1b-softmax:** {D103734699} -- `test_softmax.py` a16w8 MHA softmax sweep (LAND)
- **Step 1b-ops:** {D103760103} -- `test_softmax_ops.py` op-isolation harness (DNL)
- **Step 2a:** {D103760153} -- `regor` patch in third-party Vela 5.0 fork (LAND)
- **Step 2b:** {D103760514} -- DNL companion that drops `xfails=` from `test_sum.py` (lands in OSS only after upstream Vela syncs the fix)
## Test design
Tests use the standard `pipeline.run()` with the same a16w8 kwargs other arm a16w8 tests use (e.g. `test_native_layer_norm_16a8w_u85_INT` in `test_layer_norm.py`):
```
a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16
```
Numerical comparison is the standard `atol`/`rtol`-only check from `pipeline.run()` -- no SQNR helpers -- to stay consistent with the rest of `arm/test/ops/`.
The U85 cases are wrapped with `xfails=a16w8_sum_u85_xfails, strict=False`. `strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) *and* after Step 2a lands the Vela patch (cases XPASS, allowed under non-strict). Step 2b separately drops the `xfails=` argument once the upstream Vela fix syncs down.
The new U85 a16w8 test deliberately omits `common.XfailIfNoCorstone320` (which is present on the U55 sibling). Stacking that decorator with the per-id `xfails=` argument makes the per-id marks not fire (verified empirically) so the bug-firing cases would hard-fail instead of XFAIL. CI always has Corstone-320 installed; if it ever isn't, the test fails loudly with `FileNotFoundError`, which is the right signal for a missing-FVP misconfiguration. A code comment in the file documents this constraint.
## Scope note
This diff only **adds** new tests for the a16w8 path. It does not modify any existing tests in `test_sum.py` -- the pre-existing `Sum.test_parameters` (including the `dim_None` cases) is left as-is. Pre-existing `dim_None` test failures on `test_sum_u{55,85}_INT_1_0` are out of scope and unrelated to this diff.
Differential Revision: D103667823
Summary:
Adds an a16w8 (int16 IO + int8 weights) sweep for `aten.sum.dim_IntList` reducing the last dim with `keepdim=True`. The new tests `test_sum_dim_intlist_a16w8_{u55,u85}_INT` run on the standard Corstone-300 / Corstone-320 FVP harness. The U85 case surfaces a known numerics issue in the Vela `regor` lowering at int16 IO precision (silent zero output), tracked upstream at https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23. The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale.
This diff is **additive only**: the `Sum` / `SumDefault` test classes and existing test functions are not modified, except for `skips=` annotations on the four pre-existing `dim_None` parametrize ids that are not bundled-program-serializable and surface only because this diff is the first to register `ops/test_sum.py` in the buck test target list.
Test design:
- Standard `pipeline.run()` with the same a16w8 kwargs other arm a16w8 tests use (e.g. `test_native_layer_norm_16a8w_u85_INT` in `test_layer_norm.py`): `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16`.
- Numerical comparison is the standard `atol`/`rtol` check from `pipeline.run()` — no SQNR helpers.
- The U85 cases are wrapped with `xfails=a16w8_sum_u85_xfails, strict=False`. `strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream Vela fix is in tree (cases XPASS allowed).
- `XfailIfNoCorstone320` is intentionally omitted on the new a16w8 U85 test — stacking it with the per-id `xfails=` argument makes the per-id marks not fire (verified empirically in this buck test target). A code comment in the file documents this constraint.
Differential Revision: D103667823
Summary:
Adds an a16w8 (int16 IO + int8 weights) sweep for `aten.sum.dim_IntList` reducing the last dim with `keepdim=True`. The new tests `test_sum_dim_intlist_a16w8_{u55,u85}_INT` run on the standard Corstone-300 / Corstone-320 FVP harness. The U85 case surfaces a known numerics issue in the Vela `regor` lowering at int16 IO precision (silent zero output), tracked upstream at https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23. The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale.
Also annotates the four `dim_None{,_4d_tensor}` parametrize ids on `test_sum_u{55,85}_INT_1_0` (and the corresponding fp16 / bf16 variants) with `skips=` -- those cases cannot be exercised through the FVP harness because `executorch.devtools.bundled_program.config` rejects `None` as a model input. The dim=None case is properly covered by the existing `SumDefault` class.
Test design:
- Standard `pipeline.run()` with the same a16w8 kwargs other arm a16w8 tests use (e.g. `test_native_layer_norm_16a8w_u85_INT` in `test_layer_norm.py`): `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16`.
- Numerical comparison is the standard `atol`/`rtol` check from `pipeline.run()` -- no SQNR helpers.
- The U85 a16w8 test is wrapped with both `common.XfailIfNoCorstone320` (handles missing-FVP environments via `FileNotFoundError`) and `pytest.mark.xfail(strict=False, reason="...")` (handles the silent-zero bug). Both are function-level decorators that compose cleanly -- pattern matches `test_max_pool1d.py:111-114`. `strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream Vela fix is in tree (cases XPASS allowed).
Differential Revision: D103667823
| # dim=None cases skipped: executorch.devtools.bundled_program.config rejects | ||
| # None as a model input. dim=None is covered by the SumDefault class below. | ||
| _DIM_NONE_SKIP_REASON = ( | ||
| "bundled_program cannot serialize None as a model input; " | ||
| "dim=None is covered by SumDefault" | ||
| ) |
Summary:
Adds an a16w8 (int16 IO + int8 weights) sweep for
aten.sum.dim_IntListreducing the last dim withkeepdim=True. The new teststest_sum_dim_intlist_a16w8_{u55,u85}_INTrun on the standard Corstone-300 / Corstone-320 FVP harness. The U85 case surfaces a known numerics issue in the Velaregorlowering at int16 IO precision (silent zero output), tracked upstream at https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23. The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale.Also annotates the four
dim_None{,_4d_tensor}parametrize ids ontest_sum_u{55,85}_INT_1_0(and the corresponding fp16 / bf16 variants) withskips=-- those cases cannot be exercised through the FVP harness becauseexecutorch.devtools.bundled_program.configrejectsNoneas a model input. The dim=None case is properly covered by the existingSumDefaultclass.Test design:
pipeline.run()with the same a16w8 kwargs other arm a16w8 tests use (e.g.test_native_layer_norm_16a8w_u85_INTintest_layer_norm.py):a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16.atol/rtolcheck frompipeline.run()-- no SQNR helpers.common.XfailIfNoCorstone320(handles missing-FVP environments viaFileNotFoundError) andpytest.mark.xfail(strict=False, reason="...")(handles the silent-zero bug). Both are function-level decorators that compose cleanly -- pattern matchestest_max_pool1d.py:111-114.strict=Falsekeeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream Vela fix is in tree (cases XPASS allowed).Differential Revision: D103667823