You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Arm backend: Modernise and standalone the executor runner (pytorch#19018)
## Summary
This PR modernizes the ExecuTorch Arm bare-metal runner workflow so
users can move from a PyTorch model to a runnable
Arm executor runner with fewer manual build-system steps, stronger
validation, and faster repeated local iteration.
The main change is a new standalone Arm executor runner CMake entry
point. `run.sh` now acts as the orchestration
layer for common Ethos-U bare-metal flows: it can derive build
directories, configure the standalone runner with Arm
bare-metal defaults, stage generated PTE/BPTE files, validate reused
CMake caches, build the needed runner target,
locate the runner binary, and invoke FVP.
## Problem
Before this change, the Arm runner workflow depended on manually
stitching together ExecuTorch build/install
artifacts, runner CMake configuration, PTE input wiring, toolchain and
target settings, optional debug features, and
repeated install/export steps.
That made the workflow harder to explain, fragile in CI, slower to
iterate on locally, and easy to break when reusing
a build directory configured for a different target or feature set.
And a shorter version if the PR description is already long:
## CMake Architecture Change
```mermaid
flowchart LR
subgraph Before
A1["Build ExecuTorch<br/>arm-baremetal preset"] --> A2["Install/export artifacts"]
A2 --> A3["Configure runner CMake<br/>examples/arm/executor_runner"]
A4["PTE / BPTE"] --> A3
A3 --> A5["arm_executor_runner ELF"]
end
subgraph After
B1["run.sh"] --> B2["Validate / choose build dir"]
B2 --> B3["Standalone runner CMake<br/>examples/arm/executor_runner/standalone"]
B4["PTE / BPTE"] --> B1
B3 --> B5["ExecuTorch top-level CMake<br/>as subdirectory"]
B3 --> B6["Arm CMake helpers + presets"]
B5 --> B7["arm_executor_runner ELF"]
B6 --> B7
end
```
## What Changed
- Added `examples/arm/executor_runner/standalone` as the supported
standalone CMake entry point for
`arm_executor_runner`.
- Added shared Arm CMake helpers for Ethos-U SDK setup, required target
validation, and predictable runner output
paths.
- Updated `build_executor_runner.sh` and `run.sh` to use the standalone
runner workflow.
- Added deterministic default build directories under `--et_build_root`.
- Added cache validation for reused build directories, including target,
toolchain, selected ops, PTE placement,
BundleIO, ETDump, and devtools settings.
- Added PTE/BPTE staging so repeated runs can reuse the same configured
CMake build directory.
- Integrated selective-op handling into the standalone runner path.
- Cleaned up bare-metal install/export behavior so standalone builds can
consume reusable build-tree artifacts.
- Updated Arm README and notebooks for the new workflow.
## Iteration Speed
Repeated local PTE-to-runner iteration is now **8x faster** because
`run.sh` can reuse the configured standalone CMake build directory,
stage updated PTE/BPTE payloads into the existing cache wiring, and
rebuild only the needed runner target instead of repeating the full
manual configure/install/export flow.
This is a developer workflow speedup, not a model runtime speedup.
## Result
For common Ethos-U bare-metal usage, the user-facing path is now
script-owned and repeatable:
1. Run Arm setup.
2. Run `examples/arm/run.sh` with a model and target.
3. Reuse or inspect the generated build directory under
`--et_build_root`.
4. Iterate by regenerating the PTE/BPTE and rebuilding through the same
validated CMake cache.
VGF host flows remain explicit: `run.sh` requires an existing
`--build-dir` for VGF-style host builds rather than
auto-configuring them as bare-metal runner builds.
## Testing
Validated through the Arm backend runner, bare-metal, VGF, and CI
workflows covered by this stack.
cc @digantdesai@freddan80@per@zingo@oscarandersson8218@mansnils@Sebastian-Larsson@robell@rascani
---------
Signed-off-by: Usamah Zaheer <usamah.zaheer@arm.com>
echo"Build Arm ${toolchain/-gcc/} executor_runner for ${target} PTE: ${pte_file} using ${system_config}${memory_mode}${extra_build_flags} to '${output_folder}'"
Copy file name to clipboardExpand all lines: backends/arm/scripts/docgen/ethos-u/backends-arm-ethos-u-overview.md.in
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ For the AOT flow, compilation of a model to `.pte` format using the Ethos-U back
27
27
- [TOSA Serialization Library](https://www.mlplatform.org/tosa/software.html) for serializing the Exir IR graph into TOSA IR.
28
28
- [Ethos-U Vela graph compiler](https://pypi.org/project/ethos-u-vela/) for compiling TOSA flatbuffers into an Ethos-U command stream.
29
29
30
-
And for building and running the example application available in `examples/arm/executor_runner/`:
30
+
And for building and running the example application available in `examples/arm/executor_runner/` through the standalone CMake entry point:
31
31
- [Arm GNU Toolchain](https://developer.arm.com/Tools%20and%20Software/GNU%20Toolchain) for cross compilation.
32
32
- [Arm® Corstone™ SSE-300 FVP](https://developer.arm.com/documentation/100966/1128/Arm--Corstone-SSE-300-FVP) for testing on a Arm® Cortex®-M55+Ethos-U55 reference design.
33
33
- [Arm® Corstone™ SSE-320 FVP](https://developer.arm.com/documentation/109760/0000/SSE-320-FVP) for testing on a Arm® Cortex®-M85+Ethos-U85 reference design.
@@ -55,7 +55,7 @@ For more information on quantization, see [Quantization](arm-ethos-u-quantizatio
55
55
56
56
## Runtime Integration
57
57
58
-
An example runtime application is available in [examples/arm/executor_runner](https://github.com/pytorch/executorch/blob/main/examples/arm/executor_runner/), and the steps requried for building and deploying it on a FVP it is explained in the previously mentioned [Arm Ethos-U Backend Tutorial](tutorials/ethos-u-getting-started.md). <!-- @lint-ignore -->
58
+
An example runtime application is available in [examples/arm/executor_runner](https://github.com/pytorch/executorch/blob/main/examples/arm/executor_runner/), with a standalone CMake entry point in `examples/arm/executor_runner/standalone`. The steps required for building and deploying it on an FVP are explained in the previously mentioned [Arm Ethos-U Backend Tutorial](tutorials/ethos-u-getting-started.md). <!-- @lint-ignore -->
59
59
The example application is recommended to use for testing basic functionality of your lowered models, as well as a starting point for developing runtime integrations for your own targets.
60
60
For an in-depth explanation of the architecture of the executor_runner and the steps required for doing such an integration, please refer to [Ethos-U porting guide](https://github.com/pytorch/executorch/blob/main/examples/arm/ethos-u-porting-guide.md).
Copy file name to clipboardExpand all lines: backends/arm/scripts/docgen/ethos-u/ethos-u-getting-started-tutorial.md.in
+8-15Lines changed: 8 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -76,35 +76,28 @@ To produce a pte file equivalent to the one above, run
76
76
77
77
### Runtime:
78
78
79
-
After the AOT compilation flow is done, the runtime can be cross compiled and linked to the produced `.pte`-file using the Arm cross-compilation toolchain. This is done in two steps:
79
+
After the AOT compilation flow is done, the runtime can be cross compiled and linked to the produced `.pte`-file using the Arm cross-compilation toolchain. Configure the standalone Arm executor runner CMake project to pull in the ExecuTorch build graph, link the Ethos-U delegate, and generate kernel bindings for any non-delegated ops. This produces the `arm_executor_runner` program that will run on target.
80
80
81
-
First, build and install the ExecuTorch libraries and EthosUDelegate:
82
81
```
83
82
# In ExecuTorch top-level, with sourced setup_path.sh
Second, build and link the `arm_executor_runner` and generate kernel bindings for any non delegated ops. This is the actual program that will run on target.
88
-
89
-
```
90
-
# In ExecuTorch top-level, with sourced setup_path.sh
The block diagram below shows, at the high level, how the various build artifacts are generated and are linked together to generate the final bare-metal executable.
0 commit comments