|
| 1 | +--- |
| 2 | +name: binary-size |
| 3 | +description: Analyze and reduce ExecuTorch binary size. Use when investigating binary size, running size tests, or optimizing the runtime for size-constrained deployments. |
| 4 | +--- |
| 5 | + |
| 6 | +# Binary Size |
| 7 | + |
| 8 | +## Start from the `main` branch of executorch |
| 9 | +Ask the user where the executorch repo is. |
| 10 | + |
| 11 | +```bash |
| 12 | +git checkout main && git pull |
| 13 | +``` |
| 14 | + |
| 15 | +## Build and measure baseline |
| 16 | +```bash |
| 17 | +conda activate executorch |
| 18 | +bash test/build_size_test.sh |
| 19 | +strip -o /tmp/size_test_stripped cmake-out/test/size_test |
| 20 | +strip -o /tmp/size_test_all_ops_stripped cmake-out/test/size_test_all_ops |
| 21 | +ls -la /tmp/size_test_stripped /tmp/size_test_all_ops_stripped |
| 22 | +``` |
| 23 | + |
| 24 | +Produces two binaries: |
| 25 | +- `cmake-out/test/size_test` — ExecuTorch runtime without operator implementations |
| 26 | +- `cmake-out/test/size_test_all_ops` — ExecuTorch runtime with portable ops |
| 27 | + |
| 28 | +## Analyze with bloaty |
| 29 | +```bash |
| 30 | +bloaty cmake-out/test/size_test -d symbols -n 30 # by symbol |
| 31 | +bloaty cmake-out/test/size_test -d sections # by ELF section |
| 32 | +bloaty <after> -- <before> # diff two builds |
| 33 | +nm -S <binary> | sort -k2 -rn | head -30 # symbol sizes |
| 34 | +strings <binary> | less # string literals in .rodata |
| 35 | +``` |
| 36 | + |
| 37 | +Note: `bloaty -d compileunits` requires debug info (`-g`). The Release build does not include it. |
| 38 | + |
| 39 | +## Key build flags |
| 40 | +Set by `test/build_size_test.sh`: |
| 41 | +- `CMAKE_BUILD_TYPE=Release` |
| 42 | +- `EXECUTORCH_OPTIMIZE_SIZE=ON` — enables `-Os`, `-fno-exceptions`, `-fno-rtti`, unwind table suppression |
| 43 | +- `CXXFLAGS="-fno-exceptions -fno-rtti -Wall -Werror"` |
| 44 | + |
| 45 | +## Constraints |
| 46 | +- Use **CMake** to build (not Buck) |
| 47 | +- **C++17 minimum** language standard |
| 48 | +- Must build on **GCC 9** (CI uses `executorch-ubuntu-22.04-gcc9-nopytorch`) and **Clang 12** — avoid compiler-specific flags or pragmas without version guards |
| 49 | +- Do not regress existing functionality — run tests for modified files |
| 50 | +- Do not change build flags in `build_size_test.sh` for size reductions |
| 51 | +- Do not increase latency in the core runtime |
| 52 | + |
| 53 | +## Where to look for size reductions |
| 54 | +- `.text`: look for large functions, template bloat, duplicate instantiations |
| 55 | +- `.rodata`: verbose error messages, format strings, embedded file paths (`__FILE__`) |
| 56 | +- `.eh_frame`: should already be suppressed when `EXECUTORCH_OPTIMIZE_SIZE=ON` |
| 57 | +- Static init functions (`nm -S <binary> | grep GLOBAL__sub_I`): use `constexpr` constructors to constant-initialize static arrays |
| 58 | +- Logging strings: `ET_LOG_ENABLED=0` in Release eliminates format strings; ensure it propagates to consumers via `PUBLIC` compile definitions on cmake targets |
| 59 | +- Inline header functions: watch for compile-define mismatches between library and consumer TUs (e.g. `ET_LOG_ENABLED` set in library but not in consumer) |
| 60 | + |
| 61 | +## For each change |
| 62 | +1. Create a branch: `git checkout -b binary-size-<N>` |
| 63 | +2. Implement, rebuild, measure stripped sizes |
| 64 | +3. Create a separate PR — one logical change per PR |
| 65 | +4. Record results in `binary-size-<N>.md`: |
| 66 | + |
| 67 | +| Binary | This change (N vs N-1) | Cumulative (N vs main) | |
| 68 | +|---|---|---| |
| 69 | +| `size_test` (stripped) | -X | -Y | |
| 70 | +| `size_test_all_ops` (stripped) | -X | -Y | |
| 71 | + |
| 72 | +5. Update the CI size threshold in `.github/workflows/pull.yml` if sizes decrease |
0 commit comments