Confine SIMD code to runtime-dispatched tiers (fixes #628) by andrewkern · Pull Request #630 · MesserLab/SLiM

andrewkern · 2026-05-21T22:02:12Z

Problem. SLiM was built with -mavx2 -mfma applied project-wide, so the compiler emitted AVX2/FMA throughout the whole binary — not just the SIMD kernels. Pre-Haswell x86_64 CPUs (no AVX2) crashed with SIGILL (see issue #628).

Fix. The kernels formerly in eidos_simd.h move to eidos_simd_impl.h, a tier-parameterized body compiled once per instruction-set tier — scalar, SSE4.2, AVX2+FMA (x86_64), NEON (ARM64).

Only the per-tier .cpp files get ISA flags; everything else, including the dispatcher, builds at the baseline x86_64 ABI. Eidos_SIMD_Init() checks the CPU with __builtin_cpu_supports() and points the Eidos_SIMD function pointers at the fastest supported tier.

One binary should now correct on any x86_64 CPU — baseline contains no AVX2/SSE4.2, tier code runs only after the CPU is confirmed to support it. USE_SIMD is now ON/OFF (OFF and MSVC build scalar only).

Verification.

-testEidos (7464) and -testSLiM (36853) pass.
Object-file inspection: AVX2/FMA only in eidos_simd_avx2.cpp.o, SSE4.2-only in eidos_simd_sse42.cpp.o, every other .o baseline-clean.
All three x86 tiers exercised at runtime (AVX2 / SSE4.2 / scalar) — all pass.
USE_SIMD=OFF build: zero AVX2/FMA, tests pass.

Things still needed!

Xcode project. SLiM.xcodeproj needs updating before the macOS build will work:
Add the 6 new files — eidos_simd.cpp, eidos_simd_scalar.cpp, eidos_simd_sse42.cpp, eidos_simd_avx2.cpp, eidos_simd_neon.cpp, and the header eidos_simd_impl.h — and add the 5 .cpp files to the Compile Sources phase of each Eidos/SLiM target.
Set per-file flags in the Compile Sources build phase: -mavx2 -mfma on eidos_simd_avx2.cpp, -msse4.2 on eidos_simd_sse42.cpp. Without the flags that target I think it will fail to compile (AVX2 intrinsics with no -mavx2)
The qmake build is scalar-only (EIDOS_SUPPRESS_SIMD_DISPATCH) i think?

SLiM was built with -mavx2 -mfma applied to the whole project, which let the compiler emit AVX2/FMA instructions throughout the entire binary, not only in the explicit SIMD kernels. The resulting build crashed with SIGILL on x86_64 CPUs without AVX2 (pre-Haswell, ~2012 and earlier). The kernels formerly in eidos_simd.h are moved to a tier-parameterized body, eidos_simd_impl.h, compiled once per instruction-set tier: scalar, SSE4.2, and AVX2+FMA on x86_64, and NEON on ARM64. Only the per-tier translation units receive instruction-set flags; every other translation unit, including the dispatcher, is compiled at the baseline x86_64 ABI. At startup Eidos_SIMD_Init() probes the CPU with __builtin_cpu_supports() and points the Eidos_SIMD function pointers at the fastest supported tier. A single binary is therefore correct on any x86_64 CPU: the baseline of the executable contains no AVX2/SSE4.2 instructions, and tier code runs only after the CPU has been confirmed to support it. USE_SIMD is now a simple ON/OFF switch; OFF (and MSVC) builds the scalar tier only. The qmake build sets EIDOS_SUPPRESS_SIMD_DISPATCH, keeping its prior scalar-only behavior since it applies no per-file SIMD flags.

These files were created from copies of eidos_simd.h and carried its original creation date; set it to their actual creation date.

codecov · 2026-05-21T22:05:22Z

Codecov Report

❌ Patch coverage is 97.46377% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.57%. Comparing base (b7435e2) to head (df85904).

Files with missing lines	Patch %	Lines
eidos/eidos_test_functions_math.cpp	90.38%	10 Missing ⚠️
eidos/eidos_simd.cpp	85.18%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #630      +/-   ##
==========================================
- Coverage   75.64%   75.57%   -0.08%     
==========================================
  Files         114      118       +4     
  Lines       72808    73048     +240     
  Branches    12873    12915      +42     
==========================================
+ Hits        55079    55207     +128     
- Misses      17729    17841     +112

Files with missing lines	Coverage Δ
eidos/eidos_globals.cpp	`72.06% <100.00%> (+0.02%)`	⬆️
eidos/eidos_simd_avx2.cpp	`100.00% <100.00%> (ø)`
eidos/eidos_simd_impl.h	`100.00% <100.00%> (ø)`
eidos/eidos_simd_scalar.cpp	`100.00% <100.00%> (ø)`
eidos/eidos_simd_sse42.cpp	`100.00% <100.00%> (ø)`
eidos/eidos_simd.cpp	`85.18% <85.18%> (ø)`
eidos/eidos_test_functions_math.cpp	`97.01% <90.38%> (-0.51%)`	⬇️

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andrewkern · 2026-05-21T22:18:56Z

grrr... some of the windows tests are failing... working on it

The SIMD kernels are compiled once per instruction-set tier, but only the tier the CPU selects at startup ever runs, so on a modern CI machine the scalar and SSE4.2 kernels were never executed or covered. Add Eidos_SIMD_SelectTier(), which forces a specific tier, and rewrite Eidos_SIMD_Init() in terms of it. The SIMD math tests now cycle through every tier the CPU supports -- running the full battery against scalar, SSE4.2, and AVX2+FMA -- then restore the best tier. The battery also gains direct tests for the kernels that previously had no per-tier coverage: sqrt, abs, the rounding family, the reductions, the convolution helpers, and the single-precision spatial-interaction kernels. This exercises the scalar and SSE4.2 code paths that nothing tested before.

…lags On Windows the WIN32 target blocks run set_source_files_properties() over every source file to add "-include config.h". COMPILE_FLAGS is a single string property, so that overwrote the "-mavx2 -mfma" / "-msse4.2" set earlier on the tier files, and eidos_simd_avx2.cpp then failed to compile its AVX2 intrinsics. Apply the per-tier ISA flags at the end of the file instead, after the WIN32 blocks, using set_property(... APPEND_STRING ...) so they extend rather than replace COMPILE_FLAGS.

andrewkern · 2026-05-22T15:23:13Z

okay @bhaller -- this is ready for review

andrewkern added 2 commits May 21, 2026 10:21

Update creation dates on the new SIMD dispatch files

f0d2060

These files were created from copies of eidos_simd.h and carried its original creation date; set it to their actual creation date.

andrewkern added 2 commits May 21, 2026 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confine SIMD code to runtime-dispatched tiers (fixes #628)#630

Confine SIMD code to runtime-dispatched tiers (fixes #628)#630
andrewkern wants to merge 4 commits into
MesserLab:masterfrom
andrewkern:fix/simd-runtime-dispatch

andrewkern commented May 21, 2026

Uh oh!

codecov Bot commented May 21, 2026 •

edited

Loading

Uh oh!

andrewkern commented May 21, 2026

Uh oh!

andrewkern commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andrewkern commented May 21, 2026

Uh oh!

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andrewkern commented May 21, 2026

Uh oh!

andrewkern commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented May 21, 2026 •

edited

Loading