Skip to content

Confine SIMD code to runtime-dispatched tiers (fixes #628)#630

Open
andrewkern wants to merge 4 commits into
MesserLab:masterfrom
andrewkern:fix/simd-runtime-dispatch
Open

Confine SIMD code to runtime-dispatched tiers (fixes #628)#630
andrewkern wants to merge 4 commits into
MesserLab:masterfrom
andrewkern:fix/simd-runtime-dispatch

Conversation

@andrewkern
Copy link
Copy Markdown
Collaborator

Problem. SLiM was built with -mavx2 -mfma applied project-wide, so the compiler emitted AVX2/FMA throughout the whole binary — not just the SIMD kernels. Pre-Haswell x86_64 CPUs (no AVX2) crashed with SIGILL (see issue #628).

Fix. The kernels formerly in eidos_simd.h move to eidos_simd_impl.h, a tier-parameterized body compiled once per instruction-set tier — scalar, SSE4.2, AVX2+FMA (x86_64), NEON (ARM64).

Only the per-tier .cpp files get ISA flags; everything else, including the dispatcher, builds at the baseline x86_64 ABI. Eidos_SIMD_Init() checks the CPU with __builtin_cpu_supports() and points the Eidos_SIMD function pointers at the fastest supported tier.

One binary should now correct on any x86_64 CPU — baseline contains no AVX2/SSE4.2, tier code runs only after the CPU is confirmed to support it. USE_SIMD is now ON/OFF (OFF and MSVC build scalar only).

Verification.

  • -testEidos (7464) and -testSLiM (36853) pass.
  • Object-file inspection: AVX2/FMA only in eidos_simd_avx2.cpp.o, SSE4.2-only in eidos_simd_sse42.cpp.o, every other .o baseline-clean.
  • All three x86 tiers exercised at runtime (AVX2 / SSE4.2 / scalar) — all pass.
  • USE_SIMD=OFF build: zero AVX2/FMA, tests pass.

Things still needed!

  • Xcode project. SLiM.xcodeproj needs updating before the macOS build will work:
  • Add the 6 new files — eidos_simd.cpp, eidos_simd_scalar.cpp, eidos_simd_sse42.cpp, eidos_simd_avx2.cpp, eidos_simd_neon.cpp, and the header eidos_simd_impl.h — and add the 5 .cpp files to the Compile Sources phase of each Eidos/SLiM target.
  • Set per-file flags in the Compile Sources build phase: -mavx2 -mfma on eidos_simd_avx2.cpp, -msse4.2 on eidos_simd_sse42.cpp. Without the flags that target I think it will fail to compile (AVX2 intrinsics with no -mavx2)
  • The qmake build is scalar-only (EIDOS_SUPPRESS_SIMD_DISPATCH) i think?

SLiM was built with -mavx2 -mfma applied to the whole project, which let
the compiler emit AVX2/FMA instructions throughout the entire binary, not
only in the explicit SIMD kernels. The resulting build crashed with SIGILL
on x86_64 CPUs without AVX2 (pre-Haswell, ~2012 and earlier).

The kernels formerly in eidos_simd.h are moved to a tier-parameterized body,
eidos_simd_impl.h, compiled once per instruction-set tier: scalar, SSE4.2,
and AVX2+FMA on x86_64, and NEON on ARM64. Only the per-tier translation
units receive instruction-set flags; every other translation unit, including
the dispatcher, is compiled at the baseline x86_64 ABI. At startup
Eidos_SIMD_Init() probes the CPU with __builtin_cpu_supports() and points the
Eidos_SIMD function pointers at the fastest supported tier.

A single binary is therefore correct on any x86_64 CPU: the baseline of the
executable contains no AVX2/SSE4.2 instructions, and tier code runs only
after the CPU has been confirmed to support it.

USE_SIMD is now a simple ON/OFF switch; OFF (and MSVC) builds the scalar tier
only. The qmake build sets EIDOS_SUPPRESS_SIMD_DISPATCH, keeping its prior
scalar-only behavior since it applies no per-file SIMD flags.
These files were created from copies of eidos_simd.h and carried its
original creation date; set it to their actual creation date.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 97.46377% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.57%. Comparing base (b7435e2) to head (df85904).

Files with missing lines Patch % Lines
eidos/eidos_test_functions_math.cpp 90.38% 10 Missing ⚠️
eidos/eidos_simd.cpp 85.18% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #630      +/-   ##
==========================================
- Coverage   75.64%   75.57%   -0.08%     
==========================================
  Files         114      118       +4     
  Lines       72808    73048     +240     
  Branches    12873    12915      +42     
==========================================
+ Hits        55079    55207     +128     
- Misses      17729    17841     +112     
Files with missing lines Coverage Δ
eidos/eidos_globals.cpp 72.06% <100.00%> (+0.02%) ⬆️
eidos/eidos_simd_avx2.cpp 100.00% <100.00%> (ø)
eidos/eidos_simd_impl.h 100.00% <100.00%> (ø)
eidos/eidos_simd_scalar.cpp 100.00% <100.00%> (ø)
eidos/eidos_simd_sse42.cpp 100.00% <100.00%> (ø)
eidos/eidos_simd.cpp 85.18% <85.18%> (ø)
eidos/eidos_test_functions_math.cpp 97.01% <90.38%> (-0.51%) ⬇️

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andrewkern
Copy link
Copy Markdown
Collaborator Author

grrr... some of the windows tests are failing... working on it

The SIMD kernels are compiled once per instruction-set tier, but only the
tier the CPU selects at startup ever runs, so on a modern CI machine the
scalar and SSE4.2 kernels were never executed or covered.

Add Eidos_SIMD_SelectTier(), which forces a specific tier, and rewrite
Eidos_SIMD_Init() in terms of it. The SIMD math tests now cycle through
every tier the CPU supports -- running the full battery against scalar,
SSE4.2, and AVX2+FMA -- then restore the best tier. The battery also gains
direct tests for the kernels that previously had no per-tier coverage:
sqrt, abs, the rounding family, the reductions, the convolution helpers,
and the single-precision spatial-interaction kernels.

This exercises the scalar and SSE4.2 code paths that nothing tested before.
…lags

On Windows the WIN32 target blocks run set_source_files_properties() over
every source file to add "-include config.h".  COMPILE_FLAGS is a single
string property, so that overwrote the "-mavx2 -mfma" / "-msse4.2" set
earlier on the tier files, and eidos_simd_avx2.cpp then failed to compile
its AVX2 intrinsics.

Apply the per-tier ISA flags at the end of the file instead, after the
WIN32 blocks, using set_property(... APPEND_STRING ...) so they extend
rather than replace COMPILE_FLAGS.
@andrewkern
Copy link
Copy Markdown
Collaborator Author

okay @bhaller -- this is ready for review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant