Confine SIMD code to runtime-dispatched tiers (fixes #628)#630
Open
andrewkern wants to merge 4 commits into
Open
Confine SIMD code to runtime-dispatched tiers (fixes #628)#630andrewkern wants to merge 4 commits into
andrewkern wants to merge 4 commits into
Conversation
SLiM was built with -mavx2 -mfma applied to the whole project, which let the compiler emit AVX2/FMA instructions throughout the entire binary, not only in the explicit SIMD kernels. The resulting build crashed with SIGILL on x86_64 CPUs without AVX2 (pre-Haswell, ~2012 and earlier). The kernels formerly in eidos_simd.h are moved to a tier-parameterized body, eidos_simd_impl.h, compiled once per instruction-set tier: scalar, SSE4.2, and AVX2+FMA on x86_64, and NEON on ARM64. Only the per-tier translation units receive instruction-set flags; every other translation unit, including the dispatcher, is compiled at the baseline x86_64 ABI. At startup Eidos_SIMD_Init() probes the CPU with __builtin_cpu_supports() and points the Eidos_SIMD function pointers at the fastest supported tier. A single binary is therefore correct on any x86_64 CPU: the baseline of the executable contains no AVX2/SSE4.2 instructions, and tier code runs only after the CPU has been confirmed to support it. USE_SIMD is now a simple ON/OFF switch; OFF (and MSVC) builds the scalar tier only. The qmake build sets EIDOS_SUPPRESS_SIMD_DISPATCH, keeping its prior scalar-only behavior since it applies no per-file SIMD flags.
These files were created from copies of eidos_simd.h and carried its original creation date; set it to their actual creation date.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #630 +/- ##
==========================================
- Coverage 75.64% 75.57% -0.08%
==========================================
Files 114 118 +4
Lines 72808 73048 +240
Branches 12873 12915 +42
==========================================
+ Hits 55079 55207 +128
- Misses 17729 17841 +112
🚀 New features to boost your workflow:
|
Collaborator
Author
|
grrr... some of the windows tests are failing... working on it |
The SIMD kernels are compiled once per instruction-set tier, but only the tier the CPU selects at startup ever runs, so on a modern CI machine the scalar and SSE4.2 kernels were never executed or covered. Add Eidos_SIMD_SelectTier(), which forces a specific tier, and rewrite Eidos_SIMD_Init() in terms of it. The SIMD math tests now cycle through every tier the CPU supports -- running the full battery against scalar, SSE4.2, and AVX2+FMA -- then restore the best tier. The battery also gains direct tests for the kernels that previously had no per-tier coverage: sqrt, abs, the rounding family, the reductions, the convolution helpers, and the single-precision spatial-interaction kernels. This exercises the scalar and SSE4.2 code paths that nothing tested before.
…lags On Windows the WIN32 target blocks run set_source_files_properties() over every source file to add "-include config.h". COMPILE_FLAGS is a single string property, so that overwrote the "-mavx2 -mfma" / "-msse4.2" set earlier on the tier files, and eidos_simd_avx2.cpp then failed to compile its AVX2 intrinsics. Apply the per-tier ISA flags at the end of the file instead, after the WIN32 blocks, using set_property(... APPEND_STRING ...) so they extend rather than replace COMPILE_FLAGS.
Collaborator
Author
|
okay @bhaller -- this is ready for review |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem. SLiM was built with -mavx2 -mfma applied project-wide, so the compiler emitted AVX2/FMA throughout the whole binary — not just the SIMD kernels. Pre-Haswell x86_64 CPUs (no AVX2) crashed with SIGILL (see issue #628).
Fix. The kernels formerly in
eidos_simd.hmove toeidos_simd_impl.h, a tier-parameterized body compiled once per instruction-set tier — scalar, SSE4.2, AVX2+FMA (x86_64), NEON (ARM64).Only the per-tier
.cppfiles get ISA flags; everything else, including the dispatcher, builds at the baseline x86_64 ABI.Eidos_SIMD_Init()checks the CPU with__builtin_cpu_supports()and points the Eidos_SIMD function pointers at the fastest supported tier.One binary should now correct on any x86_64 CPU — baseline contains no AVX2/SSE4.2, tier code runs only after the CPU is confirmed to support it. USE_SIMD is now ON/OFF (OFF and MSVC build scalar only).
Verification.
Things still needed!