Add FMA Compiler Flag to Improve Reproducibility Across Compilers#3823
Add FMA Compiler Flag to Improve Reproducibility Across Compilers#3823jtramm wants to merge 8 commits intoopenmc-dev:developfrom
Conversation
paulromano
left a comment
There was a problem hiding this comment.
Thanks so much for digging into all of this @jtramm. A few questions/comments:
- The option name is a little unintuitive to me because enabling it means that we don't pass an extra flag. I would rather it be called something like
OPENMC_ENABLE_STRICT_FPand default to "ON", meaning pass extra arguments that give better floating point reproducibility. - Whatever the name is, it should be documented in
docs/source/usersguide/install.rst - In principle, you could use
check_cxx_compiler_flagrather than always using it for any compiler other than MSVC - I didn't really see it discussed in #3820, but do we know if this flag is the primary reason that test results differ between the
DebugandRelWithDebInfo/Releasecmake build types? If so, it would mean that with it enabled we could probably switch to usingRelWithDebInfoduring CI rather thanDebug, which would help speed up CI runs significantly.
… to install.rst. Force casserts to run when using RelWithDebInfo.
…in test configuration warning about XS data and build type
For reference, the warnings from (4) and (5) look like: Another note: now that we are doing our CI coverage tests with optimizations enabled, the coverage info may not be as accurate. The coverage report should still reveal if major regions are being missed, but may not be as line-by-line accurate as it was before. |
paulromano
left a comment
There was a problem hiding this comment.
This looks great @jtramm. Really looking forward to being able to test (both locally and in CI) without having a debug build. Only additional comment I have at this point is that for consistency, I think it would better to name the option OPENMC_ENABLE_STRICT_FP. This goes with the current naming scheme:
OPENMC_USE_*-- enables support for an external library/featureOPENMC_ENABLE_*-- adds one or more compile/linker flags
The Problem with FMAs
As discussed in much more depth in Issue #3820 - our regression tests are not currently perfectly portable between different compilers and systems.
In this PR, we specifically address Item (3) identified in Issue #3820: Differences in compiler FMA contraction.
To refresh our memories from the #3820 writeup, operations like:
Will change results depending on the approach to FMA contraction that the compiler is taking. I.e., whether or not to combine multiple operations into one FMA operation or leave them as is - resulting in different rounding. Thus, before we've even sampled the first particle, our underlying material/XS data is a little different. It won't take long before you get macro-scale particle divergence.
The Fix
I've found that this problem tends to produce differences between older and new versions of gcc, as well as gcc vs. clang. This is partly due to differences in how the two compilers go about making choices of what to contract, and partially due to differences in their default flags. Thankfully, there are flags that can offer a little more explicit control over FMA contraction globally.
Choices for this are:
For a linux image, I had claude run all the tests across a variety of different gcc/clang versions and with each of the different flags (plus default where no flag was provided). The results are below:
Number of Regression Test Failures
default-ffp-contract=off-ffp-contract=on-ffp-contract=fastNotably, this comparison is against the current expected results in OpenMC (i.e., what the CI has). It shows that by default, none of the gcc or clang versions on this linux image (a fairly default ubuntu 24.04) give the same result as the CI. Setting things to
-ffp-contract=onhelps for gcc but not clang. So: the only viable options are-ffp-contract=offor-ffp-contract=fast.On the one hand, going with
fastseems attractive as this guarantees FMAs are used the most aggressively, giving slightly higher precision due to reduced rounding and potentially better runtime performance. The downsides to-ffp-contract=fastare:pytest crashes out (looks like a seg fault?) on
cmfd_feed_ng, so that would need to be tracked down as it prevents the rest of the tests from running, anddifferent compilers/versions may still have different internal rules/logic they follow for deciding on FMA utilization. Some compilers may be able to deduce that an FMA can be used to combine disjoint lines whereas another might not be advanced enough to realize that, such that we may still get differences.
In this light, I'm proposing in this PR to have OpenMC specify
-ffp-contract=offby default. I don't think this will have any significant impact on performance as most simulations are typically memory/latency bound, such that the difference between 1 FMA vs. two regular FP operations would not be observable. If a user has a really floating-point intensive workload they are interested in, they can now compile withOPENMC_ENABLE_FMA_CONTRACTION=ONto easily test out if they get a performance benefit or not. Additionally, in the case where clear use cases are identified where the flag is making a performance impact, we can always do some analysis on the hot kernels and add in a fewstd::fma()operations manually to explicitly use FMAs. When explicit FMA intrinsics orstd::fma()are used, then the flag won't override that, so behavior would be consistent. An added benefit to-ffp-contract=offis that it appears to match the behavior of the compiler that the CI uses (I believe a fairly old version of gcc?).I'm open to other ideas though. I'm curious if anyone has in mind a use case where the potential for instruction count savings via FMA would actually be useful.
Bad news for MacOS
The bad news here is that item (2) listed in the writeup for Issue #3820 still causes FP differences between linux (GNU glibc) and MacOS (Apple System math library). This PR fixes the FMA issues, but I still get 18 failures (down from 48) on macOS which are likely attributed to differences in transcendental operations (e.g.,
exp()) between the library implementations. Fixing that will be left for another PR.Checklist