Skip to content

Comments

Add FMA Compiler Flag to Improve Reproducibility Across Compilers#3823

Open
jtramm wants to merge 8 commits intoopenmc-dev:developfrom
jtramm:fma_fix
Open

Add FMA Compiler Flag to Improve Reproducibility Across Compilers#3823
jtramm wants to merge 8 commits intoopenmc-dev:developfrom
jtramm:fma_fix

Conversation

@jtramm
Copy link
Contributor

@jtramm jtramm commented Feb 18, 2026

The Problem with FMAs

As discussed in much more depth in Issue #3820 - our regression tests are not currently perfectly portable between different compilers and systems.

In this PR, we specifically address Item (3) identified in Issue #3820: Differences in compiler FMA contraction.

To refresh our memories from the #3820 writeup, operations like:

  for (int i = 0; i < nuclide_.size(); ++i) {
    int i_nuc = nuclide_[i];
    double awr = settings::run_CE ? data::nuclides[i_nuc]->awr_ : 1.0;
    int z = settings::run_CE ? data::nuclides[i_nuc]->Z_ : 0.0;
    density_gpcc_ += atom_density_(i) * awr * MASS_NEUTRON / N_AVOGADRO;  # Potential FMA
    charge_density_ += atom_density_(i) * z; # Potential FMA
  }

Will change results depending on the approach to FMA contraction that the compiler is taking. I.e., whether or not to combine multiple operations into one FMA operation or leave them as is - resulting in different rounding. Thus, before we've even sampled the first particle, our underlying material/XS data is a little different. It won't take long before you get macro-scale particle divergence.

The Fix

I've found that this problem tends to produce differences between older and new versions of gcc, as well as gcc vs. clang. This is partly due to differences in how the two compilers go about making choices of what to contract, and partially due to differences in their default flags. Thankfully, there are flags that can offer a little more explicit control over FMA contraction globally.

Choices for this are:

  • -ffp-contract=off
  • -ffp-contract=on (default for clang)
  • -ffp-contract=fast (default for gcc)

For a linux image, I had claude run all the tests across a variety of different gcc/clang versions and with each of the different flags (plus default where no flag was provided). The results are below:

Number of Regression Test Failures

Compiler default -ffp-contract=off -ffp-contract=on -ffp-contract=fast
gcc-11 49 0 0 49
gcc-13 49 0 0 49
gcc-14 49 0 47 49
clang-14 47 0 47 49
clang-18 47 0 47 49

Notably, this comparison is against the current expected results in OpenMC (i.e., what the CI has). It shows that by default, none of the gcc or clang versions on this linux image (a fairly default ubuntu 24.04) give the same result as the CI. Setting things to -ffp-contract=on helps for gcc but not clang. So: the only viable options are -ffp-contract=off or -ffp-contract=fast.

On the one hand, going with fast seems attractive as this guarantees FMAs are used the most aggressively, giving slightly higher precision due to reduced rounding and potentially better runtime performance. The downsides to -ffp-contract=fast are:

  1. pytest crashes out (looks like a seg fault?) on cmfd_feed_ng, so that would need to be tracked down as it prevents the rest of the tests from running, and

  2. different compilers/versions may still have different internal rules/logic they follow for deciding on FMA utilization. Some compilers may be able to deduce that an FMA can be used to combine disjoint lines whereas another might not be advanced enough to realize that, such that we may still get differences.

In this light, I'm proposing in this PR to have OpenMC specify -ffp-contract=off by default. I don't think this will have any significant impact on performance as most simulations are typically memory/latency bound, such that the difference between 1 FMA vs. two regular FP operations would not be observable. If a user has a really floating-point intensive workload they are interested in, they can now compile with OPENMC_ENABLE_FMA_CONTRACTION=ON to easily test out if they get a performance benefit or not. Additionally, in the case where clear use cases are identified where the flag is making a performance impact, we can always do some analysis on the hot kernels and add in a few std::fma() operations manually to explicitly use FMAs. When explicit FMA intrinsics or std::fma() are used, then the flag won't override that, so behavior would be consistent. An added benefit to -ffp-contract=off is that it appears to match the behavior of the compiler that the CI uses (I believe a fairly old version of gcc?).

I'm open to other ideas though. I'm curious if anyone has in mind a use case where the potential for instruction count savings via FMA would actually be useful.

Bad news for MacOS

The bad news here is that item (2) listed in the writeup for Issue #3820 still causes FP differences between linux (GNU glibc) and MacOS (Apple System math library). This PR fixes the FMA issues, but I still get 18 failures (down from 48) on macOS which are likely attributed to differences in transcendental operations (e.g., exp()) between the library implementations. Fixing that will be left for another PR.

Checklist

  • I have performed a self-review of my own code
  • I have run clang-format (version 15) on any C++ source files (if applicable)
  • I have followed the style guidelines for Python source files (if applicable)
  • I have made corresponding changes to the documentation (if applicable)
  • I have added tests that prove my fix is effective or that my feature works (if applicable)

Copy link
Contributor

@paulromano paulromano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for digging into all of this @jtramm. A few questions/comments:

  • The option name is a little unintuitive to me because enabling it means that we don't pass an extra flag. I would rather it be called something like OPENMC_ENABLE_STRICT_FP and default to "ON", meaning pass extra arguments that give better floating point reproducibility.
  • Whatever the name is, it should be documented in docs/source/usersguide/install.rst
  • In principle, you could use check_cxx_compiler_flag rather than always using it for any compiler other than MSVC
  • I didn't really see it discussed in #3820, but do we know if this flag is the primary reason that test results differ between the Debug and RelWithDebInfo/Release cmake build types? If so, it would mean that with it enabled we could probably switch to using RelWithDebInfo during CI rather than Debug, which would help speed up CI runs significantly.

@jtramm
Copy link
Contributor Author

jtramm commented Feb 24, 2026

  1. So in digging around some more, it looks like compiling with -DCMAKE_BUILD_TYPE=Debug only gives consistency for gcc. For LLVM, the -ffp-contract=off flag still needs to be passed.

  2. To accelerate the CI, it would be nice to be able to run all the CI matrix with -DCMAKE_BUILD_TYPE=RelWithDebInfo instead of Debug. In my tests locally doing so makes it run about 7x faster, so would greatly accelerate our CI testing. I was able to get consistency on my local machine regardless of if -DCMAKE_BUILD_TYPE=RelWithDebInfo was passed or -DCMAKE_BUILD_TYPE=Debug, but on the CI it gives failures if we use -DCMAKE_BUILD_TYPE=RelWithDebInfo. To remedy this, I did some experimentation with a few different flags, and found that if we also pass -fno-builtin then we get agreement again. As such, I've rolled -ffp-contract=off and -fno-builtin together into one new OpenMC compiler setting: OPENMC_STRICT_FP. This is off by default, with the idea being that folks will enable this when running or generating new expected inputs for tests. I've updated the testing docs to recommend this flag.

  3. I also bundled in another change that triggers with the OPENMC_STRICT_FP flag. We now remove the -DNDEBUG flag when running with OPENMC_STRICT_FP flag set, such that assertions are still active in the code even when using a RelWithDebInfo build. This ensures the CI will test all assertions even though it is now using a RelWithDebInfo build.

  4. Given the changes to testing, I have also added a new build parameter such that python can determine if OPENMC_STRICT_FP was set when OpenMC was built. Pytest will now query this info and provide a warning if running a test and this variable was not set (at both the start of testing and again near the end, such that its hard to miss). Hopefully this cuts down on user confusion from tests not passing with an out of the box default build.

  5. Given the above change in (4), I also added in another pytest check to ensure that the NNDC data set is being used. This is another common problem I've had where I get a bunch of test failures and start debugging only to realize a few minutes later that I had the wrong XS data in my environment. It determines if the correct library is present by an md5 hash of the cross_sections.xml NNDC file. This isn't foolproof, but it's something ! I'm open to better ideas on how to confirm if the NNDC data is being used. In any event, if the wrong checksum is generated, it gives a warning before/after testing, similar to the warning generated in (4).

For reference, the warnings from (4) and (5) look like:

root@ci_testing:~/openmc/tests/regression_tests$ pytest
================================= OpenMC Environment Warnings =================================
WARNING: OpenMC was NOT built with -DOPENMC_STRICT_FP=on. Regression test results may not match reference values due to compiler floating-point optimizations. Rebuild with -DOPENMC_STRICT_FP=on for reproducible results.
WARNING: OPENMC_CROSS_SECTIONS (/data/nndc_hdf5/cross_sections.xml) does not match the official NNDC HDF5 dataset. Regression tests expect the NNDC data; results may differ with other cross section libraries.

===================================== test session starts =====================================
platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0
rootdir: /workspace/openmc
configfile: pytest.ini
plugins: cov-7.0.0, rerunfailures-16.1
collected 308 items / 1 skipped                                                               

adj_cell_rotation/test.py .                                                             [  0%]
albedo_box/test.py .                                                                    [  0%]

Another note: now that we are doing our CI coverage tests with optimizations enabled, the coverage info may not be as accurate. The coverage report should still reveal if major regions are being missed, but may not be as line-by-line accurate as it was before.

Copy link
Contributor

@paulromano paulromano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @jtramm. Really looking forward to being able to test (both locally and in CI) without having a debug build. Only additional comment I have at this point is that for consistency, I think it would better to name the option OPENMC_ENABLE_STRICT_FP. This goes with the current naming scheme:

  • OPENMC_USE_* -- enables support for an external library/feature
  • OPENMC_ENABLE_* -- adds one or more compile/linker flags

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants