Fixes for LoongArch LSX + fast math #1369

iv-m · 2026-01-15T10:08:05Z

Fix a couple of errors I found when testing current simde master on my loongarch64 machine with GCC 14.3.1 and -Ofast -mlsx -mlasx in CFLAGS and CXXFLAGS.

iv-m · 2026-01-15T10:12:08Z

@HecaiYuan, @mr-c, please take a look.

mr-c

Thank you for the PR, @iv-m ; please address the issues at https://github.com/simd-everywhere/simde/actions/runs/21027953168/job/60456937165?pr=1369

iv-m · 2026-01-15T12:47:17Z

Please address the issues at [...]

Well, I just submitted fixes for two of the most obvious issues I found(

Other issues require more investigation. For example, the error from the CI:

 ../test/x86/avx512/../../../simde/x86/sse2.h: In function ‘simde_mm_cvtps_epi32’:
../test/x86/avx512/../../../simde/x86/sse2.h:3316:7: note: use ‘-flax-vector-conversions’ to permit conversions between vectors with differing element types or numbers of subparts
 3316 |       r_.lsx_i32 = __lsx_vftintrne_w_s(a_.lsx_f32);
      |       ^~
../test/x86/avx512/../../../simde/x86/sse2.h:3316:20: error: incompatible types when assigning to type ‘v4i32’ from type ‘__m128i’
 3316 |       r_.lsx_i32 = __lsx_vftintrne_w_s(a_.lsx_f32);
      |                    ^~~~~~~~~~~~~~~~~~~

It looks like GCC bug to me: __lsx_vftintrne_w_s, as far as I can know, returns four 32-bit integers packed into __m128, but GCC 14 for some reason thinks otherwise.

I'm not sure I'm qualified enough and have time to address all the issues there rn. I probably can disable the preporcessor branches for optimizations that break CI -- that's more or less what I did for my local simde copy -- and hope that LoongArch community (me included) will eventually implement them back. If this is the way to go, what is the preferred conventions for that? Something along the lines of #if 0 && ... or just plane removal?

mr-c · 2026-01-15T13:37:31Z

If this is the way to go, what is the preferred conventions for that?

Please file issues with GCC, then we can add a entry near

simde/simde/simde-common.h

Line 1058 in 613c365

# if defined(SIMDE_ARCH_POWER)

to define SIMDE_BUG_GCC_NNNN only for circumstances where it is active.

Then we can add && !defined(SIMDE_BUG_GCC_NNNN) to

simde/simde/x86/sse2.h

Line 3314 in 613c365

    
           #elif defined(SIMDE_LOONGARCH_LSX_NATIVE) && defined(SIMDE_FAST_CONVERSION_RANGE) && defined(SIMDE_FAST_ROUND_TIES)

You can also experiment with adding cast using HEDLEY_REINTERPRET_CAST if you'd like to use this function before a fix is released for GCC

HecaiYuan · 2026-01-17T03:00:53Z

@iv-m @mr-c Thanks a lot for reporting this issue! I'll look into it.

__lsx_vftintrz_w_d accepts two __m128d arguments, so it's should be called with zero_f64 that is declared. This fixes the following compilation error that I get when compiling current simde master for loongarch64-linux-gnu with gcc 14.3.1 and `-Ofast -mlsx -mlasx` in CFLAGS: ../test/x86/avx512/../../../simde/x86/sse2.h: In function ‘simde__m128i simde_mm_cvttpd_epi32(simde__m128d)’: ../test/x86/avx512/../../../simde/x86/sse2.h:3736:39: error: ‘zero_i64’ was not declared in this scope; did you mean ‘zero_f64’? 3736 | r_.lsx_i64 = __lsx_vftintrz_w_d(zero_i64, simde__m128d_to_private(a).lsx_f64); | ^~~~~~~~ | zero_f64 Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>

Similarly to what other architectures do, __lsx_vftintrz_w_s should be used when both SIMDE_FAST_CONVERSION_RANGE and SIMDE_FAST_NANS are declared, not just stored to a temporary and lost. Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>

mr-c · 2026-01-21T11:28:00Z

Another compiler bug? 😝

[15/4028] ccache loongarch64-linux-gnu-gcc-14 -Itest/x86/avx512/abs-native-c.p -Itest/x86/avx512 -I../test/x86/avx512 -I. -I.. -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c99 -g -march=loongarch64 -Wextra -Werror -mlsx -mlasx -Ofast -fopenmp-simd -DSIMDE_CONSTRAINED_COMPILATION -DSIMDE_ENABLE_OPENMP -DSIMDE_TEST_BARE -MD -MQ test/x86/avx512/abs-native-c.p/abs.c.o -MF test/x86/avx512/abs-native-c.p/abs.c.o.d -o test/x86/avx512/abs-native-c.p/abs.c.o -c ../test/x86/avx512/abs.c
{standard input}: Assembler messages:
{standard input}:164: Warning: Right shift of negative numbers may be changed from arithmetic right shift to logical right shift!

(it is unclear where the source of this is, not abs.[ch] I think)

HecaiYuan · 2026-01-21T11:31:52Z

I didn't encounter the above problem when performing cross-compilation. On the contrary, I ran into other bugs, which are currently being fixed. @mr-c @iv-m

iv-m · 2026-01-21T11:40:54Z

Another compiler bug? 😝

Yup, and there are more(

I didn't encounter the above problem when performing cross-compilation.

It's not reproducible on my system, too, probably something specific to compiler or binutils version that CI uses.

HecaiYuan · 2026-01-21T13:55:03Z

I fixed some typos. @iv-m @mr-c
0001-sse2.h-Fix-typo-about-SIMDE_LOONGARCH_LSX_NATIVE.patch

But I encountered this error on my system.

../test/x86/avx512/fixupimm.c:529: assertion failed: r[6] ~= simde_mm256_loadu_ps(test_vec[4].r)[6] (-inf ~= -inf)
../test/x86/avx512/fixupimm.c:689: assertion failed: r[2] ~= simde_mm256_loadu_ps(test_vec[3].r)[2] (-inf ~= -inf)
../test/x86/avx512/fixupimm.c:855: assertion failed: r[4] ~= simde_mm256_loadu_ps(test_vec[3].r)[4] (-inf ~= -inf)

iv-m · 2026-01-21T14:03:28Z

I fixed some typos. @iv-m @mr-c
0001-sse2.h-Fix-typo-about-SIMDE_LOONGARCH_LSX_NATIVE.patch

Awesome! I've added it to this PR.

iv-m · 2026-01-21T14:19:39Z

(-inf ~= -inf)

16 tests are currently failing on my machine in this way.

-Ofast implies -ffast-math, which in turn implies -ffinit-math-only. And with -ffinit-math-only GCC considers any comparison with constant inf or -inf to be false, and optimizes them away.

I think tests that deal with infinities should be skipped when SIMDE_FAST_MATH is defined, in the same way may tests that deal with NaNs are skipped currently. I think I'll add a patch for that real quick.

Also it's interesting why other architectures never hit this -- at least I don't see any traces of anything like that in the tests. AFAIK this GCC behavior is totally portable.

iv-m · 2026-01-21T14:34:25Z

Also it's interesting why other architectures never hit this

On my x86_64 machine with gcc 14.3.1, 22 tests failed when build with -Ofast -mavx2, including those 16 that fail on loongarch64 because of infinities.

AFAIK this GCC behavior is totally portable.

100%

mr-c · 2026-01-21T14:42:11Z

Also it's interesting why other architectures never hit this

On my x86_64 machine with gcc 14.3.1, 22 tests failed when build with -Ofast -mavx2, including those 16 that fail on loongarch64 because of infinities.

Until this PR, we never tested -Ofast on any architecture; so that doesn't surprise me

Can you share more where you are using SIMDe + -Ofast?

mr-c · 2026-01-21T14:43:36Z

I didn't encounter the above problem when performing cross-compilation.

It's not reproducible on my system, too, probably something specific to compiler or binutils version that CI uses.

What are your versions @iv-m @HecaiYuan ?

In CI, we are running loongarch64-linux-gnu-gcc-14 (gcc 14.2.0 "loongarch64-linux-gnu-gcc-14 (Ubuntu 14.2.0-4ubuntu2~24.04) 14.2.0")

We can try GCC 15, if you wish

mr-c · 2026-01-21T14:45:02Z

Another compiler error, congratulations!

test/x86/avx512/extract.cpp:784:1: error: unrecognizable insn:
  784 | }
      | ^
(insn 144 143 145 2 (set (reg:V4DF 254 [ r_$f64_100 ])
        (vec_merge:V4DF (vec_duplicate:V4DF (const_double:DF 0.0 [0x0.0p+0]))
            (reg:V4DF 254 [ r_$f64_100 ])
            (const_int 2 [0x2]))) "../test/x86/avx512/../../../simde/x86/avx.h":1073:17 -1
     (nil))
during RTL pass: vregs
test/x86/avx512/extract.cpp:784:1: internal compiler error: in extract_insn, at recog.cc:2812
0x196dccd internal_error(char const*, ...)
	???:0
0x669aca fancy_abort(char const*, int, char const*)
	???:0
0x649a60 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
	???:0
0x649a82 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
	???:0

https://github.com/simd-everywhere/simde/actions/runs/21213583598/job/61028496796#step:9:73842

iv-m · 2026-01-21T14:50:07Z

Can you share more where you are using SIMDe + -Ofast?

I first encountered the issue when tried to build some DSP code on loongarch64. I guess that was parts of Surge XT, but I don't remember exactly. I've seen several LV2/VST3 plugins which are build with -ffast-math or -Ofast and use simde for portability.

iv-m · 2026-01-21T17:03:25Z

I'm experimenting with an interesting (but partial) solution for -Ofast tests: #1373. That change fixes all failing tests on my loongarch64 machine, and most of tests failures I've seen with -Ofast on x86_64 (a few failing tests left are not related to infinities).

HecaiYuan · 2026-01-22T02:04:08Z

Another compiler error, congratulations!

test/x86/avx512/extract.cpp:784:1: error: unrecognizable insn:
  784 | }
      | ^
(insn 144 143 145 2 (set (reg:V4DF 254 [ r_$f64_100 ])
        (vec_merge:V4DF (vec_duplicate:V4DF (const_double:DF 0.0 [0x0.0p+0]))
            (reg:V4DF 254 [ r_$f64_100 ])
            (const_int 2 [0x2]))) "../test/x86/avx512/../../../simde/x86/avx.h":1073:17 -1
     (nil))
during RTL pass: vregs
test/x86/avx512/extract.cpp:784:1: internal compiler error: in extract_insn, at recog.cc:2812
0x196dccd internal_error(char const*, ...)
	???:0
0x669aca fancy_abort(char const*, int, char const*)
	???:0
0x649a60 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
	???:0
0x649a82 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
	???:0

https://github.com/simd-everywhere/simde/actions/runs/21213583598/job/61028496796#step:9:73842

I encountered a similar problem. It was a compiler bug, which my colleague has fixed. However, it hasn't been merged yet.(https://gcc.gnu.org/pipermail/gcc-patches/2026-January/706166.html)

HecaiYuan · 2026-01-22T02:34:58Z

Apologies. The author info in this patch is wrong and requires correction. @iv-m @mr-c
0001-sse2.h-Fix-typo-about-SIMDE_LOONGARCH_LSX_NATIVE.patch

__lsx_vftintrne_w_s actually returns a vector of 4 ints, but lsxintrin.h from gcc 14 and 15 declares it as returning a vector of 2 longs. We use HEDLEY_REINTERPRET_CAST to work this around. See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123759

__lsx_vfcmp_cun_s actually retuns a vector of 4 ints, but lsxintrin.h from GCC 14 and 15 declares it as returning two longs. Use HEDLEY_REINTERPRET_CAST to work this around and assign the correct member of simde__m128_private. See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123759

Change from SIMD_LOONGARCH_LSX_NATIVE to SIMDE_LOONGARCH_LSX_NATIVE.

…4.04 To avoid binutils mismatch

With `-ffinite-math-only` (implied by `-ffast-math` and -Ofast), GCC considers all comparisons against infinities to be false and can optimize them away. This breaks several tests that rely on infinities to be compared correctly. To avoid this, we add GCC-specific optimize attribute that disables `-ffinite-math-only` optimization for floating point comarisions used in assertions. Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>

This works around two similar instances of ICE of GCC 14: test/x86/avx512/range.cpp: In function ‘int test_simde_mm256_maskz_range_ps()’: test/x86/avx512/range.cpp:702:1: error: unrecognizable insn: 702 | } | ^ (insn 191 190 192 2 (set (reg:V8SF 446 [ r_$f32_514 ]) (vec_merge:V8SF (vec_duplicate:V8SF (const_double:SF 0.0 [0x0.0p+0])) (reg:V8SF 446 [ r_$f32_514 ]) (const_int 1 [0x1]))) "../test/x86/avx512/../../../simde/x86/avx.h":1041:17 -1 (nil)) [...] The similar workaround is already present in simde_mm256_set_ps. Link: https://gcc.gnu.org/pipermail/gcc-patches/2026-January/706166.html Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117575

iv-m · 2026-01-22T11:51:58Z

Apologies. The author info in this patch is wrong and requires correction. @iv-m @mr-c 0001-sse2.h-Fix-typo-about-SIMDE_LOONGARCH_LSX_NATIVE.patch

I've applied the updated patch, please check that everything is correct now.

iv-m · 2026-01-22T11:55:00Z

I encountered a similar problem. It was a compiler bug, which my colleague has fixed. However, it hasn't been merged yet.(https://gcc.gnu.org/pipermail/gcc-patches/2026-January/706166.html)

@HecaiYuan, thank you for the information. It seems the fix has been merged since.

I've added a workaround for a couple of related ICEs I've seen on my loongarch64 machine with the links to this thread and the corresponding GCC bugzilla issue to the PR. The tests are now fully passing on my machine. Let's see what CI says.

iv-m · 2026-01-22T12:01:49Z

Let's see what CI says.

CI said:

../test/x86/../../simde/x86/sse2.h:7497:5: error: ‘r’ may be used uninitialized [-Werror=maybe-uninitialized]
 7497 |     __lsx_vst(simde__m128i_to_private(a).lsx_i64, mem_addr, 0);
      |     ^~~~~~~~~

I've added -Wextra -Werror to my CFLAGS and reproduced this. Fixing.

simde/x86/avx.h

iv-m · 2026-01-22T16:14:50Z

I've added -Wextra -Werror to my CFLAGS and reproduced this. Fixing.

Well, not so fast.

I've submitted one more GCC bug, this time for __lsx_vst and __lasx_xvst causing -Wmaybe-uninitialized for the second argument, which they should only write to: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123766 . I believe it's something with optimizer, not the instruction definitions or something.

But when I disable a few preprocessor branches where __lsx_vst and __lasx_xvst are employed, I still get lots of -Wuninitialized and -Wmaybe-uninitialized warnings from generic implementations of NEON intrinsics, in addition to those that were addressed by @mr-c in #1373. And I'm a bit lost at what can be done there.

I also do feel like I'm breaking more things than fixing when I hunt those warnings.

So can I humbly ask if we can just give up and add -Wno-error=maybe-uninitialized -Wno-error=uninitialized to CFLAGS/CXXFLAGS for -Ofast runs on CI?)

HecaiYuan · 2026-01-23T01:18:35Z

I've added -Wextra -Werror to my CFLAGS and reproduced this. Fixing.

Well, not so fast.

I've submitted one more GCC bug, this time for __lsx_vst and __lasx_xvst causing -Wmaybe-uninitialized for the second argument, which they should only write to: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123766 . I believe it's something with optimizer, not the instruction definitions or something.

But when I disable a few preprocessor branches where __lsx_vst and __lasx_xvst are employed, I still get lots of -Wuninitialized and -Wmaybe-uninitialized warnings from generic implementations of NEON intrinsics, in addition to those that were addressed by @mr-c in #1373. And I'm a bit lost at what can be done there.

I also do feel like I'm breaking more things than fixing when I hunt those warnings.

So can I humbly ask if we can just give up and add -Wno-error=maybe-uninitialized -Wno-error=uninitialized to CFLAGS/CXXFLAGS for -Ofast runs on CI?)

I will seek help from my colleagues.

…arch64 Avoid some usages of __lsx_vst and __lasx_xvst, as they may cause maybe-uninitialized warnings to be triggered: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123766 The optimizing compiler still generates optimal vectorized code for fixed-size __builtin_memcpy, so no performance loss is expected.

…ialized

... in the same way it's already done for RISC-V GCC.

HecaiYuan · 2026-01-23T07:18:12Z

My colleague responded to this issue. @ iv-m
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123766.

iv-m · 2026-01-23T07:20:31Z

And I'm a bit lost at what can be done there.

... and then I learned about SIMDE_DIAGNOSTIC_DISABLE_MAYBE_UNINITIAZILED_ and how it's already used to disable the exact warnings I see on some other architectures.

iv-m · 2026-01-23T07:55:38Z

../test/x86/avx512/fixupimm_round.c:1621: assertion failed: r[1] ~= simde_mm_loadu_ps(test_vec[1].r)[1] (0.000000 ~= 196.860001)
../test/arm/neon/ext.c:931: assertion failed: r[0] ~= simde_vld1q_f32(test_vec[i].r)[0] (0.000000 ~= -600.229980)
../test/arm/neon/ext.c:1120: assertion failed: r[0] == simde_vld1q_s8(test_vec[i].r)[0] (4 == -92)
../test/arm/neon/ext.c:1202: assertion failed: r[0] == simde_vld1q_s16(test_vec[i].r)[0] (2 == -3612)
../test/arm/neon/ext.c:1279: assertion failed: r[0] == simde_vld1q_s32(test_vec[i].r)[0] (2 == 1887280373)
../test/arm/neon/ext.c:1354: assertion failed: r[0] == simde_vld1q_s64(test_vec[i].r)[0] (2 == 1054151440654452764)
../test/arm/neon/ext.c:1468: assertion failed: r[0] == simde_vld1q_u8(test_vec[i].r)[0] (4 == 124)
../test/arm/neon/ext.c:1549: assertion failed: r[0] == simde_vld1q_u16(test_vec[i].r)[0] (2 == 4173)
../test/arm/neon/ext.c:1627: assertion failed: r[0] == simde_vld1q_u32(test_vec[i].r)[0] (2 == 2577931479)
../test/arm/neon/ext.c:1702: assertion failed: r[0] == simde_vld1q_u64(test_vec[i].r)[0] (2 == 12195681843175063656)

Well, on my machine (with gcc 14.3.1) all the tests are passing now cleanly.

mr-c · 2026-01-24T12:08:54Z

FYI, I found a LoongArch ICE with a recent GCC 16 snapshot https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123807 (CI link: https://github.com/simd-everywhere/simde/actions/runs/21312327281/job/61350343527#step:6:438 )

mr-c requested changes Jan 15, 2026

View reviewed changes

iv-m and others added 3 commits January 20, 2026 19:25

gh-actions: test loongarch64 with -Ofast

89eaa4b

iv-m force-pushed the fixes-for-loongarch-lsx-fast-math branch from 8be0f4a to 8431927 Compare January 20, 2026 15:26

iv-m and others added 7 commits January 22, 2026 15:41

sse2.h: Fix typo about SIMDE_LOONGARCH_LSX_NATIVE

95ed60c

Change from SIMD_LOONGARCH_LSX_NATIVE to SIMDE_LOONGARCH_LSX_NATIVE.

DO NOT MERGE; skip other CI

64b4b8b

gh-actions gcc-qemu: only add extra repository for gcc-15 on Ubuntu 2…

da8efa7

…4.04 To avoid binutils mismatch

DO NOT MERGE: run emulated tests as well

3113494

iv-m force-pushed the fixes-for-loongarch-lsx-fast-math branch from 30b3cfa to 48abcfb Compare January 22, 2026 11:43

mr-c reviewed Jan 22, 2026

View reviewed changes

simde/x86/avx.h Show resolved Hide resolved

iv-m and others added 3 commits January 23, 2026 10:02

arm neon ext: small adjustment to reduce risk of -Werror=maybe-uninit…

92c79c2

…ialized

arm/neon: Disable uninitialized variable warnings on loongarch64

3ada3b9

... in the same way it's already done for RISC-V GCC.

Fixes for LoongArch LSX + fast math #1369

Are you sure you want to change the base?

Fixes for LoongArch LSX + fast math #1369

Conversation

iv-m commented Jan 15, 2026

Uh oh!

iv-m commented Jan 15, 2026

Uh oh!

mr-c left a comment

Choose a reason for hiding this comment

Uh oh!

iv-m commented Jan 15, 2026

Uh oh!

mr-c commented Jan 15, 2026

Uh oh!

HecaiYuan commented Jan 17, 2026

Uh oh!

mr-c commented Jan 21, 2026

Uh oh!

HecaiYuan commented Jan 21, 2026

Uh oh!

iv-m commented Jan 21, 2026

Uh oh!

HecaiYuan commented Jan 21, 2026

Uh oh!

iv-m commented Jan 21, 2026

Uh oh!

iv-m commented Jan 21, 2026

Uh oh!

iv-m commented Jan 21, 2026

Uh oh!

mr-c commented Jan 21, 2026

Uh oh!

mr-c commented Jan 21, 2026

Uh oh!

mr-c commented Jan 21, 2026

Uh oh!

iv-m commented Jan 21, 2026

Uh oh!

iv-m commented Jan 21, 2026

Uh oh!

HecaiYuan commented Jan 22, 2026

Uh oh!

HecaiYuan commented Jan 22, 2026

Uh oh!

iv-m commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iv-m commented Jan 22, 2026

Uh oh!

iv-m commented Jan 22, 2026

Uh oh!

Uh oh!

iv-m commented Jan 22, 2026

Uh oh!

HecaiYuan commented Jan 23, 2026

Uh oh!

HecaiYuan commented Jan 23, 2026

Uh oh!

iv-m commented Jan 23, 2026

Uh oh!

iv-m commented Jan 23, 2026

Uh oh!

mr-c commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

iv-m commented Jan 22, 2026 •

edited

Loading