Skip to content

Conversation

@iv-m
Copy link

@iv-m iv-m commented Jan 15, 2026

Fix a couple of errors I found when testing current simde master on my loongarch64 machine with GCC 14.3.1 and -Ofast -mlsx -mlasx in CFLAGS and CXXFLAGS.

@iv-m
Copy link
Author

iv-m commented Jan 15, 2026

@HecaiYuan, @mr-c, please take a look.

Copy link
Collaborator

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iv-m
Copy link
Author

iv-m commented Jan 15, 2026

Please address the issues at [...]

Well, I just submitted fixes for two of the most obvious issues I found(

Other issues require more investigation. For example, the error from the CI:

 ../test/x86/avx512/../../../simde/x86/sse2.h: In function ‘simde_mm_cvtps_epi32’:
../test/x86/avx512/../../../simde/x86/sse2.h:3316:7: note: use ‘-flax-vector-conversions’ to permit conversions between vectors with differing element types or numbers of subparts
 3316 |       r_.lsx_i32 = __lsx_vftintrne_w_s(a_.lsx_f32);
      |       ^~
../test/x86/avx512/../../../simde/x86/sse2.h:3316:20: error: incompatible types when assigning to type ‘v4i32’ from type ‘__m128i’
 3316 |       r_.lsx_i32 = __lsx_vftintrne_w_s(a_.lsx_f32);
      |                    ^~~~~~~~~~~~~~~~~~~

It looks like GCC bug to me: __lsx_vftintrne_w_s, as far as I can know, returns four 32-bit integers packed into __m128, but GCC 14 for some reason thinks otherwise.

I'm not sure I'm qualified enough and have time to address all the issues there rn. I probably can disable the preporcessor branches for optimizations that break CI -- that's more or less what I did for my local simde copy -- and hope that LoongArch community (me included) will eventually implement them back. If this is the way to go, what is the preferred conventions for that? Something along the lines of #if 0 && ... or just plane removal?

@mr-c
Copy link
Collaborator

mr-c commented Jan 15, 2026

If this is the way to go, what is the preferred conventions for that?

Please file issues with GCC, then we can add a entry near

# if defined(SIMDE_ARCH_POWER)
to define SIMDE_BUG_GCC_NNNN only for circumstances where it is active.

Then we can add && !defined(SIMDE_BUG_GCC_NNNN) to

#elif defined(SIMDE_LOONGARCH_LSX_NATIVE) && defined(SIMDE_FAST_CONVERSION_RANGE) && defined(SIMDE_FAST_ROUND_TIES)

You can also experiment with adding cast using HEDLEY_REINTERPRET_CAST if you'd like to use this function before a fix is released for GCC

@HecaiYuan
Copy link
Contributor

@iv-m @mr-c Thanks a lot for reporting this issue! I'll look into it.

iv-m and others added 3 commits January 20, 2026 19:25
__lsx_vftintrz_w_d accepts two __m128d arguments, so it's
should be called with zero_f64 that is declared.

This fixes the following compilation error that I get when
compiling current simde master for loongarch64-linux-gnu
with gcc 14.3.1 and `-Ofast -mlsx -mlasx` in CFLAGS:

../test/x86/avx512/../../../simde/x86/sse2.h: In function ‘simde__m128i simde_mm_cvttpd_epi32(simde__m128d)’:
../test/x86/avx512/../../../simde/x86/sse2.h:3736:39: error: ‘zero_i64’ was not declared in this scope; did you mean ‘zero_f64’?
 3736 |       r_.lsx_i64 = __lsx_vftintrz_w_d(zero_i64, simde__m128d_to_private(a).lsx_f64);
      |                                       ^~~~~~~~
      |                                       zero_f64

Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>
Similarly to what other architectures do, __lsx_vftintrz_w_s
should be used when both SIMDE_FAST_CONVERSION_RANGE and
SIMDE_FAST_NANS are declared, not just stored to a temporary
and lost.

Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>
@iv-m iv-m force-pushed the fixes-for-loongarch-lsx-fast-math branch from 8be0f4a to 8431927 Compare January 20, 2026 15:26
@mr-c
Copy link
Collaborator

mr-c commented Jan 21, 2026

Another compiler bug? 😝

[15/4028] ccache loongarch64-linux-gnu-gcc-14 -Itest/x86/avx512/abs-native-c.p -Itest/x86/avx512 -I../test/x86/avx512 -I. -I.. -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c99 -g -march=loongarch64 -Wextra -Werror -mlsx -mlasx -Ofast -fopenmp-simd -DSIMDE_CONSTRAINED_COMPILATION -DSIMDE_ENABLE_OPENMP -DSIMDE_TEST_BARE -MD -MQ test/x86/avx512/abs-native-c.p/abs.c.o -MF test/x86/avx512/abs-native-c.p/abs.c.o.d -o test/x86/avx512/abs-native-c.p/abs.c.o -c ../test/x86/avx512/abs.c
{standard input}: Assembler messages:
{standard input}:164: Warning: Right shift of negative numbers may be changed from arithmetic right shift to logical right shift!

(it is unclear where the source of this is, not abs.[ch] I think)

@HecaiYuan
Copy link
Contributor

I didn't encounter the above problem when performing cross-compilation. On the contrary, I ran into other bugs, which are currently being fixed. @mr-c @iv-m

@iv-m
Copy link
Author

iv-m commented Jan 21, 2026

Another compiler bug? 😝

Yup, and there are more(

I didn't encounter the above problem when performing cross-compilation.

It's not reproducible on my system, too, probably something specific to compiler or binutils version that CI uses.

@HecaiYuan
Copy link
Contributor

I fixed some typos. @iv-m @mr-c
0001-sse2.h-Fix-typo-about-SIMDE_LOONGARCH_LSX_NATIVE.patch

But I encountered this error on my system.

../test/x86/avx512/fixupimm.c:529: assertion failed: r[6] ~= simde_mm256_loadu_ps(test_vec[4].r)[6] (-inf ~= -inf)
../test/x86/avx512/fixupimm.c:689: assertion failed: r[2] ~= simde_mm256_loadu_ps(test_vec[3].r)[2] (-inf ~= -inf)
../test/x86/avx512/fixupimm.c:855: assertion failed: r[4] ~= simde_mm256_loadu_ps(test_vec[3].r)[4] (-inf ~= -inf)

@iv-m
Copy link
Author

iv-m commented Jan 21, 2026

I fixed some typos. @iv-m @mr-c
0001-sse2.h-Fix-typo-about-SIMDE_LOONGARCH_LSX_NATIVE.patch

Awesome! I've added it to this PR.

@iv-m
Copy link
Author

iv-m commented Jan 21, 2026

(-inf ~= -inf)

16 tests are currently failing on my machine in this way.

-Ofast implies -ffast-math, which in turn implies -ffinit-math-only. And with -ffinit-math-only GCC considers any comparison with constant inf or -inf to be false, and optimizes them away.

I think tests that deal with infinities should be skipped when SIMDE_FAST_MATH is defined, in the same way may tests that deal with NaNs are skipped currently. I think I'll add a patch for that real quick.

Also it's interesting why other architectures never hit this -- at least I don't see any traces of anything like that in the tests. AFAIK this GCC behavior is totally portable.

@iv-m
Copy link
Author

iv-m commented Jan 21, 2026

Also it's interesting why other architectures never hit this

On my x86_64 machine with gcc 14.3.1, 22 tests failed when build with -Ofast -mavx2, including those 16 that fail on loongarch64 because of infinities.

AFAIK this GCC behavior is totally portable.

100%

@mr-c
Copy link
Collaborator

mr-c commented Jan 21, 2026

Also it's interesting why other architectures never hit this

On my x86_64 machine with gcc 14.3.1, 22 tests failed when build with -Ofast -mavx2, including those 16 that fail on loongarch64 because of infinities.

Until this PR, we never tested -Ofast on any architecture; so that doesn't surprise me

Can you share more where you are using SIMDe + -Ofast?

@mr-c
Copy link
Collaborator

mr-c commented Jan 21, 2026

I didn't encounter the above problem when performing cross-compilation.

It's not reproducible on my system, too, probably something specific to compiler or binutils version that CI uses.

What are your versions @iv-m @HecaiYuan ?

In CI, we are running loongarch64-linux-gnu-gcc-14 (gcc 14.2.0 "loongarch64-linux-gnu-gcc-14 (Ubuntu 14.2.0-4ubuntu2~24.04) 14.2.0")

We can try GCC 15, if you wish

@mr-c
Copy link
Collaborator

mr-c commented Jan 21, 2026

Another compiler error, congratulations!

test/x86/avx512/extract.cpp:784:1: error: unrecognizable insn:
  784 | }
      | ^
(insn 144 143 145 2 (set (reg:V4DF 254 [ r_$f64_100 ])
        (vec_merge:V4DF (vec_duplicate:V4DF (const_double:DF 0.0 [0x0.0p+0]))
            (reg:V4DF 254 [ r_$f64_100 ])
            (const_int 2 [0x2]))) "../test/x86/avx512/../../../simde/x86/avx.h":1073:17 -1
     (nil))
during RTL pass: vregs
test/x86/avx512/extract.cpp:784:1: internal compiler error: in extract_insn, at recog.cc:2812
0x196dccd internal_error(char const*, ...)
	???:0
0x669aca fancy_abort(char const*, int, char const*)
	???:0
0x649a60 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
	???:0
0x649a82 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
	???:0

https://github.com/simd-everywhere/simde/actions/runs/21213583598/job/61028496796#step:9:73842

@iv-m
Copy link
Author

iv-m commented Jan 21, 2026

Can you share more where you are using SIMDe + -Ofast?

I first encountered the issue when tried to build some DSP code on loongarch64. I guess that was parts of Surge XT, but I don't remember exactly. I've seen several LV2/VST3 plugins which are build with -ffast-math or -Ofast and use simde for portability.

@iv-m
Copy link
Author

iv-m commented Jan 21, 2026

I'm experimenting with an interesting (but partial) solution for -Ofast tests: #1373. That change fixes all failing tests on my loongarch64 machine, and most of tests failures I've seen with -Ofast on x86_64 (a few failing tests left are not related to infinities).

@HecaiYuan
Copy link
Contributor

Another compiler error, congratulations!

test/x86/avx512/extract.cpp:784:1: error: unrecognizable insn:
  784 | }
      | ^
(insn 144 143 145 2 (set (reg:V4DF 254 [ r_$f64_100 ])
        (vec_merge:V4DF (vec_duplicate:V4DF (const_double:DF 0.0 [0x0.0p+0]))
            (reg:V4DF 254 [ r_$f64_100 ])
            (const_int 2 [0x2]))) "../test/x86/avx512/../../../simde/x86/avx.h":1073:17 -1
     (nil))
during RTL pass: vregs
test/x86/avx512/extract.cpp:784:1: internal compiler error: in extract_insn, at recog.cc:2812
0x196dccd internal_error(char const*, ...)
	???:0
0x669aca fancy_abort(char const*, int, char const*)
	???:0
0x649a60 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
	???:0
0x649a82 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
	???:0

https://github.com/simd-everywhere/simde/actions/runs/21213583598/job/61028496796#step:9:73842

I encountered a similar problem. It was a compiler bug, which my colleague has fixed. However, it hasn't been merged yet.(https://gcc.gnu.org/pipermail/gcc-patches/2026-January/706166.html)

@HecaiYuan
Copy link
Contributor

Apologies. The author info in this patch is wrong and requires correction. @iv-m @mr-c
0001-sse2.h-Fix-typo-about-SIMDE_LOONGARCH_LSX_NATIVE.patch

iv-m and others added 7 commits January 22, 2026 15:41
__lsx_vftintrne_w_s actually returns a vector of 4 ints,
but lsxintrin.h from gcc 14 and 15 declares it as returning
a vector of 2 longs. We use HEDLEY_REINTERPRET_CAST to
work this around.

See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123759
__lsx_vfcmp_cun_s actually retuns a vector of 4 ints, but
lsxintrin.h from GCC 14 and 15 declares it as returning two longs.
Use HEDLEY_REINTERPRET_CAST to work this around and assign
the correct member of simde__m128_private.

See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123759
Change from SIMD_LOONGARCH_LSX_NATIVE to SIMDE_LOONGARCH_LSX_NATIVE.
With `-ffinite-math-only` (implied by `-ffast-math` and -Ofast),
GCC considers all comparisons against infinities to be false
and can optimize them away. This breaks several tests that rely
on infinities to be compared correctly. To avoid this, we add
GCC-specific optimize attribute that disables `-ffinite-math-only`
optimization for floating point comarisions used in assertions.

Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>
This works around two similar instances of ICE of GCC 14:

  test/x86/avx512/range.cpp: In function ‘int test_simde_mm256_maskz_range_ps()’:
  test/x86/avx512/range.cpp:702:1: error: unrecognizable insn:
    702 | }
        | ^
  (insn 191 190 192 2 (set (reg:V8SF 446 [ r_$f32_514 ])
          (vec_merge:V8SF (vec_duplicate:V8SF (const_double:SF 0.0 [0x0.0p+0]))
              (reg:V8SF 446 [ r_$f32_514 ])
              (const_int 1 [0x1]))) "../test/x86/avx512/../../../simde/x86/avx.h":1041:17 -1
       (nil))
  [...]

The similar workaround is already present in simde_mm256_set_ps.

Link: https://gcc.gnu.org/pipermail/gcc-patches/2026-January/706166.html
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117575
@iv-m iv-m force-pushed the fixes-for-loongarch-lsx-fast-math branch from 30b3cfa to 48abcfb Compare January 22, 2026 11:43
@iv-m
Copy link
Author

iv-m commented Jan 22, 2026

Apologies. The author info in this patch is wrong and requires correction. @iv-m @mr-c 0001-sse2.h-Fix-typo-about-SIMDE_LOONGARCH_LSX_NATIVE.patch

I've applied the updated patch, please check that everything is correct now.

@iv-m
Copy link
Author

iv-m commented Jan 22, 2026

I encountered a similar problem. It was a compiler bug, which my colleague has fixed. However, it hasn't been merged yet.(https://gcc.gnu.org/pipermail/gcc-patches/2026-January/706166.html)

@HecaiYuan, thank you for the information. It seems the fix has been merged since.

I've added a workaround for a couple of related ICEs I've seen on my loongarch64 machine with the links to this thread and the corresponding GCC bugzilla issue to the PR. The tests are now fully passing on my machine. Let's see what CI says.

@iv-m
Copy link
Author

iv-m commented Jan 22, 2026

Let's see what CI says.

CI said:

../test/x86/../../simde/x86/sse2.h:7497:5: error: ‘r’ may be used uninitialized [-Werror=maybe-uninitialized]
 7497 |     __lsx_vst(simde__m128i_to_private(a).lsx_i64, mem_addr, 0);
      |     ^~~~~~~~~

I've added -Wextra -Werror to my CFLAGS and reproduced this. Fixing.

@iv-m
Copy link
Author

iv-m commented Jan 22, 2026

I've added -Wextra -Werror to my CFLAGS and reproduced this. Fixing.

Well, not so fast.

I've submitted one more GCC bug, this time for __lsx_vst and __lasx_xvst causing -Wmaybe-uninitialized for the second argument, which they should only write to: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123766 . I believe it's something with optimizer, not the instruction definitions or something.

But when I disable a few preprocessor branches where __lsx_vst and __lasx_xvst are employed, I still get lots of -Wuninitialized and -Wmaybe-uninitialized warnings from generic implementations of NEON intrinsics, in addition to those that were addressed by @mr-c in #1373. And I'm a bit lost at what can be done there.

I also do feel like I'm breaking more things than fixing when I hunt those warnings.

So can I humbly ask if we can just give up and add -Wno-error=maybe-uninitialized -Wno-error=uninitialized to CFLAGS/CXXFLAGS for -Ofast runs on CI?)

@HecaiYuan
Copy link
Contributor

I've added -Wextra -Werror to my CFLAGS and reproduced this. Fixing.

Well, not so fast.

I've submitted one more GCC bug, this time for __lsx_vst and __lasx_xvst causing -Wmaybe-uninitialized for the second argument, which they should only write to: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123766 . I believe it's something with optimizer, not the instruction definitions or something.

But when I disable a few preprocessor branches where __lsx_vst and __lasx_xvst are employed, I still get lots of -Wuninitialized and -Wmaybe-uninitialized warnings from generic implementations of NEON intrinsics, in addition to those that were addressed by @mr-c in #1373. And I'm a bit lost at what can be done there.

I also do feel like I'm breaking more things than fixing when I hunt those warnings.

So can I humbly ask if we can just give up and add -Wno-error=maybe-uninitialized -Wno-error=uninitialized to CFLAGS/CXXFLAGS for -Ofast runs on CI?)

I will seek help from my colleagues.

iv-m and others added 3 commits January 23, 2026 10:02
…arch64

Avoid some usages of __lsx_vst and __lasx_xvst, as they may
cause maybe-uninitialized warnings to be triggered:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123766

The optimizing compiler still generates optimal vectorized
code for fixed-size __builtin_memcpy, so no performance
loss is expected.
... in the same way it's already done for RISC-V GCC.
@HecaiYuan
Copy link
Contributor

My colleague responded to this issue. @ iv-m
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123766.

@iv-m
Copy link
Author

iv-m commented Jan 23, 2026

And I'm a bit lost at what can be done there.

... and then I learned about SIMDE_DIAGNOSTIC_DISABLE_MAYBE_UNINITIAZILED_ and how it's already used to disable the exact warnings I see on some other architectures.

@iv-m
Copy link
Author

iv-m commented Jan 23, 2026

../test/x86/avx512/fixupimm_round.c:1621: assertion failed: r[1] ~= simde_mm_loadu_ps(test_vec[1].r)[1] (0.000000 ~= 196.860001)
../test/arm/neon/ext.c:931: assertion failed: r[0] ~= simde_vld1q_f32(test_vec[i].r)[0] (0.000000 ~= -600.229980)
../test/arm/neon/ext.c:1120: assertion failed: r[0] == simde_vld1q_s8(test_vec[i].r)[0] (4 == -92)
../test/arm/neon/ext.c:1202: assertion failed: r[0] == simde_vld1q_s16(test_vec[i].r)[0] (2 == -3612)
../test/arm/neon/ext.c:1279: assertion failed: r[0] == simde_vld1q_s32(test_vec[i].r)[0] (2 == 1887280373)
../test/arm/neon/ext.c:1354: assertion failed: r[0] == simde_vld1q_s64(test_vec[i].r)[0] (2 == 1054151440654452764)
../test/arm/neon/ext.c:1468: assertion failed: r[0] == simde_vld1q_u8(test_vec[i].r)[0] (4 == 124)
../test/arm/neon/ext.c:1549: assertion failed: r[0] == simde_vld1q_u16(test_vec[i].r)[0] (2 == 4173)
../test/arm/neon/ext.c:1627: assertion failed: r[0] == simde_vld1q_u32(test_vec[i].r)[0] (2 == 2577931479)
../test/arm/neon/ext.c:1702: assertion failed: r[0] == simde_vld1q_u64(test_vec[i].r)[0] (2 == 12195681843175063656)

Well, on my machine (with gcc 14.3.1) all the tests are passing now cleanly.

@mr-c
Copy link
Collaborator

mr-c commented Jan 24, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants