-
Notifications
You must be signed in to change notification settings - Fork 301
Fixes for LoongArch LSX + fast math #1369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fixes for LoongArch LSX + fast math #1369
Conversation
|
@HecaiYuan, @mr-c, please take a look. |
mr-c
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, @iv-m ; please address the issues at https://github.com/simd-everywhere/simde/actions/runs/21027953168/job/60456937165?pr=1369
Well, I just submitted fixes for two of the most obvious issues I found( Other issues require more investigation. For example, the error from the CI: It looks like GCC bug to me: I'm not sure I'm qualified enough and have time to address all the issues there rn. I probably can disable the preporcessor branches for optimizations that break CI -- that's more or less what I did for my local simde copy -- and hope that LoongArch community (me included) will eventually implement them back. If this is the way to go, what is the preferred conventions for that? Something along the lines of |
Please file issues with GCC, then we can add a entry near Line 1058 in 613c365
SIMDE_BUG_GCC_NNNN only for circumstances where it is active.
Then we can add Line 3314 in 613c365
You can also experiment with adding cast using |
__lsx_vftintrz_w_d accepts two __m128d arguments, so it's
should be called with zero_f64 that is declared.
This fixes the following compilation error that I get when
compiling current simde master for loongarch64-linux-gnu
with gcc 14.3.1 and `-Ofast -mlsx -mlasx` in CFLAGS:
../test/x86/avx512/../../../simde/x86/sse2.h: In function ‘simde__m128i simde_mm_cvttpd_epi32(simde__m128d)’:
../test/x86/avx512/../../../simde/x86/sse2.h:3736:39: error: ‘zero_i64’ was not declared in this scope; did you mean ‘zero_f64’?
3736 | r_.lsx_i64 = __lsx_vftintrz_w_d(zero_i64, simde__m128d_to_private(a).lsx_f64);
| ^~~~~~~~
| zero_f64
Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>
Similarly to what other architectures do, __lsx_vftintrz_w_s should be used when both SIMDE_FAST_CONVERSION_RANGE and SIMDE_FAST_NANS are declared, not just stored to a temporary and lost. Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>
8be0f4a to
8431927
Compare
|
Another compiler bug? 😝
|
Yup, and there are more(
It's not reproducible on my system, too, probably something specific to compiler or binutils version that CI uses. |
|
I fixed some typos. @iv-m @mr-c But I encountered this error on my system. |
16 tests are currently failing on my machine in this way.
I think tests that deal with infinities should be skipped when Also it's interesting why other architectures never hit this -- at least I don't see any traces of anything like that in the tests. AFAIK this GCC behavior is totally portable. |
On my x86_64 machine with gcc 14.3.1, 22 tests failed when build with
100% |
Until this PR, we never tested Can you share more where you are using SIMDe + |
What are your versions @iv-m @HecaiYuan ? In CI, we are running We can try GCC 15, if you wish |
|
Another compiler error, congratulations! https://github.com/simd-everywhere/simde/actions/runs/21213583598/job/61028496796#step:9:73842 |
I first encountered the issue when tried to build some DSP code on loongarch64. I guess that was parts of Surge XT, but I don't remember exactly. I've seen several LV2/VST3 plugins which are build with -ffast-math or -Ofast and use simde for portability. |
|
I'm experimenting with an interesting (but partial) solution for |
I encountered a similar problem. It was a compiler bug, which my colleague has fixed. However, it hasn't been merged yet.(https://gcc.gnu.org/pipermail/gcc-patches/2026-January/706166.html) |
|
Apologies. The author info in this patch is wrong and requires correction. @iv-m @mr-c |
__lsx_vftintrne_w_s actually returns a vector of 4 ints, but lsxintrin.h from gcc 14 and 15 declares it as returning a vector of 2 longs. We use HEDLEY_REINTERPRET_CAST to work this around. See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123759
__lsx_vfcmp_cun_s actually retuns a vector of 4 ints, but lsxintrin.h from GCC 14 and 15 declares it as returning two longs. Use HEDLEY_REINTERPRET_CAST to work this around and assign the correct member of simde__m128_private. See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123759
Change from SIMD_LOONGARCH_LSX_NATIVE to SIMDE_LOONGARCH_LSX_NATIVE.
…4.04 To avoid binutils mismatch
With `-ffinite-math-only` (implied by `-ffast-math` and -Ofast), GCC considers all comparisons against infinities to be false and can optimize them away. This breaks several tests that rely on infinities to be compared correctly. To avoid this, we add GCC-specific optimize attribute that disables `-ffinite-math-only` optimization for floating point comarisions used in assertions. Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>
This works around two similar instances of ICE of GCC 14:
test/x86/avx512/range.cpp: In function ‘int test_simde_mm256_maskz_range_ps()’:
test/x86/avx512/range.cpp:702:1: error: unrecognizable insn:
702 | }
| ^
(insn 191 190 192 2 (set (reg:V8SF 446 [ r_$f32_514 ])
(vec_merge:V8SF (vec_duplicate:V8SF (const_double:SF 0.0 [0x0.0p+0]))
(reg:V8SF 446 [ r_$f32_514 ])
(const_int 1 [0x1]))) "../test/x86/avx512/../../../simde/x86/avx.h":1041:17 -1
(nil))
[...]
The similar workaround is already present in simde_mm256_set_ps.
Link: https://gcc.gnu.org/pipermail/gcc-patches/2026-January/706166.html
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117575
30b3cfa to
48abcfb
Compare
I've applied the updated patch, please check that everything is correct now. |
@HecaiYuan, thank you for the information. It seems the fix has been merged since. I've added a workaround for a couple of related ICEs I've seen on my loongarch64 machine with the links to this thread and the corresponding GCC bugzilla issue to the PR. The tests are now fully passing on my machine. Let's see what CI says. |
CI said: I've added |
Well, not so fast. I've submitted one more GCC bug, this time for But when I disable a few preprocessor branches where I also do feel like I'm breaking more things than fixing when I hunt those warnings. So can I humbly ask if we can just give up and add |
I will seek help from my colleagues. |
…arch64 Avoid some usages of __lsx_vst and __lasx_xvst, as they may cause maybe-uninitialized warnings to be triggered: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123766 The optimizing compiler still generates optimal vectorized code for fixed-size __builtin_memcpy, so no performance loss is expected.
... in the same way it's already done for RISC-V GCC.
|
My colleague responded to this issue. @ iv-m |
... and then I learned about |
Well, on my machine (with gcc 14.3.1) all the tests are passing now cleanly. |
|
FYI, I found a LoongArch ICE with a recent GCC 16 snapshot https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123807 (CI link: https://github.com/simd-everywhere/simde/actions/runs/21312327281/job/61350343527#step:6:438 ) |
Fix a couple of errors I found when testing current simde master on my loongarch64 machine with GCC 14.3.1 and
-Ofast -mlsx -mlasxinCFLAGSandCXXFLAGS.