Skip to content

Comments

x86: use intrinsics::simd for masked truncating stores#2030

Open
okaneco wants to merge 1 commit intorust-lang:mainfrom
okaneco:avx512f_masked_truncating_stores
Open

x86: use intrinsics::simd for masked truncating stores#2030
okaneco wants to merge 1 commit intorust-lang:mainfrom
okaneco:avx512f_masked_truncating_stores

Conversation

@okaneco
Copy link
Contributor

@okaneco okaneco commented Feb 18, 2026

Use intrinsics for _mask_cvt truncating-cast, unaligned stores

2 of the 15 stores were omitted for failing instruction tests:

  • avx512f::_mm512_mask_cvtepi32_storeu_epi16
  • avx512f::_mm512_mask_cvtepi32_storeu_epi8

Use intrinsics for `_mask_cvt` truncating-cast, unaligned stores

Note, 2 of the 15 stores were omitted for failing instruction tests:
- avx512f::_mm512_mask_cvtepi32_storeu_epi16
- avx512f::_mm512_mask_cvtepi32_storeu_epi8
@rustbot
Copy link
Collaborator

rustbot commented Feb 18, 2026

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: @Amanieu, @folkertdev, @sayantn
  • @Amanieu, @folkertdev, @sayantn expanded to Amanieu, folkertdev, sayantn
  • Random selection from Amanieu, folkertdev, sayantn

@okaneco
Copy link
Contributor Author

okaneco commented Feb 18, 2026

These were the 2 instruction tests that failed locally for me on this PR.

Failed _mm512 intrinsics
failures:

---- core_arch::x86::avx512f::assert__mm512_mask_cvtepi32_storeu_epi16_vpmovdw stdout ----
disassembly for stdarch_test_shim__mm512_mask_cvtepi32_storeu_epi16_vpmovdw:
         0: vpmovdw ymm0 zmm0
         1: test sil 1
         2: jne 000000014018542f
         3: test sil 2
         4: jne 000000014018543f
         5: test sil 4
         6: jne 0000000140185450
         7: test sil 8
         8: jne 0000000140185461
         9: test sil 10h
        10: jne 0000000140185472
        11: test sil 20h
        12: jne 0000000140185483
        13: test sil 40h
        14: jne 0000000140185494
        15: test sil sil
        16: js 00000001401854a4
        17: vextracti128 xmm0 ymm0 1
        18: test esi 100h
        19: jne 00000001401854bd
        20: test esi 200h
        21: jne 00000001401854d0
        22: test esi 400h
        23: jne 00000001401854e3
        24: test esi 800h
        25: jne 00000001401854f6
        26: test esi 1000h
        27: jne 0000000140185509
        28: test esi 2000h
        29: jne 000000014018551c
        30: test esi 4000h
        31: jne 000000014018552f
        32: test esi 8000h
        33: jne 0000000140185542
        34: vzeroupper
        35: ret

thread 'core_arch::x86::avx512f::assert__mm512_mask_cvtepi32_storeu_epi16_vpmovdw' (15836) panicked at crates\stdarch-test\src\lib.rs:206:9:
instruction found, but the disassembly contains too many instructions: #instructions = 36 >= 22 (limit)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- core_arch::x86::avx512f::assert__mm512_mask_cvtepi32_storeu_epi8_vpmovdb stdout ----
disassembly for stdarch_test_shim__mm512_mask_cvtepi32_storeu_epi8_vpmovdb:
         0: vpmovdb xmm0 zmm0
         1: test sil 1
         2: jne 0000000140185609
         3: test sil 2
         4: jne 0000000140185619
         5: test sil 4
         6: jne 000000014018562a
         7: test sil 8
         8: jne 000000014018563b
         9: test sil 10h
        10: jne 000000014018564c
        11: test sil 20h
        12: jne 000000014018565d
        13: test sil 40h
        14: jne 000000014018566e
        15: test sil sil
        16: js 000000014018567e
        17: test esi 100h
        18: jne 0000000140185691
        19: test esi 200h
        20: jne 00000001401856a4
        21: test esi 400h
        22: jne 00000001401856b7
        23: test esi 800h
        24: jne 00000001401856ca
        25: test esi 1000h
        26: jne 00000001401856dd
        27: test esi 2000h
        28: jne 00000001401856f0
        29: test esi 4000h
        30: jne 0000000140185703
        31: test esi 8000h
        32: jne 0000000140185716
        33: vzeroupper
        34: ret

thread 'core_arch::x86::avx512f::assert__mm512_mask_cvtepi32_storeu_epi8_vpmovdb' (10772) panicked at crates\stdarch-test\src\lib.rs:206:9:
instruction found, but the disassembly contains too many instructions: #instructions = 35 >= 22 (limit)


failures:
    core_arch::x86::avx512f::assert__mm512_mask_cvtepi32_storeu_epi16_vpmovdw
    core_arch::x86::avx512f::assert__mm512_mask_cvtepi32_storeu_epi8_vpmovdb

I was also looking into a PR for the unsigned and signed saturating AVX-512F intrinsics but 4 out of the 10 tests failed for just 128-bit intrinsics. Seems like there are gaps in LLVM coverage for 64->32-bit conversions and 32->8/16-bit conversions.

Failed signed/unsigned saturation tests for _mm-sized (128-bit) intrinsics
failures:

---- core_arch::x86::avx512f::assert__mm_mask_cvtsepi32_storeu_epi8_vpmovsdb stdout ----
disassembly for stdarch_test_shim__mm_mask_cvtsepi32_storeu_epi8_vpmovsdb:
         0: vpackssdw xmm0 xmm0 xmm0
         1: vpacksswb xmm0 xmm0 xmm0
         2: test sil 1
         3: jne 0000000140188e81
         4: test sil 2
         5: jne 0000000140188e8d
         6: test sil 4
         7: jne 0000000140188e9a
         8: test sil 8
         9: jne 0000000140188ea7
        10: ret

thread 'core_arch::x86::avx512f::assert__mm_mask_cvtsepi32_storeu_epi8_vpmovsdb' (21112) panicked at crates\stdarch-test\src\lib.rs:204:9:
failed to find instruction `vpmovsdb` in the disassembly

---- core_arch::x86::avx512f::assert__mm_mask_cvtsepi32_storeu_epi16_vpmovsdw stdout ----
disassembly for stdarch_test_shim__mm_mask_cvtsepi32_storeu_epi16_vpmovsdw:
         0: vpackssdw xmm0 xmm0 xmm0
         1: test sil 1
         2: jne 0000000140188e2d
         3: test sil 2
         4: jne 0000000140188e39
         5: test sil 4
         6: jne 0000000140188e46
         7: test sil 8
         8: jne 0000000140188e53
         9: ret

thread 'core_arch::x86::avx512f::assert__mm_mask_cvtsepi32_storeu_epi16_vpmovsdw' (22040) panicked at crates\stdarch-test\src\lib.rs:204:9:
failed to find instruction `vpmovsdw` in the disassembly
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- core_arch::x86::avx512f::assert__mm_mask_cvtsepi64_storeu_epi32_vpmovsqd stdout ----
disassembly for stdarch_test_shim__mm_mask_cvtsepi64_storeu_epi32_vpmovsqd:
         0: kmovw k1 esi
         1: vpmaxsq xmm0 xmm0 qword bcst [__real@ffffffff80000000]
         2: vpminsq xmm0 xmm0 qword bcst [__real@000000007fffffff]
         3: vpmovqd mmword ptr [rdi]{k1} xmm0
         4: ret

thread 'core_arch::x86::avx512f::assert__mm_mask_cvtsepi64_storeu_epi32_vpmovsqd' (21136) panicked at crates\stdarch-test\src\lib.rs:204:9:
failed to find instruction `vpmovsqd` in the disassembly

---- core_arch::x86::avx512f::assert__mm_mask_cvtusepi64_storeu_epi32_vpmovusqd stdout ----
disassembly for stdarch_test_shim__mm_mask_cvtusepi64_storeu_epi32_vpmovusqd:
         0: kmovw k1 esi
         1: vpminuq xmm0 xmm0 qword bcst [__real@00000000ffffffff]
         2: vpmovqd mmword ptr [rdi]{k1} xmm0
         3: ret

thread 'core_arch::x86::avx512f::assert__mm_mask_cvtusepi64_storeu_epi32_vpmovusqd' (23764) panicked at crates\stdarch-test\src\lib.rs:204:9:
failed to find instruction `vpmovusqd` in the disassembly


failures:
    core_arch::x86::avx512f::assert__mm_mask_cvtsepi32_storeu_epi16_vpmovsdw
    core_arch::x86::avx512f::assert__mm_mask_cvtsepi32_storeu_epi8_vpmovsdb
    core_arch::x86::avx512f::assert__mm_mask_cvtsepi64_storeu_epi32_vpmovsqd
    core_arch::x86::avx512f::assert__mm_mask_cvtusepi64_storeu_epi32_vpmovusqd

@folkertdev
Copy link
Contributor

Neat. For the failures, can you make a godbolt displaying the problem (as a template, I was just working on https://godbolt.org/z/5TPT84T4z). I've had decent luck either reporting or fixing this sort of problem in LLVM.

@okaneco
Copy link
Contributor Author

okaneco commented Feb 18, 2026

Filed upstream - llvm/llvm-project#182034

Thanks, your template came in very handy. This is what I ended up with - https://rust.godbolt.org/z/8PE81Thds
For the LLVM IR output window, make sure to uncheck Filter>Filter Attribute Groups so it works in the LLVM tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants