Shuffle scalable vector in CodeGen_ARM by stevesuzuki-arm · Pull Request #8898 · halide/Halide

stevesuzuki-arm · 2025-12-11T19:27:47Z

By design, LLVM shufflevector doesn't accept scalable vectors.
So, we try to use llvm.vector.xx intrinsic where possible.
However, those are not enough to cover wide usage of shuffles in Halide.
To handle arbitrary index pattern, we decompose a shuffle operation
to a sequence of multiple native shuffles, which are lowered to
Arm SVE2 intrinsic TBL or TBL2.

Another approach could be to perform shuffle in fixed sized vector
by adding conversion between scalable vector and fixed vector.
However, it seems to be only possible via load/store memory,
which would presumably be poor performance.

This change also includes:

Peep-hole the particular predicate pattern to emit WHILELT instruction
Shuffle 1bit type scalable vectors as 8bit with type casts
Peep-hole concat_vectors for padding to align up vector
Fix redundant broadcast in CodeGen_LLVM

stevesuzuki-arm · 2025-12-11T19:41:31Z

stevesuzuki-arm · 2025-12-11T20:10:25Z

The CI test failure below is a known issue which should be fixed by #8888.

st2w_int32_x8                   (arm-64-linux-no_neon-sve2-vector_bits_256)
StartAssertion failed: (!isScalable() || isZero()) && "Request for a fixed element count on a scalable object", file C:\build_bot\worker\llvm-main-x86-32-windows\llvm-project\llvm\include\llvm/Support/TypeSize.h, line 202

I will rebase once #8888 is merged.

Theoretically, these are llvm common and not ARM specific, but for now, keep it for ARM only to avoid any affect to other targets.

The workaround of checking wide_enough in get_vector_type() was causing the issue of mixing FixedVector and ScalableVector in generating a intrinsic instruction in SVE2 codegen. By this change, we select scalable vector for most of the cases. Note the workaround for vscale > 1 case will be addressed in a separate commit.

By design, LLVM shufflevector doesn't accept scalable vectors. So, we try to use llvm.vector.xx intrinsic where possible. However, those are not enough to cover wide usage of shuffles in Halide. To handle arbitrary index pattern, we decompose a shuffle operation to a sequence of multiple native shuffles, which are lowered to Arm SVE2 intrinsic TBL or TBL2. Another approach could be to perform shuffle in fixed sized vector by adding conversion between scalable vector and fixed vector. However, it seems to be only possible via load/store memory, which would presumably be poor performance. This change also includes: - Peep-hole the particular predicate pattern to emit WHILELT instruction - Shuffle 1bit type scalable vectors as 8bit with type casts - Peep-hole concat_vectors for padding to align up vector - Fix redundant broadcast in CodeGen_LLVM

Modified codegen of vector broadcast in SVE2 to emit TBL ARM intrin instead of llvm.vector.insert. Fix performance test failure of nested_vectorization_gemm

alexreinking requested a review from halidebuildbots December 11, 2025 19:35

stevesuzuki-arm added 6 commits December 15, 2025 10:50

Add helpers for shuffle operations of scalable vector

1fd7c1b

Move helpers for shuffle scalable vectors to CodeGen_ARM

9d8fe11

Theoretically, these are llvm common and not ARM specific, but for now, keep it for ARM only to avoid any affect to other targets.

Add DecomposeVectorShuffle to Makefile

a7bc84b

Improve performance of vector broadcast in SVE2

9c9e621

Modified codegen of vector broadcast in SVE2 to emit TBL ARM intrin instead of llvm.vector.insert. Fix performance test failure of nested_vectorization_gemm

stevesuzuki-arm force-pushed the pr-shuffle_sve2 branch from 4a40326 to 9c9e621 Compare December 15, 2025 11:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shuffle scalable vector in CodeGen_ARM#8898

Shuffle scalable vector in CodeGen_ARM#8898
stevesuzuki-arm wants to merge 6 commits intohalide:mainfrom
stevesuzuki-arm:pr-shuffle_sve2

stevesuzuki-arm commented Dec 11, 2025 •

edited

Loading

Uh oh!

stevesuzuki-arm commented Dec 11, 2025

Uh oh!

stevesuzuki-arm commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stevesuzuki-arm commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevesuzuki-arm commented Dec 11, 2025

Uh oh!

stevesuzuki-arm commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

stevesuzuki-arm commented Dec 11, 2025 •

edited

Loading