Skip to content

The representation of vector shuffles needs reworking #9129

@abadams

Description

@abadams

Current a Shuffle node takes a vector of Exprs and a vector of ints, and concatenates the Exprs and slices out the corresponding ints. This has three main problems:

  • It cannot represent dynamic shuffles (computed indices)
  • All operations and storage is O(lanes), which is a problem for larger and larger vectors. Simple things like is_transpose have to check every lane, and there could be up to 2^16-1 of them.
  • It's so general that operations on it get very complex, because there are no invariants baked in. All code that touches a shuffle typically has to verify a bunch of things (e.g. is there a single Expr?)

I propose we break the Shuffle node up into a collection of nodes, one per shuffle type: concat, transpose, slice, and a general one which takes a single Expr for the vector and a single Expr for the indices into that vector. Note that none of these are O(lanes), and the dynamic case is handled by the general shuffle. Only concat takes multiple Expr args (either just two and you chain them, or a vector of them). An interleave is just a transpose of a concat. An LLVM shufflevector instruction would then be a concat followed by our general shuffle where the indices happen to be constants (e.g. a concat of ints).

The general one is equivalent to spilling the value to the stack and then loading from that memory at the given indices. That could be the baseline lowering of it for architectures without, e.g. tbl, vlut, pshufb instructions.

We could also just have concat and the general one, where transpose and slice are a 2D multiramp index and a constant ramp index respectively. Maybe that's better?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions