MLIR letter-D vector designs are presently represented as (n-1)-D arrays of just one-D vectors whenever lowered so you can LLVM

MLIR letter-D vector designs are presently represented as (n-1)-D arrays of just one-D vectors whenever lowered so you can LLVM

Brand new implication of your own actual HW restrictions into programming design was this package do not list dynamically across hardware files: a join document is also fundamentally not noted dynamically. It is because the brand new check in number is restricted and something often needs to unroll clearly discover fixed sign in number or go using thoughts. That is a restriction familiar to CUDA coders: whenever claiming an exclusive drift an effective ; and next indexing that have a dynamic value leads to very-entitled local memories usage (i.elizabeth. roundtripping so you’re able to recollections).

Implication to your codegen ¶

That it raises the results into fixed against active indexing talked about in earlier times: extractelement , insertelement and you will shufflevector to your letter-D vectors inside the MLIR simply support static indices. Active indicator are only offered on really lesser step one-D vector not brand new external (n-1)-D . With other instances, direct weight / stores are expected.

  1. Loops around vector viewpoints is actually secondary dealing with out of vector philosophy, they must run using direct stream / shop surgery over letter-D vector items.
  2. Shortly after an letter-D vector sort of was stacked for the an SSA value (which can otherwise will most likely not reside in letter files, having otherwise as opposed to spilling, when sooner or later decreased), it may be unrolled so you can less k-D vector products and processes you to correspond to brand new HW. So it number of MLIR codegen is related to check in allotment and you will spilling that are present far later in the LLVM pipeline.
  3. HW could possibly get assistance >1-D vectors with intrinsics to have secondary handling throughout these vectors. These may feel targeted by way of explicit vector_shed procedures away from MLIR k-D vector items and processes so you’re able to LLVM step one-D vectors + intrinsics.

Instead, we believe truly reducing so you can an excellent linearized abstraction hides out the codegen intricacies regarding memories accesses by providing a bogus impact off enchanting vibrant indexing across data. Alternatively i like to create the individuals extremely explicit from inside the MLIR and make it codegen to explore tradeoffs. Some other HW will need other tradeoffs about items employed in tips step 1., dos. and you may step 3.

Conclusion produced on MLIR top get effects on good far later on phase within the LLVM (once register allotment). We really do not consider to reveal questions associated with acting regarding sign in allowance and spilling in order to MLIR clearly. As an alternative, each target commonly present a collection of “good” address functions and you may n-D vector items, associated with the can cost you that PatterRewriters in the MLIR top would be able to address. Like will cost you in the MLIR height would-be abstract and you can utilized for positions, perhaps not to have perfect overall performance modeling. Later on such as for instance can cost you could well be read.

Implication for the Minimizing to Accelerators ¶

To target accelerators that support higher dimensional vectors natively, we can start from either 1-D or n-D vectors in MLIR and use vector.cast to flatten the most minor dimensions to 1-D vector where K is an appropriate constant. Then, the existing lowering to LLVM-IR immediately applies, with extensions for accelerator-specific intrinsics.

It is the role of an Accelerator-specific vector dialect (see codegen flow in the figure above) to lower the vector.cast . Accelerator -> LLVM lowering would then consist of a bunch of Accelerator -> Accelerator rewrites to perform the casts composed with Accelerator -> LLVM conversions + intrinsics that operate on 1-D vector .

Some of those rewrites may need extra handling, especially if a reduction is involved. For example, vector.cast %0: vector to vector when K != K1 * … * Kn and some arbitrary irregular vector.cast %0: vector<4x4x17xf32> to vector may introduce masking and intra-vector shuffling that may not be worthwhile or even feasible, i.e. infinite cost.

However vector.cast %0: vector to vector when K = K1 * … * Kn should be close to a noop.