[LLVMdev] [Mesa3d-dev] Folding vector instructions
clattner at apple.com
Tue Dec 30 14:30:35 CST 2008
On Dec 30, 2008, at 6:39 AM, Corbin Simpson wrote:
>> However, the special instrucions cannot directly be mapped to LLVM
>> IR, like
>> "min", the conversion involves in 'extract' the vector, create
>> less-than-compare, create 'select' instruction, and create 'insert-
Using scalar operations obviously works, but will probably produce
very inefficient code. One positive thing is that all target-specific
operations of supported vector ISAs (Altivec and SSE[1-4] currently)
are exposed either through LLVM IR ops or through target-specific
builtins/intrinsics. This means that you can get access to all the
crazy SSE instructions, but it means that your codegen would have to
handle this target-specific code generation.
The direction we're going is to expose more and more vector operations
in LLVM IR. For example, compares and select are currently being
worked on, so you can do a comparison of two vectors which returns a
vector of bools, and use that as the compare value of a select
instruction (selecting between two vectors). This would allow
implementing min and a variety of other operations and is easier for
the codegen to reassemble into a first-class min operation etc.
I don't know what the status of this is, I think it is partially
implemented but may not be complete yet.
>> I don't have experience of the new vector instructions in LLVM, and
>> that's why it makes me feel it's complicated to fold the swizzle and
We have really good support for swizzling operations already with the
shuffle_vector instruction. I'm not sure about writemask.
> Um, I was thinking that we should eventually create intrinsic
> for some of the commands, like LIT, that might not be
> single-instruction, but that can be lowered eventually, and for
> like LG2, that might be single-instruction for shaders, but probably
> for non-shader chipsets.
Sure, it would be very reasonable to make these target-specific
builtins when targeting a GPU, the same way we have target-specific
builtins for SSE.
More information about the LLVMdev