From evan.cheng at apple.com Mon Apr 24 00:37:47 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Mon, 24 Apr 2006 00:37:47 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86RegisterInfo.cpp Message-ID: <200604240537.AAA11890@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86RegisterInfo.cpp updated: 1.143 -> 1.144 --- Log message: MakeMIInst() should handle jump table index operands. --- Diffs of the changes: (+3 -0) X86RegisterInfo.cpp | 3 +++ 1 files changed, 3 insertions(+) Index: llvm/lib/Target/X86/X86RegisterInfo.cpp diff -u llvm/lib/Target/X86/X86RegisterInfo.cpp:1.143 llvm/lib/Target/X86/X86RegisterInfo.cpp:1.144 --- llvm/lib/Target/X86/X86RegisterInfo.cpp:1.143 Tue Apr 18 16:59:43 2006 +++ llvm/lib/Target/X86/X86RegisterInfo.cpp Mon Apr 24 00:37:35 2006 @@ -151,6 +151,9 @@ return addFrameReference(BuildMI(Opcode, 5), FrameIndex) .addGlobalAddress(MI->getOperand(1).getGlobal(), false, MI->getOperand(1).getOffset()); + else if (MI->getOperand(1).isJumpTableIndex()) + return addFrameReference(BuildMI(Opcode, 5), FrameIndex) + .addJumpTableIndex(MI->getOperand(1).getJumpTableIndex()); assert(0 && "Unknown operand for MakeMI!"); return 0; } From evan.cheng at apple.com Mon Apr 24 01:42:27 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Mon, 24 Apr 2006 01:42:27 -0500 Subject: [llvm-commits] CVS: llvm/include/llvm/CodeGen/MachineInstrBuilder.h Message-ID: <200604240642.BAA12479@zion.cs.uiuc.edu> Changes in directory llvm/include/llvm/CodeGen: MachineInstrBuilder.h updated: 1.29 -> 1.30 --- Log message: Added addJumpTableIndex --- Diffs of the changes: (+5 -0) MachineInstrBuilder.h | 5 +++++ 1 files changed, 5 insertions(+) Index: llvm/include/llvm/CodeGen/MachineInstrBuilder.h diff -u llvm/include/llvm/CodeGen/MachineInstrBuilder.h:1.29 llvm/include/llvm/CodeGen/MachineInstrBuilder.h:1.30 --- llvm/include/llvm/CodeGen/MachineInstrBuilder.h:1.29 Sat Feb 25 03:52:55 2006 +++ llvm/include/llvm/CodeGen/MachineInstrBuilder.h Mon Apr 24 01:42:15 2006 @@ -131,6 +131,11 @@ return *this; } + const MachineInstrBuilder &addJumpTableIndex(unsigned Idx) const { + MI->addJumpTableIndexOperand(Idx); + return *this; + } + const MachineInstrBuilder &addGlobalAddress(GlobalValue *GV, bool isPCRelative = false, int Offset = 0) const { From lattner at cs.uiuc.edu Mon Apr 24 11:35:09 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Mon, 24 Apr 2006 11:35:09 -0500 Subject: [llvm-commits] CVS: llvm/docs/WritingAnLLVMBackend.html Message-ID: <200604241635.LAA09166@zion.cs.uiuc.edu> Changes in directory llvm/docs: WritingAnLLVMBackend.html updated: 1.10 -> 1.11 --- Log message: Suggest looking at the SPARC backend. How weird is that?? :) --- Diffs of the changes: (+4 -3) WritingAnLLVMBackend.html | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) Index: llvm/docs/WritingAnLLVMBackend.html diff -u llvm/docs/WritingAnLLVMBackend.html:1.10 llvm/docs/WritingAnLLVMBackend.html:1.11 --- llvm/docs/WritingAnLLVMBackend.html:1.10 Mon Mar 13 23:39:39 2006 +++ llvm/docs/WritingAnLLVMBackend.html Mon Apr 24 11:34:45 2006 @@ -61,8 +61,9 @@
-

In general, you want to follow the format of X86 or PowerPC (in -lib/Target).

+

In general, you want to follow the format of SPARC, X86 or PowerPC (in +lib/Target). SPARC is the simplest backend, and is RISC, so if +you're working on a RISC target, it is a good one to start with.

To create a static compiler (one that emits text assembly), you need to implement the following:

@@ -252,7 +253,7 @@ Misha Brukman
The LLVM Compiler Infrastructure
- Last modified: $Date: 2006/03/14 05:39:39 $ + Last modified: $Date: 2006/04/24 16:34:45 $ From evan.cheng at apple.com Mon Apr 24 12:38:28 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Mon, 24 Apr 2006 12:38:28 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/README.txt Message-ID: <200604241738.MAA13465@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: README.txt updated: 1.96 -> 1.97 --- Log message: Remove a completed entry. --- Diffs of the changes: (+0 -55) README.txt | 55 ------------------------------------------------------- 1 files changed, 55 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.96 llvm/lib/Target/X86/README.txt:1.97 --- llvm/lib/Target/X86/README.txt:1.96 Sun Apr 23 14:47:09 2006 +++ llvm/lib/Target/X86/README.txt Mon Apr 24 12:38:16 2006 @@ -999,61 +999,6 @@ //===---------------------------------------------------------------------===// -Use the 0's in the top part of movss from memory (and from other instructions -that generate them) to build vectors more efficiently. Consider: - -vector float test(float a) { - return (vector float){ 0.0, a, 0.0, 0.0}; -} - -We currently generate this as: - -_test: - sub %ESP, 28 - movss %XMM0, DWORD PTR [%ESP + 32] - movss DWORD PTR [%ESP + 4], %XMM0 - mov DWORD PTR [%ESP + 12], 0 - mov DWORD PTR [%ESP + 8], 0 - mov DWORD PTR [%ESP], 0 - movaps %XMM0, XMMWORD PTR [%ESP] - add %ESP, 28 - ret - -Something like this should be sufficient: - -_test: - movss %XMM0, DWORD PTR [%ESP + 4] - shufps %XMM0, %XMM0, 81 - ret - -... which takes advantage of the zero elements provided by movss. -Even xoring a register and shufps'ing IT would be better than the -above code. - -Likewise, for this: - -vector float test(float a, float b) { - return (vector float){ b, a, 0.0, 0.0}; -} - -_test: - pxor %XMM0, %XMM0 - movss %XMM1, %XMM0 - movss %XMM2, DWORD PTR [%ESP + 4] - unpcklps %XMM2, %XMM1 - movss %XMM0, DWORD PTR [%ESP + 8] - unpcklps %XMM0, %XMM1 - unpcklps %XMM0, %XMM2 - ret - -... where we do use pxor, it would be better to use the zero'd -elements that movss provides to turn this into 2 shufps's instead -of 3 unpcklps's. - -Another example: {0.0, 0.0, a, b } - -//===---------------------------------------------------------------------===// - Consider: __m128 test(float a) { From evan.cheng at apple.com Mon Apr 24 13:01:57 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Mon, 24 Apr 2006 13:01:57 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86ISelLowering.cpp Message-ID: <200604241801.NAA16389@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86ISelLowering.cpp updated: 1.184 -> 1.185 --- Log message: A little bit more build_vector enhancement for v8i16 cases. --- Diffs of the changes: (+105 -42) X86ISelLowering.cpp | 147 +++++++++++++++++++++++++++++++++++++--------------- 1 files changed, 105 insertions(+), 42 deletions(-) Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.184 llvm/lib/Target/X86/X86ISelLowering.cpp:1.185 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.184 Sun Apr 23 01:35:19 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Mon Apr 24 13:01:45 2006 @@ -2154,6 +2154,78 @@ return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V2, Mask); } +/// LowerBuildVectorv16i8 - Custom lower build_vector of v16i8. +/// +static SDOperand LowerBuildVectorv16i8(SDOperand Op, unsigned NonZeros, + unsigned NumNonZero, unsigned NumZero, + SelectionDAG &DAG) { + if (NumNonZero > 8) + return SDOperand(); + + SDOperand V(0, 0); + bool First = true; + for (unsigned i = 0; i < 16; ++i) { + bool ThisIsNonZero = (NonZeros & (1 << i)) != 0; + if (ThisIsNonZero && First) { + if (NumZero) + V = getZeroVector(MVT::v8i16, DAG); + else + V = DAG.getNode(ISD::UNDEF, MVT::v8i16); + First = false; + } + + if ((i & 1) != 0) { + SDOperand ThisElt(0, 0), LastElt(0, 0); + bool LastIsNonZero = (NonZeros & (1 << (i-1))) != 0; + if (LastIsNonZero) { + LastElt = DAG.getNode(ISD::ZERO_EXTEND, MVT::i16, Op.getOperand(i-1)); + } + if (ThisIsNonZero) { + ThisElt = DAG.getNode(ISD::ZERO_EXTEND, MVT::i16, Op.getOperand(i)); + ThisElt = DAG.getNode(ISD::SHL, MVT::i16, + ThisElt, DAG.getConstant(8, MVT::i8)); + if (LastIsNonZero) + ThisElt = DAG.getNode(ISD::OR, MVT::i16, ThisElt, LastElt); + } else + ThisElt = LastElt; + + if (ThisElt.Val) + V = DAG.getNode(ISD::INSERT_VECTOR_ELT, MVT::v8i16, V, ThisElt, + DAG.getConstant(i/2, MVT::i32)); + } + } + + return DAG.getNode(ISD::BIT_CONVERT, MVT::v16i8, V); +} + +/// LowerBuildVectorv16i8 - Custom lower build_vector of v8i16. +/// +static SDOperand LowerBuildVectorv8i16(SDOperand Op, unsigned NonZeros, + unsigned NumNonZero, unsigned NumZero, + SelectionDAG &DAG) { + if (NumNonZero > 4) + return SDOperand(); + + SDOperand V(0, 0); + bool First = true; + for (unsigned i = 0; i < 8; ++i) { + bool isNonZero = (NonZeros & (1 << i)) != 0; + if (isNonZero) { + if (First) { + if (NumZero) + V = getZeroVector(MVT::v8i16, DAG); + else + V = DAG.getNode(ISD::UNDEF, MVT::v8i16); + First = false; + } + V = DAG.getNode(ISD::INSERT_VECTOR_ELT, MVT::v8i16, V, Op.getOperand(i), + DAG.getConstant(i, MVT::i32)); + } + } + + return V; +} + /// LowerOperation - Provide custom lowering hooks for some operations. /// SDOperand X86TargetLowering::LowerOperation(SDOperand Op, SelectionDAG &DAG) { @@ -3152,38 +3224,49 @@ return SDOperand(); } case ISD::BUILD_VECTOR: { + // All zero's are handled with pxor. + if (ISD::isBuildVectorAllZeros(Op.Val)) + return Op; + // All one's are handled with pcmpeqd. if (ISD::isBuildVectorAllOnes(Op.Val)) return Op; - unsigned NumElems = Op.getNumOperands(); - if (NumElems == 2) - return SDOperand(); - - unsigned Half = NumElems/2; MVT::ValueType VT = Op.getValueType(); MVT::ValueType EVT = MVT::getVectorBaseType(VT); + unsigned EVTBits = MVT::getSizeInBits(EVT); + + // Let legalizer expand 2-widde build_vector's. + if (EVTBits == 64) + return SDOperand(); + + unsigned NumElems = Op.getNumOperands(); unsigned NumZero = 0; + unsigned NumNonZero = 0; unsigned NonZeros = 0; std::set Values; for (unsigned i = 0; i < NumElems; ++i) { SDOperand Elt = Op.getOperand(i); - Values.insert(Elt); - if (isZeroNode(Elt)) - NumZero++; - else if (Elt.getOpcode() != ISD::UNDEF) - NonZeros |= (1 << i); + if (Elt.getOpcode() != ISD::UNDEF) { + Values.insert(Elt); + if (isZeroNode(Elt)) + NumZero++; + else { + NonZeros |= (1 << i); + NumNonZero++; + } + } } - unsigned NumNonZero = CountPopulation_32(NonZeros); if (NumNonZero == 0) - return Op; + // Must be a mix of zero and undef. Return a zero vector. + return getZeroVector(VT, DAG); // Splat is obviously ok. Let legalizer expand it to a shuffle. if (Values.size() == 1) return SDOperand(); - // If element VT is >= 32 bits, turn it into a number of shuffles. + // Special case for single non-zero element. if (NumNonZero == 1) { unsigned Idx = CountTrailingZeros_32(NonZeros); SDOperand Item = Op.getOperand(Idx); @@ -3193,7 +3276,7 @@ return getShuffleVectorZeroOrUndef(Item, VT, NumElems, Idx, NumZero > 0, DAG); - if (MVT::getSizeInBits(EVT) >= 32) { + if (EVTBits == 32) { // Turn it into a shuffle of zero and zero-extended scalar to vector. Item = getShuffleVectorZeroOrUndef(Item, VT, NumElems, 0, NumZero > 0, DAG); @@ -3209,37 +3292,17 @@ } // If element VT is < 32 bits, convert it to inserts into a zero vector. - if (MVT::getSizeInBits(EVT) <= 16) { - if (NumNonZero <= Half) { - SDOperand V(0, 0); - - for (unsigned i = 0; i < NumNonZero; ++i) { - unsigned Idx = CountTrailingZeros_32(NonZeros); - NonZeros ^= (1 << Idx); - SDOperand Item = Op.getOperand(Idx); - if (i == 0) { - if (NumZero) - V = getZeroVector(MVT::v8i16, DAG); - else - V = DAG.getNode(ISD::UNDEF, MVT::v8i16); - } - if (EVT == MVT::i8) { - Item = DAG.getNode(ISD::ANY_EXTEND, MVT::i16, Item); - if ((Idx % 2) != 0) - Item = DAG.getNode(ISD::SHL, MVT::i16, - Item, DAG.getConstant(8, MVT::i8)); - Idx /= 2; - } - V = DAG.getNode(ISD::INSERT_VECTOR_ELT, MVT::v8i16, V, Item, - DAG.getConstant(Idx, MVT::i32)); - } + if (EVTBits == 8) { + SDOperand V = LowerBuildVectorv16i8(Op, NonZeros,NumNonZero,NumZero, DAG); + if (V.Val) return V; + } - if (EVT == MVT::i8) - V = DAG.getNode(ISD::BIT_CONVERT, VT, V); - return V; - } + if (EVTBits == 16) { + SDOperand V = LowerBuildVectorv8i16(Op, NonZeros,NumNonZero,NumZero, DAG); + if (V.Val) return V; } + // If element VT is == 32 bits, turn it into a number of shuffles. std::vector V(NumElems); if (NumElems == 4 && NumZero > 0) { for (unsigned i = 0; i < 4; ++i) { From evan.cheng at apple.com Mon Apr 24 16:58:32 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Mon, 24 Apr 2006 16:58:32 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86InstrSSE.td Message-ID: <200604242158.QAA20267@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86InstrSSE.td updated: 1.110 -> 1.111 --- Log message: Some missing movlps, movhps, movlpd, and movhpd patterns. --- Diffs of the changes: (+14 -6) X86InstrSSE.td | 20 ++++++++++++++------ 1 files changed, 14 insertions(+), 6 deletions(-) Index: llvm/lib/Target/X86/X86InstrSSE.td diff -u llvm/lib/Target/X86/X86InstrSSE.td:1.110 llvm/lib/Target/X86/X86InstrSSE.td:1.111 --- llvm/lib/Target/X86/X86InstrSSE.td:1.110 Thu Apr 20 20:05:10 2006 +++ llvm/lib/Target/X86/X86InstrSSE.td Mon Apr 24 16:58:20 2006 @@ -2462,18 +2462,26 @@ MOVHP_shuffle_mask)), (MOVHPDrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; -def : Pat<(v4i32 (vector_shuffle VR128:$src1, VR128:$src2, - MOVL_shuffle_mask)), - (MOVLPSrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; def : Pat<(v4i32 (vector_shuffle VR128:$src1, (bc_v4i32 (loadv2i64 addr:$src2)), MOVLP_shuffle_mask)), (MOVLPSrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; +def : Pat<(v2i64 (vector_shuffle VR128:$src1, (loadv2i64 addr:$src2), + MOVLP_shuffle_mask)), + (MOVLPDrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; +def : Pat<(v4i32 (vector_shuffle VR128:$src1, (bc_v4i32 (loadv2i64 addr:$src2)), + MOVHP_shuffle_mask)), + (MOVHPSrm VR128:$src1, addr:$src2)>, Requires<[HasSSE1]>; +def : Pat<(v2i64 (vector_shuffle VR128:$src1, (loadv2i64 addr:$src2), + MOVLP_shuffle_mask)), + (MOVLPDrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; + +// Setting the lowest element in the vector. +def : Pat<(v4i32 (vector_shuffle VR128:$src1, VR128:$src2, + MOVL_shuffle_mask)), + (MOVLPSrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; def : Pat<(v2i64 (vector_shuffle VR128:$src1, VR128:$src2, MOVL_shuffle_mask)), (MOVLPDrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; -def : Pat<(v2i64 (vector_shuffle VR128:$src1, (loadv2i64 addr:$src2), - MOVHP_shuffle_mask)), - (MOVHPDrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; } // 128-bit logical shifts From evan.cheng at apple.com Mon Apr 24 17:59:04 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Mon, 24 Apr 2006 17:59:04 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86ISelLowering.cpp Message-ID: <200604242259.RAA25870@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86ISelLowering.cpp updated: 1.185 -> 1.186 --- Log message: Special case handling two wide build_vector(0, x). --- Diffs of the changes: (+4 -4) X86ISelLowering.cpp | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.185 llvm/lib/Target/X86/X86ISelLowering.cpp:1.186 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.185 Mon Apr 24 13:01:45 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Mon Apr 24 17:58:52 2006 @@ -3236,10 +3236,6 @@ MVT::ValueType EVT = MVT::getVectorBaseType(VT); unsigned EVTBits = MVT::getSizeInBits(EVT); - // Let legalizer expand 2-widde build_vector's. - if (EVTBits == 64) - return SDOperand(); - unsigned NumElems = Op.getNumOperands(); unsigned NumZero = 0; unsigned NumNonZero = 0; @@ -3291,6 +3287,10 @@ } } + // Let legalizer expand 2-widde build_vector's. + if (EVTBits == 64) + return SDOperand(); + // If element VT is < 32 bits, convert it to inserts into a zero vector. if (EVTBits == 8) { SDOperand V = LowerBuildVectorv16i8(Op, NonZeros,NumNonZero,NumZero, DAG); From evan.cheng at apple.com Mon Apr 24 18:03:34 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Mon, 24 Apr 2006 18:03:34 -0500 Subject: [llvm-commits] CVS: llvm/test/Regression/CodeGen/X86/vec_set-7.ll Message-ID: <200604242303.SAA25955@zion.cs.uiuc.edu> Changes in directory llvm/test/Regression/CodeGen/X86: vec_set-7.ll added (r1.1) --- Log message: Added a movq test case. --- Diffs of the changes: (+10 -0) vec_set-7.ll | 10 ++++++++++ 1 files changed, 10 insertions(+) Index: llvm/test/Regression/CodeGen/X86/vec_set-7.ll diff -c /dev/null llvm/test/Regression/CodeGen/X86/vec_set-7.ll:1.1 *** /dev/null Mon Apr 24 18:03:32 2006 --- llvm/test/Regression/CodeGen/X86/vec_set-7.ll Mon Apr 24 18:03:22 2006 *************** *** 0 **** --- 1,10 ---- + ; RUN: llvm-as < %s | llc -march=x86 -mattr=+sse2 | grep movq | wc -l | grep 1 + + <2 x long> %test(<2 x long>* %p) { + %tmp = cast <2 x long>* %p to double* + %tmp = load double* %tmp + %tmp = insertelement <2 x double> undef, double %tmp, uint 0 + %tmp5 = insertelement <2 x double> %tmp, double 0.000000e+00, uint 1 + %tmp = cast <2 x double> %tmp5 to <2 x long> + ret <2 x long> %tmp + } From evan.cheng at apple.com Mon Apr 24 18:30:21 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Mon, 24 Apr 2006 18:30:21 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/README.txt Message-ID: <200604242330.SAA26138@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: README.txt updated: 1.97 -> 1.98 --- Log message: Add a new entry. --- Diffs of the changes: (+32 -0) README.txt | 32 ++++++++++++++++++++++++++++++++ 1 files changed, 32 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.97 llvm/lib/Target/X86/README.txt:1.98 --- llvm/lib/Target/X86/README.txt:1.97 Mon Apr 24 12:38:16 2006 +++ llvm/lib/Target/X86/README.txt Mon Apr 24 18:30:10 2006 @@ -1075,3 +1075,35 @@ There is also one case we do worse on PPC. //===---------------------------------------------------------------------===// + +For this: + +#include +void test(__m128d *r, __m128d *A, double B) { + *r = _mm_loadl_pd(*A, &B); +} + +We generates: + + subl $12, %esp + movsd 24(%esp), %xmm0 + movsd %xmm0, (%esp) + movl 20(%esp), %eax + movapd (%eax), %xmm0 + movlpd (%esp), %xmm0 + movl 16(%esp), %eax + movapd %xmm0, (%eax) + addl $12, %esp + ret + +icc generates: + + movl 4(%esp), %edx #3.6 + movl 8(%esp), %eax #3.6 + movapd (%eax), %xmm0 #4.22 + movlpd 12(%esp), %xmm0 #4.8 + movapd %xmm0, (%edx) #4.3 + ret #5.1 + +So icc is smart enough to know that B is in memory so it doesn't load it and +store it back to stack. From evan.cheng at apple.com Mon Apr 24 18:35:08 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Mon, 24 Apr 2006 18:35:08 -0500 Subject: [llvm-commits] CVS: llvm/include/llvm/IntrinsicsX86.td Message-ID: <200604242335.SAA26172@zion.cs.uiuc.edu> Changes in directory llvm/include/llvm: IntrinsicsX86.td updated: 1.29 -> 1.30 --- Log message: Added X86 SSE2 intrinsics which can be represented as vector_shuffles. This is a temporary workaround for the 2-wide vector_shuffle problem (i.e. its mask would have type v2i32 which is not legal). --- Diffs of the changes: (+29 -1) IntrinsicsX86.td | 30 +++++++++++++++++++++++++++++- 1 files changed, 29 insertions(+), 1 deletion(-) Index: llvm/include/llvm/IntrinsicsX86.td diff -u llvm/include/llvm/IntrinsicsX86.td:1.29 llvm/include/llvm/IntrinsicsX86.td:1.30 --- llvm/include/llvm/IntrinsicsX86.td:1.29 Fri Apr 14 16:59:03 2006 +++ llvm/include/llvm/IntrinsicsX86.td Mon Apr 24 18:34:56 2006 @@ -445,7 +445,6 @@ def int_x86_sse2_packuswb_128 : GCCBuiltin<"__builtin_ia32_packuswb128">, Intrinsic<[llvm_v8i16_ty, llvm_v8i16_ty, llvm_v8i16_ty], [IntrNoMem]>; - // FIXME: Temporary workaround since 2-wide shuffle is broken. def int_x86_sse2_movl_dq : GCCBuiltin<"__builtin_ia32_movqv4si">, Intrinsic<[llvm_v4i32_ty, llvm_v4i32_ty], [IntrNoMem]>; def int_x86_sse2_movmsk_pd : GCCBuiltin<"__builtin_ia32_movmskpd">, @@ -463,6 +462,35 @@ Intrinsic<[llvm_void_ty], [IntrWriteMem]>; } +// Shuffles. +// FIXME: Temporary workarounds since 2-wide shuffle is broken. +let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.". + def int_x86_sse2_movs_d : GCCBuiltin<"__builtin_ia32_movsd">, + Intrinsic<[llvm_v2f64_ty, llvm_v2f64_ty, + llvm_v2f64_ty], [IntrNoMem]>; + def int_x86_sse2_loadh_pd : GCCBuiltin<"__builtin_ia32_loadhpd">, + Intrinsic<[llvm_v2f64_ty, llvm_v2f64_ty, + llvm_ptr_ty], [IntrReadMem]>; + def int_x86_sse2_loadl_pd : GCCBuiltin<"__builtin_ia32_loadlpd">, + Intrinsic<[llvm_v2f64_ty, llvm_v2f64_ty, + llvm_ptr_ty], [IntrReadMem]>; + def int_x86_sse2_shuf_pd : GCCBuiltin<"__builtin_ia32_shufpd">, + Intrinsic<[llvm_v2f64_ty, llvm_v2f64_ty, + llvm_v2f64_ty, llvm_int_ty], [IntrNoMem]>; + def int_x86_sse2_unpckh_pd : GCCBuiltin<"__builtin_ia32_unpckhpd">, + Intrinsic<[llvm_v2f64_ty, llvm_v2f64_ty, + llvm_v2f64_ty], [IntrNoMem]>; + def int_x86_sse2_unpckl_pd : GCCBuiltin<"__builtin_ia32_unpcklpd">, + Intrinsic<[llvm_v2f64_ty, llvm_v2f64_ty, + llvm_v2f64_ty], [IntrNoMem]>; + def int_x86_sse2_punpckh_qdq : GCCBuiltin<"__builtin_ia32_punpckhqdq128">, + Intrinsic<[llvm_v2i64_ty, llvm_v2i64_ty, + llvm_v2i64_ty], [IntrNoMem]>; + def int_x86_sse2_punpckl_qdq : GCCBuiltin<"__builtin_ia32_punpcklqdq128">, + Intrinsic<[llvm_v2i64_ty, llvm_v2i64_ty, + llvm_v2i64_ty], [IntrNoMem]>; +} + //===----------------------------------------------------------------------===// // SSE3 From evan.cheng at apple.com Mon Apr 24 18:35:09 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Mon, 24 Apr 2006 18:35:09 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86InstrSSE.td Message-ID: <200604242335.SAA26176@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86InstrSSE.td updated: 1.111 -> 1.112 --- Log message: Added X86 SSE2 intrinsics which can be represented as vector_shuffles. This is a temporary workaround for the 2-wide vector_shuffle problem (i.e. its mask would have type v2i32 which is not legal). --- Diffs of the changes: (+44 -12) X86InstrSSE.td | 56 ++++++++++++++++++++++++++++++++++++++++++++------------ 1 files changed, 44 insertions(+), 12 deletions(-) Index: llvm/lib/Target/X86/X86InstrSSE.td diff -u llvm/lib/Target/X86/X86InstrSSE.td:1.111 llvm/lib/Target/X86/X86InstrSSE.td:1.112 --- llvm/lib/Target/X86/X86InstrSSE.td:1.111 Mon Apr 24 16:58:20 2006 +++ llvm/lib/Target/X86/X86InstrSSE.td Mon Apr 24 18:34:56 2006 @@ -2212,11 +2212,6 @@ "movq {$src, $dst|$dst, $src}", [(int_x86_sse2_storel_dq addr:$dst, VR128:$src)]>; -// FIXME: Temporary workaround since 2-wide shuffle is broken. -def MOVLQ128rr : PDI<0xD6, MRMSrcReg, (ops VR128:$dst, VR128:$src), - "movq {$src, $dst|$dst, $src}", - [(set VR128:$dst, (int_x86_sse2_movl_dq VR128:$src))]>; - // Move to lower bits of a VR128 and zeroing upper bits. // Loading from memory automatically zeroing upper bits. let AddedComplexity = 20 in { @@ -2241,13 +2236,16 @@ [(set VR128:$dst, (v4i32 (vector_shuffle immAllZerosV, (v4i32 (scalar_to_vector (loadi32 addr:$src))), MOVL_shuffle_mask)))]>; -def MOVZQI2PQIrr : PDI<0x7E, MRMSrcMem, (ops VR128:$dst, VR64:$src), - "movq {$src, $dst|$dst, $src}", []>; -def MOVZQI2PQIrm : PDI<0x7E, MRMSrcMem, (ops VR128:$dst, i64mem:$src), - "movq {$src, $dst|$dst, $src}", - [(set VR128:$dst, (bc_v2i64 (vector_shuffle immAllZerosV, - (v2f64 (scalar_to_vector (loadf64 addr:$src))), - MOVL_shuffle_mask)))]>; +// Moving from XMM to XMM but still clear upper 64 bits. +def MOVZQI2PQIrr : I<0x7E, MRMSrcReg, (ops VR128:$dst, VR128:$src), + "movq {$src, $dst|$dst, $src}", + [(set VR128:$dst, (int_x86_sse2_movl_dq VR128:$src))]>, + XS, Requires<[HasSSE2]>; +def MOVZQI2PQIrm : I<0x7E, MRMSrcMem, (ops VR128:$dst, i64mem:$src), + "movq {$src, $dst|$dst, $src}", + [(set VR128:$dst, (int_x86_sse2_movl_dq + (bc_v4i32 (loadv2i64 addr:$src))))]>, + XS, Requires<[HasSSE2]>; } //===----------------------------------------------------------------------===// @@ -2482,8 +2480,42 @@ def : Pat<(v2i64 (vector_shuffle VR128:$src1, VR128:$src2, MOVL_shuffle_mask)), (MOVLPDrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; + +// Set lowest element and zero upper elements. +def : Pat<(bc_v2i64 (vector_shuffle immAllZerosV, + (v2f64 (scalar_to_vector (loadf64 addr:$src))), + MOVL_shuffle_mask)), + (MOVZQI2PQIrm addr:$src)>, Requires<[HasSSE2]>; } +// FIXME: Temporary workaround since 2-wide shuffle is broken. +def : Pat<(int_x86_sse2_movs_d VR128:$src1, VR128:$src2), + (MOVLPDrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; +def : Pat<(int_x86_sse2_loadh_pd VR128:$src1, addr:$src2), + (MOVHPDrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; +def : Pat<(int_x86_sse2_loadl_pd VR128:$src1, addr:$src2), + (MOVLPDrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; +def : Pat<(int_x86_sse2_shuf_pd VR128:$src1, VR128:$src2, imm:$src3), + (SHUFPDrri VR128:$src1, VR128:$src2, imm:$src3)>, Requires<[HasSSE2]>; +def : Pat<(int_x86_sse2_shuf_pd VR128:$src1, (load addr:$src2), imm:$src3), + (SHUFPDrmi VR128:$src1, addr:$src2, imm:$src3)>, Requires<[HasSSE2]>; +def : Pat<(int_x86_sse2_unpckh_pd VR128:$src1, VR128:$src2), + (UNPCKHPDrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; +def : Pat<(int_x86_sse2_unpckh_pd VR128:$src1, (load addr:$src2)), + (UNPCKHPDrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; +def : Pat<(int_x86_sse2_unpckl_pd VR128:$src1, VR128:$src2), + (UNPCKLPDrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; +def : Pat<(int_x86_sse2_unpckl_pd VR128:$src1, (load addr:$src2)), + (UNPCKLPDrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; +def : Pat<(int_x86_sse2_punpckh_qdq VR128:$src1, VR128:$src2), + (PUNPCKHQDQrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; +def : Pat<(int_x86_sse2_punpckh_qdq VR128:$src1, (load addr:$src2)), + (PUNPCKHQDQrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; +def : Pat<(int_x86_sse2_punpckl_qdq VR128:$src1, VR128:$src2), + (PUNPCKLQDQrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; +def : Pat<(int_x86_sse2_punpckl_qdq VR128:$src1, (load addr:$src2)), + (PUNPCKLQDQrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; + // 128-bit logical shifts def : Pat<(int_x86_sse2_psll_dq VR128:$src1, imm:$src2), (v2i64 (PSLLDQri VR128:$src1, (PSxLDQ_imm imm:$src2)))>, From evan.cheng at apple.com Mon Apr 24 19:50:13 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Mon, 24 Apr 2006 19:50:13 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86InstrSSE.td Message-ID: <200604250050.TAA26790@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86InstrSSE.td updated: 1.112 -> 1.113 --- Log message: Explicitly specify result type for def : Pat<> patterns (if it produces a vector result). Otherwise tblgen will pick the default (v16i8 for 128-bit vector). --- Diffs of the changes: (+47 -45) X86InstrSSE.td | 92 +++++++++++++++++++++++++++++---------------------------- 1 files changed, 47 insertions(+), 45 deletions(-) Index: llvm/lib/Target/X86/X86InstrSSE.td diff -u llvm/lib/Target/X86/X86InstrSSE.td:1.112 llvm/lib/Target/X86/X86InstrSSE.td:1.113 --- llvm/lib/Target/X86/X86InstrSSE.td:1.112 Mon Apr 24 18:34:56 2006 +++ llvm/lib/Target/X86/X86InstrSSE.td Mon Apr 24 19:50:01 2006 @@ -2281,9 +2281,9 @@ // Scalar to v8i16 / v16i8. The source may be a R32, but only the lower 8 or // 16-bits matter. -def : Pat<(v8i16 (X86s2vec R32:$src)), (MOVDI2PDIrr R32:$src)>, +def : Pat<(v8i16 (X86s2vec R32:$src)), (v8i16 (MOVDI2PDIrr R32:$src))>, Requires<[HasSSE2]>; -def : Pat<(v16i8 (X86s2vec R32:$src)), (MOVDI2PDIrr R32:$src)>, +def : Pat<(v16i8 (X86s2vec R32:$src)), (v16i8 (MOVDI2PDIrr R32:$src))>, Requires<[HasSSE2]>; // bit_convert @@ -2353,17 +2353,17 @@ let AddedComplexity = 20 in { def : Pat<(v8i16 (vector_shuffle immAllZerosV, (v8i16 (X86s2vec R32:$src)), MOVL_shuffle_mask)), - (MOVZDI2PDIrr R32:$src)>, Requires<[HasSSE2]>; + (v8i16 (MOVZDI2PDIrr R32:$src))>, Requires<[HasSSE2]>; def : Pat<(v16i8 (vector_shuffle immAllZerosV, (v16i8 (X86s2vec R32:$src)), MOVL_shuffle_mask)), - (MOVZDI2PDIrr R32:$src)>, Requires<[HasSSE2]>; + (v16i8 (MOVZDI2PDIrr R32:$src))>, Requires<[HasSSE2]>; // Zeroing a VR128 then do a MOVS{S|D} to the lower bits. def : Pat<(v2f64 (vector_shuffle immAllZerosV, (v2f64 (scalar_to_vector FR64:$src)), MOVL_shuffle_mask)), - (MOVLSD2PDrr (V_SET0_PD), FR64:$src)>, Requires<[HasSSE2]>; + (v2f64 (MOVLSD2PDrr (V_SET0_PD), FR64:$src))>, Requires<[HasSSE2]>; def : Pat<(v4f32 (vector_shuffle immAllZerosV, (v4f32 (scalar_to_vector FR32:$src)), MOVL_shuffle_mask)), - (MOVLSS2PSrr (V_SET0_PS), FR32:$src)>, Requires<[HasSSE2]>; + (v4f32 (MOVLSS2PSrr (V_SET0_PS), FR32:$src))>, Requires<[HasSSE2]>; } // Splat v2f64 / v2i64 @@ -2404,115 +2404,117 @@ let AddedComplexity = 10 in { def : Pat<(v4f32 (vector_shuffle VR128:$src, (undef), UNPCKL_v_undef_shuffle_mask)), - (UNPCKLPSrr VR128:$src, VR128:$src)>, Requires<[HasSSE2]>; + (v4f32 (UNPCKLPSrr VR128:$src, VR128:$src))>, Requires<[HasSSE2]>; def : Pat<(v16i8 (vector_shuffle VR128:$src, (undef), UNPCKL_v_undef_shuffle_mask)), - (PUNPCKLBWrr VR128:$src, VR128:$src)>, Requires<[HasSSE2]>; + (v16i8 (PUNPCKLBWrr VR128:$src, VR128:$src))>, Requires<[HasSSE2]>; def : Pat<(v8i16 (vector_shuffle VR128:$src, (undef), UNPCKL_v_undef_shuffle_mask)), - (PUNPCKLWDrr VR128:$src, VR128:$src)>, Requires<[HasSSE2]>; + (v8i16 (PUNPCKLWDrr VR128:$src, VR128:$src))>, Requires<[HasSSE2]>; def : Pat<(v4i32 (vector_shuffle VR128:$src, (undef), UNPCKL_v_undef_shuffle_mask)), - (PUNPCKLDQrr VR128:$src, VR128:$src)>, Requires<[HasSSE1]>; + (v4i32 (PUNPCKLDQrr VR128:$src, VR128:$src))>, Requires<[HasSSE1]>; } let AddedComplexity = 20 in { // vector_shuffle v1, <1, 1, 3, 3> def : Pat<(v4i32 (vector_shuffle VR128:$src, (undef), MOVSHDUP_shuffle_mask)), - (MOVSHDUPrr VR128:$src)>, Requires<[HasSSE3]>; + (v4i32 (MOVSHDUPrr VR128:$src))>, Requires<[HasSSE3]>; def : Pat<(v4i32 (vector_shuffle (bc_v4i32 (loadv2i64 addr:$src)), (undef), MOVSHDUP_shuffle_mask)), - (MOVSHDUPrm addr:$src)>, Requires<[HasSSE3]>; + (v4i32 (MOVSHDUPrm addr:$src))>, Requires<[HasSSE3]>; // vector_shuffle v1, <0, 0, 2, 2> def : Pat<(v4i32 (vector_shuffle VR128:$src, (undef), MOVSLDUP_shuffle_mask)), - (MOVSLDUPrr VR128:$src)>, Requires<[HasSSE3]>; + (v4i32 (MOVSLDUPrr VR128:$src))>, Requires<[HasSSE3]>; def : Pat<(v4i32 (vector_shuffle (bc_v4i32 (loadv2i64 addr:$src)), (undef), MOVSLDUP_shuffle_mask)), - (MOVSLDUPrm addr:$src)>, Requires<[HasSSE3]>; + (v4i32 (MOVSLDUPrm addr:$src))>, Requires<[HasSSE3]>; } let AddedComplexity = 20 in { // vector_shuffle v1, v2 <0, 1, 4, 5> using MOVLHPS def : Pat<(v4i32 (vector_shuffle VR128:$src1, VR128:$src2, MOVHP_shuffle_mask)), - (MOVLHPSrr VR128:$src1, VR128:$src2)>; + (v4i32 (MOVLHPSrr VR128:$src1, VR128:$src2))>; // vector_shuffle v1, v2 <6, 7, 2, 3> using MOVHLPS def : Pat<(v4i32 (vector_shuffle VR128:$src1, VR128:$src2, MOVHLPS_shuffle_mask)), - (MOVHLPSrr VR128:$src1, VR128:$src2)>; + (v4i32 (MOVHLPSrr VR128:$src1, VR128:$src2))>; // vector_shuffle v1, (load v2) <4, 5, 2, 3> using MOVLPS // vector_shuffle v1, (load v2) <0, 1, 4, 5> using MOVHPS def : Pat<(v4f32 (vector_shuffle VR128:$src1, (loadv4f32 addr:$src2), MOVLP_shuffle_mask)), - (MOVLPSrm VR128:$src1, addr:$src2)>, Requires<[HasSSE1]>; + (v4f32 (MOVLPSrm VR128:$src1, addr:$src2))>, Requires<[HasSSE1]>; def : Pat<(v2f64 (vector_shuffle VR128:$src1, (loadv2f64 addr:$src2), MOVLP_shuffle_mask)), - (MOVLPDrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; + (v2f64 (MOVLPDrm VR128:$src1, addr:$src2))>, Requires<[HasSSE2]>; def : Pat<(v4f32 (vector_shuffle VR128:$src1, (loadv4f32 addr:$src2), MOVHP_shuffle_mask)), - (MOVHPSrm VR128:$src1, addr:$src2)>, Requires<[HasSSE1]>; + (v4f32 (MOVHPSrm VR128:$src1, addr:$src2))>, Requires<[HasSSE1]>; def : Pat<(v2f64 (vector_shuffle VR128:$src1, (loadv2f64 addr:$src2), MOVHP_shuffle_mask)), - (MOVHPDrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; + (v2f64 (MOVHPDrm VR128:$src1, addr:$src2))>, Requires<[HasSSE2]>; def : Pat<(v4i32 (vector_shuffle VR128:$src1, (bc_v4i32 (loadv2i64 addr:$src2)), MOVLP_shuffle_mask)), - (MOVLPSrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; + (v4i32 (MOVLPSrm VR128:$src1, addr:$src2))>, Requires<[HasSSE2]>; def : Pat<(v2i64 (vector_shuffle VR128:$src1, (loadv2i64 addr:$src2), MOVLP_shuffle_mask)), - (MOVLPDrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; + (v2i64 (MOVLPDrm VR128:$src1, addr:$src2))>, Requires<[HasSSE2]>; def : Pat<(v4i32 (vector_shuffle VR128:$src1, (bc_v4i32 (loadv2i64 addr:$src2)), MOVHP_shuffle_mask)), - (MOVHPSrm VR128:$src1, addr:$src2)>, Requires<[HasSSE1]>; + (v4i32 (MOVHPSrm VR128:$src1, addr:$src2))>, Requires<[HasSSE1]>; def : Pat<(v2i64 (vector_shuffle VR128:$src1, (loadv2i64 addr:$src2), MOVLP_shuffle_mask)), - (MOVLPDrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; + (v2i64 (MOVLPDrm VR128:$src1, addr:$src2))>, Requires<[HasSSE2]>; // Setting the lowest element in the vector. def : Pat<(v4i32 (vector_shuffle VR128:$src1, VR128:$src2, MOVL_shuffle_mask)), - (MOVLPSrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; + (v4i32 (MOVLPSrr VR128:$src1, VR128:$src2))>, Requires<[HasSSE2]>; def : Pat<(v2i64 (vector_shuffle VR128:$src1, VR128:$src2, MOVL_shuffle_mask)), - (MOVLPDrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; + (v2i64 (MOVLPDrr VR128:$src1, VR128:$src2))>, Requires<[HasSSE2]>; // Set lowest element and zero upper elements. def : Pat<(bc_v2i64 (vector_shuffle immAllZerosV, (v2f64 (scalar_to_vector (loadf64 addr:$src))), MOVL_shuffle_mask)), - (MOVZQI2PQIrm addr:$src)>, Requires<[HasSSE2]>; + (v2i64 (MOVZQI2PQIrm addr:$src))>, Requires<[HasSSE2]>; } // FIXME: Temporary workaround since 2-wide shuffle is broken. def : Pat<(int_x86_sse2_movs_d VR128:$src1, VR128:$src2), - (MOVLPDrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; + (v2f64 (MOVLPDrr VR128:$src1, VR128:$src2))>, Requires<[HasSSE2]>; def : Pat<(int_x86_sse2_loadh_pd VR128:$src1, addr:$src2), - (MOVHPDrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; + (v2f64 (MOVHPDrm VR128:$src1, addr:$src2))>, Requires<[HasSSE2]>; def : Pat<(int_x86_sse2_loadl_pd VR128:$src1, addr:$src2), - (MOVLPDrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; + (v2f64 (MOVLPDrm VR128:$src1, addr:$src2))>, Requires<[HasSSE2]>; def : Pat<(int_x86_sse2_shuf_pd VR128:$src1, VR128:$src2, imm:$src3), - (SHUFPDrri VR128:$src1, VR128:$src2, imm:$src3)>, Requires<[HasSSE2]>; + (v2f64 (SHUFPDrri VR128:$src1, VR128:$src2, imm:$src3))>, + Requires<[HasSSE2]>; def : Pat<(int_x86_sse2_shuf_pd VR128:$src1, (load addr:$src2), imm:$src3), - (SHUFPDrmi VR128:$src1, addr:$src2, imm:$src3)>, Requires<[HasSSE2]>; + (v2f64 (SHUFPDrmi VR128:$src1, addr:$src2, imm:$src3))>, + Requires<[HasSSE2]>; def : Pat<(int_x86_sse2_unpckh_pd VR128:$src1, VR128:$src2), - (UNPCKHPDrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; + (v2f64 (UNPCKHPDrr VR128:$src1, VR128:$src2))>, Requires<[HasSSE2]>; def : Pat<(int_x86_sse2_unpckh_pd VR128:$src1, (load addr:$src2)), - (UNPCKHPDrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; + (v2f64 (UNPCKHPDrm VR128:$src1, addr:$src2))>, Requires<[HasSSE2]>; def : Pat<(int_x86_sse2_unpckl_pd VR128:$src1, VR128:$src2), - (UNPCKLPDrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; + (v2f64 (UNPCKLPDrr VR128:$src1, VR128:$src2))>, Requires<[HasSSE2]>; def : Pat<(int_x86_sse2_unpckl_pd VR128:$src1, (load addr:$src2)), - (UNPCKLPDrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; + (v2f64 (UNPCKLPDrm VR128:$src1, addr:$src2))>, Requires<[HasSSE2]>; def : Pat<(int_x86_sse2_punpckh_qdq VR128:$src1, VR128:$src2), - (PUNPCKHQDQrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; + (v2i64 (PUNPCKHQDQrr VR128:$src1, VR128:$src2))>, Requires<[HasSSE2]>; def : Pat<(int_x86_sse2_punpckh_qdq VR128:$src1, (load addr:$src2)), - (PUNPCKHQDQrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; + (v2i64 (PUNPCKHQDQrm VR128:$src1, addr:$src2))>, Requires<[HasSSE2]>; def : Pat<(int_x86_sse2_punpckl_qdq VR128:$src1, VR128:$src2), - (PUNPCKLQDQrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; + (v2i64 (PUNPCKLQDQrr VR128:$src1, VR128:$src2))>, Requires<[HasSSE2]>; def : Pat<(int_x86_sse2_punpckl_qdq VR128:$src1, (load addr:$src2)), (PUNPCKLQDQrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; @@ -2527,20 +2529,20 @@ // Some special case pandn patterns. def : Pat<(v2i64 (and (xor VR128:$src1, (bc_v2i64 (v4i32 immAllOnesV))), VR128:$src2)), - (PANDNrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; + (v2i64 (PANDNrr VR128:$src1, VR128:$src2))>, Requires<[HasSSE2]>; def : Pat<(v2i64 (and (xor VR128:$src1, (bc_v2i64 (v8i16 immAllOnesV))), VR128:$src2)), - (PANDNrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; + (v2i64 (PANDNrr VR128:$src1, VR128:$src2))>, Requires<[HasSSE2]>; def : Pat<(v2i64 (and (xor VR128:$src1, (bc_v2i64 (v16i8 immAllOnesV))), VR128:$src2)), - (PANDNrr VR128:$src1, VR128:$src2)>, Requires<[HasSSE2]>; + (v2i64 (PANDNrr VR128:$src1, VR128:$src2))>, Requires<[HasSSE2]>; def : Pat<(v2i64 (and (xor VR128:$src1, (bc_v2i64 (v4i32 immAllOnesV))), (load addr:$src2))), - (PANDNrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; + (v2i64 (PANDNrm VR128:$src1, addr:$src2))>, Requires<[HasSSE2]>; def : Pat<(v2i64 (and (xor VR128:$src1, (bc_v2i64 (v8i16 immAllOnesV))), (load addr:$src2))), - (PANDNrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; + (v2i64 (PANDNrm VR128:$src1, addr:$src2))>, Requires<[HasSSE2]>; def : Pat<(v2i64 (and (xor VR128:$src1, (bc_v2i64 (v16i8 immAllOnesV))), (load addr:$src2))), - (PANDNrm VR128:$src1, addr:$src2)>, Requires<[HasSSE2]>; + (v2i64 (PANDNrm VR128:$src1, addr:$src2))>, Requires<[HasSSE2]>; From natebegeman at mac.com Mon Apr 24 23:46:11 2006 From: natebegeman at mac.com (Nate Begeman) Date: Mon, 24 Apr 2006 23:46:11 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/PowerPC/PPCJITInfo.cpp Message-ID: <200604250446.XAA28185@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/PowerPC: PPCJITInfo.cpp updated: 1.18 -> 1.19 --- Log message: No functionality changes, but cleaner code with correct comments. --- Diffs of the changes: (+41 -35) PPCJITInfo.cpp | 76 ++++++++++++++++++++++++++++++--------------------------- 1 files changed, 41 insertions(+), 35 deletions(-) Index: llvm/lib/Target/PowerPC/PPCJITInfo.cpp diff -u llvm/lib/Target/PowerPC/PPCJITInfo.cpp:1.18 llvm/lib/Target/PowerPC/PPCJITInfo.cpp:1.19 --- llvm/lib/Target/PowerPC/PPCJITInfo.cpp:1.18 Fri Apr 21 17:04:15 2006 +++ llvm/lib/Target/PowerPC/PPCJITInfo.cpp Mon Apr 24 23:45:59 2006 @@ -86,49 +86,55 @@ #endif extern "C" void PPC32CompilationCallbackC(unsigned *IntRegs, double *FPRegs) { - unsigned *CameFromStub = (unsigned*)__builtin_return_address(0+1); - unsigned *CameFromOrig = (unsigned*)__builtin_return_address(1+1); - unsigned *CCStackPtr = (unsigned*)__builtin_frame_address(0); -//unsigned *StubStackPtr = (unsigned*)__builtin_frame_address(1); - unsigned *OrigStackPtr = (unsigned*)__builtin_frame_address(2+1); - - // Adjust pointer to the branch, not the return address. - --CameFromStub; - - void *Target = JITCompilerFunction(CameFromStub); - - // Check to see if CameFromOrig[-1] is a 'bl' instruction, and if we can - // rewrite it to branch directly to the destination. If so, rewrite it so it - // does not need to go through the stub anymore. - unsigned CameFromOrigInst = CameFromOrig[-1]; - if ((CameFromOrigInst >> 26) == 18) { // Direct call. - intptr_t Offset = ((intptr_t)Target-(intptr_t)CameFromOrig+4) >> 2; + unsigned *StubCallAddrPlus4 = (unsigned*)__builtin_return_address(0+1); + unsigned *OrigCallAddrPlus4 = (unsigned*)__builtin_return_address(1+1); + unsigned *CurStackPtr = (unsigned*)__builtin_frame_address(0); + unsigned *OrigStackPtr = (unsigned*)__builtin_frame_address(2+1); + + // Adjust the pointer to the address of the call instruction in the stub + // emitted by emitFunctionStub, rather than the instruction after it. + unsigned *StubCallAddr = StubCallAddrPlus4 - 1; + unsigned *OrigCallAddr = OrigCallAddrPlus4 - 1; + + void *Target = JITCompilerFunction(StubCallAddr); + + // Check to see if *OrigCallAddr is a 'bl' instruction, and if we can rewrite + // it to branch directly to the destination. If so, rewrite it so it does not + // need to go through the stub anymore. + unsigned OrigCallInst = *OrigCallAddr; + if ((OrigCallInst >> 26) == 18) { // Direct call. + intptr_t Offset = ((intptr_t)Target - (intptr_t)OrigCallAddr) >> 2; + if (Offset >= -(1 << 23) && Offset < (1 << 23)) { // In range? // Clear the original target out. - CameFromOrigInst &= (63 << 26) | 3; + OrigCallInst &= (63 << 26) | 3; // Fill in the new target. - CameFromOrigInst |= (Offset & ((1 << 24)-1)) << 2; + OrigCallInst |= (Offset & ((1 << 24)-1)) << 2; // Replace the call. - CameFromOrig[-1] = CameFromOrigInst; + *OrigCallAddr = OrigCallInst; } } - // Locate the start of the stub. If this is a short call, adjust backwards - // the short amount, otherwise the full amount. - bool isShortStub = (*CameFromStub >> 26) == 18; - CameFromStub -= isShortStub ? 2 : 6; + // Assert that we are coming from a stub that was created with our + // emitFunctionStub. + assert((*StubCallAddr >> 26) == 19 && "Call in stub is not indirect!"); + StubCallAddr -= 6; // Rewrite the stub with an unconditional branch to the target, for any users // who took the address of the stub. - EmitBranchToAt(CameFromStub, Target, false); + EmitBranchToAt(StubCallAddr, Target, false); - // Change the SP so that we pop two stack frames off when we return. - *CCStackPtr = (intptr_t)OrigStackPtr; - - // Put the address of the stub and the LR value that originally came into the - // stub in a place that is easy to get on the stack after we restore all regs. - CCStackPtr[2] = (intptr_t)Target; - CCStackPtr[1] = (intptr_t)CameFromOrig; + // Change the stored stack pointer so that we pop three stack frames: + // 1. PPC32CompilationCallbackC's frame + // 2. _PPC32CompilationCallback's frame + // 3. the stub's frame + *CurStackPtr = (intptr_t)OrigStackPtr; + + // Put the address of the target function to call and the address to return to + // after calling the target function in a place that is easy to get on the + // stack after we restore all regs. + CurStackPtr[2] = (intptr_t)Target; + CurStackPtr[1] = (intptr_t)OrigCallAddrPlus4; // Note, this is not a standard epilog! #if defined(__POWERPC__) || defined (__ppc__) || defined(_POWER) @@ -141,12 +147,12 @@ "lfd f10, 72(%0)\n" "lfd f11, 80(%0)\n" "lfd f12, 88(%0)\n" "lfd f13, 96(%0)\n" "lmw r3, 0(%1)\n" // Load all integer regs - "lwz r0,4(r1)\n" // Get CameFromOrig (LR into stub) + "lwz r0,4(r1)\n" // Get OrigCallAddrPlus4 (LR value when stub was called) "mtlr r0\n" // Put it in the LR register "lwz r0,8(r1)\n" // Get target function pointer "mtctr r0\n" // Put it into the CTR register - "lwz r1,0(r1)\n" // Pop two frames off - "bctr\n" :: // Return to stub! + "lwz r1,0(r1)\n" // Pop three frames off + "bctr\n" :: // Call target function "b" (FRR), "b" (IRR)); #endif } From natebegeman at mac.com Tue Apr 25 12:46:58 2006 From: natebegeman at mac.com (Nate Begeman) Date: Tue, 25 Apr 2006 12:46:58 -0500 Subject: [llvm-commits] CVS: llvm/lib/ExecutionEngine/JIT/JITEmitter.cpp Message-ID: <200604251746.MAA11967@zion.cs.uiuc.edu> Changes in directory llvm/lib/ExecutionEngine/JIT: JITEmitter.cpp updated: 1.83 -> 1.84 --- Log message: Fix a warning --- Diffs of the changes: (+1 -1) JITEmitter.cpp | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) Index: llvm/lib/ExecutionEngine/JIT/JITEmitter.cpp diff -u llvm/lib/ExecutionEngine/JIT/JITEmitter.cpp:1.83 llvm/lib/ExecutionEngine/JIT/JITEmitter.cpp:1.84 --- llvm/lib/ExecutionEngine/JIT/JITEmitter.cpp:1.83 Sat Apr 22 13:53:45 2006 +++ llvm/lib/ExecutionEngine/JIT/JITEmitter.cpp Tue Apr 25 12:46:32 2006 @@ -699,7 +699,7 @@ for (unsigned i = 0; i < Index; ++i) Offset += JT[i].MBBs.size() * EntrySize; - return (uint64_t)((char *)JumpTableBase + Offset); + return (intptr_t)((char *)JumpTableBase + Offset); } unsigned char* JITEmitter::allocateGlobal(unsigned size, unsigned alignment) From evan.cheng at apple.com Tue Apr 25 12:48:54 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Tue, 25 Apr 2006 12:48:54 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86InstrSSE.td Message-ID: <200604251748.MAA11985@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86InstrSSE.td updated: 1.113 -> 1.114 --- Log message: Fix a typo. --- Diffs of the changes: (+1 -1) X86InstrSSE.td | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) Index: llvm/lib/Target/X86/X86InstrSSE.td diff -u llvm/lib/Target/X86/X86InstrSSE.td:1.113 llvm/lib/Target/X86/X86InstrSSE.td:1.114 --- llvm/lib/Target/X86/X86InstrSSE.td:1.113 Mon Apr 24 19:50:01 2006 +++ llvm/lib/Target/X86/X86InstrSSE.td Tue Apr 25 12:48:41 2006 @@ -1988,7 +1988,7 @@ UNPCKH_shuffle_mask)))]>; def PUNPCKHQDQrr : PDI<0x6D, MRMSrcReg, (ops VR128:$dst, VR128:$src1, VR128:$src2), - "punpckhdq {$src2, $dst|$dst, $src2}", + "punpckhqdq {$src2, $dst|$dst, $src2}", [(set VR128:$dst, (v2i64 (vector_shuffle VR128:$src1, VR128:$src2, UNPCKH_shuffle_mask)))]>; From alenhar2 at cs.uiuc.edu Tue Apr 25 14:28:08 2006 From: alenhar2 at cs.uiuc.edu (Andrew Lenharth) Date: Tue, 25 Apr 2006 14:28:08 -0500 Subject: [llvm-commits] CVS: llvm/test/Regression/Analysis/DSGraph/2006-04-25-ZeroArrayStructUse.ll Message-ID: <200604251928.OAA12950@zion.cs.uiuc.edu> Changes in directory llvm/test/Regression/Analysis/DSGraph: 2006-04-25-ZeroArrayStructUse.ll added (r1.1) --- Log message: another c99 style problem --- Diffs of the changes: (+22 -0) 2006-04-25-ZeroArrayStructUse.ll | 22 ++++++++++++++++++++++ 1 files changed, 22 insertions(+) Index: llvm/test/Regression/Analysis/DSGraph/2006-04-25-ZeroArrayStructUse.ll diff -c /dev/null llvm/test/Regression/Analysis/DSGraph/2006-04-25-ZeroArrayStructUse.ll:1.1 *** /dev/null Tue Apr 25 14:28:06 2006 --- llvm/test/Regression/Analysis/DSGraph/2006-04-25-ZeroArrayStructUse.ll Tue Apr 25 14:27:56 2006 *************** *** 0 **** --- 1,22 ---- + ; RUN: analyze %s -datastructure-gc -dsgc-check-flags=x:IA + + ; ModuleID = 'bug3.bc' + target endian = little + target pointersize = 32 + target triple = "i686-pc-linux-gnu" + + + %struct.c99 = type { + uint, + uint, + [0 x sbyte*] } + + implementation ; Functions: + + + void %foo(%struct.c99* %x) { + entry: + %B1 = getelementptr %struct.c99* %x, long 0, uint 2, uint 1 + %B2 = getelementptr %struct.c99* %x, long 0, uint 2, uint 2 + ret void + } From alenhar2 at cs.uiuc.edu Tue Apr 25 14:33:35 2006 From: alenhar2 at cs.uiuc.edu (Andrew Lenharth) Date: Tue, 25 Apr 2006 14:33:35 -0500 Subject: [llvm-commits] CVS: llvm/lib/Analysis/DataStructure/Local.cpp Message-ID: <200604251933.OAA12990@zion.cs.uiuc.edu> Changes in directory llvm/lib/Analysis/DataStructure: Local.cpp updated: 1.150 -> 1.151 --- Log message: better c99 struct handling --- Diffs of the changes: (+1 -2) Local.cpp | 3 +-- 1 files changed, 1 insertion(+), 2 deletions(-) Index: llvm/lib/Analysis/DataStructure/Local.cpp diff -u llvm/lib/Analysis/DataStructure/Local.cpp:1.150 llvm/lib/Analysis/DataStructure/Local.cpp:1.151 --- llvm/lib/Analysis/DataStructure/Local.cpp:1.150 Wed Apr 19 10:34:02 2006 +++ llvm/lib/Analysis/DataStructure/Local.cpp Tue Apr 25 14:33:23 2006 @@ -1122,13 +1122,12 @@ for (unsigned i = 0, e = CS->getNumOperands(); i != e; ++i) { DSNode *NHN = NH.getNode(); //Some programmers think ending a structure with a [0 x sbyte] is cute - //This should be ok as the allocation type should grow this type when - //it is merged in if it is bigger. if (SL->MemberOffsets[i] < SL->StructSize) { DSNodeHandle NewNH(NHN, NH.getOffset()+(unsigned)SL->MemberOffsets[i]); MergeConstantInitIntoNode(NewNH, cast(CS->getOperand(i))); } else if (SL->MemberOffsets[i] == SL->StructSize) { DEBUG(std::cerr << "Zero size element at end of struct\n"); + NHN->foldNodeCompletely(); } else { assert(0 && "type was smaller than offsets of of struct layout indicate"); } From alenhar2 at cs.uiuc.edu Tue Apr 25 14:33:54 2006 From: alenhar2 at cs.uiuc.edu (Andrew Lenharth) Date: Tue, 25 Apr 2006 14:33:54 -0500 Subject: [llvm-commits] CVS: llvm/lib/Analysis/DataStructure/GraphChecker.cpp Message-ID: <200604251933.OAA13010@zion.cs.uiuc.edu> Changes in directory llvm/lib/Analysis/DataStructure: GraphChecker.cpp updated: 1.19 -> 1.20 --- Log message: slightly more useful error message --- Diffs of the changes: (+3 -2) GraphChecker.cpp | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) Index: llvm/lib/Analysis/DataStructure/GraphChecker.cpp diff -u llvm/lib/Analysis/DataStructure/GraphChecker.cpp:1.19 llvm/lib/Analysis/DataStructure/GraphChecker.cpp:1.20 --- llvm/lib/Analysis/DataStructure/GraphChecker.cpp:1.19 Sun Oct 23 19:38:38 2005 +++ llvm/lib/Analysis/DataStructure/GraphChecker.cpp Tue Apr 25 14:33:41 2006 @@ -183,8 +183,9 @@ } if (CheckFlagsM.count(Name) && CheckFlagsM[Name] != N->getNodeFlags()) { - std::cerr << "Node flags are not as expected for node: " << Name - << "\n"; + std::cerr << "Node flags are not as expected for node: " << Name + << " (" << CheckFlagsM[Name] << ":" <getNodeFlags() + << ")\n"; N->print(std::cerr, &G); abort(); } From evan.cheng at apple.com Tue Apr 25 15:14:05 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Tue, 25 Apr 2006 15:14:05 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86ISelLowering.cpp X86ISelLowering.h Message-ID: <200604252014.PAA13267@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86ISelLowering.cpp updated: 1.186 -> 1.187 X86ISelLowering.h updated: 1.58 -> 1.59 --- Log message: Separate LowerOperation() into multiple functions, one per opcode. --- Diffs of the changes: (+1211 -1133) X86ISelLowering.cpp | 2308 ++++++++++++++++++++++++++-------------------------- X86ISelLowering.h | 36 2 files changed, 1211 insertions(+), 1133 deletions(-) Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.186 llvm/lib/Target/X86/X86ISelLowering.cpp:1.187 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.186 Mon Apr 24 17:58:52 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Tue Apr 25 15:13:52 2006 @@ -2226,14 +2226,534 @@ return V; } -/// LowerOperation - Provide custom lowering hooks for some operations. -/// -SDOperand X86TargetLowering::LowerOperation(SDOperand Op, SelectionDAG &DAG) { - switch (Op.getOpcode()) { - default: assert(0 && "Should not custom lower this!"); - case ISD::SHL_PARTS: - case ISD::SRA_PARTS: - case ISD::SRL_PARTS: { +SDOperand +X86TargetLowering::LowerBUILD_VECTOR(SDOperand Op, SelectionDAG &DAG) { + // All zero's are handled with pxor. + if (ISD::isBuildVectorAllZeros(Op.Val)) + return Op; + + // All one's are handled with pcmpeqd. + if (ISD::isBuildVectorAllOnes(Op.Val)) + return Op; + + MVT::ValueType VT = Op.getValueType(); + MVT::ValueType EVT = MVT::getVectorBaseType(VT); + unsigned EVTBits = MVT::getSizeInBits(EVT); + + unsigned NumElems = Op.getNumOperands(); + unsigned NumZero = 0; + unsigned NumNonZero = 0; + unsigned NonZeros = 0; + std::set Values; + for (unsigned i = 0; i < NumElems; ++i) { + SDOperand Elt = Op.getOperand(i); + if (Elt.getOpcode() != ISD::UNDEF) { + Values.insert(Elt); + if (isZeroNode(Elt)) + NumZero++; + else { + NonZeros |= (1 << i); + NumNonZero++; + } + } + } + + if (NumNonZero == 0) + // Must be a mix of zero and undef. Return a zero vector. + return getZeroVector(VT, DAG); + + // Splat is obviously ok. Let legalizer expand it to a shuffle. + if (Values.size() == 1) + return SDOperand(); + + // Special case for single non-zero element. + if (NumNonZero == 1) { + unsigned Idx = CountTrailingZeros_32(NonZeros); + SDOperand Item = Op.getOperand(Idx); + Item = DAG.getNode(ISD::SCALAR_TO_VECTOR, VT, Item); + if (Idx == 0) + // Turn it into a MOVL (i.e. movss, movsd, or movd) to a zero vector. + return getShuffleVectorZeroOrUndef(Item, VT, NumElems, Idx, + NumZero > 0, DAG); + + if (EVTBits == 32) { + // Turn it into a shuffle of zero and zero-extended scalar to vector. + Item = getShuffleVectorZeroOrUndef(Item, VT, NumElems, 0, NumZero > 0, + DAG); + MVT::ValueType MaskVT = MVT::getIntVectorWithNumElements(NumElems); + MVT::ValueType MaskEVT = MVT::getVectorBaseType(MaskVT); + std::vector MaskVec; + for (unsigned i = 0; i < NumElems; i++) + MaskVec.push_back(DAG.getConstant((i == Idx) ? 0 : 1, MaskEVT)); + SDOperand Mask = DAG.getNode(ISD::BUILD_VECTOR, MaskVT, MaskVec); + return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, Item, + DAG.getNode(ISD::UNDEF, VT), Mask); + } + } + + // Let legalizer expand 2-widde build_vector's. + if (EVTBits == 64) + return SDOperand(); + + // If element VT is < 32 bits, convert it to inserts into a zero vector. + if (EVTBits == 8) { + SDOperand V = LowerBuildVectorv16i8(Op, NonZeros,NumNonZero,NumZero, DAG); + if (V.Val) return V; + } + + if (EVTBits == 16) { + SDOperand V = LowerBuildVectorv8i16(Op, NonZeros,NumNonZero,NumZero, DAG); + if (V.Val) return V; + } + + // If element VT is == 32 bits, turn it into a number of shuffles. + std::vector V(NumElems); + if (NumElems == 4 && NumZero > 0) { + for (unsigned i = 0; i < 4; ++i) { + bool isZero = !(NonZeros & (1 << i)); + if (isZero) + V[i] = getZeroVector(VT, DAG); + else + V[i] = DAG.getNode(ISD::SCALAR_TO_VECTOR, VT, Op.getOperand(i)); + } + + for (unsigned i = 0; i < 2; ++i) { + switch ((NonZeros & (0x3 << i*2)) >> (i*2)) { + default: break; + case 0: + V[i] = V[i*2]; // Must be a zero vector. + break; + case 1: + V[i] = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V[i*2+1], V[i*2], + getMOVLMask(NumElems, DAG)); + break; + case 2: + V[i] = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V[i*2], V[i*2+1], + getMOVLMask(NumElems, DAG)); + break; + case 3: + V[i] = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V[i*2], V[i*2+1], + getUnpacklMask(NumElems, DAG)); + break; + } + } + + // Take advantage of the fact R32 to VR128 scalar_to_vector (i.e. movd) + // clears the upper bits. + // FIXME: we can do the same for v4f32 case when we know both parts of + // the lower half come from scalar_to_vector (loadf32). We should do + // that in post legalizer dag combiner with target specific hooks. + if (MVT::isInteger(EVT) && (NonZeros & (0x3 << 2)) == 0) + return V[0]; + MVT::ValueType MaskVT = MVT::getIntVectorWithNumElements(NumElems); + MVT::ValueType EVT = MVT::getVectorBaseType(MaskVT); + std::vector MaskVec; + bool Reverse = (NonZeros & 0x3) == 2; + for (unsigned i = 0; i < 2; ++i) + if (Reverse) + MaskVec.push_back(DAG.getConstant(1-i, EVT)); + else + MaskVec.push_back(DAG.getConstant(i, EVT)); + Reverse = ((NonZeros & (0x3 << 2)) >> 2) == 2; + for (unsigned i = 0; i < 2; ++i) + if (Reverse) + MaskVec.push_back(DAG.getConstant(1-i+NumElems, EVT)); + else + MaskVec.push_back(DAG.getConstant(i+NumElems, EVT)); + SDOperand ShufMask = DAG.getNode(ISD::BUILD_VECTOR, MaskVT, MaskVec); + return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V[0], V[1], ShufMask); + } + + if (Values.size() > 2) { + // Expand into a number of unpckl*. + // e.g. for v4f32 + // Step 1: unpcklps 0, 2 ==> X: + // : unpcklps 1, 3 ==> Y: + // Step 2: unpcklps X, Y ==> <3, 2, 1, 0> + SDOperand UnpckMask = getUnpacklMask(NumElems, DAG); + for (unsigned i = 0; i < NumElems; ++i) + V[i] = DAG.getNode(ISD::SCALAR_TO_VECTOR, VT, Op.getOperand(i)); + NumElems >>= 1; + while (NumElems != 0) { + for (unsigned i = 0; i < NumElems; ++i) + V[i] = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V[i], V[i + NumElems], + UnpckMask); + NumElems >>= 1; + } + return V[0]; + } + + return SDOperand(); +} + +SDOperand +X86TargetLowering::LowerVECTOR_SHUFFLE(SDOperand Op, SelectionDAG &DAG) { + SDOperand V1 = Op.getOperand(0); + SDOperand V2 = Op.getOperand(1); + SDOperand PermMask = Op.getOperand(2); + MVT::ValueType VT = Op.getValueType(); + unsigned NumElems = PermMask.getNumOperands(); + bool V1IsUndef = V1.getOpcode() == ISD::UNDEF; + bool V2IsUndef = V2.getOpcode() == ISD::UNDEF; + + if (isSplatMask(PermMask.Val)) { + if (NumElems <= 4) return Op; + // Promote it to a v4i32 splat. + return PromoteSplat(Op, DAG); + } + + if (X86::isMOVLMask(PermMask.Val)) + return (V1IsUndef) ? V2 : Op; + + if (X86::isMOVSHDUPMask(PermMask.Val) || + X86::isMOVSLDUPMask(PermMask.Val) || + X86::isMOVHLPSMask(PermMask.Val) || + X86::isMOVHPMask(PermMask.Val) || + X86::isMOVLPMask(PermMask.Val)) + return Op; + + if (ShouldXformToMOVHLPS(PermMask.Val) || + ShouldXformToMOVLP(V1.Val, PermMask.Val)) + return CommuteVectorShuffle(Op, DAG); + + bool V1IsSplat = isSplatVector(V1.Val) || V1.getOpcode() == ISD::UNDEF; + bool V2IsSplat = isSplatVector(V2.Val) || V2.getOpcode() == ISD::UNDEF; + if (V1IsSplat && !V2IsSplat) { + Op = CommuteVectorShuffle(Op, DAG); + V1 = Op.getOperand(0); + V2 = Op.getOperand(1); + PermMask = Op.getOperand(2); + V2IsSplat = true; + } + + if (isCommutedMOVL(PermMask.Val, V2IsSplat)) { + if (V2IsUndef) return V1; + Op = CommuteVectorShuffle(Op, DAG); + V1 = Op.getOperand(0); + V2 = Op.getOperand(1); + PermMask = Op.getOperand(2); + if (V2IsSplat) { + // V2 is a splat, so the mask may be malformed. That is, it may point + // to any V2 element. The instruction selectior won't like this. Get + // a corrected mask and commute to form a proper MOVS{S|D}. + SDOperand NewMask = getMOVLMask(NumElems, DAG); + if (NewMask.Val != PermMask.Val) + Op = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V2, NewMask); + } + return Op; + } + + if (X86::isUNPCKL_v_undef_Mask(PermMask.Val) || + X86::isUNPCKLMask(PermMask.Val) || + X86::isUNPCKHMask(PermMask.Val)) + return Op; + + if (V2IsSplat) { + // Normalize mask so all entries that point to V2 points to its first + // element then try to match unpck{h|l} again. If match, return a + // new vector_shuffle with the corrected mask. + SDOperand NewMask = NormalizeMask(PermMask, DAG); + if (NewMask.Val != PermMask.Val) { + if (X86::isUNPCKLMask(PermMask.Val, true)) { + SDOperand NewMask = getUnpacklMask(NumElems, DAG); + return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V2, NewMask); + } else if (X86::isUNPCKHMask(PermMask.Val, true)) { + SDOperand NewMask = getUnpackhMask(NumElems, DAG); + return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V2, NewMask); + } + } + } + + // Normalize the node to match x86 shuffle ops if needed + if (V2.getOpcode() != ISD::UNDEF) + if (isCommutedSHUFP(PermMask.Val)) { + Op = CommuteVectorShuffle(Op, DAG); + V1 = Op.getOperand(0); + V2 = Op.getOperand(1); + PermMask = Op.getOperand(2); + } + + // If VT is integer, try PSHUF* first, then SHUFP*. + if (MVT::isInteger(VT)) { + if (X86::isPSHUFDMask(PermMask.Val) || + X86::isPSHUFHWMask(PermMask.Val) || + X86::isPSHUFLWMask(PermMask.Val)) { + if (V2.getOpcode() != ISD::UNDEF) + return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, + DAG.getNode(ISD::UNDEF, V1.getValueType()),PermMask); + return Op; + } + + if (X86::isSHUFPMask(PermMask.Val)) + return Op; + + // Handle v8i16 shuffle high / low shuffle node pair. + if (VT == MVT::v8i16 && isPSHUFHW_PSHUFLWMask(PermMask.Val)) { + MVT::ValueType MaskVT = MVT::getIntVectorWithNumElements(NumElems); + MVT::ValueType BaseVT = MVT::getVectorBaseType(MaskVT); + std::vector MaskVec; + for (unsigned i = 0; i != 4; ++i) + MaskVec.push_back(PermMask.getOperand(i)); + for (unsigned i = 4; i != 8; ++i) + MaskVec.push_back(DAG.getConstant(i, BaseVT)); + SDOperand Mask = DAG.getNode(ISD::BUILD_VECTOR, MaskVT, MaskVec); + V1 = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V2, Mask); + MaskVec.clear(); + for (unsigned i = 0; i != 4; ++i) + MaskVec.push_back(DAG.getConstant(i, BaseVT)); + for (unsigned i = 4; i != 8; ++i) + MaskVec.push_back(PermMask.getOperand(i)); + Mask = DAG.getNode(ISD::BUILD_VECTOR, MaskVT, MaskVec); + return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V2, Mask); + } + } else { + // Floating point cases in the other order. + if (X86::isSHUFPMask(PermMask.Val)) + return Op; + if (X86::isPSHUFDMask(PermMask.Val) || + X86::isPSHUFHWMask(PermMask.Val) || + X86::isPSHUFLWMask(PermMask.Val)) { + if (V2.getOpcode() != ISD::UNDEF) + return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, + DAG.getNode(ISD::UNDEF, V1.getValueType()),PermMask); + return Op; + } + } + + if (NumElems == 4) { + // Break it into (shuffle shuffle_hi, shuffle_lo). + MVT::ValueType MaskVT = PermMask.getValueType(); + MVT::ValueType MaskEVT = MVT::getVectorBaseType(MaskVT); + std::map > Locs; + std::vector LoMask(NumElems, DAG.getNode(ISD::UNDEF, MaskEVT)); + std::vector HiMask(NumElems, DAG.getNode(ISD::UNDEF, MaskEVT)); + std::vector *MaskPtr = &LoMask; + unsigned MaskIdx = 0; + unsigned LoIdx = 0; + unsigned HiIdx = NumElems/2; + for (unsigned i = 0; i != NumElems; ++i) { + if (i == NumElems/2) { + MaskPtr = &HiMask; + MaskIdx = 1; + LoIdx = 0; + HiIdx = NumElems/2; + } + SDOperand Elt = PermMask.getOperand(i); + if (Elt.getOpcode() == ISD::UNDEF) { + Locs[i] = std::make_pair(-1, -1); + } else if (cast(Elt)->getValue() < NumElems) { + Locs[i] = std::make_pair(MaskIdx, LoIdx); + (*MaskPtr)[LoIdx] = Elt; + LoIdx++; + } else { + Locs[i] = std::make_pair(MaskIdx, HiIdx); + (*MaskPtr)[HiIdx] = Elt; + HiIdx++; + } + } + + SDOperand LoShuffle = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V2, + DAG.getNode(ISD::BUILD_VECTOR, MaskVT, LoMask)); + SDOperand HiShuffle = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V2, + DAG.getNode(ISD::BUILD_VECTOR, MaskVT, HiMask)); + std::vector MaskOps; + for (unsigned i = 0; i != NumElems; ++i) { + if (Locs[i].first == -1) { + MaskOps.push_back(DAG.getNode(ISD::UNDEF, MaskEVT)); + } else { + unsigned Idx = Locs[i].first * NumElems + Locs[i].second; + MaskOps.push_back(DAG.getConstant(Idx, MaskEVT)); + } + } + return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, LoShuffle, HiShuffle, + DAG.getNode(ISD::BUILD_VECTOR, MaskVT, MaskOps)); + } + + return SDOperand(); +} + +SDOperand +X86TargetLowering::LowerEXTRACT_VECTOR_ELT(SDOperand Op, SelectionDAG &DAG) { + if (!isa(Op.getOperand(1))) + return SDOperand(); + + MVT::ValueType VT = Op.getValueType(); + // TODO: handle v16i8. + if (MVT::getSizeInBits(VT) == 16) { + // Transform it so it match pextrw which produces a 32-bit result. + MVT::ValueType EVT = (MVT::ValueType)(VT+1); + SDOperand Extract = DAG.getNode(X86ISD::PEXTRW, EVT, + Op.getOperand(0), Op.getOperand(1)); + SDOperand Assert = DAG.getNode(ISD::AssertZext, EVT, Extract, + DAG.getValueType(VT)); + return DAG.getNode(ISD::TRUNCATE, VT, Assert); + } else if (MVT::getSizeInBits(VT) == 32) { + SDOperand Vec = Op.getOperand(0); + unsigned Idx = cast(Op.getOperand(1))->getValue(); + if (Idx == 0) + return Op; + + // SHUFPS the element to the lowest double word, then movss. + MVT::ValueType MaskVT = MVT::getIntVectorWithNumElements(4); + SDOperand IdxNode = DAG.getConstant((Idx < 2) ? Idx : Idx+4, + MVT::getVectorBaseType(MaskVT)); + std::vector IdxVec; + IdxVec.push_back(DAG.getConstant(Idx, MVT::getVectorBaseType(MaskVT))); + IdxVec.push_back(DAG.getNode(ISD::UNDEF, MVT::getVectorBaseType(MaskVT))); + IdxVec.push_back(DAG.getNode(ISD::UNDEF, MVT::getVectorBaseType(MaskVT))); + IdxVec.push_back(DAG.getNode(ISD::UNDEF, MVT::getVectorBaseType(MaskVT))); + SDOperand Mask = DAG.getNode(ISD::BUILD_VECTOR, MaskVT, IdxVec); + Vec = DAG.getNode(ISD::VECTOR_SHUFFLE, Vec.getValueType(), + Vec, Vec, Mask); + return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, VT, Vec, + DAG.getConstant(0, MVT::i32)); + } else if (MVT::getSizeInBits(VT) == 64) { + SDOperand Vec = Op.getOperand(0); + unsigned Idx = cast(Op.getOperand(1))->getValue(); + if (Idx == 0) + return Op; + + // UNPCKHPD the element to the lowest double word, then movsd. + // Note if the lower 64 bits of the result of the UNPCKHPD is then stored + // to a f64mem, the whole operation is folded into a single MOVHPDmr. + MVT::ValueType MaskVT = MVT::getIntVectorWithNumElements(4); + std::vector IdxVec; + IdxVec.push_back(DAG.getConstant(1, MVT::getVectorBaseType(MaskVT))); + IdxVec.push_back(DAG.getNode(ISD::UNDEF, MVT::getVectorBaseType(MaskVT))); + SDOperand Mask = DAG.getNode(ISD::BUILD_VECTOR, MaskVT, IdxVec); + Vec = DAG.getNode(ISD::VECTOR_SHUFFLE, Vec.getValueType(), + Vec, DAG.getNode(ISD::UNDEF, Vec.getValueType()), Mask); + return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, VT, Vec, + DAG.getConstant(0, MVT::i32)); + } + + return SDOperand(); +} + +SDOperand +X86TargetLowering::LowerINSERT_VECTOR_ELT(SDOperand Op, SelectionDAG &DAG) { + // Transform it so it match pinsrw which expects a 16-bit value in a R32 + // as its second argument. + MVT::ValueType VT = Op.getValueType(); + MVT::ValueType BaseVT = MVT::getVectorBaseType(VT); + SDOperand N0 = Op.getOperand(0); + SDOperand N1 = Op.getOperand(1); + SDOperand N2 = Op.getOperand(2); + if (MVT::getSizeInBits(BaseVT) == 16) { + if (N1.getValueType() != MVT::i32) + N1 = DAG.getNode(ISD::ANY_EXTEND, MVT::i32, N1); + if (N2.getValueType() != MVT::i32) + N2 = DAG.getConstant(cast(N2)->getValue(), MVT::i32); + return DAG.getNode(X86ISD::PINSRW, VT, N0, N1, N2); + } else if (MVT::getSizeInBits(BaseVT) == 32) { + unsigned Idx = cast(N2)->getValue(); + if (Idx == 0) { + // Use a movss. + N1 = DAG.getNode(ISD::SCALAR_TO_VECTOR, VT, N1); + MVT::ValueType MaskVT = MVT::getIntVectorWithNumElements(4); + MVT::ValueType BaseVT = MVT::getVectorBaseType(MaskVT); + std::vector MaskVec; + MaskVec.push_back(DAG.getConstant(4, BaseVT)); + for (unsigned i = 1; i <= 3; ++i) + MaskVec.push_back(DAG.getConstant(i, BaseVT)); + return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, N0, N1, + DAG.getNode(ISD::BUILD_VECTOR, MaskVT, MaskVec)); + } else { + // Use two pinsrw instructions to insert a 32 bit value. + Idx <<= 1; + if (MVT::isFloatingPoint(N1.getValueType())) { + if (N1.getOpcode() == ISD::LOAD) { + // Just load directly from f32mem to R32. + N1 = DAG.getLoad(MVT::i32, N1.getOperand(0), N1.getOperand(1), + N1.getOperand(2)); + } else { + N1 = DAG.getNode(ISD::SCALAR_TO_VECTOR, MVT::v4f32, N1); + N1 = DAG.getNode(ISD::BIT_CONVERT, MVT::v4i32, N1); + N1 = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, MVT::i32, N1, + DAG.getConstant(0, MVT::i32)); + } + } + N0 = DAG.getNode(ISD::BIT_CONVERT, MVT::v8i16, N0); + N0 = DAG.getNode(X86ISD::PINSRW, MVT::v8i16, N0, N1, + DAG.getConstant(Idx, MVT::i32)); + N1 = DAG.getNode(ISD::SRL, MVT::i32, N1, DAG.getConstant(16, MVT::i8)); + N0 = DAG.getNode(X86ISD::PINSRW, MVT::v8i16, N0, N1, + DAG.getConstant(Idx+1, MVT::i32)); + return DAG.getNode(ISD::BIT_CONVERT, VT, N0); + } + } + + return SDOperand(); +} + +SDOperand +X86TargetLowering::LowerSCALAR_TO_VECTOR(SDOperand Op, SelectionDAG &DAG) { + SDOperand AnyExt = DAG.getNode(ISD::ANY_EXTEND, MVT::i32, Op.getOperand(0)); + return DAG.getNode(X86ISD::S2VEC, Op.getValueType(), AnyExt); +} + +// ConstantPool, JumpTable, GlobalAddress, and ExternalSymbol are lowered as +// their target countpart wrapped in the X86ISD::Wrapper node. Suppose N is +// one of the above mentioned nodes. It has to be wrapped because otherwise +// Select(N) returns N. So the raw TargetGlobalAddress nodes, etc. can only +// be used to form addressing mode. These wrapped nodes will be selected +// into MOV32ri. +SDOperand +X86TargetLowering::LowerConstantPool(SDOperand Op, SelectionDAG &DAG) { + ConstantPoolSDNode *CP = cast(Op); + SDOperand Result = DAG.getNode(X86ISD::Wrapper, getPointerTy(), + DAG.getTargetConstantPool(CP->get(), getPointerTy(), + CP->getAlignment())); + if (Subtarget->isTargetDarwin()) { + // With PIC, the address is actually $g + Offset. + if (getTargetMachine().getRelocationModel() == Reloc::PIC) + Result = DAG.getNode(ISD::ADD, getPointerTy(), + DAG.getNode(X86ISD::GlobalBaseReg, getPointerTy()), Result); + } + + return Result; +} + +SDOperand +X86TargetLowering::LowerGlobalAddress(SDOperand Op, SelectionDAG &DAG) { + GlobalValue *GV = cast(Op)->getGlobal(); + SDOperand Result = DAG.getNode(X86ISD::Wrapper, getPointerTy(), + DAG.getTargetGlobalAddress(GV, getPointerTy())); + if (Subtarget->isTargetDarwin()) { + // With PIC, the address is actually $g + Offset. + if (getTargetMachine().getRelocationModel() == Reloc::PIC) + Result = DAG.getNode(ISD::ADD, getPointerTy(), + DAG.getNode(X86ISD::GlobalBaseReg, getPointerTy()), Result); + + // For Darwin, external and weak symbols are indirect, so we want to load + // the value at address GV, not the value of GV itself. This means that + // the GlobalAddress must be in the base or index register of the address, + // not the GV offset field. + if (getTargetMachine().getRelocationModel() != Reloc::Static && + DarwinGVRequiresExtraLoad(GV)) + Result = DAG.getLoad(MVT::i32, DAG.getEntryNode(), + Result, DAG.getSrcValue(NULL)); + } + + return Result; +} + +SDOperand +X86TargetLowering::LowerExternalSymbol(SDOperand Op, SelectionDAG &DAG) { + const char *Sym = cast(Op)->getSymbol(); + SDOperand Result = DAG.getNode(X86ISD::Wrapper, getPointerTy(), + DAG.getTargetExternalSymbol(Sym, getPointerTy())); + if (Subtarget->isTargetDarwin()) { + // With PIC, the address is actually $g + Offset. + if (getTargetMachine().getRelocationModel() == Reloc::PIC) + Result = DAG.getNode(ISD::ADD, getPointerTy(), + DAG.getNode(X86ISD::GlobalBaseReg, getPointerTy()), Result); + } + + return Result; +} + +SDOperand X86TargetLowering::LowerShift(SDOperand Op, SelectionDAG &DAG) { assert(Op.getNumOperands() == 3 && Op.getValueType() == MVT::i32 && "Not an i64 shift!"); bool isSRA = Op.getOpcode() == ISD::SRA_PARTS; @@ -2300,188 +2820,176 @@ Ops.push_back(Lo); Ops.push_back(Hi); return DAG.getNode(ISD::MERGE_VALUES, Tys, Ops); - } - case ISD::SINT_TO_FP: { - assert(Op.getOperand(0).getValueType() <= MVT::i64 && - Op.getOperand(0).getValueType() >= MVT::i16 && - "Unknown SINT_TO_FP to lower!"); - - SDOperand Result; - MVT::ValueType SrcVT = Op.getOperand(0).getValueType(); - unsigned Size = MVT::getSizeInBits(SrcVT)/8; +} + +SDOperand X86TargetLowering::LowerSINT_TO_FP(SDOperand Op, SelectionDAG &DAG) { + assert(Op.getOperand(0).getValueType() <= MVT::i64 && + Op.getOperand(0).getValueType() >= MVT::i16 && + "Unknown SINT_TO_FP to lower!"); + + SDOperand Result; + MVT::ValueType SrcVT = Op.getOperand(0).getValueType(); + unsigned Size = MVT::getSizeInBits(SrcVT)/8; + MachineFunction &MF = DAG.getMachineFunction(); + int SSFI = MF.getFrameInfo()->CreateStackObject(Size, Size); + SDOperand StackSlot = DAG.getFrameIndex(SSFI, getPointerTy()); + SDOperand Chain = DAG.getNode(ISD::STORE, MVT::Other, + DAG.getEntryNode(), Op.getOperand(0), + StackSlot, DAG.getSrcValue(NULL)); + + // Build the FILD + std::vector Tys; + Tys.push_back(MVT::f64); + Tys.push_back(MVT::Other); + if (X86ScalarSSE) Tys.push_back(MVT::Flag); + std::vector Ops; + Ops.push_back(Chain); + Ops.push_back(StackSlot); + Ops.push_back(DAG.getValueType(SrcVT)); + Result = DAG.getNode(X86ScalarSSE ? X86ISD::FILD_FLAG :X86ISD::FILD, + Tys, Ops); + + if (X86ScalarSSE) { + Chain = Result.getValue(1); + SDOperand InFlag = Result.getValue(2); + + // FIXME: Currently the FST is flagged to the FILD_FLAG. This + // shouldn't be necessary except that RFP cannot be live across + // multiple blocks. When stackifier is fixed, they can be uncoupled. MachineFunction &MF = DAG.getMachineFunction(); - int SSFI = MF.getFrameInfo()->CreateStackObject(Size, Size); + int SSFI = MF.getFrameInfo()->CreateStackObject(8, 8); SDOperand StackSlot = DAG.getFrameIndex(SSFI, getPointerTy()); - SDOperand Chain = DAG.getNode(ISD::STORE, MVT::Other, - DAG.getEntryNode(), Op.getOperand(0), - StackSlot, DAG.getSrcValue(NULL)); - - // Build the FILD std::vector Tys; - Tys.push_back(MVT::f64); Tys.push_back(MVT::Other); - if (X86ScalarSSE) Tys.push_back(MVT::Flag); std::vector Ops; Ops.push_back(Chain); + Ops.push_back(Result); Ops.push_back(StackSlot); - Ops.push_back(DAG.getValueType(SrcVT)); - Result = DAG.getNode(X86ScalarSSE ? X86ISD::FILD_FLAG :X86ISD::FILD, - Tys, Ops); - - if (X86ScalarSSE) { - Chain = Result.getValue(1); - SDOperand InFlag = Result.getValue(2); - - // FIXME: Currently the FST is flagged to the FILD_FLAG. This - // shouldn't be necessary except that RFP cannot be live across - // multiple blocks. When stackifier is fixed, they can be uncoupled. - MachineFunction &MF = DAG.getMachineFunction(); - int SSFI = MF.getFrameInfo()->CreateStackObject(8, 8); - SDOperand StackSlot = DAG.getFrameIndex(SSFI, getPointerTy()); - std::vector Tys; - Tys.push_back(MVT::Other); - std::vector Ops; - Ops.push_back(Chain); - Ops.push_back(Result); - Ops.push_back(StackSlot); - Ops.push_back(DAG.getValueType(Op.getValueType())); - Ops.push_back(InFlag); - Chain = DAG.getNode(X86ISD::FST, Tys, Ops); - Result = DAG.getLoad(Op.getValueType(), Chain, StackSlot, - DAG.getSrcValue(NULL)); - } + Ops.push_back(DAG.getValueType(Op.getValueType())); + Ops.push_back(InFlag); + Chain = DAG.getNode(X86ISD::FST, Tys, Ops); + Result = DAG.getLoad(Op.getValueType(), Chain, StackSlot, + DAG.getSrcValue(NULL)); + } - return Result; - } - case ISD::FP_TO_SINT: { - assert(Op.getValueType() <= MVT::i64 && Op.getValueType() >= MVT::i16 && - "Unknown FP_TO_SINT to lower!"); - // We lower FP->sint64 into FISTP64, followed by a load, all to a temporary - // stack slot. - MachineFunction &MF = DAG.getMachineFunction(); - unsigned MemSize = MVT::getSizeInBits(Op.getValueType())/8; - int SSFI = MF.getFrameInfo()->CreateStackObject(MemSize, MemSize); - SDOperand StackSlot = DAG.getFrameIndex(SSFI, getPointerTy()); + return Result; +} + +SDOperand X86TargetLowering::LowerFP_TO_SINT(SDOperand Op, SelectionDAG &DAG) { + assert(Op.getValueType() <= MVT::i64 && Op.getValueType() >= MVT::i16 && + "Unknown FP_TO_SINT to lower!"); + // We lower FP->sint64 into FISTP64, followed by a load, all to a temporary + // stack slot. + MachineFunction &MF = DAG.getMachineFunction(); + unsigned MemSize = MVT::getSizeInBits(Op.getValueType())/8; + int SSFI = MF.getFrameInfo()->CreateStackObject(MemSize, MemSize); + SDOperand StackSlot = DAG.getFrameIndex(SSFI, getPointerTy()); - unsigned Opc; - switch (Op.getValueType()) { + unsigned Opc; + switch (Op.getValueType()) { default: assert(0 && "Invalid FP_TO_SINT to lower!"); case MVT::i16: Opc = X86ISD::FP_TO_INT16_IN_MEM; break; case MVT::i32: Opc = X86ISD::FP_TO_INT32_IN_MEM; break; case MVT::i64: Opc = X86ISD::FP_TO_INT64_IN_MEM; break; - } - - SDOperand Chain = DAG.getEntryNode(); - SDOperand Value = Op.getOperand(0); - if (X86ScalarSSE) { - assert(Op.getValueType() == MVT::i64 && "Invalid FP_TO_SINT to lower!"); - Chain = DAG.getNode(ISD::STORE, MVT::Other, Chain, Value, StackSlot, - DAG.getSrcValue(0)); - std::vector Tys; - Tys.push_back(MVT::f64); - Tys.push_back(MVT::Other); - std::vector Ops; - Ops.push_back(Chain); - Ops.push_back(StackSlot); - Ops.push_back(DAG.getValueType(Op.getOperand(0).getValueType())); - Value = DAG.getNode(X86ISD::FLD, Tys, Ops); - Chain = Value.getValue(1); - SSFI = MF.getFrameInfo()->CreateStackObject(MemSize, MemSize); - StackSlot = DAG.getFrameIndex(SSFI, getPointerTy()); - } + } - // Build the FP_TO_INT*_IN_MEM + SDOperand Chain = DAG.getEntryNode(); + SDOperand Value = Op.getOperand(0); + if (X86ScalarSSE) { + assert(Op.getValueType() == MVT::i64 && "Invalid FP_TO_SINT to lower!"); + Chain = DAG.getNode(ISD::STORE, MVT::Other, Chain, Value, StackSlot, + DAG.getSrcValue(0)); + std::vector Tys; + Tys.push_back(MVT::f64); + Tys.push_back(MVT::Other); std::vector Ops; Ops.push_back(Chain); - Ops.push_back(Value); Ops.push_back(StackSlot); - SDOperand FIST = DAG.getNode(Opc, MVT::Other, Ops); + Ops.push_back(DAG.getValueType(Op.getOperand(0).getValueType())); + Value = DAG.getNode(X86ISD::FLD, Tys, Ops); + Chain = Value.getValue(1); + SSFI = MF.getFrameInfo()->CreateStackObject(MemSize, MemSize); + StackSlot = DAG.getFrameIndex(SSFI, getPointerTy()); + } + + // Build the FP_TO_INT*_IN_MEM + std::vector Ops; + Ops.push_back(Chain); + Ops.push_back(Value); + Ops.push_back(StackSlot); + SDOperand FIST = DAG.getNode(Opc, MVT::Other, Ops); + + // Load the result. + return DAG.getLoad(Op.getValueType(), FIST, StackSlot, + DAG.getSrcValue(NULL)); +} - // Load the result. - return DAG.getLoad(Op.getValueType(), FIST, StackSlot, - DAG.getSrcValue(NULL)); - } - case ISD::READCYCLECOUNTER: { - std::vector Tys; - Tys.push_back(MVT::Other); - Tys.push_back(MVT::Flag); - std::vector Ops; - Ops.push_back(Op.getOperand(0)); - SDOperand rd = DAG.getNode(X86ISD::RDTSC_DAG, Tys, Ops); - Ops.clear(); - Ops.push_back(DAG.getCopyFromReg(rd, X86::EAX, MVT::i32, rd.getValue(1))); - Ops.push_back(DAG.getCopyFromReg(Ops[0].getValue(1), X86::EDX, - MVT::i32, Ops[0].getValue(2))); - Ops.push_back(Ops[1].getValue(1)); - Tys[0] = Tys[1] = MVT::i32; - Tys.push_back(MVT::Other); - return DAG.getNode(ISD::MERGE_VALUES, Tys, Ops); - } - case ISD::FABS: { - MVT::ValueType VT = Op.getValueType(); - const Type *OpNTy = MVT::getTypeForValueType(VT); - std::vector CV; - if (VT == MVT::f64) { - CV.push_back(ConstantFP::get(OpNTy, BitsToDouble(~(1ULL << 63)))); - CV.push_back(ConstantFP::get(OpNTy, 0.0)); - } else { - CV.push_back(ConstantFP::get(OpNTy, BitsToFloat(~(1U << 31)))); - CV.push_back(ConstantFP::get(OpNTy, 0.0)); - CV.push_back(ConstantFP::get(OpNTy, 0.0)); - CV.push_back(ConstantFP::get(OpNTy, 0.0)); - } - Constant *CS = ConstantStruct::get(CV); - SDOperand CPIdx = DAG.getConstantPool(CS, getPointerTy(), 4); - SDOperand Mask - = DAG.getNode(X86ISD::LOAD_PACK, - VT, DAG.getEntryNode(), CPIdx, DAG.getSrcValue(NULL)); - return DAG.getNode(X86ISD::FAND, VT, Op.getOperand(0), Mask); - } - case ISD::FNEG: { - MVT::ValueType VT = Op.getValueType(); - const Type *OpNTy = MVT::getTypeForValueType(VT); - std::vector CV; - if (VT == MVT::f64) { - CV.push_back(ConstantFP::get(OpNTy, BitsToDouble(1ULL << 63))); - CV.push_back(ConstantFP::get(OpNTy, 0.0)); - } else { - CV.push_back(ConstantFP::get(OpNTy, BitsToFloat(1U << 31))); - CV.push_back(ConstantFP::get(OpNTy, 0.0)); - CV.push_back(ConstantFP::get(OpNTy, 0.0)); - CV.push_back(ConstantFP::get(OpNTy, 0.0)); - } - Constant *CS = ConstantStruct::get(CV); - SDOperand CPIdx = DAG.getConstantPool(CS, getPointerTy(), 4); - SDOperand Mask - = DAG.getNode(X86ISD::LOAD_PACK, - VT, DAG.getEntryNode(), CPIdx, DAG.getSrcValue(NULL)); - return DAG.getNode(X86ISD::FXOR, VT, Op.getOperand(0), Mask); - } - case ISD::SETCC: { - assert(Op.getValueType() == MVT::i8 && "SetCC type must be 8-bit integer"); - SDOperand Cond; - SDOperand CC = Op.getOperand(2); - ISD::CondCode SetCCOpcode = cast(CC)->get(); - bool isFP = MVT::isFloatingPoint(Op.getOperand(1).getValueType()); - bool Flip; - unsigned X86CC; - if (translateX86CC(CC, isFP, X86CC, Flip)) { - if (Flip) - Cond = DAG.getNode(X86ISD::CMP, MVT::Flag, - Op.getOperand(1), Op.getOperand(0)); - else - Cond = DAG.getNode(X86ISD::CMP, MVT::Flag, - Op.getOperand(0), Op.getOperand(1)); - return DAG.getNode(X86ISD::SETCC, MVT::i8, - DAG.getConstant(X86CC, MVT::i8), Cond); - } else { - assert(isFP && "Illegal integer SetCC!"); +SDOperand X86TargetLowering::LowerFABS(SDOperand Op, SelectionDAG &DAG) { + MVT::ValueType VT = Op.getValueType(); + const Type *OpNTy = MVT::getTypeForValueType(VT); + std::vector CV; + if (VT == MVT::f64) { + CV.push_back(ConstantFP::get(OpNTy, BitsToDouble(~(1ULL << 63)))); + CV.push_back(ConstantFP::get(OpNTy, 0.0)); + } else { + CV.push_back(ConstantFP::get(OpNTy, BitsToFloat(~(1U << 31)))); + CV.push_back(ConstantFP::get(OpNTy, 0.0)); + CV.push_back(ConstantFP::get(OpNTy, 0.0)); + CV.push_back(ConstantFP::get(OpNTy, 0.0)); + } + Constant *CS = ConstantStruct::get(CV); + SDOperand CPIdx = DAG.getConstantPool(CS, getPointerTy(), 4); + SDOperand Mask + = DAG.getNode(X86ISD::LOAD_PACK, + VT, DAG.getEntryNode(), CPIdx, DAG.getSrcValue(NULL)); + return DAG.getNode(X86ISD::FAND, VT, Op.getOperand(0), Mask); +} +SDOperand X86TargetLowering::LowerFNEG(SDOperand Op, SelectionDAG &DAG) { + MVT::ValueType VT = Op.getValueType(); + const Type *OpNTy = MVT::getTypeForValueType(VT); + std::vector CV; + if (VT == MVT::f64) { + CV.push_back(ConstantFP::get(OpNTy, BitsToDouble(1ULL << 63))); + CV.push_back(ConstantFP::get(OpNTy, 0.0)); + } else { + CV.push_back(ConstantFP::get(OpNTy, BitsToFloat(1U << 31))); + CV.push_back(ConstantFP::get(OpNTy, 0.0)); + CV.push_back(ConstantFP::get(OpNTy, 0.0)); + CV.push_back(ConstantFP::get(OpNTy, 0.0)); + } + Constant *CS = ConstantStruct::get(CV); + SDOperand CPIdx = DAG.getConstantPool(CS, getPointerTy(), 4); + SDOperand Mask = DAG.getNode(X86ISD::LOAD_PACK, + VT, DAG.getEntryNode(), CPIdx, DAG.getSrcValue(NULL)); + return DAG.getNode(X86ISD::FXOR, VT, Op.getOperand(0), Mask); +} + +SDOperand X86TargetLowering::LowerSETCC(SDOperand Op, SelectionDAG &DAG) { + assert(Op.getValueType() == MVT::i8 && "SetCC type must be 8-bit integer"); + SDOperand Cond; + SDOperand CC = Op.getOperand(2); + ISD::CondCode SetCCOpcode = cast(CC)->get(); + bool isFP = MVT::isFloatingPoint(Op.getOperand(1).getValueType()); + bool Flip; + unsigned X86CC; + if (translateX86CC(CC, isFP, X86CC, Flip)) { + if (Flip) + Cond = DAG.getNode(X86ISD::CMP, MVT::Flag, + Op.getOperand(1), Op.getOperand(0)); + else Cond = DAG.getNode(X86ISD::CMP, MVT::Flag, Op.getOperand(0), Op.getOperand(1)); - std::vector Tys; - std::vector Ops; - switch (SetCCOpcode) { + return DAG.getNode(X86ISD::SETCC, MVT::i8, + DAG.getConstant(X86CC, MVT::i8), Cond); + } else { + assert(isFP && "Illegal integer SetCC!"); + + Cond = DAG.getNode(X86ISD::CMP, MVT::Flag, + Op.getOperand(0), Op.getOperand(1)); + std::vector Tys; + std::vector Ops; + switch (SetCCOpcode) { default: assert(false && "Illegal floating point SetCC!"); case ISD::SETOEQ: { // !PF & ZF Tys.push_back(MVT::i8); @@ -2505,453 +3013,140 @@ Tmp1.getValue(1)); return DAG.getNode(ISD::OR, MVT::i8, Tmp1, Tmp2); } - } } } - case ISD::SELECT: { - MVT::ValueType VT = Op.getValueType(); - bool isFPStack = MVT::isFloatingPoint(VT) && !X86ScalarSSE; - bool addTest = false; - SDOperand Op0 = Op.getOperand(0); - SDOperand Cond, CC; - if (Op0.getOpcode() == ISD::SETCC) - Op0 = LowerOperation(Op0, DAG); - - if (Op0.getOpcode() == X86ISD::SETCC) { - // If condition flag is set by a X86ISD::CMP, then make a copy of it - // (since flag operand cannot be shared). If the X86ISD::SETCC does not - // have another use it will be eliminated. - // If the X86ISD::SETCC has more than one use, then it's probably better - // to use a test instead of duplicating the X86ISD::CMP (for register - // pressure reason). - unsigned CmpOpc = Op0.getOperand(1).getOpcode(); - if (CmpOpc == X86ISD::CMP || CmpOpc == X86ISD::COMI || - CmpOpc == X86ISD::UCOMI) { - if (!Op0.hasOneUse()) { - std::vector Tys; - for (unsigned i = 0; i < Op0.Val->getNumValues(); ++i) - Tys.push_back(Op0.Val->getValueType(i)); - std::vector Ops; - for (unsigned i = 0; i < Op0.getNumOperands(); ++i) - Ops.push_back(Op0.getOperand(i)); - Op0 = DAG.getNode(X86ISD::SETCC, Tys, Ops); - } - - CC = Op0.getOperand(0); - Cond = Op0.getOperand(1); - // Make a copy as flag result cannot be used by more than one. - Cond = DAG.getNode(CmpOpc, MVT::Flag, - Cond.getOperand(0), Cond.getOperand(1)); - addTest = - isFPStack && !hasFPCMov(cast(CC)->getSignExtended()); - } else - addTest = true; - } else - addTest = true; - - if (addTest) { - CC = DAG.getConstant(X86ISD::COND_NE, MVT::i8); - Cond = DAG.getNode(X86ISD::TEST, MVT::Flag, Op0, Op0); - } - - std::vector Tys; - Tys.push_back(Op.getValueType()); - Tys.push_back(MVT::Flag); - std::vector Ops; - // X86ISD::CMOV means set the result (which is operand 1) to the RHS if - // condition is true. - Ops.push_back(Op.getOperand(2)); - Ops.push_back(Op.getOperand(1)); - Ops.push_back(CC); - Ops.push_back(Cond); - return DAG.getNode(X86ISD::CMOV, Tys, Ops); - } - case ISD::BRCOND: { - bool addTest = false; - SDOperand Cond = Op.getOperand(1); - SDOperand Dest = Op.getOperand(2); - SDOperand CC; - if (Cond.getOpcode() == ISD::SETCC) - Cond = LowerOperation(Cond, DAG); - - if (Cond.getOpcode() == X86ISD::SETCC) { - // If condition flag is set by a X86ISD::CMP, then make a copy of it - // (since flag operand cannot be shared). If the X86ISD::SETCC does not - // have another use it will be eliminated. - // If the X86ISD::SETCC has more than one use, then it's probably better - // to use a test instead of duplicating the X86ISD::CMP (for register - // pressure reason). - unsigned CmpOpc = Cond.getOperand(1).getOpcode(); - if (CmpOpc == X86ISD::CMP || CmpOpc == X86ISD::COMI || - CmpOpc == X86ISD::UCOMI) { - if (!Cond.hasOneUse()) { - std::vector Tys; - for (unsigned i = 0; i < Cond.Val->getNumValues(); ++i) - Tys.push_back(Cond.Val->getValueType(i)); - std::vector Ops; - for (unsigned i = 0; i < Cond.getNumOperands(); ++i) - Ops.push_back(Cond.getOperand(i)); - Cond = DAG.getNode(X86ISD::SETCC, Tys, Ops); - } - - CC = Cond.getOperand(0); - Cond = Cond.getOperand(1); - // Make a copy as flag result cannot be used by more than one. - Cond = DAG.getNode(CmpOpc, MVT::Flag, - Cond.getOperand(0), Cond.getOperand(1)); - } else - addTest = true; - } else - addTest = true; - - if (addTest) { - CC = DAG.getConstant(X86ISD::COND_NE, MVT::i8); - Cond = DAG.getNode(X86ISD::TEST, MVT::Flag, Cond, Cond); - } - return DAG.getNode(X86ISD::BRCOND, Op.getValueType(), - Op.getOperand(0), Op.getOperand(2), CC, Cond); - } - case ISD::MEMSET: { - SDOperand InFlag(0, 0); - SDOperand Chain = Op.getOperand(0); - unsigned Align = - (unsigned)cast(Op.getOperand(4))->getValue(); - if (Align == 0) Align = 1; - - ConstantSDNode *I = dyn_cast(Op.getOperand(3)); - // If not DWORD aligned, call memset if size is less than the threshold. - // It knows how to align to the right boundary first. - if ((Align & 3) != 0 || - (I && I->getValue() < Subtarget->getMinRepStrSizeThreshold())) { - MVT::ValueType IntPtr = getPointerTy(); - const Type *IntPtrTy = getTargetData().getIntPtrType(); - std::vector > Args; - Args.push_back(std::make_pair(Op.getOperand(1), IntPtrTy)); - // Extend the ubyte argument to be an int value for the call. - SDOperand Val = DAG.getNode(ISD::ZERO_EXTEND, MVT::i32, Op.getOperand(2)); - Args.push_back(std::make_pair(Val, IntPtrTy)); - Args.push_back(std::make_pair(Op.getOperand(3), IntPtrTy)); - std::pair CallResult = - LowerCallTo(Chain, Type::VoidTy, false, CallingConv::C, false, - DAG.getExternalSymbol("memset", IntPtr), Args, DAG); - return CallResult.second; - } - - MVT::ValueType AVT; - SDOperand Count; - ConstantSDNode *ValC = dyn_cast(Op.getOperand(2)); - unsigned BytesLeft = 0; - bool TwoRepStos = false; - if (ValC) { - unsigned ValReg; - unsigned Val = ValC->getValue() & 255; - - // If the value is a constant, then we can potentially use larger sets. - switch (Align & 3) { - case 2: // WORD aligned - AVT = MVT::i16; - Count = DAG.getConstant(I->getValue() / 2, MVT::i32); - BytesLeft = I->getValue() % 2; - Val = (Val << 8) | Val; - ValReg = X86::AX; - break; - case 0: // DWORD aligned - AVT = MVT::i32; - if (I) { - Count = DAG.getConstant(I->getValue() / 4, MVT::i32); - BytesLeft = I->getValue() % 4; - } else { - Count = DAG.getNode(ISD::SRL, MVT::i32, Op.getOperand(3), - DAG.getConstant(2, MVT::i8)); - TwoRepStos = true; - } - Val = (Val << 8) | Val; - Val = (Val << 16) | Val; - ValReg = X86::EAX; - break; - default: // Byte aligned - AVT = MVT::i8; - Count = Op.getOperand(3); - ValReg = X86::AL; - break; - } - - Chain = DAG.getCopyToReg(Chain, ValReg, DAG.getConstant(Val, AVT), - InFlag); - InFlag = Chain.getValue(1); - } else { - AVT = MVT::i8; - Count = Op.getOperand(3); - Chain = DAG.getCopyToReg(Chain, X86::AL, Op.getOperand(2), InFlag); - InFlag = Chain.getValue(1); - } - - Chain = DAG.getCopyToReg(Chain, X86::ECX, Count, InFlag); - InFlag = Chain.getValue(1); - Chain = DAG.getCopyToReg(Chain, X86::EDI, Op.getOperand(1), InFlag); - InFlag = Chain.getValue(1); - - std::vector Tys; - Tys.push_back(MVT::Other); - Tys.push_back(MVT::Flag); - std::vector Ops; - Ops.push_back(Chain); - Ops.push_back(DAG.getValueType(AVT)); - Ops.push_back(InFlag); - Chain = DAG.getNode(X86ISD::REP_STOS, Tys, Ops); - - if (TwoRepStos) { - InFlag = Chain.getValue(1); - Count = Op.getOperand(3); - MVT::ValueType CVT = Count.getValueType(); - SDOperand Left = DAG.getNode(ISD::AND, CVT, Count, - DAG.getConstant(3, CVT)); - Chain = DAG.getCopyToReg(Chain, X86::ECX, Left, InFlag); - InFlag = Chain.getValue(1); - Tys.clear(); - Tys.push_back(MVT::Other); - Tys.push_back(MVT::Flag); - Ops.clear(); - Ops.push_back(Chain); - Ops.push_back(DAG.getValueType(MVT::i8)); - Ops.push_back(InFlag); - Chain = DAG.getNode(X86ISD::REP_STOS, Tys, Ops); - } else if (BytesLeft) { - // Issue stores for the last 1 - 3 bytes. - SDOperand Value; - unsigned Val = ValC->getValue() & 255; - unsigned Offset = I->getValue() - BytesLeft; - SDOperand DstAddr = Op.getOperand(1); - MVT::ValueType AddrVT = DstAddr.getValueType(); - if (BytesLeft >= 2) { - Value = DAG.getConstant((Val << 8) | Val, MVT::i16); - Chain = DAG.getNode(ISD::STORE, MVT::Other, Chain, Value, - DAG.getNode(ISD::ADD, AddrVT, DstAddr, - DAG.getConstant(Offset, AddrVT)), - DAG.getSrcValue(NULL)); - BytesLeft -= 2; - Offset += 2; - } - - if (BytesLeft == 1) { - Value = DAG.getConstant(Val, MVT::i8); - Chain = DAG.getNode(ISD::STORE, MVT::Other, Chain, Value, - DAG.getNode(ISD::ADD, AddrVT, DstAddr, - DAG.getConstant(Offset, AddrVT)), - DAG.getSrcValue(NULL)); - } - } +} - return Chain; - } - case ISD::MEMCPY: { - SDOperand Chain = Op.getOperand(0); - unsigned Align = - (unsigned)cast(Op.getOperand(4))->getValue(); - if (Align == 0) Align = 1; - - ConstantSDNode *I = dyn_cast(Op.getOperand(3)); - // If not DWORD aligned, call memcpy if size is less than the threshold. - // It knows how to align to the right boundary first. - if ((Align & 3) != 0 || - (I && I->getValue() < Subtarget->getMinRepStrSizeThreshold())) { - MVT::ValueType IntPtr = getPointerTy(); - const Type *IntPtrTy = getTargetData().getIntPtrType(); - std::vector > Args; - Args.push_back(std::make_pair(Op.getOperand(1), IntPtrTy)); - Args.push_back(std::make_pair(Op.getOperand(2), IntPtrTy)); - Args.push_back(std::make_pair(Op.getOperand(3), IntPtrTy)); - std::pair CallResult = - LowerCallTo(Chain, Type::VoidTy, false, CallingConv::C, false, - DAG.getExternalSymbol("memcpy", IntPtr), Args, DAG); - return CallResult.second; - } - - MVT::ValueType AVT; - SDOperand Count; - unsigned BytesLeft = 0; - bool TwoRepMovs = false; - switch (Align & 3) { - case 2: // WORD aligned - AVT = MVT::i16; - Count = DAG.getConstant(I->getValue() / 2, MVT::i32); - BytesLeft = I->getValue() % 2; - break; - case 0: // DWORD aligned - AVT = MVT::i32; - if (I) { - Count = DAG.getConstant(I->getValue() / 4, MVT::i32); - BytesLeft = I->getValue() % 4; - } else { - Count = DAG.getNode(ISD::SRL, MVT::i32, Op.getOperand(3), - DAG.getConstant(2, MVT::i8)); - TwoRepMovs = true; +SDOperand X86TargetLowering::LowerSELECT(SDOperand Op, SelectionDAG &DAG) { + MVT::ValueType VT = Op.getValueType(); + bool isFPStack = MVT::isFloatingPoint(VT) && !X86ScalarSSE; + bool addTest = false; + SDOperand Op0 = Op.getOperand(0); + SDOperand Cond, CC; + if (Op0.getOpcode() == ISD::SETCC) + Op0 = LowerOperation(Op0, DAG); + + if (Op0.getOpcode() == X86ISD::SETCC) { + // If condition flag is set by a X86ISD::CMP, then make a copy of it + // (since flag operand cannot be shared). If the X86ISD::SETCC does not + // have another use it will be eliminated. + // If the X86ISD::SETCC has more than one use, then it's probably better + // to use a test instead of duplicating the X86ISD::CMP (for register + // pressure reason). + unsigned CmpOpc = Op0.getOperand(1).getOpcode(); + if (CmpOpc == X86ISD::CMP || CmpOpc == X86ISD::COMI || + CmpOpc == X86ISD::UCOMI) { + if (!Op0.hasOneUse()) { + std::vector Tys; + for (unsigned i = 0; i < Op0.Val->getNumValues(); ++i) + Tys.push_back(Op0.Val->getValueType(i)); + std::vector Ops; + for (unsigned i = 0; i < Op0.getNumOperands(); ++i) + Ops.push_back(Op0.getOperand(i)); + Op0 = DAG.getNode(X86ISD::SETCC, Tys, Ops); } - break; - default: // Byte aligned - AVT = MVT::i8; - Count = Op.getOperand(3); - break; - } - SDOperand InFlag(0, 0); - Chain = DAG.getCopyToReg(Chain, X86::ECX, Count, InFlag); - InFlag = Chain.getValue(1); - Chain = DAG.getCopyToReg(Chain, X86::EDI, Op.getOperand(1), InFlag); - InFlag = Chain.getValue(1); - Chain = DAG.getCopyToReg(Chain, X86::ESI, Op.getOperand(2), InFlag); - InFlag = Chain.getValue(1); + CC = Op0.getOperand(0); + Cond = Op0.getOperand(1); + // Make a copy as flag result cannot be used by more than one. + Cond = DAG.getNode(CmpOpc, MVT::Flag, + Cond.getOperand(0), Cond.getOperand(1)); + addTest = + isFPStack && !hasFPCMov(cast(CC)->getSignExtended()); + } else + addTest = true; + } else + addTest = true; - std::vector Tys; - Tys.push_back(MVT::Other); - Tys.push_back(MVT::Flag); - std::vector Ops; - Ops.push_back(Chain); - Ops.push_back(DAG.getValueType(AVT)); - Ops.push_back(InFlag); - Chain = DAG.getNode(X86ISD::REP_MOVS, Tys, Ops); + if (addTest) { + CC = DAG.getConstant(X86ISD::COND_NE, MVT::i8); + Cond = DAG.getNode(X86ISD::TEST, MVT::Flag, Op0, Op0); + } + + std::vector Tys; + Tys.push_back(Op.getValueType()); + Tys.push_back(MVT::Flag); + std::vector Ops; + // X86ISD::CMOV means set the result (which is operand 1) to the RHS if + // condition is true. + Ops.push_back(Op.getOperand(2)); + Ops.push_back(Op.getOperand(1)); + Ops.push_back(CC); + Ops.push_back(Cond); + return DAG.getNode(X86ISD::CMOV, Tys, Ops); +} - if (TwoRepMovs) { - InFlag = Chain.getValue(1); - Count = Op.getOperand(3); - MVT::ValueType CVT = Count.getValueType(); - SDOperand Left = DAG.getNode(ISD::AND, CVT, Count, - DAG.getConstant(3, CVT)); - Chain = DAG.getCopyToReg(Chain, X86::ECX, Left, InFlag); - InFlag = Chain.getValue(1); - Tys.clear(); - Tys.push_back(MVT::Other); - Tys.push_back(MVT::Flag); - Ops.clear(); - Ops.push_back(Chain); - Ops.push_back(DAG.getValueType(MVT::i8)); - Ops.push_back(InFlag); - Chain = DAG.getNode(X86ISD::REP_MOVS, Tys, Ops); - } else if (BytesLeft) { - // Issue loads and stores for the last 1 - 3 bytes. - unsigned Offset = I->getValue() - BytesLeft; - SDOperand DstAddr = Op.getOperand(1); - MVT::ValueType DstVT = DstAddr.getValueType(); - SDOperand SrcAddr = Op.getOperand(2); - MVT::ValueType SrcVT = SrcAddr.getValueType(); - SDOperand Value; - if (BytesLeft >= 2) { - Value = DAG.getLoad(MVT::i16, Chain, - DAG.getNode(ISD::ADD, SrcVT, SrcAddr, - DAG.getConstant(Offset, SrcVT)), - DAG.getSrcValue(NULL)); - Chain = Value.getValue(1); - Chain = DAG.getNode(ISD::STORE, MVT::Other, Chain, Value, - DAG.getNode(ISD::ADD, DstVT, DstAddr, - DAG.getConstant(Offset, DstVT)), - DAG.getSrcValue(NULL)); - BytesLeft -= 2; - Offset += 2; +SDOperand X86TargetLowering::LowerBRCOND(SDOperand Op, SelectionDAG &DAG) { + bool addTest = false; + SDOperand Cond = Op.getOperand(1); + SDOperand Dest = Op.getOperand(2); + SDOperand CC; + if (Cond.getOpcode() == ISD::SETCC) + Cond = LowerOperation(Cond, DAG); + + if (Cond.getOpcode() == X86ISD::SETCC) { + // If condition flag is set by a X86ISD::CMP, then make a copy of it + // (since flag operand cannot be shared). If the X86ISD::SETCC does not + // have another use it will be eliminated. + // If the X86ISD::SETCC has more than one use, then it's probably better + // to use a test instead of duplicating the X86ISD::CMP (for register + // pressure reason). + unsigned CmpOpc = Cond.getOperand(1).getOpcode(); + if (CmpOpc == X86ISD::CMP || CmpOpc == X86ISD::COMI || + CmpOpc == X86ISD::UCOMI) { + if (!Cond.hasOneUse()) { + std::vector Tys; + for (unsigned i = 0; i < Cond.Val->getNumValues(); ++i) + Tys.push_back(Cond.Val->getValueType(i)); + std::vector Ops; + for (unsigned i = 0; i < Cond.getNumOperands(); ++i) + Ops.push_back(Cond.getOperand(i)); + Cond = DAG.getNode(X86ISD::SETCC, Tys, Ops); } - if (BytesLeft == 1) { - Value = DAG.getLoad(MVT::i8, Chain, - DAG.getNode(ISD::ADD, SrcVT, SrcAddr, - DAG.getConstant(Offset, SrcVT)), - DAG.getSrcValue(NULL)); - Chain = Value.getValue(1); - Chain = DAG.getNode(ISD::STORE, MVT::Other, Chain, Value, - DAG.getNode(ISD::ADD, DstVT, DstAddr, - DAG.getConstant(Offset, DstVT)), - DAG.getSrcValue(NULL)); - } - } + CC = Cond.getOperand(0); + Cond = Cond.getOperand(1); + // Make a copy as flag result cannot be used by more than one. + Cond = DAG.getNode(CmpOpc, MVT::Flag, + Cond.getOperand(0), Cond.getOperand(1)); + } else + addTest = true; + } else + addTest = true; - return Chain; + if (addTest) { + CC = DAG.getConstant(X86ISD::COND_NE, MVT::i8); + Cond = DAG.getNode(X86ISD::TEST, MVT::Flag, Cond, Cond); } + return DAG.getNode(X86ISD::BRCOND, Op.getValueType(), + Op.getOperand(0), Op.getOperand(2), CC, Cond); +} - // ConstantPool, JumpTable, GlobalAddress, and ExternalSymbol are lowered as - // their target countpart wrapped in the X86ISD::Wrapper node. Suppose N is - // one of the above mentioned nodes. It has to be wrapped because otherwise - // Select(N) returns N. So the raw TargetGlobalAddress nodes, etc. can only - // be used to form addressing mode. These wrapped nodes will be selected - // into MOV32ri. - case ISD::ConstantPool: { - ConstantPoolSDNode *CP = cast(Op); - SDOperand Result = DAG.getNode(X86ISD::Wrapper, getPointerTy(), - DAG.getTargetConstantPool(CP->get(), getPointerTy(), - CP->getAlignment())); - if (Subtarget->isTargetDarwin()) { - // With PIC, the address is actually $g + Offset. - if (getTargetMachine().getRelocationModel() == Reloc::PIC) - Result = DAG.getNode(ISD::ADD, getPointerTy(), - DAG.getNode(X86ISD::GlobalBaseReg, getPointerTy()), Result); - } - - return Result; - } - case ISD::JumpTable: { - JumpTableSDNode *JT = cast(Op); - SDOperand Result = DAG.getNode(X86ISD::Wrapper, getPointerTy(), - DAG.getTargetJumpTable(JT->getIndex(), - getPointerTy())); - if (Subtarget->isTargetDarwin()) { - // With PIC, the address is actually $g + Offset. - if (getTargetMachine().getRelocationModel() == Reloc::PIC) - Result = DAG.getNode(ISD::ADD, getPointerTy(), - DAG.getNode(X86ISD::GlobalBaseReg, getPointerTy()), Result); - } - - return Result; - } - case ISD::GlobalAddress: { - GlobalValue *GV = cast(Op)->getGlobal(); - SDOperand Result = DAG.getNode(X86ISD::Wrapper, getPointerTy(), - DAG.getTargetGlobalAddress(GV, getPointerTy())); - if (Subtarget->isTargetDarwin()) { - // With PIC, the address is actually $g + Offset. - if (getTargetMachine().getRelocationModel() == Reloc::PIC) - Result = DAG.getNode(ISD::ADD, getPointerTy(), - DAG.getNode(X86ISD::GlobalBaseReg, getPointerTy()), Result); +SDOperand X86TargetLowering::LowerJumpTable(SDOperand Op, SelectionDAG &DAG) { + JumpTableSDNode *JT = cast(Op); + SDOperand Result = DAG.getNode(X86ISD::Wrapper, getPointerTy(), + DAG.getTargetJumpTable(JT->getIndex(), + getPointerTy())); + if (Subtarget->isTargetDarwin()) { + // With PIC, the address is actually $g + Offset. + if (getTargetMachine().getRelocationModel() == Reloc::PIC) + Result = DAG.getNode(ISD::ADD, getPointerTy(), + DAG.getNode(X86ISD::GlobalBaseReg, getPointerTy()), Result); + } - // For Darwin, external and weak symbols are indirect, so we want to load - // the value at address GV, not the value of GV itself. This means that - // the GlobalAddress must be in the base or index register of the address, - // not the GV offset field. - if (getTargetMachine().getRelocationModel() != Reloc::Static && - DarwinGVRequiresExtraLoad(GV)) - Result = DAG.getLoad(MVT::i32, DAG.getEntryNode(), - Result, DAG.getSrcValue(NULL)); - } - - return Result; - } - case ISD::ExternalSymbol: { - const char *Sym = cast(Op)->getSymbol(); - SDOperand Result = DAG.getNode(X86ISD::Wrapper, getPointerTy(), - DAG.getTargetExternalSymbol(Sym, getPointerTy())); - if (Subtarget->isTargetDarwin()) { - // With PIC, the address is actually $g + Offset. - if (getTargetMachine().getRelocationModel() == Reloc::PIC) - Result = DAG.getNode(ISD::ADD, getPointerTy(), - DAG.getNode(X86ISD::GlobalBaseReg, getPointerTy()), Result); - } + return Result; +} - return Result; - } - case ISD::VASTART: { - // vastart just stores the address of the VarArgsFrameIndex slot into the - // memory location argument. - // FIXME: Replace MVT::i32 with PointerTy - SDOperand FR = DAG.getFrameIndex(VarArgsFrameIndex, MVT::i32); - return DAG.getNode(ISD::STORE, MVT::Other, Op.getOperand(0), FR, - Op.getOperand(1), Op.getOperand(2)); - } - case ISD::RET: { - SDOperand Copy; +SDOperand X86TargetLowering::LowerRET(SDOperand Op, SelectionDAG &DAG) { + SDOperand Copy; - switch(Op.getNumOperands()) { + switch(Op.getNumOperands()) { default: assert(0 && "Do not know how to return this many arguments!"); abort(); case 1: // ret void. return DAG.getNode(X86ISD::RET_FLAG, MVT::Other, Op.getOperand(0), - DAG.getConstant(getBytesToPopOnReturn(), MVT::i16)); + DAG.getConstant(getBytesToPopOnReturn(), MVT::i16)); case 2: { MVT::ValueType ArgVT = Op.getOperand(1).getValueType(); @@ -3030,575 +3225,434 @@ SDOperand()); Copy = DAG.getCopyToReg(Copy, X86::EAX,Op.getOperand(1),Copy.getValue(1)); break; - } - return DAG.getNode(X86ISD::RET_FLAG, MVT::Other, - Copy, DAG.getConstant(getBytesToPopOnReturn(), MVT::i16), - Copy.getValue(1)); - } - case ISD::SCALAR_TO_VECTOR: { - SDOperand AnyExt = DAG.getNode(ISD::ANY_EXTEND, MVT::i32, Op.getOperand(0)); - return DAG.getNode(X86ISD::S2VEC, Op.getValueType(), AnyExt); - } - case ISD::VECTOR_SHUFFLE: { - SDOperand V1 = Op.getOperand(0); - SDOperand V2 = Op.getOperand(1); - SDOperand PermMask = Op.getOperand(2); - MVT::ValueType VT = Op.getValueType(); - unsigned NumElems = PermMask.getNumOperands(); - bool V1IsUndef = V1.getOpcode() == ISD::UNDEF; - bool V2IsUndef = V2.getOpcode() == ISD::UNDEF; - - if (isSplatMask(PermMask.Val)) { - if (NumElems <= 4) return Op; - // Promote it to a v4i32 splat. - return PromoteSplat(Op, DAG); - } - - if (X86::isMOVLMask(PermMask.Val)) - return (V1IsUndef) ? V2 : Op; - - if (X86::isMOVSHDUPMask(PermMask.Val) || - X86::isMOVSLDUPMask(PermMask.Val) || - X86::isMOVHLPSMask(PermMask.Val) || - X86::isMOVHPMask(PermMask.Val) || - X86::isMOVLPMask(PermMask.Val)) - return Op; - - if (ShouldXformToMOVHLPS(PermMask.Val) || - ShouldXformToMOVLP(V1.Val, PermMask.Val)) - return CommuteVectorShuffle(Op, DAG); - - bool V1IsSplat = isSplatVector(V1.Val) || V1.getOpcode() == ISD::UNDEF; - bool V2IsSplat = isSplatVector(V2.Val) || V2.getOpcode() == ISD::UNDEF; - if (V1IsSplat && !V2IsSplat) { - Op = CommuteVectorShuffle(Op, DAG); - V1 = Op.getOperand(0); - V2 = Op.getOperand(1); - PermMask = Op.getOperand(2); - V2IsSplat = true; - } - - if (isCommutedMOVL(PermMask.Val, V2IsSplat)) { - if (V2IsUndef) return V1; - Op = CommuteVectorShuffle(Op, DAG); - V1 = Op.getOperand(0); - V2 = Op.getOperand(1); - PermMask = Op.getOperand(2); - if (V2IsSplat) { - // V2 is a splat, so the mask may be malformed. That is, it may point - // to any V2 element. The instruction selectior won't like this. Get - // a corrected mask and commute to form a proper MOVS{S|D}. - SDOperand NewMask = getMOVLMask(NumElems, DAG); - if (NewMask.Val != PermMask.Val) - Op = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V2, NewMask); - } - return Op; - } - - if (X86::isUNPCKL_v_undef_Mask(PermMask.Val) || - X86::isUNPCKLMask(PermMask.Val) || - X86::isUNPCKHMask(PermMask.Val)) - return Op; - - if (V2IsSplat) { - // Normalize mask so all entries that point to V2 points to its first - // element then try to match unpck{h|l} again. If match, return a - // new vector_shuffle with the corrected mask. - SDOperand NewMask = NormalizeMask(PermMask, DAG); - if (NewMask.Val != PermMask.Val) { - if (X86::isUNPCKLMask(PermMask.Val, true)) { - SDOperand NewMask = getUnpacklMask(NumElems, DAG); - return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V2, NewMask); - } else if (X86::isUNPCKHMask(PermMask.Val, true)) { - SDOperand NewMask = getUnpackhMask(NumElems, DAG); - return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V2, NewMask); - } - } - } - - // Normalize the node to match x86 shuffle ops if needed - if (V2.getOpcode() != ISD::UNDEF) - if (isCommutedSHUFP(PermMask.Val)) { - Op = CommuteVectorShuffle(Op, DAG); - V1 = Op.getOperand(0); - V2 = Op.getOperand(1); - PermMask = Op.getOperand(2); - } - - // If VT is integer, try PSHUF* first, then SHUFP*. - if (MVT::isInteger(VT)) { - if (X86::isPSHUFDMask(PermMask.Val) || - X86::isPSHUFHWMask(PermMask.Val) || - X86::isPSHUFLWMask(PermMask.Val)) { - if (V2.getOpcode() != ISD::UNDEF) - return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, - DAG.getNode(ISD::UNDEF, V1.getValueType()),PermMask); - return Op; - } - - if (X86::isSHUFPMask(PermMask.Val)) - return Op; - - // Handle v8i16 shuffle high / low shuffle node pair. - if (VT == MVT::v8i16 && isPSHUFHW_PSHUFLWMask(PermMask.Val)) { - MVT::ValueType MaskVT = MVT::getIntVectorWithNumElements(NumElems); - MVT::ValueType BaseVT = MVT::getVectorBaseType(MaskVT); - std::vector MaskVec; - for (unsigned i = 0; i != 4; ++i) - MaskVec.push_back(PermMask.getOperand(i)); - for (unsigned i = 4; i != 8; ++i) - MaskVec.push_back(DAG.getConstant(i, BaseVT)); - SDOperand Mask = DAG.getNode(ISD::BUILD_VECTOR, MaskVT, MaskVec); - V1 = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V2, Mask); - MaskVec.clear(); - for (unsigned i = 0; i != 4; ++i) - MaskVec.push_back(DAG.getConstant(i, BaseVT)); - for (unsigned i = 4; i != 8; ++i) - MaskVec.push_back(PermMask.getOperand(i)); - Mask = DAG.getNode(ISD::BUILD_VECTOR, MaskVT, MaskVec); - return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V2, Mask); - } - } else { - // Floating point cases in the other order. - if (X86::isSHUFPMask(PermMask.Val)) - return Op; - if (X86::isPSHUFDMask(PermMask.Val) || - X86::isPSHUFHWMask(PermMask.Val) || - X86::isPSHUFLWMask(PermMask.Val)) { - if (V2.getOpcode() != ISD::UNDEF) - return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, - DAG.getNode(ISD::UNDEF, V1.getValueType()),PermMask); - return Op; - } - } + } + return DAG.getNode(X86ISD::RET_FLAG, MVT::Other, + Copy, DAG.getConstant(getBytesToPopOnReturn(), MVT::i16), + Copy.getValue(1)); +} - if (NumElems == 4) { - // Break it into (shuffle shuffle_hi, shuffle_lo). - MVT::ValueType MaskVT = PermMask.getValueType(); - MVT::ValueType MaskEVT = MVT::getVectorBaseType(MaskVT); - std::map > Locs; - std::vector LoMask(NumElems, DAG.getNode(ISD::UNDEF, MaskEVT)); - std::vector HiMask(NumElems, DAG.getNode(ISD::UNDEF, MaskEVT)); - std::vector *MaskPtr = &LoMask; - unsigned MaskIdx = 0; - unsigned LoIdx = 0; - unsigned HiIdx = NumElems/2; - for (unsigned i = 0; i != NumElems; ++i) { - if (i == NumElems/2) { - MaskPtr = &HiMask; - MaskIdx = 1; - LoIdx = 0; - HiIdx = NumElems/2; - } - SDOperand Elt = PermMask.getOperand(i); - if (Elt.getOpcode() == ISD::UNDEF) { - Locs[i] = std::make_pair(-1, -1); - } else if (cast(Elt)->getValue() < NumElems) { - Locs[i] = std::make_pair(MaskIdx, LoIdx); - (*MaskPtr)[LoIdx] = Elt; - LoIdx++; - } else { - Locs[i] = std::make_pair(MaskIdx, HiIdx); - (*MaskPtr)[HiIdx] = Elt; - HiIdx++; - } - } +SDOperand X86TargetLowering::LowerMEMSET(SDOperand Op, SelectionDAG &DAG) { + SDOperand InFlag(0, 0); + SDOperand Chain = Op.getOperand(0); + unsigned Align = + (unsigned)cast(Op.getOperand(4))->getValue(); + if (Align == 0) Align = 1; + + ConstantSDNode *I = dyn_cast(Op.getOperand(3)); + // If not DWORD aligned, call memset if size is less than the threshold. + // It knows how to align to the right boundary first. + if ((Align & 3) != 0 || + (I && I->getValue() < Subtarget->getMinRepStrSizeThreshold())) { + MVT::ValueType IntPtr = getPointerTy(); + const Type *IntPtrTy = getTargetData().getIntPtrType(); + std::vector > Args; + Args.push_back(std::make_pair(Op.getOperand(1), IntPtrTy)); + // Extend the ubyte argument to be an int value for the call. + SDOperand Val = DAG.getNode(ISD::ZERO_EXTEND, MVT::i32, Op.getOperand(2)); + Args.push_back(std::make_pair(Val, IntPtrTy)); + Args.push_back(std::make_pair(Op.getOperand(3), IntPtrTy)); + std::pair CallResult = + LowerCallTo(Chain, Type::VoidTy, false, CallingConv::C, false, + DAG.getExternalSymbol("memset", IntPtr), Args, DAG); + return CallResult.second; + } + + MVT::ValueType AVT; + SDOperand Count; + ConstantSDNode *ValC = dyn_cast(Op.getOperand(2)); + unsigned BytesLeft = 0; + bool TwoRepStos = false; + if (ValC) { + unsigned ValReg; + unsigned Val = ValC->getValue() & 255; - SDOperand LoShuffle = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V2, - DAG.getNode(ISD::BUILD_VECTOR, MaskVT, LoMask)); - SDOperand HiShuffle = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V2, - DAG.getNode(ISD::BUILD_VECTOR, MaskVT, HiMask)); - std::vector MaskOps; - for (unsigned i = 0; i != NumElems; ++i) { - if (Locs[i].first == -1) { - MaskOps.push_back(DAG.getNode(ISD::UNDEF, MaskEVT)); + // If the value is a constant, then we can potentially use larger sets. + switch (Align & 3) { + case 2: // WORD aligned + AVT = MVT::i16; + Count = DAG.getConstant(I->getValue() / 2, MVT::i32); + BytesLeft = I->getValue() % 2; + Val = (Val << 8) | Val; + ValReg = X86::AX; + break; + case 0: // DWORD aligned + AVT = MVT::i32; + if (I) { + Count = DAG.getConstant(I->getValue() / 4, MVT::i32); + BytesLeft = I->getValue() % 4; } else { - unsigned Idx = Locs[i].first * NumElems + Locs[i].second; - MaskOps.push_back(DAG.getConstant(Idx, MaskEVT)); + Count = DAG.getNode(ISD::SRL, MVT::i32, Op.getOperand(3), + DAG.getConstant(2, MVT::i8)); + TwoRepStos = true; } - } - return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, LoShuffle, HiShuffle, - DAG.getNode(ISD::BUILD_VECTOR, MaskVT, MaskOps)); + Val = (Val << 8) | Val; + Val = (Val << 16) | Val; + ValReg = X86::EAX; + break; + default: // Byte aligned + AVT = MVT::i8; + Count = Op.getOperand(3); + ValReg = X86::AL; + break; } - return SDOperand(); + Chain = DAG.getCopyToReg(Chain, ValReg, DAG.getConstant(Val, AVT), + InFlag); + InFlag = Chain.getValue(1); + } else { + AVT = MVT::i8; + Count = Op.getOperand(3); + Chain = DAG.getCopyToReg(Chain, X86::AL, Op.getOperand(2), InFlag); + InFlag = Chain.getValue(1); } - case ISD::BUILD_VECTOR: { - // All zero's are handled with pxor. - if (ISD::isBuildVectorAllZeros(Op.Val)) - return Op; - // All one's are handled with pcmpeqd. - if (ISD::isBuildVectorAllOnes(Op.Val)) - return Op; + Chain = DAG.getCopyToReg(Chain, X86::ECX, Count, InFlag); + InFlag = Chain.getValue(1); + Chain = DAG.getCopyToReg(Chain, X86::EDI, Op.getOperand(1), InFlag); + InFlag = Chain.getValue(1); + + std::vector Tys; + Tys.push_back(MVT::Other); + Tys.push_back(MVT::Flag); + std::vector Ops; + Ops.push_back(Chain); + Ops.push_back(DAG.getValueType(AVT)); + Ops.push_back(InFlag); + Chain = DAG.getNode(X86ISD::REP_STOS, Tys, Ops); - MVT::ValueType VT = Op.getValueType(); - MVT::ValueType EVT = MVT::getVectorBaseType(VT); - unsigned EVTBits = MVT::getSizeInBits(EVT); - - unsigned NumElems = Op.getNumOperands(); - unsigned NumZero = 0; - unsigned NumNonZero = 0; - unsigned NonZeros = 0; - std::set Values; - for (unsigned i = 0; i < NumElems; ++i) { - SDOperand Elt = Op.getOperand(i); - if (Elt.getOpcode() != ISD::UNDEF) { - Values.insert(Elt); - if (isZeroNode(Elt)) - NumZero++; - else { - NonZeros |= (1 << i); - NumNonZero++; - } - } + if (TwoRepStos) { + InFlag = Chain.getValue(1); + Count = Op.getOperand(3); + MVT::ValueType CVT = Count.getValueType(); + SDOperand Left = DAG.getNode(ISD::AND, CVT, Count, + DAG.getConstant(3, CVT)); + Chain = DAG.getCopyToReg(Chain, X86::ECX, Left, InFlag); + InFlag = Chain.getValue(1); + Tys.clear(); + Tys.push_back(MVT::Other); + Tys.push_back(MVT::Flag); + Ops.clear(); + Ops.push_back(Chain); + Ops.push_back(DAG.getValueType(MVT::i8)); + Ops.push_back(InFlag); + Chain = DAG.getNode(X86ISD::REP_STOS, Tys, Ops); + } else if (BytesLeft) { + // Issue stores for the last 1 - 3 bytes. + SDOperand Value; + unsigned Val = ValC->getValue() & 255; + unsigned Offset = I->getValue() - BytesLeft; + SDOperand DstAddr = Op.getOperand(1); + MVT::ValueType AddrVT = DstAddr.getValueType(); + if (BytesLeft >= 2) { + Value = DAG.getConstant((Val << 8) | Val, MVT::i16); + Chain = DAG.getNode(ISD::STORE, MVT::Other, Chain, Value, + DAG.getNode(ISD::ADD, AddrVT, DstAddr, + DAG.getConstant(Offset, AddrVT)), + DAG.getSrcValue(NULL)); + BytesLeft -= 2; + Offset += 2; + } + + if (BytesLeft == 1) { + Value = DAG.getConstant(Val, MVT::i8); + Chain = DAG.getNode(ISD::STORE, MVT::Other, Chain, Value, + DAG.getNode(ISD::ADD, AddrVT, DstAddr, + DAG.getConstant(Offset, AddrVT)), + DAG.getSrcValue(NULL)); } + } - if (NumNonZero == 0) - // Must be a mix of zero and undef. Return a zero vector. - return getZeroVector(VT, DAG); - - // Splat is obviously ok. Let legalizer expand it to a shuffle. - if (Values.size() == 1) - return SDOperand(); - - // Special case for single non-zero element. - if (NumNonZero == 1) { - unsigned Idx = CountTrailingZeros_32(NonZeros); - SDOperand Item = Op.getOperand(Idx); - Item = DAG.getNode(ISD::SCALAR_TO_VECTOR, VT, Item); - if (Idx == 0) - // Turn it into a MOVL (i.e. movss, movsd, or movd) to a zero vector. - return getShuffleVectorZeroOrUndef(Item, VT, NumElems, Idx, - NumZero > 0, DAG); - - if (EVTBits == 32) { - // Turn it into a shuffle of zero and zero-extended scalar to vector. - Item = getShuffleVectorZeroOrUndef(Item, VT, NumElems, 0, NumZero > 0, - DAG); - MVT::ValueType MaskVT = MVT::getIntVectorWithNumElements(NumElems); - MVT::ValueType MaskEVT = MVT::getVectorBaseType(MaskVT); - std::vector MaskVec; - for (unsigned i = 0; i < NumElems; i++) - MaskVec.push_back(DAG.getConstant((i == Idx) ? 0 : 1, MaskEVT)); - SDOperand Mask = DAG.getNode(ISD::BUILD_VECTOR, MaskVT, MaskVec); - return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, Item, - DAG.getNode(ISD::UNDEF, VT), Mask); - } - } + return Chain; +} - // Let legalizer expand 2-widde build_vector's. - if (EVTBits == 64) - return SDOperand(); - - // If element VT is < 32 bits, convert it to inserts into a zero vector. - if (EVTBits == 8) { - SDOperand V = LowerBuildVectorv16i8(Op, NonZeros,NumNonZero,NumZero, DAG); - if (V.Val) return V; - } - - if (EVTBits == 16) { - SDOperand V = LowerBuildVectorv8i16(Op, NonZeros,NumNonZero,NumZero, DAG); - if (V.Val) return V; - } - - // If element VT is == 32 bits, turn it into a number of shuffles. - std::vector V(NumElems); - if (NumElems == 4 && NumZero > 0) { - for (unsigned i = 0; i < 4; ++i) { - bool isZero = !(NonZeros & (1 << i)); - if (isZero) - V[i] = getZeroVector(VT, DAG); - else - V[i] = DAG.getNode(ISD::SCALAR_TO_VECTOR, VT, Op.getOperand(i)); +SDOperand X86TargetLowering::LowerMEMCPY(SDOperand Op, SelectionDAG &DAG) { + SDOperand Chain = Op.getOperand(0); + unsigned Align = + (unsigned)cast(Op.getOperand(4))->getValue(); + if (Align == 0) Align = 1; + + ConstantSDNode *I = dyn_cast(Op.getOperand(3)); + // If not DWORD aligned, call memcpy if size is less than the threshold. + // It knows how to align to the right boundary first. + if ((Align & 3) != 0 || + (I && I->getValue() < Subtarget->getMinRepStrSizeThreshold())) { + MVT::ValueType IntPtr = getPointerTy(); + const Type *IntPtrTy = getTargetData().getIntPtrType(); + std::vector > Args; + Args.push_back(std::make_pair(Op.getOperand(1), IntPtrTy)); + Args.push_back(std::make_pair(Op.getOperand(2), IntPtrTy)); + Args.push_back(std::make_pair(Op.getOperand(3), IntPtrTy)); + std::pair CallResult = + LowerCallTo(Chain, Type::VoidTy, false, CallingConv::C, false, + DAG.getExternalSymbol("memcpy", IntPtr), Args, DAG); + return CallResult.second; + } + + MVT::ValueType AVT; + SDOperand Count; + unsigned BytesLeft = 0; + bool TwoRepMovs = false; + switch (Align & 3) { + case 2: // WORD aligned + AVT = MVT::i16; + Count = DAG.getConstant(I->getValue() / 2, MVT::i32); + BytesLeft = I->getValue() % 2; + break; + case 0: // DWORD aligned + AVT = MVT::i32; + if (I) { + Count = DAG.getConstant(I->getValue() / 4, MVT::i32); + BytesLeft = I->getValue() % 4; + } else { + Count = DAG.getNode(ISD::SRL, MVT::i32, Op.getOperand(3), + DAG.getConstant(2, MVT::i8)); + TwoRepMovs = true; } + break; + default: // Byte aligned + AVT = MVT::i8; + Count = Op.getOperand(3); + break; + } - for (unsigned i = 0; i < 2; ++i) { - switch ((NonZeros & (0x3 << i*2)) >> (i*2)) { - default: break; - case 0: - V[i] = V[i*2]; // Must be a zero vector. - break; - case 1: - V[i] = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V[i*2+1], V[i*2], - getMOVLMask(NumElems, DAG)); - break; - case 2: - V[i] = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V[i*2], V[i*2+1], - getMOVLMask(NumElems, DAG)); - break; - case 3: - V[i] = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V[i*2], V[i*2+1], - getUnpacklMask(NumElems, DAG)); - break; - } - } + SDOperand InFlag(0, 0); + Chain = DAG.getCopyToReg(Chain, X86::ECX, Count, InFlag); + InFlag = Chain.getValue(1); + Chain = DAG.getCopyToReg(Chain, X86::EDI, Op.getOperand(1), InFlag); + InFlag = Chain.getValue(1); + Chain = DAG.getCopyToReg(Chain, X86::ESI, Op.getOperand(2), InFlag); + InFlag = Chain.getValue(1); + + std::vector Tys; + Tys.push_back(MVT::Other); + Tys.push_back(MVT::Flag); + std::vector Ops; + Ops.push_back(Chain); + Ops.push_back(DAG.getValueType(AVT)); + Ops.push_back(InFlag); + Chain = DAG.getNode(X86ISD::REP_MOVS, Tys, Ops); - // Take advantage of the fact R32 to VR128 scalar_to_vector (i.e. movd) - // clears the upper bits. - // FIXME: we can do the same for v4f32 case when we know both parts of - // the lower half come from scalar_to_vector (loadf32). We should do - // that in post legalizer dag combiner with target specific hooks. - if (MVT::isInteger(EVT) && (NonZeros & (0x3 << 2)) == 0) - return V[0]; - MVT::ValueType MaskVT = MVT::getIntVectorWithNumElements(NumElems); - MVT::ValueType EVT = MVT::getVectorBaseType(MaskVT); - std::vector MaskVec; - bool Reverse = (NonZeros & 0x3) == 2; - for (unsigned i = 0; i < 2; ++i) - if (Reverse) - MaskVec.push_back(DAG.getConstant(1-i, EVT)); - else - MaskVec.push_back(DAG.getConstant(i, EVT)); - Reverse = ((NonZeros & (0x3 << 2)) >> 2) == 2; - for (unsigned i = 0; i < 2; ++i) - if (Reverse) - MaskVec.push_back(DAG.getConstant(1-i+NumElems, EVT)); - else - MaskVec.push_back(DAG.getConstant(i+NumElems, EVT)); - SDOperand ShufMask = DAG.getNode(ISD::BUILD_VECTOR, MaskVT, MaskVec); - return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V[0], V[1], ShufMask); - } - - if (Values.size() > 2) { - // Expand into a number of unpckl*. - // e.g. for v4f32 - // Step 1: unpcklps 0, 2 ==> X: - // : unpcklps 1, 3 ==> Y: - // Step 2: unpcklps X, Y ==> <3, 2, 1, 0> - SDOperand UnpckMask = getUnpacklMask(NumElems, DAG); - for (unsigned i = 0; i < NumElems; ++i) - V[i] = DAG.getNode(ISD::SCALAR_TO_VECTOR, VT, Op.getOperand(i)); - NumElems >>= 1; - while (NumElems != 0) { - for (unsigned i = 0; i < NumElems; ++i) - V[i] = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V[i], V[i + NumElems], - UnpckMask); - NumElems >>= 1; - } - return V[0]; + if (TwoRepMovs) { + InFlag = Chain.getValue(1); + Count = Op.getOperand(3); + MVT::ValueType CVT = Count.getValueType(); + SDOperand Left = DAG.getNode(ISD::AND, CVT, Count, + DAG.getConstant(3, CVT)); + Chain = DAG.getCopyToReg(Chain, X86::ECX, Left, InFlag); + InFlag = Chain.getValue(1); + Tys.clear(); + Tys.push_back(MVT::Other); + Tys.push_back(MVT::Flag); + Ops.clear(); + Ops.push_back(Chain); + Ops.push_back(DAG.getValueType(MVT::i8)); + Ops.push_back(InFlag); + Chain = DAG.getNode(X86ISD::REP_MOVS, Tys, Ops); + } else if (BytesLeft) { + // Issue loads and stores for the last 1 - 3 bytes. + unsigned Offset = I->getValue() - BytesLeft; + SDOperand DstAddr = Op.getOperand(1); + MVT::ValueType DstVT = DstAddr.getValueType(); + SDOperand SrcAddr = Op.getOperand(2); + MVT::ValueType SrcVT = SrcAddr.getValueType(); + SDOperand Value; + if (BytesLeft >= 2) { + Value = DAG.getLoad(MVT::i16, Chain, + DAG.getNode(ISD::ADD, SrcVT, SrcAddr, + DAG.getConstant(Offset, SrcVT)), + DAG.getSrcValue(NULL)); + Chain = Value.getValue(1); + Chain = DAG.getNode(ISD::STORE, MVT::Other, Chain, Value, + DAG.getNode(ISD::ADD, DstVT, DstAddr, + DAG.getConstant(Offset, DstVT)), + DAG.getSrcValue(NULL)); + BytesLeft -= 2; + Offset += 2; } - return SDOperand(); + if (BytesLeft == 1) { + Value = DAG.getLoad(MVT::i8, Chain, + DAG.getNode(ISD::ADD, SrcVT, SrcAddr, + DAG.getConstant(Offset, SrcVT)), + DAG.getSrcValue(NULL)); + Chain = Value.getValue(1); + Chain = DAG.getNode(ISD::STORE, MVT::Other, Chain, Value, + DAG.getNode(ISD::ADD, DstVT, DstAddr, + DAG.getConstant(Offset, DstVT)), + DAG.getSrcValue(NULL)); + } } - case ISD::EXTRACT_VECTOR_ELT: { - if (!isa(Op.getOperand(1))) - return SDOperand(); - - MVT::ValueType VT = Op.getValueType(); - // TODO: handle v16i8. - if (MVT::getSizeInBits(VT) == 16) { - // Transform it so it match pextrw which produces a 32-bit result. - MVT::ValueType EVT = (MVT::ValueType)(VT+1); - SDOperand Extract = DAG.getNode(X86ISD::PEXTRW, EVT, - Op.getOperand(0), Op.getOperand(1)); - SDOperand Assert = DAG.getNode(ISD::AssertZext, EVT, Extract, - DAG.getValueType(VT)); - return DAG.getNode(ISD::TRUNCATE, VT, Assert); - } else if (MVT::getSizeInBits(VT) == 32) { - SDOperand Vec = Op.getOperand(0); - unsigned Idx = cast(Op.getOperand(1))->getValue(); - if (Idx == 0) - return Op; - // SHUFPS the element to the lowest double word, then movss. - MVT::ValueType MaskVT = MVT::getIntVectorWithNumElements(4); - SDOperand IdxNode = DAG.getConstant((Idx < 2) ? Idx : Idx+4, - MVT::getVectorBaseType(MaskVT)); - std::vector IdxVec; - IdxVec.push_back(DAG.getConstant(Idx, MVT::getVectorBaseType(MaskVT))); - IdxVec.push_back(DAG.getNode(ISD::UNDEF, MVT::getVectorBaseType(MaskVT))); - IdxVec.push_back(DAG.getNode(ISD::UNDEF, MVT::getVectorBaseType(MaskVT))); - IdxVec.push_back(DAG.getNode(ISD::UNDEF, MVT::getVectorBaseType(MaskVT))); - SDOperand Mask = DAG.getNode(ISD::BUILD_VECTOR, MaskVT, IdxVec); - Vec = DAG.getNode(ISD::VECTOR_SHUFFLE, Vec.getValueType(), - Vec, Vec, Mask); - return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, VT, Vec, - DAG.getConstant(0, MVT::i32)); - } else if (MVT::getSizeInBits(VT) == 64) { - SDOperand Vec = Op.getOperand(0); - unsigned Idx = cast(Op.getOperand(1))->getValue(); - if (Idx == 0) - return Op; - - // UNPCKHPD the element to the lowest double word, then movsd. - // Note if the lower 64 bits of the result of the UNPCKHPD is then stored - // to a f64mem, the whole operation is folded into a single MOVHPDmr. - MVT::ValueType MaskVT = MVT::getIntVectorWithNumElements(4); - std::vector IdxVec; - IdxVec.push_back(DAG.getConstant(1, MVT::getVectorBaseType(MaskVT))); - IdxVec.push_back(DAG.getNode(ISD::UNDEF, MVT::getVectorBaseType(MaskVT))); - SDOperand Mask = DAG.getNode(ISD::BUILD_VECTOR, MaskVT, IdxVec); - Vec = DAG.getNode(ISD::VECTOR_SHUFFLE, Vec.getValueType(), - Vec, DAG.getNode(ISD::UNDEF, Vec.getValueType()), Mask); - return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, VT, Vec, - DAG.getConstant(0, MVT::i32)); - } + return Chain; +} - return SDOperand(); - } - case ISD::INSERT_VECTOR_ELT: { - // Transform it so it match pinsrw which expects a 16-bit value in a R32 - // as its second argument. - MVT::ValueType VT = Op.getValueType(); - MVT::ValueType BaseVT = MVT::getVectorBaseType(VT); - SDOperand N0 = Op.getOperand(0); - SDOperand N1 = Op.getOperand(1); - SDOperand N2 = Op.getOperand(2); - if (MVT::getSizeInBits(BaseVT) == 16) { - if (N1.getValueType() != MVT::i32) - N1 = DAG.getNode(ISD::ANY_EXTEND, MVT::i32, N1); - if (N2.getValueType() != MVT::i32) - N2 = DAG.getConstant(cast(N2)->getValue(), MVT::i32); - return DAG.getNode(X86ISD::PINSRW, VT, N0, N1, N2); - } else if (MVT::getSizeInBits(BaseVT) == 32) { - unsigned Idx = cast(N2)->getValue(); - if (Idx == 0) { - // Use a movss. - N1 = DAG.getNode(ISD::SCALAR_TO_VECTOR, VT, N1); - MVT::ValueType MaskVT = MVT::getIntVectorWithNumElements(4); - MVT::ValueType BaseVT = MVT::getVectorBaseType(MaskVT); - std::vector MaskVec; - MaskVec.push_back(DAG.getConstant(4, BaseVT)); - for (unsigned i = 1; i <= 3; ++i) - MaskVec.push_back(DAG.getConstant(i, BaseVT)); - return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, N0, N1, - DAG.getNode(ISD::BUILD_VECTOR, MaskVT, MaskVec)); - } else { - // Use two pinsrw instructions to insert a 32 bit value. - Idx <<= 1; - if (MVT::isFloatingPoint(N1.getValueType())) { - if (N1.getOpcode() == ISD::LOAD) { - // Just load directly from f32mem to R32. - N1 = DAG.getLoad(MVT::i32, N1.getOperand(0), N1.getOperand(1), - N1.getOperand(2)); - } else { - N1 = DAG.getNode(ISD::SCALAR_TO_VECTOR, MVT::v4f32, N1); - N1 = DAG.getNode(ISD::BIT_CONVERT, MVT::v4i32, N1); - N1 = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, MVT::i32, N1, - DAG.getConstant(0, MVT::i32)); - } - } - N0 = DAG.getNode(ISD::BIT_CONVERT, MVT::v8i16, N0); - N0 = DAG.getNode(X86ISD::PINSRW, MVT::v8i16, N0, N1, - DAG.getConstant(Idx, MVT::i32)); - N1 = DAG.getNode(ISD::SRL, MVT::i32, N1, DAG.getConstant(16, MVT::i8)); - N0 = DAG.getNode(X86ISD::PINSRW, MVT::v8i16, N0, N1, - DAG.getConstant(Idx+1, MVT::i32)); - return DAG.getNode(ISD::BIT_CONVERT, VT, N0); - } - } +SDOperand +X86TargetLowering::LowerREADCYCLCECOUNTER(SDOperand Op, SelectionDAG &DAG) { + std::vector Tys; + Tys.push_back(MVT::Other); + Tys.push_back(MVT::Flag); + std::vector Ops; + Ops.push_back(Op.getOperand(0)); + SDOperand rd = DAG.getNode(X86ISD::RDTSC_DAG, Tys, Ops); + Ops.clear(); + Ops.push_back(DAG.getCopyFromReg(rd, X86::EAX, MVT::i32, rd.getValue(1))); + Ops.push_back(DAG.getCopyFromReg(Ops[0].getValue(1), X86::EDX, + MVT::i32, Ops[0].getValue(2))); + Ops.push_back(Ops[1].getValue(1)); + Tys[0] = Tys[1] = MVT::i32; + Tys.push_back(MVT::Other); + return DAG.getNode(ISD::MERGE_VALUES, Tys, Ops); +} - return SDOperand(); - } - case ISD::INTRINSIC_WO_CHAIN: { - unsigned IntNo = cast(Op.getOperand(0))->getValue(); - switch (IntNo) { - default: return SDOperand(); // Don't custom lower most intrinsics. +SDOperand X86TargetLowering::LowerVASTART(SDOperand Op, SelectionDAG &DAG) { + // vastart just stores the address of the VarArgsFrameIndex slot into the + // memory location argument. + // FIXME: Replace MVT::i32 with PointerTy + SDOperand FR = DAG.getFrameIndex(VarArgsFrameIndex, MVT::i32); + return DAG.getNode(ISD::STORE, MVT::Other, Op.getOperand(0), FR, + Op.getOperand(1), Op.getOperand(2)); +} + +SDOperand +X86TargetLowering::LowerINTRINSIC_WO_CHAIN(SDOperand Op, SelectionDAG &DAG) { + unsigned IntNo = cast(Op.getOperand(0))->getValue(); + switch (IntNo) { + default: return SDOperand(); // Don't custom lower most intrinsics. // Comparison intrinsics. - case Intrinsic::x86_sse_comieq_ss: + case Intrinsic::x86_sse_comieq_ss: + case Intrinsic::x86_sse_comilt_ss: + case Intrinsic::x86_sse_comile_ss: + case Intrinsic::x86_sse_comigt_ss: + case Intrinsic::x86_sse_comige_ss: + case Intrinsic::x86_sse_comineq_ss: + case Intrinsic::x86_sse_ucomieq_ss: + case Intrinsic::x86_sse_ucomilt_ss: + case Intrinsic::x86_sse_ucomile_ss: + case Intrinsic::x86_sse_ucomigt_ss: + case Intrinsic::x86_sse_ucomige_ss: + case Intrinsic::x86_sse_ucomineq_ss: + case Intrinsic::x86_sse2_comieq_sd: + case Intrinsic::x86_sse2_comilt_sd: + case Intrinsic::x86_sse2_comile_sd: + case Intrinsic::x86_sse2_comigt_sd: + case Intrinsic::x86_sse2_comige_sd: + case Intrinsic::x86_sse2_comineq_sd: + case Intrinsic::x86_sse2_ucomieq_sd: + case Intrinsic::x86_sse2_ucomilt_sd: + case Intrinsic::x86_sse2_ucomile_sd: + case Intrinsic::x86_sse2_ucomigt_sd: + case Intrinsic::x86_sse2_ucomige_sd: + case Intrinsic::x86_sse2_ucomineq_sd: { + unsigned Opc = 0; + ISD::CondCode CC = ISD::SETCC_INVALID; + switch (IntNo) { + default: break; + case Intrinsic::x86_sse_comieq_ss: + case Intrinsic::x86_sse2_comieq_sd: + Opc = X86ISD::COMI; + CC = ISD::SETEQ; + break; case Intrinsic::x86_sse_comilt_ss: - case Intrinsic::x86_sse_comile_ss: - case Intrinsic::x86_sse_comigt_ss: - case Intrinsic::x86_sse_comige_ss: - case Intrinsic::x86_sse_comineq_ss: - case Intrinsic::x86_sse_ucomieq_ss: - case Intrinsic::x86_sse_ucomilt_ss: - case Intrinsic::x86_sse_ucomile_ss: - case Intrinsic::x86_sse_ucomigt_ss: - case Intrinsic::x86_sse_ucomige_ss: - case Intrinsic::x86_sse_ucomineq_ss: - case Intrinsic::x86_sse2_comieq_sd: case Intrinsic::x86_sse2_comilt_sd: + Opc = X86ISD::COMI; + CC = ISD::SETLT; + break; + case Intrinsic::x86_sse_comile_ss: case Intrinsic::x86_sse2_comile_sd: + Opc = X86ISD::COMI; + CC = ISD::SETLE; + break; + case Intrinsic::x86_sse_comigt_ss: case Intrinsic::x86_sse2_comigt_sd: + Opc = X86ISD::COMI; + CC = ISD::SETGT; + break; + case Intrinsic::x86_sse_comige_ss: case Intrinsic::x86_sse2_comige_sd: + Opc = X86ISD::COMI; + CC = ISD::SETGE; + break; + case Intrinsic::x86_sse_comineq_ss: case Intrinsic::x86_sse2_comineq_sd: + Opc = X86ISD::COMI; + CC = ISD::SETNE; + break; + case Intrinsic::x86_sse_ucomieq_ss: case Intrinsic::x86_sse2_ucomieq_sd: + Opc = X86ISD::UCOMI; + CC = ISD::SETEQ; + break; + case Intrinsic::x86_sse_ucomilt_ss: case Intrinsic::x86_sse2_ucomilt_sd: + Opc = X86ISD::UCOMI; + CC = ISD::SETLT; + break; + case Intrinsic::x86_sse_ucomile_ss: case Intrinsic::x86_sse2_ucomile_sd: + Opc = X86ISD::UCOMI; + CC = ISD::SETLE; + break; + case Intrinsic::x86_sse_ucomigt_ss: case Intrinsic::x86_sse2_ucomigt_sd: + Opc = X86ISD::UCOMI; + CC = ISD::SETGT; + break; + case Intrinsic::x86_sse_ucomige_ss: case Intrinsic::x86_sse2_ucomige_sd: - case Intrinsic::x86_sse2_ucomineq_sd: { - unsigned Opc = 0; - ISD::CondCode CC = ISD::SETCC_INVALID; - switch (IntNo) { - default: break; - case Intrinsic::x86_sse_comieq_ss: - case Intrinsic::x86_sse2_comieq_sd: - Opc = X86ISD::COMI; - CC = ISD::SETEQ; - break; - case Intrinsic::x86_sse_comilt_ss: - case Intrinsic::x86_sse2_comilt_sd: - Opc = X86ISD::COMI; - CC = ISD::SETLT; - break; - case Intrinsic::x86_sse_comile_ss: - case Intrinsic::x86_sse2_comile_sd: - Opc = X86ISD::COMI; - CC = ISD::SETLE; - break; - case Intrinsic::x86_sse_comigt_ss: - case Intrinsic::x86_sse2_comigt_sd: - Opc = X86ISD::COMI; - CC = ISD::SETGT; - break; - case Intrinsic::x86_sse_comige_ss: - case Intrinsic::x86_sse2_comige_sd: - Opc = X86ISD::COMI; - CC = ISD::SETGE; - break; - case Intrinsic::x86_sse_comineq_ss: - case Intrinsic::x86_sse2_comineq_sd: - Opc = X86ISD::COMI; - CC = ISD::SETNE; - break; - case Intrinsic::x86_sse_ucomieq_ss: - case Intrinsic::x86_sse2_ucomieq_sd: - Opc = X86ISD::UCOMI; - CC = ISD::SETEQ; - break; - case Intrinsic::x86_sse_ucomilt_ss: - case Intrinsic::x86_sse2_ucomilt_sd: - Opc = X86ISD::UCOMI; - CC = ISD::SETLT; - break; - case Intrinsic::x86_sse_ucomile_ss: - case Intrinsic::x86_sse2_ucomile_sd: - Opc = X86ISD::UCOMI; - CC = ISD::SETLE; - break; - case Intrinsic::x86_sse_ucomigt_ss: - case Intrinsic::x86_sse2_ucomigt_sd: - Opc = X86ISD::UCOMI; - CC = ISD::SETGT; - break; - case Intrinsic::x86_sse_ucomige_ss: - case Intrinsic::x86_sse2_ucomige_sd: - Opc = X86ISD::UCOMI; - CC = ISD::SETGE; - break; - case Intrinsic::x86_sse_ucomineq_ss: - case Intrinsic::x86_sse2_ucomineq_sd: - Opc = X86ISD::UCOMI; - CC = ISD::SETNE; - break; - } - bool Flip; - unsigned X86CC; - translateX86CC(CC, true, X86CC, Flip); - SDOperand Cond = DAG.getNode(Opc, MVT::Flag, Op.getOperand(Flip?2:1), - Op.getOperand(Flip?1:2)); - SDOperand SetCC = DAG.getNode(X86ISD::SETCC, MVT::i8, - DAG.getConstant(X86CC, MVT::i8), Cond); - return DAG.getNode(ISD::ANY_EXTEND, MVT::i32, SetCC); - } + Opc = X86ISD::UCOMI; + CC = ISD::SETGE; + break; + case Intrinsic::x86_sse_ucomineq_ss: + case Intrinsic::x86_sse2_ucomineq_sd: + Opc = X86ISD::UCOMI; + CC = ISD::SETNE; + break; } + bool Flip; + unsigned X86CC; + translateX86CC(CC, true, X86CC, Flip); + SDOperand Cond = DAG.getNode(Opc, MVT::Flag, Op.getOperand(Flip?2:1), + Op.getOperand(Flip?1:2)); + SDOperand SetCC = DAG.getNode(X86ISD::SETCC, MVT::i8, + DAG.getConstant(X86CC, MVT::i8), Cond); + return DAG.getNode(ISD::ANY_EXTEND, MVT::i32, SetCC); } } } +/// LowerOperation - Provide custom lowering hooks for some operations. +/// +SDOperand X86TargetLowering::LowerOperation(SDOperand Op, SelectionDAG &DAG) { + switch (Op.getOpcode()) { + default: assert(0 && "Should not custom lower this!"); + case ISD::BUILD_VECTOR: return LowerBUILD_VECTOR(Op, DAG); + case ISD::VECTOR_SHUFFLE: return LowerVECTOR_SHUFFLE(Op, DAG); + case ISD::EXTRACT_VECTOR_ELT: return LowerEXTRACT_VECTOR_ELT(Op, DAG); + case ISD::INSERT_VECTOR_ELT: return LowerINSERT_VECTOR_ELT(Op, DAG); + case ISD::SCALAR_TO_VECTOR: return LowerSCALAR_TO_VECTOR(Op, DAG); + case ISD::ConstantPool: return LowerConstantPool(Op, DAG); + case ISD::GlobalAddress: return LowerGlobalAddress(Op, DAG); + case ISD::ExternalSymbol: return LowerExternalSymbol(Op, DAG); + case ISD::SHL_PARTS: + case ISD::SRA_PARTS: + case ISD::SRL_PARTS: return LowerShift(Op, DAG); + case ISD::SINT_TO_FP: return LowerSINT_TO_FP(Op, DAG); + case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG); + case ISD::FABS: return LowerFABS(Op, DAG); + case ISD::FNEG: return LowerFNEG(Op, DAG); + case ISD::SETCC: return LowerSETCC(Op, DAG); + case ISD::SELECT: return LowerSELECT(Op, DAG); + case ISD::BRCOND: return LowerBRCOND(Op, DAG); + case ISD::JumpTable: return LowerJumpTable(Op, DAG); + case ISD::RET: return LowerRET(Op, DAG); + case ISD::MEMSET: return LowerMEMSET(Op, DAG); + case ISD::MEMCPY: return LowerMEMCPY(Op, DAG); + case ISD::READCYCLECOUNTER: return LowerREADCYCLCECOUNTER(Op, DAG); + case ISD::VASTART: return LowerVASTART(Op, DAG); + case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG); + } +} + const char *X86TargetLowering::getTargetNodeName(unsigned Opcode) const { switch (Opcode) { default: return NULL; Index: llvm/lib/Target/X86/X86ISelLowering.h diff -u llvm/lib/Target/X86/X86ISelLowering.h:1.58 llvm/lib/Target/X86/X86ISelLowering.h:1.59 --- llvm/lib/Target/X86/X86ISelLowering.h:1.58 Thu Apr 20 20:05:10 2006 +++ llvm/lib/Target/X86/X86ISelLowering.h Tue Apr 25 15:13:52 2006 @@ -339,6 +339,13 @@ MVT::ValueType EVT, SelectionDAG &DAG) const; private: + /// Subtarget - Keep a pointer to the X86Subtarget around so that we can + /// make the right decision when generating code for different targets. + const X86Subtarget *Subtarget; + + /// X86ScalarSSE - Select between SSE2 or x87 floating point ops. + bool X86ScalarSSE; + // C Calling Convention implementation. std::vector LowerCCCArguments(Function &F, SelectionDAG &DAG); std::pair @@ -352,12 +359,29 @@ LowerFastCCCallTo(SDOperand Chain, const Type *RetTy, bool isTailCall, SDOperand Callee, ArgListTy &Args, SelectionDAG &DAG); - /// Subtarget - Keep a pointer to the X86Subtarget around so that we can - /// make the right decision when generating code for different targets. - const X86Subtarget *Subtarget; - - /// X86ScalarSSE - Select between SSE2 or x87 floating point ops. - bool X86ScalarSSE; + SDOperand LowerBUILD_VECTOR(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerVECTOR_SHUFFLE(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerEXTRACT_VECTOR_ELT(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerINSERT_VECTOR_ELT(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerSCALAR_TO_VECTOR(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerConstantPool(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerGlobalAddress(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerExternalSymbol(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerShift(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerSINT_TO_FP(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerFP_TO_SINT(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerFABS(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerFNEG(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerSETCC(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerSELECT(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerBRCOND(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerMEMSET(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerMEMCPY(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerJumpTable(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerRET(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerREADCYCLCECOUNTER(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerVASTART(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerINTRINSIC_WO_CHAIN(SDOperand Op, SelectionDAG &DAG); }; } From natebegeman at mac.com Tue Apr 25 15:54:39 2006 From: natebegeman at mac.com (Nate Begeman) Date: Tue, 25 Apr 2006 15:54:39 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86JITInfo.cpp Message-ID: <200604252054.PAA13566@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86JITInfo.cpp updated: 1.15 -> 1.16 --- Log message: Keep the stack from on darwin 16-byte aligned. This fixes many JIT failres. --- Diffs of the changes: (+13 -2) X86JITInfo.cpp | 15 +++++++++++++-- 1 files changed, 13 insertions(+), 2 deletions(-) Index: llvm/lib/Target/X86/X86JITInfo.cpp diff -u llvm/lib/Target/X86/X86JITInfo.cpp:1.15 llvm/lib/Target/X86/X86JITInfo.cpp:1.16 --- llvm/lib/Target/X86/X86JITInfo.cpp:1.15 Thu Jan 26 13:55:20 2006 +++ llvm/lib/Target/X86/X86JITInfo.cpp Tue Apr 25 15:54:26 2006 @@ -59,8 +59,19 @@ "movl %esp, %ebp\n" // Standard prologue "pushl %eax\n" "pushl %edx\n" // save EAX/EDX -#if defined(__CYGWIN__) || defined(__APPLE__) - "call _X86CompilationCallback2\n" +#if defined(__CYGWIN__) + "call _X86CompilationCallback2\n" +#elif defined(__APPLE__) + "movl 4(%ebp), %eax\n" // load the address of return address + "movl $24, %edx\n" // if the opcode of the instruction at the + "cmpb $-51, (%eax)\n" // return address is our 0xCD marker, then + "movl $12, %eax\n" // subtract 24 from %esp to realign it to 16 + "cmovne %eax, %edx\n" // bytes after the push of edx, the amount to. + "subl %edx, %esp\n" // the push of edx to keep it aligned. + "pushl %edx\n" // subtract. Otherwise, subtract 12 bytes after + "call _X86CompilationCallback2\n" + "popl %edx\n" + "addl %edx, %esp\n" #else "call X86CompilationCallback2\n" #endif From evan.cheng at apple.com Tue Apr 25 18:03:48 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Tue, 25 Apr 2006 18:03:48 -0500 Subject: [llvm-commits] CVS: llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp Message-ID: <200604252303.SAA14269@zion.cs.uiuc.edu> Changes in directory llvm/lib/CodeGen/SelectionDAG: SelectionDAGISel.cpp updated: 1.226 -> 1.227 --- Log message: Don't forget return void. --- Diffs of the changes: (+3 -0) SelectionDAGISel.cpp | 3 +++ 1 files changed, 3 insertions(+) Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp diff -u llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1.226 llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1.227 --- llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1.226 Sun Apr 23 01:26:20 2006 +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp Tue Apr 25 18:03:35 2006 @@ -2385,6 +2385,9 @@ break; } } + + if (RetVals.size() == 0) + RetVals.push_back(MVT::isVoid); // Create the node. SDNode *Result = DAG.getNode(ISD::FORMAL_ARGUMENTS, RetVals, Ops).Val; From evan.cheng at apple.com Tue Apr 25 20:20:29 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Tue, 25 Apr 2006 20:20:29 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86ISelLowering.cpp X86ISelLowering.h Message-ID: <200604260120.UAA15065@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86ISelLowering.cpp updated: 1.187 -> 1.188 X86ISelLowering.h updated: 1.59 -> 1.60 --- Log message: Switching over FORMAL_ARGUMENTS mechanism to lower call arguments. --- Diffs of the changes: (+177 -80) X86ISelLowering.cpp | 245 +++++++++++++++++++++++++++++++++++----------------- X86ISelLowering.h | 12 ++ 2 files changed, 177 insertions(+), 80 deletions(-) Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.187 llvm/lib/Target/X86/X86ISelLowering.cpp:1.188 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.187 Tue Apr 25 15:13:52 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Tue Apr 25 20:20:17 2006 @@ -363,9 +363,16 @@ std::vector X86TargetLowering::LowerArguments(Function &F, SelectionDAG &DAG) { + std::vector Args = TargetLowering::LowerArguments(F, DAG); + + FormalArgs.clear(); + // This sets BytesToPopOnReturn, BytesCallerReserves, etc. which have to be set + // before the rest of the function can be lowered. if (F.getCallingConv() == CallingConv::Fast && EnableFastCC) - return LowerFastCCArguments(F, DAG); - return LowerCCCArguments(F, DAG); + PreprocessFastCCArguments(Args[0], F, DAG); + else + PreprocessCCCArguments(Args[0], F, DAG); + return Args; } std::pair @@ -393,10 +400,41 @@ // C Calling Convention implementation //===----------------------------------------------------------------------===// -std::vector -X86TargetLowering::LowerCCCArguments(Function &F, SelectionDAG &DAG) { - std::vector ArgValues; +void X86TargetLowering::PreprocessCCCArguments(SDOperand Op, Function &F, + SelectionDAG &DAG) { + unsigned NumArgs = Op.Val->getNumValues(); + MachineFunction &MF = DAG.getMachineFunction(); + MachineFrameInfo *MFI = MF.getFrameInfo(); + + unsigned ArgOffset = 0; // Frame mechanisms handle retaddr slot + for (unsigned i = 0; i < NumArgs; ++i) { + MVT::ValueType ObjectVT = Op.Val->getValueType(i); + unsigned ArgIncrement = 4; + unsigned ObjSize; + switch (ObjectVT) { + default: assert(0 && "Unhandled argument type!"); + case MVT::i1: + case MVT::i8: ObjSize = 1; break; + case MVT::i16: ObjSize = 2; break; + case MVT::i32: ObjSize = 4; break; + case MVT::i64: ObjSize = ArgIncrement = 8; break; + case MVT::f32: ObjSize = 4; break; + case MVT::f64: ObjSize = ArgIncrement = 8; break; + } + ArgOffset += ArgIncrement; // Move on to the next argument... + } + // If the function takes variable number of arguments, make a frame index for + // the start of the first vararg value... for expansion of llvm.va_start. + if (F.isVarArg()) + VarArgsFrameIndex = MFI->CreateFixedObject(1, ArgOffset); + ReturnAddrIndex = 0; // No return address slot generated yet. + BytesToPopOnReturn = 0; // Callee pops nothing. + BytesCallerReserves = ArgOffset; +} + +void X86TargetLowering::LowerCCCArguments(SDOperand Op, SelectionDAG &DAG) { + unsigned NumArgs = Op.Val->getNumValues(); MachineFunction &MF = DAG.getMachineFunction(); MachineFrameInfo *MFI = MF.getFrameInfo(); @@ -409,8 +447,8 @@ // ... // unsigned ArgOffset = 0; // Frame mechanisms handle retaddr slot - for (Function::arg_iterator I = F.arg_begin(), E = F.arg_end(); I != E; ++I) { - MVT::ValueType ObjectVT = getValueType(I->getType()); + for (unsigned i = 0; i < NumArgs; ++i) { + MVT::ValueType ObjectVT = Op.Val->getValueType(i); unsigned ArgIncrement = 4; unsigned ObjSize; switch (ObjectVT) { @@ -429,31 +467,11 @@ // Create the SelectionDAG nodes corresponding to a load from this parameter SDOperand FIN = DAG.getFrameIndex(FI, MVT::i32); - // Don't codegen dead arguments. FIXME: remove this check when we can nuke - // dead loads. - SDOperand ArgValue; - if (!I->use_empty()) - ArgValue = DAG.getLoad(ObjectVT, DAG.getEntryNode(), FIN, - DAG.getSrcValue(NULL)); - else { - if (MVT::isInteger(ObjectVT)) - ArgValue = DAG.getConstant(0, ObjectVT); - else - ArgValue = DAG.getConstantFP(0, ObjectVT); - } - ArgValues.push_back(ArgValue); - + SDOperand ArgValue = DAG.getLoad(ObjectVT, DAG.getEntryNode(), FIN, + DAG.getSrcValue(NULL)); + FormalArgs.push_back(ArgValue); ArgOffset += ArgIncrement; // Move on to the next argument... } - - // If the function takes variable number of arguments, make a frame index for - // the start of the first vararg value... for expansion of llvm.va_start. - if (F.isVarArg()) - VarArgsFrameIndex = MFI->CreateFixedObject(1, ArgOffset); - ReturnAddrIndex = 0; // No return address slot generated yet. - BytesToPopOnReturn = 0; // Callee pops nothing. - BytesCallerReserves = ArgOffset; - return ArgValues; } std::pair @@ -697,10 +715,103 @@ static unsigned FASTCC_NUM_INT_ARGS_INREGS = 0; -std::vector -X86TargetLowering::LowerFastCCArguments(Function &F, SelectionDAG &DAG) { - std::vector ArgValues; +void +X86TargetLowering::PreprocessFastCCArguments(SDOperand Op, Function &F, + SelectionDAG &DAG) { + unsigned NumArgs = Op.Val->getNumValues(); + MachineFunction &MF = DAG.getMachineFunction(); + MachineFrameInfo *MFI = MF.getFrameInfo(); + + unsigned ArgOffset = 0; // Frame mechanisms handle retaddr slot + + // Keep track of the number of integer regs passed so far. This can be either + // 0 (neither EAX or EDX used), 1 (EAX is used) or 2 (EAX and EDX are both + // used). + unsigned NumIntRegs = 0; + + for (unsigned i = 0; i < NumArgs; ++i) { + MVT::ValueType ObjectVT = Op.Val->getValueType(i); + unsigned ArgIncrement = 4; + unsigned ObjSize = 0; + SDOperand ArgValue; + switch (ObjectVT) { + default: assert(0 && "Unhandled argument type!"); + case MVT::i1: + case MVT::i8: + if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) { + ++NumIntRegs; + break; + } + + ObjSize = 1; + break; + case MVT::i16: + if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) { + ++NumIntRegs; + break; + } + ObjSize = 2; + break; + case MVT::i32: + if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) { + ++NumIntRegs; + break; + } + ObjSize = 4; + break; + case MVT::i64: + if (NumIntRegs+2 <= FASTCC_NUM_INT_ARGS_INREGS) { + NumIntRegs += 2; + break; + } else if (NumIntRegs+1 <= FASTCC_NUM_INT_ARGS_INREGS) { + ArgOffset += 4; + NumIntRegs = FASTCC_NUM_INT_ARGS_INREGS; + break; + } + ObjSize = ArgIncrement = 8; + break; + case MVT::f32: ObjSize = 4; break; + case MVT::f64: ObjSize = ArgIncrement = 8; break; + } + + if (ObjSize) + ArgOffset += ArgIncrement; // Move on to the next argument. + } + + // Make sure the instruction takes 8n+4 bytes to make sure the start of the + // arguments and the arguments after the retaddr has been pushed are aligned. + if ((ArgOffset & 7) == 0) + ArgOffset += 4; + + VarArgsFrameIndex = 0xAAAAAAA; // fastcc functions can't have varargs. + ReturnAddrIndex = 0; // No return address slot generated yet. + BytesToPopOnReturn = ArgOffset; // Callee pops all stack arguments. + BytesCallerReserves = 0; + + // Finally, inform the code generator which regs we return values in. + switch (getValueType(F.getReturnType())) { + default: assert(0 && "Unknown type!"); + case MVT::isVoid: break; + case MVT::i1: + case MVT::i8: + case MVT::i16: + case MVT::i32: + MF.addLiveOut(X86::EAX); + break; + case MVT::i64: + MF.addLiveOut(X86::EAX); + MF.addLiveOut(X86::EDX); + break; + case MVT::f32: + case MVT::f64: + MF.addLiveOut(X86::ST0); + break; + } +} +void +X86TargetLowering::LowerFastCCArguments(SDOperand Op, SelectionDAG &DAG) { + unsigned NumArgs = Op.Val->getNumValues(); MachineFunction &MF = DAG.getMachineFunction(); MachineFrameInfo *MFI = MF.getFrameInfo(); @@ -718,18 +829,19 @@ // used). unsigned NumIntRegs = 0; - for (Function::arg_iterator I = F.arg_begin(), E = F.arg_end(); I != E; ++I) { - MVT::ValueType ObjectVT = getValueType(I->getType()); + for (unsigned i = 0; i < NumArgs; ++i) { + MVT::ValueType ObjectVT = Op.Val->getValueType(i); unsigned ArgIncrement = 4; unsigned ObjSize = 0; SDOperand ArgValue; + bool hasUse = !Op.Val->hasNUsesOfValue(0, i); switch (ObjectVT) { default: assert(0 && "Unhandled argument type!"); case MVT::i1: case MVT::i8: if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) { - if (!I->use_empty()) { + if (hasUse) { unsigned VReg = AddLiveIn(MF, NumIntRegs ? X86::DL : X86::AL, X86::R8RegisterClass); ArgValue = DAG.getCopyFromReg(DAG.getRoot(), VReg, MVT::i8); @@ -746,7 +858,7 @@ break; case MVT::i16: if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) { - if (!I->use_empty()) { + if (hasUse) { unsigned VReg = AddLiveIn(MF, NumIntRegs ? X86::DX : X86::AX, X86::R16RegisterClass); ArgValue = DAG.getCopyFromReg(DAG.getRoot(), VReg, MVT::i16); @@ -759,7 +871,7 @@ break; case MVT::i32: if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) { - if (!I->use_empty()) { + if (hasUse) { unsigned VReg = AddLiveIn(MF, NumIntRegs ? X86::EDX : X86::EAX, X86::R32RegisterClass); ArgValue = DAG.getCopyFromReg(DAG.getRoot(), VReg, MVT::i32); @@ -772,7 +884,7 @@ break; case MVT::i64: if (NumIntRegs+2 <= FASTCC_NUM_INT_ARGS_INREGS) { - if (!I->use_empty()) { + if (hasUse) { unsigned BotReg = AddLiveIn(MF, X86::EAX, X86::R32RegisterClass); unsigned TopReg = AddLiveIn(MF, X86::EDX, X86::R32RegisterClass); @@ -785,7 +897,7 @@ NumIntRegs += 2; break; } else if (NumIntRegs+1 <= FASTCC_NUM_INT_ARGS_INREGS) { - if (!I->use_empty()) { + if (hasUse) { unsigned BotReg = AddLiveIn(MF, X86::EDX, X86::R32RegisterClass); SDOperand Low = DAG.getCopyFromReg(DAG.getRoot(), BotReg, MVT::i32); DAG.setRoot(Low.getValue(1)); @@ -808,9 +920,7 @@ case MVT::f64: ObjSize = ArgIncrement = 8; break; } - // Don't codegen dead arguments. FIXME: remove this check when we can nuke - // dead loads. - if (ObjSize && !I->use_empty()) { + if (ObjSize) { // Create the frame index object for this incoming parameter... int FI = MFI->CreateFixedObject(ObjSize, ArgOffset); @@ -826,42 +936,8 @@ else ArgValue = DAG.getConstantFP(0, ObjectVT); } - ArgValues.push_back(ArgValue); - - if (ObjSize) - ArgOffset += ArgIncrement; // Move on to the next argument. - } - - // Make sure the instruction takes 8n+4 bytes to make sure the start of the - // arguments and the arguments after the retaddr has been pushed are aligned. - if ((ArgOffset & 7) == 0) - ArgOffset += 4; - - VarArgsFrameIndex = 0xAAAAAAA; // fastcc functions can't have varargs. - ReturnAddrIndex = 0; // No return address slot generated yet. - BytesToPopOnReturn = ArgOffset; // Callee pops all stack arguments. - BytesCallerReserves = 0; - - // Finally, inform the code generator which regs we return values in. - switch (getValueType(F.getReturnType())) { - default: assert(0 && "Unknown type!"); - case MVT::isVoid: break; - case MVT::i1: - case MVT::i8: - case MVT::i16: - case MVT::i32: - MF.addLiveOut(X86::EAX); - break; - case MVT::i64: - MF.addLiveOut(X86::EAX); - MF.addLiveOut(X86::EDX); - break; - case MVT::f32: - case MVT::f64: - MF.addLiveOut(X86::ST0); - break; + FormalArgs.push_back(ArgValue); } - return ArgValues; } std::pair @@ -3231,6 +3307,18 @@ Copy.getValue(1)); } +SDOperand +X86TargetLowering::LowerFORMAL_ARGUMENTS(SDOperand Op, SelectionDAG &DAG) { + if (FormalArgs.size() == 0) { + unsigned CC = cast(Op.getOperand(0))->getValue(); + if (CC == CallingConv::Fast && EnableFastCC) + LowerFastCCArguments(Op, DAG); + else + LowerCCCArguments(Op, DAG); + } + return FormalArgs[Op.ResNo]; +} + SDOperand X86TargetLowering::LowerMEMSET(SDOperand Op, SelectionDAG &DAG) { SDOperand InFlag(0, 0); SDOperand Chain = Op.getOperand(0); @@ -3645,6 +3733,7 @@ case ISD::BRCOND: return LowerBRCOND(Op, DAG); case ISD::JumpTable: return LowerJumpTable(Op, DAG); case ISD::RET: return LowerRET(Op, DAG); + case ISD::FORMAL_ARGUMENTS: return LowerFORMAL_ARGUMENTS(Op, DAG); case ISD::MEMSET: return LowerMEMSET(Op, DAG); case ISD::MEMCPY: return LowerMEMCPY(Op, DAG); case ISD::READCYCLECOUNTER: return LowerREADCYCLCECOUNTER(Op, DAG); Index: llvm/lib/Target/X86/X86ISelLowering.h diff -u llvm/lib/Target/X86/X86ISelLowering.h:1.59 llvm/lib/Target/X86/X86ISelLowering.h:1.60 --- llvm/lib/Target/X86/X86ISelLowering.h:1.59 Tue Apr 25 15:13:52 2006 +++ llvm/lib/Target/X86/X86ISelLowering.h Tue Apr 25 20:20:17 2006 @@ -346,15 +346,22 @@ /// X86ScalarSSE - Select between SSE2 or x87 floating point ops. bool X86ScalarSSE; + /// Formal arguments lowered to load and CopyFromReg ops. + std::vector FormalArgs; + // C Calling Convention implementation. - std::vector LowerCCCArguments(Function &F, SelectionDAG &DAG); + void PreprocessCCCArguments(SDOperand Op, Function &F, SelectionDAG &DAG); + void LowerCCCArguments(SDOperand Op, SelectionDAG &DAG); std::pair LowerCCCCallTo(SDOperand Chain, const Type *RetTy, bool isVarArg, bool isTailCall, SDOperand Callee, ArgListTy &Args, SelectionDAG &DAG); // Fast Calling Convention implementation. - std::vector LowerFastCCArguments(Function &F, SelectionDAG &DAG); + void + PreprocessFastCCArguments(SDOperand Op, Function &F, SelectionDAG &DAG); + void + LowerFastCCArguments(SDOperand Op, SelectionDAG &DAG); std::pair LowerFastCCCallTo(SDOperand Chain, const Type *RetTy, bool isTailCall, SDOperand Callee, ArgListTy &Args, SelectionDAG &DAG); @@ -379,6 +386,7 @@ SDOperand LowerMEMCPY(SDOperand Op, SelectionDAG &DAG); SDOperand LowerJumpTable(SDOperand Op, SelectionDAG &DAG); SDOperand LowerRET(SDOperand Op, SelectionDAG &DAG); + SDOperand LowerFORMAL_ARGUMENTS(SDOperand Op, SelectionDAG &DAG); SDOperand LowerREADCYCLCECOUNTER(SDOperand Op, SelectionDAG &DAG); SDOperand LowerVASTART(SDOperand Op, SelectionDAG &DAG); SDOperand LowerINTRINSIC_WO_CHAIN(SDOperand Op, SelectionDAG &DAG); From lattner at cs.uiuc.edu Tue Apr 25 23:46:23 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Tue, 25 Apr 2006 23:46:23 -0500 Subject: [llvm-commits] CVS: llvm-www/pubs/2006-04-25-GelatoLLVMIntro.html 2006-04-25-GelatoLLVMIntro.pdf index.html Message-ID: <200604260446.XAA16246@zion.cs.uiuc.edu> Changes in directory llvm-www/pubs: 2006-04-25-GelatoLLVMIntro.html added (r1.1) 2006-04-25-GelatoLLVMIntro.pdf added (r1.1) index.html updated: 1.36 -> 1.37 --- Log message: Add a new presentation --- Diffs of the changes: (+42 -0) 2006-04-25-GelatoLLVMIntro.html | 38 ++++++++++++++++++++++++++++++++++++++ 2006-04-25-GelatoLLVMIntro.pdf | 0 index.html | 4 ++++ 3 files changed, 42 insertions(+) Index: llvm-www/pubs/2006-04-25-GelatoLLVMIntro.html diff -c /dev/null llvm-www/pubs/2006-04-25-GelatoLLVMIntro.html:1.1 *** /dev/null Tue Apr 25 23:46:21 2006 --- llvm-www/pubs/2006-04-25-GelatoLLVMIntro.html Tue Apr 25 23:46:11 2006 *************** *** 0 **** --- 1,38 ---- + + + + + + Introduction to the LLVM Compiler Infrastructure + + + +
+ Introduction to the LLVM Compiler Infrastructure +
+ + +

Abstract:

+
+ This invited talk gives a high-level overview of the LLVM Project, its + capabilities, features, progress, and the direction it is taking. It is + aimed at a GCC-centric audience, specifically to follow up a presentation + on the GCC Link-Time-Optimization proposal at the 2006 Gelato Itanium + Conference and Expo (ICE). +
+ +

Published:

+
+ "Introduction to the LLVM Compiler Infrastructure", Chris Lattner,
+ 2006 Itanium Conference and Expo, San Jose, California, April 2006.
+
+ +

Download Presentation:

+ + + + Index: llvm-www/pubs/2006-04-25-GelatoLLVMIntro.pdf Index: llvm-www/pubs/index.html diff -u llvm-www/pubs/index.html:1.36 llvm-www/pubs/index.html:1.37 --- llvm-www/pubs/index.html:1.36 Tue Apr 4 13:25:09 2006 +++ llvm-www/pubs/index.html Tue Apr 25 23:46:11 2006 @@ -3,6 +3,10 @@
    +
  1. "Introduction to the LLVM Compiler Infrastructure"
    Chris Lattner
    + 2006 Itanium Conference and Expo, San Jose, California, April 2006.
  2. + +
  3. "The LLVM Compiler Framework and Infrastructure Tutorial"
    Chris Lattner and Vikram Adve
    LCPC'04 Mini Workshop on Compiler Research Infrastructures, West Lafayette, Indiana, Sep. 2004.
  4. From reid at x10sys.com Wed Apr 26 09:52:43 2006 From: reid at x10sys.com (Reid Spencer) Date: Wed, 26 Apr 2006 09:52:43 -0500 Subject: [llvm-commits] CVS: llvm/docs/FAQ.html Message-ID: <200604261452.JAA27980@zion.cs.uiuc.edu> Changes in directory llvm/docs: FAQ.html updated: 1.31 -> 1.32 --- Log message: Add some notes about the current state of source (front end) languages so we can point to them on llvm-dev. --- Diffs of the changes: (+44 -1) FAQ.html | 45 ++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 44 insertions(+), 1 deletion(-) Index: llvm/docs/FAQ.html diff -u llvm/docs/FAQ.html:1.31 llvm/docs/FAQ.html:1.32 --- llvm/docs/FAQ.html:1.31 Mon Mar 13 23:39:39 2006 +++ llvm/docs/FAQ.html Wed Apr 26 09:52:19 2006 @@ -55,6 +55,13 @@ target".
+
  • Source Languages +
      +
    1. What source languages are supported?
    2. +
    3. What support is there for higher level source + language constructs for building a compiler?
    4. +
    +
  • Using the GCC Front End
    1. @@ -385,6 +392,42 @@ + +

      What source languages are supported?

      +
      +

      LLVM currently has full support for C and C++ source languages. These are + available through a special version of GCC that LLVM calls the + C Front End

      +

      There is an incomplete version of a Java front end available in the + llvm-java CVS repository. There is no documentation on this yet so + you'll need to download the code, compile it, and try it.

      +

      In the examples/BFtoLLVM directory is a translator for the + BrainF*** language (2002 Language Specification).

      +

      In the projects/Stacker directory is a compiler and runtime + library for the Stacker language, a "toy" language loosely based on Forth.

      +

      The PyPy developers are working on integrating LLVM into the PyPy backend + so that PyPy language can translate to LLVM.

      +
      +
      +

      What support is there for a higher level source language constructs for + building a compiler?

      +
      +
      +

      Currently, there isn't much. LLVM supports an intermediate representation + which is useful for code representation but will not support the high level + (abstract syntax tree) representation needed by most compilers. There are no + facilities for lexical nor semantica analysis. There is, however, a mostly + implemented configuration-driven + compiler driver which simplifies the task + of running optimizations, linking, and executable generation.

      +

      You might be interested in following the progress of the HLVM Project which is attempting to address these + issues.

      +
      + + @@ -574,7 +617,7 @@ src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"> LLVM Compiler Infrastructure
      - Last modified: $Date: 2006/03/14 05:39:39 $ + Last modified: $Date: 2006/04/26 14:52:19 $ From reid at x10sys.com Wed Apr 26 10:47:06 2006 From: reid at x10sys.com (Reid Spencer) Date: Wed, 26 Apr 2006 10:47:06 -0500 Subject: [llvm-commits] CVS: llvm/docs/FAQ.html Message-ID: <200604261547.KAA28461@zion.cs.uiuc.edu> Changes in directory llvm/docs: FAQ.html updated: 1.32 -> 1.33 --- Log message: Fix some anchors. --- Diffs of the changes: (+8 -8) FAQ.html | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-) Index: llvm/docs/FAQ.html diff -u llvm/docs/FAQ.html:1.32 llvm/docs/FAQ.html:1.33 --- llvm/docs/FAQ.html:1.32 Wed Apr 26 09:52:19 2006 +++ llvm/docs/FAQ.html Wed Apr 26 10:46:53 2006 @@ -391,12 +391,12 @@
  • -
    - Source Languages -
    +
    Source Languages
    -

    What source languages are supported?

    -
    +

    + What source languages are supported?

    +
    +

    LLVM currently has full support for C and C++ source languages. These are available through a special version of GCC that LLVM calls the C Front End

    @@ -410,9 +410,9 @@

    The PyPy developers are working on integrating LLVM into the PyPy backend so that PyPy language can translate to LLVM.

    -
    +

    Currently, there isn't much. LLVM supports an intermediate representation @@ -617,7 +617,7 @@ src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"> LLVM Compiler Infrastructure
    - Last modified: $Date: 2006/04/26 14:52:19 $ + Last modified: $Date: 2006/04/26 15:46:53 $ From jeffc at jolt-lang.org Wed Apr 26 13:05:38 2006 From: jeffc at jolt-lang.org (Jeff Cohen) Date: Wed, 26 Apr 2006 13:05:38 -0500 Subject: [llvm-commits] CVS: llvm/docs/FAQ.html Message-ID: <200604261805.NAA32219@zion.cs.uiuc.edu> Changes in directory llvm/docs: FAQ.html updated: 1.33 -> 1.34 --- Log message: Fix typo. --- Diffs of the changes: (+2 -2) FAQ.html | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) Index: llvm/docs/FAQ.html diff -u llvm/docs/FAQ.html:1.33 llvm/docs/FAQ.html:1.34 --- llvm/docs/FAQ.html:1.33 Wed Apr 26 10:46:53 2006 +++ llvm/docs/FAQ.html Wed Apr 26 13:05:25 2006 @@ -418,7 +418,7 @@

    Currently, there isn't much. LLVM supports an intermediate representation which is useful for code representation but will not support the high level (abstract syntax tree) representation needed by most compilers. There are no - facilities for lexical nor semantica analysis. There is, however, a mostly + facilities for lexical nor semantical analysis. There is, however, a mostly implemented configuration-driven compiler driver which simplifies the task of running optimizations, linking, and executable generation.

    @@ -617,7 +617,7 @@ src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"> LLVM Compiler Infrastructure
    - Last modified: $Date: 2006/04/26 15:46:53 $ + Last modified: $Date: 2006/04/26 18:05:25 $ From lattner at cs.uiuc.edu Wed Apr 26 13:11:12 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Wed, 26 Apr 2006 13:11:12 -0500 Subject: [llvm-commits] CVS: llvm/docs/FAQ.html Message-ID: <200604261811.NAA32558@zion.cs.uiuc.edu> Changes in directory llvm/docs: FAQ.html updated: 1.34 -> 1.35 --- Log message: Unfortunately this really isn't the place for advertisement. --- Diffs of the changes: (+1 -4) FAQ.html | 5 +---- 1 files changed, 1 insertion(+), 4 deletions(-) Index: llvm/docs/FAQ.html diff -u llvm/docs/FAQ.html:1.34 llvm/docs/FAQ.html:1.35 --- llvm/docs/FAQ.html:1.34 Wed Apr 26 13:05:25 2006 +++ llvm/docs/FAQ.html Wed Apr 26 13:10:59 2006 @@ -422,9 +422,6 @@ implemented
    configuration-driven compiler driver which simplifies the task of running optimizations, linking, and executable generation.

    -

    You might be interested in following the progress of the HLVM Project which is attempting to address these - issues.

    @@ -617,7 +614,7 @@ src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"> LLVM Compiler Infrastructure
    - Last modified: $Date: 2006/04/26 18:05:25 $ + Last modified: $Date: 2006/04/26 18:10:59 $ From evan.cheng at apple.com Wed Apr 26 13:21:44 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Wed, 26 Apr 2006 13:21:44 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86ISelLowering.cpp Message-ID: <200604261821.NAA32615@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86ISelLowering.cpp updated: 1.188 -> 1.189 --- Log message: Fix fastcc failures. --- Diffs of the changes: (+3 -0) X86ISelLowering.cpp | 3 +++ 1 files changed, 3 insertions(+) Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.188 llvm/lib/Target/X86/X86ISelLowering.cpp:1.189 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.188 Tue Apr 25 20:20:17 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Wed Apr 26 13:21:31 2006 @@ -937,6 +937,9 @@ ArgValue = DAG.getConstantFP(0, ObjectVT); } FormalArgs.push_back(ArgValue); + + if (ObjSize) + ArgOffset += ArgIncrement; // Move on to the next argument. } } From lattner at cs.uiuc.edu Wed Apr 26 13:33:12 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Wed, 26 Apr 2006 13:33:12 -0500 Subject: [llvm-commits] CVS: llvm/test/Regression/Transforms/IndVarsSimplify/complex-scev.ll Message-ID: <200604261833.NAA32709@zion.cs.uiuc.edu> Changes in directory llvm/test/Regression/Transforms/IndVarsSimplify: complex-scev.ll added (r1.1) --- Log message: new testcase --- Diffs of the changes: (+28 -0) complex-scev.ll | 28 ++++++++++++++++++++++++++++ 1 files changed, 28 insertions(+) Index: llvm/test/Regression/Transforms/IndVarsSimplify/complex-scev.ll diff -c /dev/null llvm/test/Regression/Transforms/IndVarsSimplify/complex-scev.ll:1.1 *** /dev/null Wed Apr 26 13:33:09 2006 --- llvm/test/Regression/Transforms/IndVarsSimplify/complex-scev.ll Wed Apr 26 13:32:59 2006 *************** *** 0 **** --- 1,28 ---- + ; The i induction variable looks like a wrap-around, but it really is just + ; a simple affine IV. Make sure that indvars eliminates it. + + ; RUN: llvm-as < %s | opt -indvars | llvm-dis | grep phi | wc -l | grep 1 + + void %foo() { + entry: + br label %bb6 + + bb6: ; preds = %cond_true, %entry + %j.0 = phi int [ 1, %entry ], [ %tmp5, %cond_true ] ; [#uses=3] + %i.0 = phi int [ 0, %entry ], [ %j.0, %cond_true ] ; [#uses=1] + %tmp7 = call int (...)* %foo2( ) ; [#uses=1] + %tmp = setne int %tmp7, 0 ; [#uses=1] + br bool %tmp, label %cond_true, label %return + + cond_true: ; preds = %bb6 + %tmp2 = call int (...)* %bar( int %i.0, int %j.0 ) ; [#uses=0] + %tmp5 = add int %j.0, 1 ; [#uses=1] + br label %bb6 + + return: ; preds = %bb6 + ret void + } + + declare int %bar(...) + + declare int %foo2(...) From lattner at cs.uiuc.edu Wed Apr 26 13:34:20 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Wed, 26 Apr 2006 13:34:20 -0500 Subject: [llvm-commits] CVS: llvm/lib/Analysis/ScalarEvolution.cpp Message-ID: <200604261834.NAA00313@zion.cs.uiuc.edu> Changes in directory llvm/lib/Analysis: ScalarEvolution.cpp updated: 1.47 -> 1.48 --- Log message: Implement Transforms/IndVarsSimplify/complex-scev.ll, a case where we didn't recognize some simple affine IV's. --- Diffs of the changes: (+25 -0) ScalarEvolution.cpp | 25 +++++++++++++++++++++++++ 1 files changed, 25 insertions(+) Index: llvm/lib/Analysis/ScalarEvolution.cpp diff -u llvm/lib/Analysis/ScalarEvolution.cpp:1.47 llvm/lib/Analysis/ScalarEvolution.cpp:1.48 --- llvm/lib/Analysis/ScalarEvolution.cpp:1.47 Tue Apr 4 01:11:42 2006 +++ llvm/lib/Analysis/ScalarEvolution.cpp Wed Apr 26 13:34:07 2006 @@ -1304,6 +1304,31 @@ return PHISCEV; } } + } else if (SCEVAddRecExpr *AddRec = dyn_cast(BEValue)) { + // Otherwise, this could be a loop like this: + // i = 0; for (j = 1; ..; ++j) { .... i = j; } + // In this case, j = {1,+,1} and BEValue is j. + // Because the other in-value of i (0) fits the evolution of BEValue + // i really is an addrec evolution. + if (AddRec->getLoop() == L && AddRec->isAffine()) { + SCEVHandle StartVal = getSCEV(PN->getIncomingValue(IncomingEdge)); + + // If StartVal = j.start - j.stride, we can use StartVal as the + // initial step of the addrec evolution. + if (StartVal == SCEV::getMinusSCEV(AddRec->getOperand(0), + AddRec->getOperand(1))) { + SCEVHandle PHISCEV = + SCEVAddRecExpr::get(StartVal, AddRec->getOperand(1), L); + + // Okay, for the entire analysis of this edge we assumed the PHI + // to be symbolic. We now need to go back and update all of the + // entries for the scalars that use the PHI (except for the PHI + // itself) to use the new analyzed value instead of the "symbolic" + // value. + ReplaceSymbolicValueWithConcrete(PN, SymbolicName, PHISCEV); + return PHISCEV; + } + } } return SymbolicName; From jeffc at jolt-lang.org Wed Apr 26 16:03:30 2006 From: jeffc at jolt-lang.org (Jeff Cohen) Date: Wed, 26 Apr 2006 16:03:30 -0500 Subject: [llvm-commits] CVS: llvm/docs/FAQ.html Message-ID: <200604262103.QAA01504@zion.cs.uiuc.edu> Changes in directory llvm/docs: FAQ.html updated: 1.35 -> 1.36 --- Log message: Actually, semantical doesn't appear to be a word. --- Diffs of the changes: (+2 -2) FAQ.html | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) Index: llvm/docs/FAQ.html diff -u llvm/docs/FAQ.html:1.35 llvm/docs/FAQ.html:1.36 --- llvm/docs/FAQ.html:1.35 Wed Apr 26 13:10:59 2006 +++ llvm/docs/FAQ.html Wed Apr 26 16:03:17 2006 @@ -418,7 +418,7 @@

    Currently, there isn't much. LLVM supports an intermediate representation which is useful for code representation but will not support the high level (abstract syntax tree) representation needed by most compilers. There are no - facilities for lexical nor semantical analysis. There is, however, a mostly + facilities for lexical nor semantic analysis. There is, however, a mostly implemented configuration-driven compiler driver which simplifies the task of running optimizations, linking, and executable generation.

    @@ -614,7 +614,7 @@ src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"> LLVM Compiler Infrastructure
    - Last modified: $Date: 2006/04/26 18:10:59 $ + Last modified: $Date: 2006/04/26 21:03:17 $ From lattner at cs.uiuc.edu Wed Apr 26 20:14:58 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Wed, 26 Apr 2006 20:14:58 -0500 Subject: [llvm-commits] CVS: llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp Message-ID: <200604270114.UAA20007@zion.cs.uiuc.edu> Changes in directory llvm/lib/Transforms/Utils: PromoteMemoryToRegister.cpp updated: 1.82 -> 1.83 --- Log message: Fix some nondeterminstic behavior in the mem2reg pass that (in addition to nondeterminism being bad) could cause some trivial missed optimizations (dead phi nodes being left around for later passes to clean up). With this, llvm-gcc4 now bootstraps and correctly compares. I don't know why I never tried to do it before... :) --- Diffs of the changes: (+38 -20) PromoteMemoryToRegister.cpp | 58 ++++++++++++++++++++++++++++---------------- 1 files changed, 38 insertions(+), 20 deletions(-) Index: llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp diff -u llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp:1.82 llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp:1.83 --- llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp:1.82 Fri Nov 18 01:31:42 2005 +++ llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp Wed Apr 26 20:14:43 2006 @@ -147,7 +147,7 @@ if (AI->use_empty()) { // If there are no uses of the alloca, just delete it now. if (AST) AST->deleteValue(AI); - AI->getParent()->getInstList().erase(AI); + AI->eraseFromParent(); // Remove the alloca from the Allocas list, since it has been processed Allocas[AllocaNum] = Allocas.back(); @@ -331,7 +331,7 @@ if (AST && isa(PN->getType())) AST->deleteValue(PN); - PN->getParent()->getInstList().erase(PN); + PN->eraseFromParent(); } // Keep the reverse mapping of the 'Allocas' array. @@ -377,7 +377,7 @@ // The renamer uses the Visited set to avoid infinite loops. Clear it now. Visited.clear(); - // Remove the allocas themselves from the function... + // Remove the allocas themselves from the function. for (unsigned i = 0, e = Allocas.size(); i != e; ++i) { Instruction *A = Allocas[i]; @@ -388,9 +388,41 @@ if (!A->use_empty()) A->replaceAllUsesWith(UndefValue::get(A->getType())); if (AST) AST->deleteValue(A); - A->getParent()->getInstList().erase(A); + A->eraseFromParent(); } + + // Loop over all of the PHI nodes and see if there are any that we can get + // rid of because they merge all of the same incoming values. This can + // happen due to undef values coming into the PHI nodes. This process is + // iterative, because eliminating one PHI node can cause others to be removed. + bool EliminatedAPHI = true; + while (EliminatedAPHI) { + EliminatedAPHI = false; + + for (std::map >::iterator I = + NewPhiNodes.begin(), E = NewPhiNodes.end(); I != E; ++I) { + std::vector &PNs = I->second; + for (unsigned i = 0, e = PNs.size(); i != e; ++i) { + if (!PNs[i]) continue; + + // If this PHI node merges one value and/or undefs, get the value. + if (Value *V = PNs[i]->hasConstantValue(true)) { + if (!isa(V) || + properlyDominates(cast(V), PNs[i])) { + if (AST && isa(PNs[i]->getType())) + AST->deleteValue(PNs[i]); + PNs[i]->replaceAllUsesWith(V); + PNs[i]->eraseFromParent(); + PNs[i] = 0; + EliminatedAPHI = true; + continue; + } + } + } + } + } + // At this point, the renamer has added entries to PHI nodes for all reachable // code. Unfortunately, there may be blocks which are not reachable, which // the renamer hasn't traversed. If this is the case, the PHI nodes may not @@ -403,25 +435,11 @@ std::vector Preds(pred_begin(I->first), pred_end(I->first)); std::vector &PNs = I->second; assert(!PNs.empty() && "Empty PHI node list??"); - - // Loop over all of the PHI nodes and see if there are any that we can get - // rid of because they merge all of the same incoming values. This can - // happen due to undef values coming into the PHI nodes. PHINode *SomePHI = 0; for (unsigned i = 0, e = PNs.size(); i != e; ++i) if (PNs[i]) { - if (Value *V = PNs[i]->hasConstantValue(true)) { - if (!isa(V) || - properlyDominates(cast(V), PNs[i])) { - if (AST && isa(PNs[i]->getType())) - AST->deleteValue(PNs[i]); - PNs[i]->replaceAllUsesWith(V); - PNs[i]->eraseFromParent(); - PNs[i] = 0; - } - } - if (PNs[i]) - SomePHI = PNs[i]; + SomePHI = PNs[i]; + break; } // Only do work here if there the PHI nodes are missing incoming values. We From evan.cheng at apple.com Wed Apr 26 20:32:36 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Wed, 26 Apr 2006 20:32:36 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86ISelLowering.cpp X86ISelLowering.h Message-ID: <200604270132.UAA20811@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86ISelLowering.cpp updated: 1.189 -> 1.190 X86ISelLowering.h updated: 1.60 -> 1.61 --- Log message: - Clean up formal argument lowering code. Prepare for vector pass by value work. - Fixed vararg support. --- Diffs of the changes: (+237 -215) X86ISelLowering.cpp | 427 ++++++++++++++++++++++++++-------------------------- X86ISelLowering.h | 25 ++- 2 files changed, 237 insertions(+), 215 deletions(-) Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.189 llvm/lib/Target/X86/X86ISelLowering.cpp:1.190 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.189 Wed Apr 26 13:21:31 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Wed Apr 26 20:32:22 2006 @@ -366,12 +366,14 @@ std::vector Args = TargetLowering::LowerArguments(F, DAG); FormalArgs.clear(); + FormalArgLocs.clear(); + // This sets BytesToPopOnReturn, BytesCallerReserves, etc. which have to be set // before the rest of the function can be lowered. if (F.getCallingConv() == CallingConv::Fast && EnableFastCC) - PreprocessFastCCArguments(Args[0], F, DAG); + PreprocessFastCCArguments(Args, F, DAG); else - PreprocessCCCArguments(Args[0], F, DAG); + PreprocessCCCArguments(Args, F, DAG); return Args; } @@ -400,28 +402,74 @@ // C Calling Convention implementation //===----------------------------------------------------------------------===// -void X86TargetLowering::PreprocessCCCArguments(SDOperand Op, Function &F, - SelectionDAG &DAG) { - unsigned NumArgs = Op.Val->getNumValues(); +static unsigned getFormalArgSize(MVT::ValueType ObjectVT) { + unsigned ObjSize = 0; + switch (ObjectVT) { + default: assert(0 && "Unhandled argument type!"); + case MVT::i1: + case MVT::i8: ObjSize = 1; break; + case MVT::i16: ObjSize = 2; break; + case MVT::i32: ObjSize = 4; break; + case MVT::i64: ObjSize = 8; break; + case MVT::f32: ObjSize = 4; break; + case MVT::f64: ObjSize = 8; break; + } + return ObjSize; +} + +static std::vector getFormalArgObjects(SDOperand Op) { + unsigned Opc = Op.getOpcode(); + std::vector Objs; + if (Opc == ISD::TRUNCATE) { + Op = Op.getOperand(0); + assert(Op.getOpcode() == ISD::AssertSext || + Op.getOpcode() == ISD::AssertZext); + Objs.push_back(Op.getOperand(0)); + } else if (Opc == ISD::FP_ROUND) { + Objs.push_back(Op.getOperand(0)); + } else if (Opc == ISD::BUILD_PAIR) { + Objs.push_back(Op.getOperand(0)); + Objs.push_back(Op.getOperand(1)); + } else { + Objs.push_back(Op); + } + return Objs; +} + +void X86TargetLowering::PreprocessCCCArguments(std::vectorArgs, + Function &F, SelectionDAG &DAG) { + unsigned NumArgs = Args.size(); MachineFunction &MF = DAG.getMachineFunction(); MachineFrameInfo *MFI = MF.getFrameInfo(); + // Add DAG nodes to load the arguments... On entry to a function on the X86, + // the stack frame looks like this: + // + // [ESP] -- return address + // [ESP + 4] -- first argument (leftmost lexically) + // [ESP + 8] -- second argument, if first argument is four bytes in size + // ... + // unsigned ArgOffset = 0; // Frame mechanisms handle retaddr slot for (unsigned i = 0; i < NumArgs; ++i) { - MVT::ValueType ObjectVT = Op.Val->getValueType(i); - unsigned ArgIncrement = 4; - unsigned ObjSize; - switch (ObjectVT) { - default: assert(0 && "Unhandled argument type!"); - case MVT::i1: - case MVT::i8: ObjSize = 1; break; - case MVT::i16: ObjSize = 2; break; - case MVT::i32: ObjSize = 4; break; - case MVT::i64: ObjSize = ArgIncrement = 8; break; - case MVT::f32: ObjSize = 4; break; - case MVT::f64: ObjSize = ArgIncrement = 8; break; + SDOperand Op = Args[i]; + std::vector Objs = getFormalArgObjects(Op); + for (std::vector::iterator I = Objs.begin(), E = Objs.end(); + I != E; ++I) { + SDOperand Obj = *I; + MVT::ValueType ObjectVT = Obj.getValueType(); + unsigned ArgIncrement = 4; + unsigned ObjSize = getFormalArgSize(ObjectVT); + if (ObjSize == 8) + ArgIncrement = 8; + + // Create the frame index object for this incoming parameter... + int FI = MFI->CreateFixedObject(ObjSize, ArgOffset); + std::pair Loc = + std::make_pair(FALocInfo(FALocInfo::StackFrameLoc, FI), FALocInfo()); + FormalArgLocs.push_back(Loc); + ArgOffset += ArgIncrement; // Move on to the next argument... } - ArgOffset += ArgIncrement; // Move on to the next argument... } // If the function takes variable number of arguments, make a frame index for @@ -438,39 +486,13 @@ MachineFunction &MF = DAG.getMachineFunction(); MachineFrameInfo *MFI = MF.getFrameInfo(); - // Add DAG nodes to load the arguments... On entry to a function on the X86, - // the stack frame looks like this: - // - // [ESP] -- return address - // [ESP + 4] -- first argument (leftmost lexically) - // [ESP + 8] -- second argument, if first argument is four bytes in size - // ... - // - unsigned ArgOffset = 0; // Frame mechanisms handle retaddr slot for (unsigned i = 0; i < NumArgs; ++i) { - MVT::ValueType ObjectVT = Op.Val->getValueType(i); - unsigned ArgIncrement = 4; - unsigned ObjSize; - switch (ObjectVT) { - default: assert(0 && "Unhandled argument type!"); - case MVT::i1: - case MVT::i8: ObjSize = 1; break; - case MVT::i16: ObjSize = 2; break; - case MVT::i32: ObjSize = 4; break; - case MVT::i64: ObjSize = ArgIncrement = 8; break; - case MVT::f32: ObjSize = 4; break; - case MVT::f64: ObjSize = ArgIncrement = 8; break; - } - // Create the frame index object for this incoming parameter... - int FI = MFI->CreateFixedObject(ObjSize, ArgOffset); - // Create the SelectionDAG nodes corresponding to a load from this parameter + unsigned FI = FormalArgLocs[i].first.Loc; SDOperand FIN = DAG.getFrameIndex(FI, MVT::i32); - - SDOperand ArgValue = DAG.getLoad(ObjectVT, DAG.getEntryNode(), FIN, - DAG.getSrcValue(NULL)); + SDOperand ArgValue = DAG.getLoad(Op.Val->getValueType(i),DAG.getEntryNode(), + FIN, DAG.getSrcValue(NULL)); FormalArgs.push_back(ArgValue); - ArgOffset += ArgIncrement; // Move on to the next argument... } } @@ -715,13 +737,64 @@ static unsigned FASTCC_NUM_INT_ARGS_INREGS = 0; +static void +DetermineFastCCFormalArgSizeNumRegs(MVT::ValueType ObjectVT, + unsigned &ObjSize, unsigned &NumIntRegs) { + ObjSize = 0; + NumIntRegs = 0; + + switch (ObjectVT) { + default: assert(0 && "Unhandled argument type!"); + case MVT::i1: + case MVT::i8: + if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) + NumIntRegs = 1; + else + ObjSize = 1; + break; + case MVT::i16: + if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) + NumIntRegs = 1; + else + ObjSize = 2; + break; + case MVT::i32: + if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) + NumIntRegs = 1; + else + ObjSize = 4; + break; + case MVT::i64: + if (NumIntRegs+2 <= FASTCC_NUM_INT_ARGS_INREGS) { + NumIntRegs = 2; + } else if (NumIntRegs+1 <= FASTCC_NUM_INT_ARGS_INREGS) { + NumIntRegs = 1; + ObjSize = 4; + } else + ObjSize = 8; + case MVT::f32: + ObjSize = 4; + break; + case MVT::f64: + ObjSize = 8; + break; + } +} + void -X86TargetLowering::PreprocessFastCCArguments(SDOperand Op, Function &F, - SelectionDAG &DAG) { - unsigned NumArgs = Op.Val->getNumValues(); +X86TargetLowering::PreprocessFastCCArguments(std::vectorArgs, + Function &F, SelectionDAG &DAG) { + unsigned NumArgs = Args.size(); MachineFunction &MF = DAG.getMachineFunction(); MachineFrameInfo *MFI = MF.getFrameInfo(); + // Add DAG nodes to load the arguments... On entry to a function the stack + // frame looks like this: + // + // [ESP] -- return address + // [ESP + 4] -- first nonreg argument (leftmost lexically) + // [ESP + 8] -- second nonreg argument, if first argument is 4 bytes in size + // ... unsigned ArgOffset = 0; // Frame mechanisms handle retaddr slot // Keep track of the number of integer regs passed so far. This can be either @@ -730,53 +803,77 @@ unsigned NumIntRegs = 0; for (unsigned i = 0; i < NumArgs; ++i) { - MVT::ValueType ObjectVT = Op.Val->getValueType(i); - unsigned ArgIncrement = 4; - unsigned ObjSize = 0; - SDOperand ArgValue; - - switch (ObjectVT) { - default: assert(0 && "Unhandled argument type!"); - case MVT::i1: - case MVT::i8: - if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) { - ++NumIntRegs; - break; + SDOperand Op = Args[i]; + std::vector Objs = getFormalArgObjects(Op); + for (std::vector::iterator I = Objs.begin(), E = Objs.end(); + I != E; ++I) { + SDOperand Obj = *I; + MVT::ValueType ObjectVT = Obj.getValueType(); + unsigned ArgIncrement = 4; + unsigned ObjSize = 0; + unsigned NumRegs = 0; + + DetermineFastCCFormalArgSizeNumRegs(ObjectVT, ObjSize, NumRegs); + if (ObjSize == 8) + ArgIncrement = 8; + + unsigned Reg; + std::pair Loc = std::make_pair(FALocInfo(), + FALocInfo()); + if (NumRegs) { + switch (ObjectVT) { + default: assert(0 && "Unhandled argument type!"); + case MVT::i1: + case MVT::i8: + Reg = AddLiveIn(MF, NumIntRegs ? X86::DL : X86::AL, + X86::R8RegisterClass); + Loc.first.Kind = FALocInfo::LiveInRegLoc; + Loc.first.Loc = Reg; + Loc.first.Typ = MVT::i8; + break; + case MVT::i16: + Reg = AddLiveIn(MF, NumIntRegs ? X86::DX : X86::AX, + X86::R16RegisterClass); + Loc.first.Kind = FALocInfo::LiveInRegLoc; + Loc.first.Loc = Reg; + Loc.first.Typ = MVT::i16; + break; + case MVT::i32: + Reg = AddLiveIn(MF, NumIntRegs ? X86::EDX : X86::EAX, + X86::R32RegisterClass); + Loc.first.Kind = FALocInfo::LiveInRegLoc; + Loc.first.Loc = Reg; + Loc.first.Typ = MVT::i32; + break; + case MVT::i64: + Reg = AddLiveIn(MF, NumIntRegs ? X86::EDX : X86::EAX, + X86::R32RegisterClass); + Loc.first.Kind = FALocInfo::LiveInRegLoc; + Loc.first.Loc = Reg; + Loc.first.Typ = MVT::i32; + if (NumRegs == 2) { + Reg = AddLiveIn(MF, X86::EDX, X86::R32RegisterClass); + Loc.second.Kind = FALocInfo::LiveInRegLoc; + Loc.second.Loc = Reg; + Loc.second.Typ = MVT::i32; + } + break; + } + } + if (ObjSize) { + int FI = MFI->CreateFixedObject(ObjSize, ArgOffset); + if (ObjectVT == MVT::i64 && NumRegs) { + Loc.second.Kind = FALocInfo::StackFrameLoc; + Loc.second.Loc = FI; + } else { + Loc.first.Kind = FALocInfo::StackFrameLoc; + Loc.first.Loc = FI; + } + ArgOffset += ArgIncrement; // Move on to the next argument. } - ObjSize = 1; - break; - case MVT::i16: - if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) { - ++NumIntRegs; - break; - } - ObjSize = 2; - break; - case MVT::i32: - if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) { - ++NumIntRegs; - break; - } - ObjSize = 4; - break; - case MVT::i64: - if (NumIntRegs+2 <= FASTCC_NUM_INT_ARGS_INREGS) { - NumIntRegs += 2; - break; - } else if (NumIntRegs+1 <= FASTCC_NUM_INT_ARGS_INREGS) { - ArgOffset += 4; - NumIntRegs = FASTCC_NUM_INT_ARGS_INREGS; - break; - } - ObjSize = ArgIncrement = 8; - break; - case MVT::f32: ObjSize = 4; break; - case MVT::f64: ObjSize = ArgIncrement = 8; break; + FormalArgLocs.push_back(Loc); } - - if (ObjSize) - ArgOffset += ArgIncrement; // Move on to the next argument. } // Make sure the instruction takes 8n+4 bytes to make sure the start of the @@ -815,131 +912,35 @@ MachineFunction &MF = DAG.getMachineFunction(); MachineFrameInfo *MFI = MF.getFrameInfo(); - // Add DAG nodes to load the arguments... On entry to a function the stack - // frame looks like this: - // - // [ESP] -- return address - // [ESP + 4] -- first nonreg argument (leftmost lexically) - // [ESP + 8] -- second nonreg argument, if first argument is 4 bytes in size - // ... - unsigned ArgOffset = 0; // Frame mechanisms handle retaddr slot - - // Keep track of the number of integer regs passed so far. This can be either - // 0 (neither EAX or EDX used), 1 (EAX is used) or 2 (EAX and EDX are both - // used). - unsigned NumIntRegs = 0; - for (unsigned i = 0; i < NumArgs; ++i) { - MVT::ValueType ObjectVT = Op.Val->getValueType(i); - unsigned ArgIncrement = 4; - unsigned ObjSize = 0; + MVT::ValueType VT = Op.Val->getValueType(i); + std::pair Loc = FormalArgLocs[i]; SDOperand ArgValue; - bool hasUse = !Op.Val->hasNUsesOfValue(0, i); - - switch (ObjectVT) { - default: assert(0 && "Unhandled argument type!"); - case MVT::i1: - case MVT::i8: - if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) { - if (hasUse) { - unsigned VReg = AddLiveIn(MF, NumIntRegs ? X86::DL : X86::AL, - X86::R8RegisterClass); - ArgValue = DAG.getCopyFromReg(DAG.getRoot(), VReg, MVT::i8); - DAG.setRoot(ArgValue.getValue(1)); - if (ObjectVT == MVT::i1) - // FIXME: Should insert a assertzext here. - ArgValue = DAG.getNode(ISD::TRUNCATE, MVT::i1, ArgValue); - } - ++NumIntRegs; - break; - } - - ObjSize = 1; - break; - case MVT::i16: - if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) { - if (hasUse) { - unsigned VReg = AddLiveIn(MF, NumIntRegs ? X86::DX : X86::AX, - X86::R16RegisterClass); - ArgValue = DAG.getCopyFromReg(DAG.getRoot(), VReg, MVT::i16); - DAG.setRoot(ArgValue.getValue(1)); - } - ++NumIntRegs; - break; - } - ObjSize = 2; - break; - case MVT::i32: - if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) { - if (hasUse) { - unsigned VReg = AddLiveIn(MF, NumIntRegs ? X86::EDX : X86::EAX, - X86::R32RegisterClass); - ArgValue = DAG.getCopyFromReg(DAG.getRoot(), VReg, MVT::i32); - DAG.setRoot(ArgValue.getValue(1)); - } - ++NumIntRegs; - break; - } - ObjSize = 4; - break; - case MVT::i64: - if (NumIntRegs+2 <= FASTCC_NUM_INT_ARGS_INREGS) { - if (hasUse) { - unsigned BotReg = AddLiveIn(MF, X86::EAX, X86::R32RegisterClass); - unsigned TopReg = AddLiveIn(MF, X86::EDX, X86::R32RegisterClass); - - SDOperand Low = DAG.getCopyFromReg(DAG.getRoot(), BotReg, MVT::i32); - SDOperand Hi = DAG.getCopyFromReg(Low.getValue(1), TopReg, MVT::i32); - DAG.setRoot(Hi.getValue(1)); - - ArgValue = DAG.getNode(ISD::BUILD_PAIR, MVT::i64, Low, Hi); - } - NumIntRegs += 2; - break; - } else if (NumIntRegs+1 <= FASTCC_NUM_INT_ARGS_INREGS) { - if (hasUse) { - unsigned BotReg = AddLiveIn(MF, X86::EDX, X86::R32RegisterClass); - SDOperand Low = DAG.getCopyFromReg(DAG.getRoot(), BotReg, MVT::i32); - DAG.setRoot(Low.getValue(1)); - - // Load the high part from memory. - // Create the frame index object for this incoming parameter... - int FI = MFI->CreateFixedObject(4, ArgOffset); - SDOperand FIN = DAG.getFrameIndex(FI, MVT::i32); - SDOperand Hi = DAG.getLoad(MVT::i32, DAG.getEntryNode(), FIN, - DAG.getSrcValue(NULL)); - ArgValue = DAG.getNode(ISD::BUILD_PAIR, MVT::i64, Low, Hi); - } - ArgOffset += 4; - NumIntRegs = FASTCC_NUM_INT_ARGS_INREGS; - break; - } - ObjSize = ArgIncrement = 8; - break; - case MVT::f32: ObjSize = 4; break; - case MVT::f64: ObjSize = ArgIncrement = 8; break; + if (Loc.first.Kind == FALocInfo::StackFrameLoc) { + // Create the SelectionDAG nodes corresponding to a load from this parameter + SDOperand FIN = DAG.getFrameIndex(Loc.first.Loc, MVT::i32); + ArgValue = DAG.getLoad(Op.Val->getValueType(i),DAG.getEntryNode(), FIN, + DAG.getSrcValue(NULL)); + } else { + // Must be a CopyFromReg + ArgValue= DAG.getCopyFromReg(DAG.getRoot(), Loc.first.Loc, Loc.first.Typ); } - if (ObjSize) { - // Create the frame index object for this incoming parameter... - int FI = MFI->CreateFixedObject(ObjSize, ArgOffset); - - // Create the SelectionDAG nodes corresponding to a load from this - // parameter. - SDOperand FIN = DAG.getFrameIndex(FI, MVT::i32); - - ArgValue = DAG.getLoad(ObjectVT, DAG.getEntryNode(), FIN, - DAG.getSrcValue(NULL)); - } else if (ArgValue.Val == 0) { - if (MVT::isInteger(ObjectVT)) - ArgValue = DAG.getConstant(0, ObjectVT); - else - ArgValue = DAG.getConstantFP(0, ObjectVT); + if (Loc.second.Kind != FALocInfo::None) { + SDOperand ArgValue2; + if (Loc.second.Kind == FALocInfo::StackFrameLoc) { + // Create the SelectionDAG nodes corresponding to a load from this parameter + SDOperand FIN = DAG.getFrameIndex(Loc.second.Loc, MVT::i32); + ArgValue2 = DAG.getLoad(Op.Val->getValueType(i),DAG.getEntryNode(), FIN, + DAG.getSrcValue(NULL)); + } else { + // Must be a CopyFromReg + ArgValue2 = DAG.getCopyFromReg(DAG.getRoot(), + Loc.second.Loc, Loc.second.Typ); + } + ArgValue = DAG.getNode(ISD::BUILD_PAIR, VT, ArgValue, ArgValue2); } FormalArgs.push_back(ArgValue); - - if (ObjSize) - ArgOffset += ArgIncrement; // Move on to the next argument. } } Index: llvm/lib/Target/X86/X86ISelLowering.h diff -u llvm/lib/Target/X86/X86ISelLowering.h:1.60 llvm/lib/Target/X86/X86ISelLowering.h:1.61 --- llvm/lib/Target/X86/X86ISelLowering.h:1.60 Tue Apr 25 20:20:17 2006 +++ llvm/lib/Target/X86/X86ISelLowering.h Wed Apr 26 20:32:22 2006 @@ -349,8 +349,28 @@ /// Formal arguments lowered to load and CopyFromReg ops. std::vector FormalArgs; + /// Formal arguments locations (frame indices and registers). + struct FALocInfo { + enum FALocKind { + None, + StackFrameLoc, + LiveInRegLoc, + } Kind; + + int Loc; + MVT::ValueType Typ; + + FALocInfo() : Kind(None), Loc(0), Typ(MVT::isVoid) {}; + FALocInfo(enum FALocKind k, int fi) : Kind(k), Loc(fi), Typ(MVT::isVoid) {}; + FALocInfo(enum FALocKind k, int r, MVT::ValueType vt) + : Kind(k), Loc(r), Typ(vt) {}; + }; + + std::vector > FormalArgLocs; + // C Calling Convention implementation. - void PreprocessCCCArguments(SDOperand Op, Function &F, SelectionDAG &DAG); + void PreprocessCCCArguments(std::vectorArgs, Function &F, + SelectionDAG &DAG); void LowerCCCArguments(SDOperand Op, SelectionDAG &DAG); std::pair LowerCCCCallTo(SDOperand Chain, const Type *RetTy, bool isVarArg, @@ -359,7 +379,8 @@ // Fast Calling Convention implementation. void - PreprocessFastCCArguments(SDOperand Op, Function &F, SelectionDAG &DAG); + PreprocessFastCCArguments(std::vectorArgs, Function &F, + SelectionDAG &DAG); void LowerFastCCArguments(SDOperand Op, SelectionDAG &DAG); std::pair From lattner at cs.uiuc.edu Thu Apr 27 00:00:56 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Thu, 27 Apr 2006 00:00:56 -0500 Subject: [llvm-commits] CVS: llvm/test/Regression/CodeGen/Generic/2006-04-26-SetCCAnd.ll Message-ID: <200604270500.AAA32042@zion.cs.uiuc.edu> Changes in directory llvm/test/Regression/CodeGen/Generic: 2006-04-26-SetCCAnd.ll added (r1.1) --- Log message: new testcase --- Diffs of the changes: (+43 -0) 2006-04-26-SetCCAnd.ll | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 43 insertions(+) Index: llvm/test/Regression/CodeGen/Generic/2006-04-26-SetCCAnd.ll diff -c /dev/null llvm/test/Regression/CodeGen/Generic/2006-04-26-SetCCAnd.ll:1.1 *** /dev/null Thu Apr 27 00:00:53 2006 --- llvm/test/Regression/CodeGen/Generic/2006-04-26-SetCCAnd.ll Thu Apr 27 00:00:43 2006 *************** *** 0 **** --- 1,43 ---- + ; RUN: llvm-as < %s | llc + ; PR748 + + %G = external global ushort ; [#uses=1] + + implementation ; Functions: + + void %OmNewObjHdr() { + entry: + br bool false, label %endif.4, label %then.0 + + then.0: ; preds = %entry + ret void + + endif.4: ; preds = %entry + br bool false, label %else.3, label %shortcirc_next.3 + + shortcirc_next.3: ; preds = %endif.4 + ret void + + else.3: ; preds = %endif.4 + switch int 0, label %endif.10 [ + int 5001, label %then.10 + int -5008, label %then.10 + ] + + then.10: ; preds = %else.3, %else.3 + %tmp.112 = load ushort* null ; [#uses=2] + %tmp.113 = load ushort* %G ; [#uses=2] + %tmp.114 = setgt ushort %tmp.112, %tmp.113 ; [#uses=1] + %tmp.120 = setlt ushort %tmp.112, %tmp.113 ; [#uses=1] + %bothcond = and bool %tmp.114, %tmp.120 ; [#uses=1] + br bool %bothcond, label %else.4, label %then.11 + + then.11: ; preds = %then.10 + ret void + + else.4: ; preds = %then.10 + ret void + + endif.10: ; preds = %else.3 + ret void + } From lattner at cs.uiuc.edu Thu Apr 27 00:01:20 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Thu, 27 Apr 2006 00:01:20 -0500 Subject: [llvm-commits] CVS: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp Message-ID: <200604270501.AAA32058@zion.cs.uiuc.edu> Changes in directory llvm/lib/CodeGen/SelectionDAG: SelectionDAG.cpp updated: 1.302 -> 1.303 --- Log message: Fix Regression/CodeGen/Generic/2006-04-26-SetCCAnd.ll and PR748: http://llvm.cs.uiuc.edu/PR748 . --- Diffs of the changes: (+29 -1) SelectionDAG.cpp | 30 +++++++++++++++++++++++++++++- 1 files changed, 29 insertions(+), 1 deletion(-) Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp diff -u llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1.302 llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1.303 --- llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1.302 Sat Apr 22 13:53:45 2006 +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp Thu Apr 27 00:01:07 2006 @@ -232,7 +232,22 @@ return ISD::SETCC_INVALID; // Combine all of the condition bits. - return ISD::CondCode(Op1 & Op2); + ISD::CondCode Result = ISD::CondCode(Op1 & Op2); + + // Canonicalize illegal integer setcc's. + if (isInteger) { + switch (Result) { + default: break; + case ISD::SETUO: // e.g. SETUGT & SETULT + Result = ISD::SETFALSE; + break; + case ISD::SETUEQ: // e.g. SETUGE & SETULE + Result = ISD::SETEQ; + break; + } + } + + return Result; } const TargetMachine &SelectionDAG::getTarget() const { @@ -849,6 +864,19 @@ case ISD::SETFALSE2: return getConstant(0, VT); case ISD::SETTRUE: case ISD::SETTRUE2: return getConstant(1, VT); + + case ISD::SETOEQ: + case ISD::SETOGT: + case ISD::SETOGE: + case ISD::SETOLT: + case ISD::SETOLE: + case ISD::SETONE: + case ISD::SETO: + case ISD::SETUO: + case ISD::SETUEQ: + case ISD::SETUNE: + assert(!MVT::isInteger(N1.getValueType()) && "Illegal setcc for integer!"); + break; } if (ConstantSDNode *N2C = dyn_cast(N2.Val)) { From evan.cheng at apple.com Thu Apr 27 00:35:41 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Thu, 27 Apr 2006 00:35:41 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86ISelLowering.cpp Message-ID: <200604270535.AAA32241@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86ISelLowering.cpp updated: 1.190 -> 1.191 --- Log message: Bug fix: not updating NumIntRegs. --- Diffs of the changes: (+65 -60) X86ISelLowering.cpp | 125 +++++++++++++++++++++++++++------------------------- 1 files changed, 65 insertions(+), 60 deletions(-) Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.190 llvm/lib/Target/X86/X86ISelLowering.cpp:1.191 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.190 Wed Apr 26 20:32:22 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Thu Apr 27 00:35:28 2006 @@ -402,6 +402,19 @@ // C Calling Convention implementation //===----------------------------------------------------------------------===// +/// AddLiveIn - This helper function adds the specified physical register to the +/// MachineFunction as a live in value. It also creates a corresponding virtual +/// register for it. +static unsigned AddLiveIn(MachineFunction &MF, unsigned PReg, + TargetRegisterClass *RC) { + assert(RC->contains(PReg) && "Not the correct regclass!"); + unsigned VReg = MF.getSSARegMap()->createVirtualRegister(RC); + MF.addLiveIn(PReg, VReg); + return VReg; +} + +/// getFormalArgSize - Return the minimum size of the stack frame needed to store +/// an object of the specified type. static unsigned getFormalArgSize(MVT::ValueType ObjectVT) { unsigned ObjSize = 0; switch (ObjectVT) { @@ -417,6 +430,8 @@ return ObjSize; } +/// getFormalArgObjects - Returns itself if Op is a FORMAL_ARGUMENTS, otherwise +/// returns the FORMAL_ARGUMENTS node(s) that made up parts of the node. static std::vector getFormalArgObjects(SDOperand Op) { unsigned Opc = Op.getOpcode(); std::vector Objs; @@ -706,17 +721,6 @@ // (when we have a global fp allocator) and do other tricks. // -/// AddLiveIn - This helper function adds the specified physical register to the -/// MachineFunction as a live in value. It also creates a corresponding virtual -/// register for it. -static unsigned AddLiveIn(MachineFunction &MF, unsigned PReg, - TargetRegisterClass *RC) { - assert(RC->contains(PReg) && "Not the correct regclass!"); - unsigned VReg = MF.getSSARegMap()->createVirtualRegister(RC); - MF.addLiveIn(PReg, VReg); - return VReg; -} - // FASTCC_NUM_INT_ARGS_INREGS - This is the max number of integer arguments // to pass in registers. 0 is none, 1 is is "use EAX", 2 is "use EAX and // EDX". Anything more is illegal. @@ -738,8 +742,8 @@ static void -DetermineFastCCFormalArgSizeNumRegs(MVT::ValueType ObjectVT, - unsigned &ObjSize, unsigned &NumIntRegs) { +HowToPassFastCCArgument(MVT::ValueType ObjectVT, unsigned NumIntRegs, + unsigned &ObjSize, unsigned &ObjIntRegs) { ObjSize = 0; NumIntRegs = 0; @@ -748,27 +752,27 @@ case MVT::i1: case MVT::i8: if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) - NumIntRegs = 1; + ObjIntRegs = 1; else ObjSize = 1; break; case MVT::i16: if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) - NumIntRegs = 1; + ObjIntRegs = 1; else ObjSize = 2; break; case MVT::i32: if (NumIntRegs < FASTCC_NUM_INT_ARGS_INREGS) - NumIntRegs = 1; + ObjIntRegs = 1; else ObjSize = 4; break; case MVT::i64: if (NumIntRegs+2 <= FASTCC_NUM_INT_ARGS_INREGS) { - NumIntRegs = 2; + ObjIntRegs = 2; } else if (NumIntRegs+1 <= FASTCC_NUM_INT_ARGS_INREGS) { - NumIntRegs = 1; + ObjIntRegs = 1; ObjSize = 4; } else ObjSize = 8; @@ -811,58 +815,59 @@ MVT::ValueType ObjectVT = Obj.getValueType(); unsigned ArgIncrement = 4; unsigned ObjSize = 0; - unsigned NumRegs = 0; + unsigned ObjIntRegs = 0; - DetermineFastCCFormalArgSizeNumRegs(ObjectVT, ObjSize, NumRegs); + HowToPassFastCCArgument(ObjectVT, NumIntRegs, ObjSize, ObjIntRegs); if (ObjSize == 8) ArgIncrement = 8; unsigned Reg; std::pair Loc = std::make_pair(FALocInfo(), FALocInfo()); - if (NumRegs) { - switch (ObjectVT) { - default: assert(0 && "Unhandled argument type!"); - case MVT::i1: - case MVT::i8: - Reg = AddLiveIn(MF, NumIntRegs ? X86::DL : X86::AL, - X86::R8RegisterClass); - Loc.first.Kind = FALocInfo::LiveInRegLoc; - Loc.first.Loc = Reg; - Loc.first.Typ = MVT::i8; - break; - case MVT::i16: - Reg = AddLiveIn(MF, NumIntRegs ? X86::DX : X86::AX, - X86::R16RegisterClass); - Loc.first.Kind = FALocInfo::LiveInRegLoc; - Loc.first.Loc = Reg; - Loc.first.Typ = MVT::i16; - break; - case MVT::i32: - Reg = AddLiveIn(MF, NumIntRegs ? X86::EDX : X86::EAX, - X86::R32RegisterClass); - Loc.first.Kind = FALocInfo::LiveInRegLoc; - Loc.first.Loc = Reg; - Loc.first.Typ = MVT::i32; - break; - case MVT::i64: - Reg = AddLiveIn(MF, NumIntRegs ? X86::EDX : X86::EAX, - X86::R32RegisterClass); - Loc.first.Kind = FALocInfo::LiveInRegLoc; - Loc.first.Loc = Reg; - Loc.first.Typ = MVT::i32; - if (NumRegs == 2) { - Reg = AddLiveIn(MF, X86::EDX, X86::R32RegisterClass); - Loc.second.Kind = FALocInfo::LiveInRegLoc; - Loc.second.Loc = Reg; - Loc.second.Typ = MVT::i32; - } - break; - } + if (ObjIntRegs) { + NumIntRegs += ObjIntRegs; + switch (ObjectVT) { + default: assert(0 && "Unhandled argument type!"); + case MVT::i1: + case MVT::i8: + Reg = AddLiveIn(MF, NumIntRegs ? X86::DL : X86::AL, + X86::R8RegisterClass); + Loc.first.Kind = FALocInfo::LiveInRegLoc; + Loc.first.Loc = Reg; + Loc.first.Typ = MVT::i8; + break; + case MVT::i16: + Reg = AddLiveIn(MF, NumIntRegs ? X86::DX : X86::AX, + X86::R16RegisterClass); + Loc.first.Kind = FALocInfo::LiveInRegLoc; + Loc.first.Loc = Reg; + Loc.first.Typ = MVT::i16; + break; + case MVT::i32: + Reg = AddLiveIn(MF, NumIntRegs ? X86::EDX : X86::EAX, + X86::R32RegisterClass); + Loc.first.Kind = FALocInfo::LiveInRegLoc; + Loc.first.Loc = Reg; + Loc.first.Typ = MVT::i32; + break; + case MVT::i64: + Reg = AddLiveIn(MF, NumIntRegs ? X86::EDX : X86::EAX, + X86::R32RegisterClass); + Loc.first.Kind = FALocInfo::LiveInRegLoc; + Loc.first.Loc = Reg; + Loc.first.Typ = MVT::i32; + if (ObjIntRegs == 2) { + Reg = AddLiveIn(MF, X86::EDX, X86::R32RegisterClass); + Loc.second.Kind = FALocInfo::LiveInRegLoc; + Loc.second.Loc = Reg; + Loc.second.Typ = MVT::i32; + } + break; + } } if (ObjSize) { int FI = MFI->CreateFixedObject(ObjSize, ArgOffset); - if (ObjectVT == MVT::i64 && NumRegs) { + if (ObjectVT == MVT::i64 && ObjIntRegs) { Loc.second.Kind = FALocInfo::StackFrameLoc; Loc.second.Loc = FI; } else { From evan.cheng at apple.com Thu Apr 27 00:45:03 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Thu, 27 Apr 2006 00:45:03 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86ISelLowering.cpp Message-ID: <200604270545.AAA32369@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86ISelLowering.cpp updated: 1.191 -> 1.192 --- Log message: Oops --- Diffs of the changes: (+1 -1) X86ISelLowering.cpp | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.191 llvm/lib/Target/X86/X86ISelLowering.cpp:1.192 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.191 Thu Apr 27 00:35:28 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Thu Apr 27 00:44:50 2006 @@ -825,7 +825,6 @@ std::pair Loc = std::make_pair(FALocInfo(), FALocInfo()); if (ObjIntRegs) { - NumIntRegs += ObjIntRegs; switch (ObjectVT) { default: assert(0 && "Unhandled argument type!"); case MVT::i1: @@ -864,6 +863,7 @@ } break; } + NumIntRegs += ObjIntRegs; } if (ObjSize) { int FI = MFI->CreateFixedObject(ObjSize, ArgOffset); From reid at x10sys.com Thu Apr 27 02:49:38 2006 From: reid at x10sys.com (Reid Spencer) Date: Thu, 27 Apr 2006 02:49:38 -0500 Subject: [llvm-commits] CVS: llvm/autoconf/configure.ac Message-ID: <200604270749.CAA09865@zion.cs.uiuc.edu> Changes in directory llvm/autoconf: configure.ac updated: 1.222 -> 1.223 --- Log message: For PR747: http://llvm.cs.uiuc.edu/PR747 : If we fail to find a required program, simply set that program to echo out something that tells the user the situation. That is, instead of just "true runtest" we now get "echo 'Skipped: runtest not found'". --- Diffs of the changes: (+10 -10) configure.ac | 20 ++++++++++---------- 1 files changed, 10 insertions(+), 10 deletions(-) Index: llvm/autoconf/configure.ac diff -u llvm/autoconf/configure.ac:1.222 llvm/autoconf/configure.ac:1.223 --- llvm/autoconf/configure.ac:1.222 Thu Apr 20 17:15:30 2006 +++ llvm/autoconf/configure.ac Thu Apr 27 02:49:24 2006 @@ -381,17 +381,17 @@ dnl are not found then they are set to "true" which always succeeds but does dnl nothing. This just lets the build output show that we could have done dnl something if the tool was available. -AC_PATH_PROG(BZIP2,[bzip2],[true bzip2]) -AC_PATH_PROG(DOT,[dot],[true dot]) -AC_PATH_PROG(DOXYGEN,[doxygen],[true doxygen]) -AC_PATH_PROG(ETAGS,[etags],[true etags]) -AC_PATH_PROG(GROFF,[groff],[true groff]) -AC_PATH_PROG(GZIP,[gzip],[true gzip]) -AC_PATH_PROG(POD2HTML,[pod2html],[true pod2html]) -AC_PATH_PROG(POD2MAN,[pod2man],[true pod2man]) -AC_PATH_PROG(RUNTEST,[runtest],[true runtest]) +AC_PATH_PROG(BZIP2,[bzip2],[echo "Skipped: bzip2 not found"]) +AC_PATH_PROG(DOT,[dot],[echo "Skipped: dot not found"]) +AC_PATH_PROG(DOXYGEN,[doxygen],[echo "Skipped: doxygen not found"]) +AC_PATH_PROG(ETAGS,[etags],[echo "Skipped: etags not found"]) +AC_PATH_PROG(GROFF,[groff],[echo "Skipped: groff not found"]) +AC_PATH_PROG(GZIP,[gzip],[echo "Skipped: gzip not found"]) +AC_PATH_PROG(POD2HTML,[pod2html],[echo "Skipped: pod2html not found"]) +AC_PATH_PROG(POD2MAN,[pod2man],[echo "Skipped: pod2man not found"]) +AC_PATH_PROG(RUNTEST,[runtest],[echo "Skipped: runtest not found"]) DJ_AC_PATH_TCLSH -AC_PATH_PROG(ZIP,[zip],[true zip]) +AC_PATH_PROG(ZIP,[zip],[echo "Skipped: zip not found"]) dnl Determine if the linker supports the -R option. AC_LINK_USE_R From reid at x10sys.com Thu Apr 27 02:49:38 2006 From: reid at x10sys.com (Reid Spencer) Date: Thu, 27 Apr 2006 02:49:38 -0500 Subject: [llvm-commits] CVS: llvm/configure Message-ID: <200604270749.CAA09867@zion.cs.uiuc.edu> Changes in directory llvm: configure updated: 1.225 -> 1.226 --- Log message: For PR747: http://llvm.cs.uiuc.edu/PR747 : If we fail to find a required program, simply set that program to echo out something that tells the user the situation. That is, instead of just "true runtest" we now get "echo 'Skipped: runtest not found'". --- Diffs of the changes: (+10 -10) configure | 20 ++++++++++---------- 1 files changed, 10 insertions(+), 10 deletions(-) Index: llvm/configure diff -u llvm/configure:1.225 llvm/configure:1.226 --- llvm/configure:1.225 Thu Apr 20 17:15:30 2006 +++ llvm/configure Thu Apr 27 02:49:18 2006 @@ -5400,7 +5400,7 @@ done done - test -z "$ac_cv_path_BZIP2" && ac_cv_path_BZIP2="true bzip2" + test -z "$ac_cv_path_BZIP2" && ac_cv_path_BZIP2="echo "Skipped: bzip2 not found"" ;; esac fi @@ -5440,7 +5440,7 @@ done done - test -z "$ac_cv_path_DOT" && ac_cv_path_DOT="true dot" + test -z "$ac_cv_path_DOT" && ac_cv_path_DOT="echo "Skipped: dot not found"" ;; esac fi @@ -5480,7 +5480,7 @@ done done - test -z "$ac_cv_path_DOXYGEN" && ac_cv_path_DOXYGEN="true doxygen" + test -z "$ac_cv_path_DOXYGEN" && ac_cv_path_DOXYGEN="echo "Skipped: doxygen not found"" ;; esac fi @@ -5520,7 +5520,7 @@ done done - test -z "$ac_cv_path_ETAGS" && ac_cv_path_ETAGS="true etags" + test -z "$ac_cv_path_ETAGS" && ac_cv_path_ETAGS="echo "Skipped: etags not found"" ;; esac fi @@ -5560,7 +5560,7 @@ done done - test -z "$ac_cv_path_GROFF" && ac_cv_path_GROFF="true groff" + test -z "$ac_cv_path_GROFF" && ac_cv_path_GROFF="echo "Skipped: groff not found"" ;; esac fi @@ -5600,7 +5600,7 @@ done done - test -z "$ac_cv_path_GZIP" && ac_cv_path_GZIP="true gzip" + test -z "$ac_cv_path_GZIP" && ac_cv_path_GZIP="echo "Skipped: gzip not found"" ;; esac fi @@ -5640,7 +5640,7 @@ done done - test -z "$ac_cv_path_POD2HTML" && ac_cv_path_POD2HTML="true pod2html" + test -z "$ac_cv_path_POD2HTML" && ac_cv_path_POD2HTML="echo "Skipped: pod2html not found"" ;; esac fi @@ -5680,7 +5680,7 @@ done done - test -z "$ac_cv_path_POD2MAN" && ac_cv_path_POD2MAN="true pod2man" + test -z "$ac_cv_path_POD2MAN" && ac_cv_path_POD2MAN="echo "Skipped: pod2man not found"" ;; esac fi @@ -5720,7 +5720,7 @@ done done - test -z "$ac_cv_path_RUNTEST" && ac_cv_path_RUNTEST="true runtest" + test -z "$ac_cv_path_RUNTEST" && ac_cv_path_RUNTEST="echo "Skipped: runtest not found"" ;; esac fi @@ -5849,7 +5849,7 @@ done done - test -z "$ac_cv_path_ZIP" && ac_cv_path_ZIP="true zip" + test -z "$ac_cv_path_ZIP" && ac_cv_path_ZIP="echo "Skipped: zip not found"" ;; esac fi From evan.cheng at apple.com Thu Apr 27 03:29:55 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Thu, 27 Apr 2006 03:29:55 -0500 Subject: [llvm-commits] CVS: llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp Message-ID: <200604270829.DAA30185@zion.cs.uiuc.edu> Changes in directory llvm/lib/CodeGen/SelectionDAG: SelectionDAGISel.cpp updated: 1.227 -> 1.228 --- Log message: Insert a VBIT_CONVERT between a FORMAL_ARGUMENT node and its vector uses (VAND, VADD, etc.). Legalizer will assert otherwise. --- Diffs of the changes: (+16 -3) SelectionDAGISel.cpp | 19 ++++++++++++++++--- 1 files changed, 16 insertions(+), 3 deletions(-) Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp diff -u llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1.227 llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1.228 --- llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1.227 Tue Apr 25 18:03:35 2006 +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp Thu Apr 27 03:29:42 2006 @@ -2372,7 +2372,7 @@ // right now. unsigned NumElems = cast(I->getType())->getNumElements(); const Type *EltTy = cast(I->getType())->getElementType(); - + // Figure out if there is a Packed type corresponding to this Vector // type. If so, convert to the packed type. MVT::ValueType TVT = MVT::getVectorType(getValueType(EltTy), NumElems); @@ -2441,7 +2441,7 @@ // right now. unsigned NumElems = cast(I->getType())->getNumElements(); const Type *EltTy = cast(I->getType())->getElementType(); - + // Figure out if there is a Packed type corresponding to this Vector // type. If so, convert to the packed type. MVT::ValueType TVT = MVT::getVectorType(getValueType(EltTy), NumElems); @@ -2987,7 +2987,20 @@ AI != E; ++AI, ++a) if (!AI->use_empty()) { SDL.setValue(AI, Args[a]); - + + MVT::ValueType VT = TLI.getValueType(AI->getType()); + if (VT == MVT::Vector) { + // Insert a VBIT_CONVERT between the FORMAL_ARGUMENT node and its uses. + // Or else legalizer will balk. + BasicBlock::iterator InsertPt = BB->begin(); + Value *NewVal = new CastInst(AI, AI->getType(), AI->getName(), InsertPt); + for (Value::use_iterator UI = AI->use_begin(), E = AI->use_end(); + UI != E; ++UI) { + Instruction *User = cast(*UI); + if (User != NewVal) + User->replaceUsesOfWith(AI, NewVal); + } + } // If this argument is live outside of the entry block, insert a copy from // whereever we got it to the vreg that other BB's will reference it as. if (FuncInfo.ValueMap.count(AI)) { From evan.cheng at apple.com Thu Apr 27 03:31:23 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Thu, 27 Apr 2006 03:31:23 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86ISelLowering.cpp Message-ID: <200604270831.DAA30272@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86ISelLowering.cpp updated: 1.192 -> 1.193 --- Log message: Support for passing 128-bit vector arguments via XMM registers. --- Diffs of the changes: (+97 -27) X86ISelLowering.cpp | 124 ++++++++++++++++++++++++++++++++++++++++------------ 1 files changed, 97 insertions(+), 27 deletions(-) Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.192 llvm/lib/Target/X86/X86ISelLowering.cpp:1.193 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.192 Thu Apr 27 00:44:50 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Thu Apr 27 03:31:10 2006 @@ -413,10 +413,13 @@ return VReg; } -/// getFormalArgSize - Return the minimum size of the stack frame needed to store -/// an object of the specified type. -static unsigned getFormalArgSize(MVT::ValueType ObjectVT) { - unsigned ObjSize = 0; +/// HowToPassCCCArgument - Returns how an formal argument of the specified type +/// should be passed. If it is through stack, returns the size of the stack +/// frame; if it is through XMM register, returns the number of XMM registers +/// are needed. +static void +HowToPassCCCArgument(MVT::ValueType ObjectVT, unsigned NumXMMRegs, + unsigned &ObjSize, unsigned &ObjXMMRegs) { switch (ObjectVT) { default: assert(0 && "Unhandled argument type!"); case MVT::i1: @@ -426,8 +429,18 @@ case MVT::i64: ObjSize = 8; break; case MVT::f32: ObjSize = 4; break; case MVT::f64: ObjSize = 8; break; + case MVT::v16i8: + case MVT::v8i16: + case MVT::v4i32: + case MVT::v2i64: + case MVT::v4f32: + case MVT::v2f64: + if (NumXMMRegs < 3) + ObjXMMRegs = 1; + else + ObjSize = 16; + break; } - return ObjSize; } /// getFormalArgObjects - Returns itself if Op is a FORMAL_ARGUMENTS, otherwise @@ -466,6 +479,8 @@ // ... // unsigned ArgOffset = 0; // Frame mechanisms handle retaddr slot + unsigned NumXMMRegs = 0; // XMM regs used for parameter passing. + unsigned XMMArgRegs[] = { X86::XMM0, X86::XMM1, X86::XMM2 }; for (unsigned i = 0; i < NumArgs; ++i) { SDOperand Op = Args[i]; std::vector Objs = getFormalArgObjects(Op); @@ -474,16 +489,29 @@ SDOperand Obj = *I; MVT::ValueType ObjectVT = Obj.getValueType(); unsigned ArgIncrement = 4; - unsigned ObjSize = getFormalArgSize(ObjectVT); - if (ObjSize == 8) - ArgIncrement = 8; - - // Create the frame index object for this incoming parameter... - int FI = MFI->CreateFixedObject(ObjSize, ArgOffset); - std::pair Loc = - std::make_pair(FALocInfo(FALocInfo::StackFrameLoc, FI), FALocInfo()); - FormalArgLocs.push_back(Loc); - ArgOffset += ArgIncrement; // Move on to the next argument... + unsigned ObjSize = 0; + unsigned ObjXMMRegs = 0; + HowToPassCCCArgument(ObjectVT, NumXMMRegs, ObjSize, ObjXMMRegs); + if (ObjSize >= 8) + ArgIncrement = ObjSize; + + if (ObjXMMRegs) { + // Passed in a XMM register. + unsigned Reg = AddLiveIn(MF, XMMArgRegs[NumXMMRegs], + X86::VR128RegisterClass); + std::pair Loc = + std::make_pair(FALocInfo(FALocInfo::LiveInRegLoc, Reg, ObjectVT), + FALocInfo()); + FormalArgLocs.push_back(Loc); + NumXMMRegs += ObjXMMRegs; + } else { + // Create the frame index object for this incoming parameter... + int FI = MFI->CreateFixedObject(ObjSize, ArgOffset); + std::pair Loc = + std::make_pair(FALocInfo(FALocInfo::StackFrameLoc, FI), FALocInfo()); + FormalArgLocs.push_back(Loc); + ArgOffset += ArgIncrement; // Move on to the next argument... + } } } @@ -502,11 +530,19 @@ MachineFrameInfo *MFI = MF.getFrameInfo(); for (unsigned i = 0; i < NumArgs; ++i) { - // Create the SelectionDAG nodes corresponding to a load from this parameter - unsigned FI = FormalArgLocs[i].first.Loc; - SDOperand FIN = DAG.getFrameIndex(FI, MVT::i32); - SDOperand ArgValue = DAG.getLoad(Op.Val->getValueType(i),DAG.getEntryNode(), - FIN, DAG.getSrcValue(NULL)); + std::pair Loc = FormalArgLocs[i]; + SDOperand ArgValue; + if (Loc.first.Kind == FALocInfo::StackFrameLoc) { + // Create the SelectionDAG nodes corresponding to a load from this parameter + unsigned FI = FormalArgLocs[i].first.Loc; + SDOperand FIN = DAG.getFrameIndex(FI, MVT::i32); + ArgValue = DAG.getLoad(Op.Val->getValueType(i), + DAG.getEntryNode(), FIN, DAG.getSrcValue(NULL)); + } else { + // Must be a CopyFromReg + ArgValue= DAG.getCopyFromReg(DAG.getEntryNode(), Loc.first.Loc, + Loc.first.Typ); + } FormalArgs.push_back(ArgValue); } } @@ -741,9 +777,15 @@ static unsigned FASTCC_NUM_INT_ARGS_INREGS = 0; +/// HowToPassFastCCArgument - Returns how an formal argument of the specified +/// type should be passed. If it is through stack, returns the size of the stack +/// frame; if it is through integer or XMM register, returns the number of +/// integer or XMM registers are needed. static void -HowToPassFastCCArgument(MVT::ValueType ObjectVT, unsigned NumIntRegs, - unsigned &ObjSize, unsigned &ObjIntRegs) { +HowToPassFastCCArgument(MVT::ValueType ObjectVT, + unsigned NumIntRegs, unsigned NumXMMRegs, + unsigned &ObjSize, unsigned &ObjIntRegs, + unsigned &ObjXMMRegs) { ObjSize = 0; NumIntRegs = 0; @@ -782,6 +824,17 @@ case MVT::f64: ObjSize = 8; break; + case MVT::v16i8: + case MVT::v8i16: + case MVT::v4i32: + case MVT::v2i64: + case MVT::v4f32: + case MVT::v2f64: + if (NumXMMRegs < 3) + ObjXMMRegs = 1; + else + ObjSize = 16; + break; } } @@ -805,6 +858,8 @@ // 0 (neither EAX or EDX used), 1 (EAX is used) or 2 (EAX and EDX are both // used). unsigned NumIntRegs = 0; + unsigned NumXMMRegs = 0; // XMM regs used for parameter passing. + unsigned XMMArgRegs[] = { X86::XMM0, X86::XMM1, X86::XMM2 }; for (unsigned i = 0; i < NumArgs; ++i) { SDOperand Op = Args[i]; @@ -816,10 +871,12 @@ unsigned ArgIncrement = 4; unsigned ObjSize = 0; unsigned ObjIntRegs = 0; + unsigned ObjXMMRegs = 0; - HowToPassFastCCArgument(ObjectVT, NumIntRegs, ObjSize, ObjIntRegs); - if (ObjSize == 8) - ArgIncrement = 8; + HowToPassFastCCArgument(ObjectVT, NumIntRegs, NumXMMRegs, + ObjSize, ObjIntRegs, ObjXMMRegs); + if (ObjSize >= 8) + ArgIncrement = ObjSize; unsigned Reg; std::pair Loc = std::make_pair(FALocInfo(), @@ -862,8 +919,20 @@ Loc.second.Typ = MVT::i32; } break; + case MVT::v16i8: + case MVT::v8i16: + case MVT::v4i32: + case MVT::v2i64: + case MVT::v4f32: + case MVT::v2f64: + Reg = AddLiveIn(MF, XMMArgRegs[NumXMMRegs], X86::VR128RegisterClass); + Loc.first.Kind = FALocInfo::LiveInRegLoc; + Loc.first.Loc = Reg; + Loc.first.Typ = ObjectVT; + break; } NumIntRegs += ObjIntRegs; + NumXMMRegs += ObjXMMRegs; } if (ObjSize) { int FI = MFI->CreateFixedObject(ObjSize, ArgOffset); @@ -928,7 +997,8 @@ DAG.getSrcValue(NULL)); } else { // Must be a CopyFromReg - ArgValue= DAG.getCopyFromReg(DAG.getRoot(), Loc.first.Loc, Loc.first.Typ); + ArgValue= DAG.getCopyFromReg(DAG.getEntryNode(), Loc.first.Loc, + Loc.first.Typ); } if (Loc.second.Kind != FALocInfo::None) { @@ -940,7 +1010,7 @@ DAG.getSrcValue(NULL)); } else { // Must be a CopyFromReg - ArgValue2 = DAG.getCopyFromReg(DAG.getRoot(), + ArgValue2 = DAG.getCopyFromReg(DAG.getEntryNode(), Loc.second.Loc, Loc.second.Typ); } ArgValue = DAG.getNode(ISD::BUILD_PAIR, VT, ArgValue, ArgValue2); From evan.cheng at apple.com Thu Apr 27 03:31:49 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Thu, 27 Apr 2006 03:31:49 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/README.txt Message-ID: <200604270831.DAA30286@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: README.txt updated: 1.98 -> 1.99 --- Log message: A couple of new entries. --- Diffs of the changes: (+37 -0) README.txt | 37 +++++++++++++++++++++++++++++++++++++ 1 files changed, 37 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.98 llvm/lib/Target/X86/README.txt:1.99 --- llvm/lib/Target/X86/README.txt:1.98 Mon Apr 24 18:30:10 2006 +++ llvm/lib/Target/X86/README.txt Thu Apr 27 03:31:33 2006 @@ -1107,3 +1107,40 @@ So icc is smart enough to know that B is in memory so it doesn't load it and store it back to stack. + +//===---------------------------------------------------------------------===// + +__m128d test1( __m128d A, __m128d B) { + return _mm_shuffle_pd(A, B, 0x3); +} + +compiles to + +shufpd $3, %xmm1, %xmm0 + +Perhaps it's better to use unpckhpd instead? + +unpckhpd %xmm1, %xmm0 + +Don't know if unpckhpd is faster. But it is shorter. + +//===---------------------------------------------------------------------===// + +typedef short v8i16 __attribute__ ((__vector_size__ (16))); +v8i16 test(v8i16 x, v8i16 y) { + return x + y; +} + +compiles to + +_test: + paddw %xmm0, %xmm1 + movaps %xmm1, %xmm0 + ret + +It should be + + paddw %xmm1, %xmm0 + ret + +since paddw is commutative. From evan.cheng at apple.com Thu Apr 27 03:40:53 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Thu, 27 Apr 2006 03:40:53 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86ISelLowering.cpp X86InstrInfo.td Message-ID: <200604270840.DAA30710@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86ISelLowering.cpp updated: 1.193 -> 1.194 X86InstrInfo.td updated: 1.262 -> 1.263 --- Log message: Make x86 isel lowering produce tailcall nodes. They are match to normal calls for now. Patch contributed by Alexander Friedman. --- Diffs of the changes: (+17 -1) X86ISelLowering.cpp | 2 +- X86InstrInfo.td | 16 ++++++++++++++++ 2 files changed, 17 insertions(+), 1 deletion(-) Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.193 llvm/lib/Target/X86/X86ISelLowering.cpp:1.194 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.193 Thu Apr 27 03:31:10 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Thu Apr 27 03:40:39 2006 @@ -1203,7 +1203,7 @@ Ops.push_back(InFlag); // FIXME: Do not generate X86ISD::TAILCALL for now. - Chain = DAG.getNode(X86ISD::CALL, NodeTys, Ops); + Chain = DAG.getNode(isTailCall ? X86ISD::TAILCALL : X86ISD::CALL, NodeTys, Ops); InFlag = Chain.getValue(1); NodeTys.clear(); Index: llvm/lib/Target/X86/X86InstrInfo.td diff -u llvm/lib/Target/X86/X86InstrInfo.td:1.262 llvm/lib/Target/X86/X86InstrInfo.td:1.263 --- llvm/lib/Target/X86/X86InstrInfo.td:1.262 Sat Apr 22 17:31:45 2006 +++ llvm/lib/Target/X86/X86InstrInfo.td Thu Apr 27 03:40:39 2006 @@ -75,6 +75,9 @@ def X86call : SDNode<"X86ISD::CALL", SDT_X86Call, [SDNPHasChain, SDNPOutFlag, SDNPOptInFlag]>; +def X86tailcall : SDNode<"X86ISD::TAILCALL", SDT_X86Call, + [SDNPHasChain, SDNPOutFlag, SDNPOptInFlag]>; + def X86rep_stos: SDNode<"X86ISD::REP_STOS", SDTX86RepStr, [SDNPHasChain, SDNPInFlag, SDNPOutFlag]>; def X86rep_movs: SDNode<"X86ISD::REP_MOVS", SDTX86RepStr, @@ -2329,6 +2332,19 @@ (MOV32mi addr:$dst, texternalsym:$src)>; // Calls +def : Pat<(X86tailcall R32:$dst), + (CALL32r R32:$dst)>; + +def : Pat<(X86tailcall (loadi32 addr:$dst)), + (CALL32m addr:$dst)>; + +def : Pat<(X86tailcall tglobaladdr:$dst), + (CALLpcrel32 tglobaladdr:$dst)>; +def : Pat<(X86tailcall texternalsym:$dst), + (CALLpcrel32 texternalsym:$dst)>; + + + def : Pat<(X86call tglobaladdr:$dst), (CALLpcrel32 tglobaladdr:$dst)>; def : Pat<(X86call texternalsym:$dst), From alenhar2 at cs.uiuc.edu Thu Apr 27 10:50:26 2006 From: alenhar2 at cs.uiuc.edu (Andrew Lenharth) Date: Thu, 27 Apr 2006 10:50:26 -0500 Subject: [llvm-commits] CVS: llvm-poolalloc/runtime/FL2Allocator/PoolAllocator.cpp PoolAllocator.h Message-ID: <200604271550.KAA16242@zion.cs.uiuc.edu> Changes in directory llvm-poolalloc/runtime/FL2Allocator: PoolAllocator.cpp updated: 1.51 -> 1.52 PoolAllocator.h updated: 1.25 -> 1.26 --- Log message: realloc_pc --- Diffs of the changes: (+16 -7) PoolAllocator.cpp | 20 ++++++++++++++------ PoolAllocator.h | 3 ++- 2 files changed, 16 insertions(+), 7 deletions(-) Index: llvm-poolalloc/runtime/FL2Allocator/PoolAllocator.cpp diff -u llvm-poolalloc/runtime/FL2Allocator/PoolAllocator.cpp:1.51 llvm-poolalloc/runtime/FL2Allocator/PoolAllocator.cpp:1.52 --- llvm-poolalloc/runtime/FL2Allocator/PoolAllocator.cpp:1.51 Thu Feb 16 09:50:09 2006 +++ llvm-poolalloc/runtime/FL2Allocator/PoolAllocator.cpp Thu Apr 27 10:50:02 2006 @@ -538,9 +538,11 @@ } template -static void *poolalloc_internal(PoolTy *Pool, unsigned NumBytes) { +static void *poolalloc_internal(PoolTy *Pool, unsigned NumBytesA) { DO_IF_TRACE(fprintf(stderr, "[%d] poolalloc%s(%d) -> ", - getPoolNumber(Pool), PoolTraits::getSuffix(), NumBytes)); + getPoolNumber(Pool), PoolTraits::getSuffix(), NumBytesA)); + + int NumBytes = NumBytesA; // If a null pool descriptor is passed in, this is not a pool allocated data // structure. Hand off to the system malloc. @@ -783,9 +785,9 @@ DO_IF_TRACE(fprintf(stderr, "0x%X (system realloc)\n", Result)); return Result; } - if (Node == 0) return poolalloc(Pool, NumBytes); + if (Node == 0) return poolalloc_internal(Pool, NumBytes); if (NumBytes == 0) { - poolfree(Pool, Node); + poolfree_internal(Pool, Node); DO_IF_TRACE(fprintf(stderr, "freed\n")); return 0; } @@ -797,12 +799,12 @@ if (Size != ~1U) { // FIXME: This is obviously much worse than it could be. In particular, we // never try to expand something in a pool. This might hurt some programs! - void *New = poolalloc(Pool, NumBytes); + void *New = poolalloc_internal(Pool, NumBytes); assert(New != 0 && "Our poolalloc doesn't ever return null for failure!"); // Copy the min of the new and old sizes over. memcpy(New, Node, Size < NumBytes ? Size : NumBytes); - poolfree(Pool, Node); + poolfree_internal(Pool, Node); DO_IF_TRACE(fprintf(stderr, "0x%X (moved)\n", New)); return New; } @@ -972,6 +974,12 @@ poolfree_internal(Pool, (char*)Pool->Slabs+Node); } +unsigned long long poolrealloc_pc(PoolTy *Pool, + unsigned long long Node, unsigned NumBytes) { + void *Result = poolrealloc_internal(Pool, (char*)Pool->Slabs+Node, NumBytes); + return (char*)Result-(char*)Pool->Slabs; +} + //===----------------------------------------------------------------------===// // Access Tracing Runtime Library Support Index: llvm-poolalloc/runtime/FL2Allocator/PoolAllocator.h diff -u llvm-poolalloc/runtime/FL2Allocator/PoolAllocator.h:1.25 llvm-poolalloc/runtime/FL2Allocator/PoolAllocator.h:1.26 --- llvm-poolalloc/runtime/FL2Allocator/PoolAllocator.h:1.25 Wed May 18 14:56:30 2005 +++ llvm-poolalloc/runtime/FL2Allocator/PoolAllocator.h Thu Apr 27 10:50:02 2006 @@ -212,7 +212,8 @@ unsigned NumBytes); void poolfree_pc(PoolTy *Pool, unsigned long long Node); //void *poolmemalign_pc(PoolTy *Pool, unsigned Alignment, unsigned NumBytes); - + unsigned long long poolrealloc_pc(PoolTy *Pool, + unsigned long long Node, unsigned NumBytes); // Access tracing runtime library support. void poolaccesstraceinit(void); From lattner at cs.uiuc.edu Thu Apr 27 16:14:11 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Thu, 27 Apr 2006 16:14:11 -0500 Subject: [llvm-commits] CVS: llvm/test/Regression/Transforms/InstCombine/vec_insert_to_shuffle.ll Message-ID: <200604272114.QAA21495@zion.cs.uiuc.edu> Changes in directory llvm/test/Regression/Transforms/InstCombine: vec_insert_to_shuffle.ll added (r1.1) --- Log message: This should turn into one vector shuffle instruction. --- Diffs of the changes: (+14 -0) vec_insert_to_shuffle.ll | 14 ++++++++++++++ 1 files changed, 14 insertions(+) Index: llvm/test/Regression/Transforms/InstCombine/vec_insert_to_shuffle.ll diff -c /dev/null llvm/test/Regression/Transforms/InstCombine/vec_insert_to_shuffle.ll:1.1 *** /dev/null Thu Apr 27 16:14:08 2006 --- llvm/test/Regression/Transforms/InstCombine/vec_insert_to_shuffle.ll Thu Apr 27 16:13:58 2006 *************** *** 0 **** --- 1,14 ---- + ; RUN: llvm-as < %s | opt -instcombine | llvm-dis | grep shufflevec | wc -l | grep 1 && + ; RUN: llvm-as < %s | opt -instcombine | llvm-dis | not grep insertelement && + ; RUN: llvm-as < %s | opt -instcombine | llvm-dis | not grep extractelement + + <4 x float> %test(<4 x float> %tmp, <4 x float> %tmp1) { + %tmp4 = extractelement <4 x float> %tmp, uint 1 ; [#uses=1] + %tmp2 = extractelement <4 x float> %tmp, uint 3 ; [#uses=1] + %tmp1 = extractelement <4 x float> %tmp1, uint 0 ; [#uses=1] + %tmp128 = insertelement <4 x float> undef, float %tmp4, uint 0 ; <<4 x float>> [#uses=1] + %tmp130 = insertelement <4 x float> %tmp128, float undef, uint 1 ; <<4 x float>> [#uses=1] + %tmp132 = insertelement <4 x float> %tmp130, float %tmp2, uint 2 ; <<4 x float>> [#uses=1] + %tmp134 = insertelement <4 x float> %tmp132, float %tmp1, uint 3 ; <<4 x float>> [#uses=1] + ret <4 x float> %tmp134 + } From lattner at cs.uiuc.edu Thu Apr 27 16:14:34 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Thu, 27 Apr 2006 16:14:34 -0500 Subject: [llvm-commits] CVS: llvm/lib/Transforms/Scalar/InstructionCombining.cpp Message-ID: <200604272114.QAA21526@zion.cs.uiuc.edu> Changes in directory llvm/lib/Transforms/Scalar: InstructionCombining.cpp updated: 1.471 -> 1.472 --- Log message: Add support for inserting undef into a vector. This implements Transforms/InstCombine/vec_insert_to_shuffle.ll --- Diffs of the changes: (+14 -3) InstructionCombining.cpp | 17 ++++++++++++++--- 1 files changed, 14 insertions(+), 3 deletions(-) Index: llvm/lib/Transforms/Scalar/InstructionCombining.cpp diff -u llvm/lib/Transforms/Scalar/InstructionCombining.cpp:1.471 llvm/lib/Transforms/Scalar/InstructionCombining.cpp:1.472 --- llvm/lib/Transforms/Scalar/InstructionCombining.cpp:1.471 Thu Apr 20 10:41:37 2006 +++ llvm/lib/Transforms/Scalar/InstructionCombining.cpp Thu Apr 27 16:14:21 2006 @@ -6959,12 +6959,23 @@ Value *ScalarOp = IEI->getOperand(1); Value *IdxOp = IEI->getOperand(2); - if (ExtractElementInst *EI = dyn_cast(ScalarOp)) { - if (isa(EI->getOperand(1)) && isa(IdxOp) && + if (!isa(IdxOp)) + return false; + unsigned InsertedIdx = cast(IdxOp)->getRawValue(); + + if (isa(ScalarOp)) { // inserting undef into vector. + // Okay, we can handle this if the vector we are insertinting into is + // transitively ok. + if (CollectSingleShuffleElements(VecOp, LHS, RHS, Mask)) { + // If so, update the mask to reflect the inserted undef. + Mask[InsertedIdx] = UndefValue::get(Type::UIntTy); + return true; + } + } else if (ExtractElementInst *EI = dyn_cast(ScalarOp)){ + if (isa(EI->getOperand(1)) && EI->getOperand(0)->getType() == V->getType()) { unsigned ExtractedIdx = cast(EI->getOperand(1))->getRawValue(); - unsigned InsertedIdx = cast(IdxOp)->getRawValue(); // This must be extracting from either LHS or RHS. if (EI->getOperand(0) == LHS || EI->getOperand(0) == RHS) { From lattner at cs.uiuc.edu Thu Apr 27 16:41:10 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Thu, 27 Apr 2006 16:41:10 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/README.txt Message-ID: <200604272141.QAA22219@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: README.txt updated: 1.99 -> 1.100 --- Log message: Add a note --- Diffs of the changes: (+44 -0) README.txt | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 44 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.99 llvm/lib/Target/X86/README.txt:1.100 --- llvm/lib/Target/X86/README.txt:1.99 Thu Apr 27 03:31:33 2006 +++ llvm/lib/Target/X86/README.txt Thu Apr 27 16:40:57 2006 @@ -1144,3 +1144,47 @@ ret since paddw is commutative. + +//===---------------------------------------------------------------------===// + +This testcase: + +%G1 = weak global <4 x float> zeroinitializer ; <<4 x float>*> [#uses=1] +%G2 = weak global <4 x float> zeroinitializer ; <<4 x float>*> [#uses=1] +%G3 = weak global <4 x float> zeroinitializer ; <<4 x float>*> [#uses=1] +%G4 = weak global <4 x float> zeroinitializer ; <<4 x float>*> [#uses=1] + +implementation ; Functions: + +void %test() { + %tmp = load <4 x float>* %G1 ; <<4 x float>> [#uses=2] + %tmp2 = load <4 x float>* %G2 ; <<4 x float>> [#uses=2] + %tmp135 = shufflevector <4 x float> %tmp, <4 x float> %tmp2, <4 x uint> < uint 0, uint 4, uint 1, uint 5 > ; <<4 x float>> [#uses=1] + store <4 x float> %tmp135, <4 x float>* %G3 + %tmp293 = shufflevector <4 x float> %tmp, <4 x float> %tmp2, <4 x uint> < uint 1, uint undef, uint 3, uint 4 > ; <<4 x float>> [#uses=1] + store <4 x float> %tmp293, <4 x float>* %G4 + ret void +} + +Compiles (llc -march=x86 -mcpu=yonah -relocation-model=static) to: + +_test: + movaps _G2, %xmm0 + movaps _G1, %xmm1 + movaps %xmm1, %xmm2 +2) shufps $3, %xmm0, %xmm2 + movaps %xmm1, %xmm3 +2) shufps $1, %xmm0, %xmm3 +1) unpcklps %xmm0, %xmm1 +2) shufps $128, %xmm2, %xmm3 +1) movaps %xmm1, _G3 + movaps %xmm3, _G4 + ret + +The 1) marked instructions could be scheduled better for reduced register +pressure. The scheduling issue is more pronounced without -static. + +The 2) marked instructions are the lowered form of the 1,undef,3,4 +shufflevector. It seems that there should be a better way to do it :) + + From lattner at cs.uiuc.edu Thu Apr 27 18:10:53 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Thu, 27 Apr 2006 18:10:53 -0500 Subject: [llvm-commits] CVS: llvm-www/www-index.html Message-ID: <200604272310.SAA22996@zion.cs.uiuc.edu> Changes in directory llvm-www: www-index.html updated: 1.127 -> 1.128 --- Log message: Grammar-o's and updates, thanks to Gabor! --- Diffs of the changes: (+2 -2) www-index.html | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) Index: llvm-www/www-index.html diff -u llvm-www/www-index.html:1.127 llvm-www/www-index.html:1.128 --- llvm-www/www-index.html:1.127 Thu Apr 20 02:21:41 2006 +++ llvm-www/www-index.html Thu Apr 27 18:10:40 2006 @@ -35,8 +35,8 @@ GCC-based C & C++ front-end, a link-time optimization framework with a growing set of global and interprocedural analyses and transformations, static back-ends for the X86, -PowerPC, IA-64, Alpha, & SPARC architectures, a back-end which emits -portable C code, and a Just-In-Time compiler for X86, PowerPC, and SPARC V9 +PowerPC, IA-64, Alpha and SPARC architectures, a back-end which emits +portable C code, and a Just-In-Time compiler for X86 and PowerPC processors.

  • LLVM does not imply things that you would expect from a From lattner at cs.uiuc.edu Thu Apr 27 19:04:38 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Thu, 27 Apr 2006 19:04:38 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/PowerPC/README_ALTIVEC.txt Message-ID: <200604280004.TAA23384@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/PowerPC: README_ALTIVEC.txt updated: 1.33 -> 1.34 --- Log message: Add a note --- Diffs of the changes: (+8 -0) README_ALTIVEC.txt | 8 ++++++++ 1 files changed, 8 insertions(+) Index: llvm/lib/Target/PowerPC/README_ALTIVEC.txt diff -u llvm/lib/Target/PowerPC/README_ALTIVEC.txt:1.33 llvm/lib/Target/PowerPC/README_ALTIVEC.txt:1.34 --- llvm/lib/Target/PowerPC/README_ALTIVEC.txt:1.33 Wed Apr 19 11:22:38 2006 +++ llvm/lib/Target/PowerPC/README_ALTIVEC.txt Thu Apr 27 19:04:05 2006 @@ -169,3 +169,11 @@ blr //===----------------------------------------------------------------------===// + +CodeGen/PowerPC/vec_constants.ll has an and operation that should be +codegen'd to andc. The issue is that the 'all ones' build vector is +SelectNodeTo'd a VSPLTISB instruction node before the and/xor is selected +which prevents the vnot pattern from matching. + + +//===----------------------------------------------------------------------===// From evan.cheng at apple.com Thu Apr 27 20:21:50 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Thu, 27 Apr 2006 20:21:50 -0500 Subject: [llvm-commits] CVS: llvm/test/Regression/CodeGen/X86/2006-04-27-ISelFoldingBug.ll Message-ID: <200604280121.UAA24455@zion.cs.uiuc.edu> Changes in directory llvm/test/Regression/CodeGen/X86: 2006-04-27-ISelFoldingBug.ll added (r1.1) --- Log message: Test case for PR748: http://llvm.cs.uiuc.edu/PR748 --- Diffs of the changes: (+31 -0) 2006-04-27-ISelFoldingBug.ll | 31 +++++++++++++++++++++++++++++++ 1 files changed, 31 insertions(+) Index: llvm/test/Regression/CodeGen/X86/2006-04-27-ISelFoldingBug.ll diff -c /dev/null llvm/test/Regression/CodeGen/X86/2006-04-27-ISelFoldingBug.ll:1.1 *** /dev/null Thu Apr 27 20:21:47 2006 --- llvm/test/Regression/CodeGen/X86/2006-04-27-ISelFoldingBug.ll Thu Apr 27 20:21:37 2006 *************** *** 0 **** --- 1,31 ---- + ; RUN: llvm-as < %s | llc -march=x86 -relocation-model=static | grep 'movl _last' | wc -l | grep 2 + + %block = external global ubyte* ; [#uses=1] + %last = external global int ; [#uses=3] + + implementation ; Functions: + + bool %loadAndRLEsource_no_exit_2E_1_label_2E_0(int %tmp.21.reload, int %tmp.8) { + newFuncRoot: + br label %label.0 + + label.0.no_exit.1_crit_edge.exitStub: ; preds = %label.0 + ret bool true + + codeRepl5.exitStub: ; preds = %label.0 + ret bool false + + label.0: ; preds = %newFuncRoot + %tmp.35 = load int* %last ; [#uses=1] + %inc.1 = add int %tmp.35, 1 ; [#uses=2] + store int %inc.1, int* %last + %tmp.36 = load ubyte** %block ; [#uses=1] + %tmp.38 = getelementptr ubyte* %tmp.36, int %inc.1 ; [#uses=1] + %tmp.40 = cast int %tmp.21.reload to ubyte ; [#uses=1] + store ubyte %tmp.40, ubyte* %tmp.38 + %tmp.910 = load int* %last ; [#uses=1] + %tmp.1111 = setlt int %tmp.910, %tmp.8 ; [#uses=1] + %tmp.1412 = setne int %tmp.21.reload, 257 ; [#uses=1] + %tmp.1613 = and bool %tmp.1111, %tmp.1412 ; [#uses=1] + br bool %tmp.1613, label %label.0.no_exit.1_crit_edge.exitStub, label %codeRepl5.exitStub + } From lattner at cs.uiuc.edu Thu Apr 27 20:47:03 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Thu, 27 Apr 2006 20:47:03 -0500 Subject: [llvm-commits] CVS: llvm/lib/CodeGen/VirtRegMap.cpp Message-ID: <200604280147.UAA24600@zion.cs.uiuc.edu> Changes in directory llvm/lib/CodeGen: VirtRegMap.cpp updated: 1.58 -> 1.59 --- Log message: When we have a two-address instruction where the input cannot be clobbered and is already available, instead of falling back to emitting a load, fall back to emitting a reg-reg copy. This generates significantly better code for some SSE testcases, as SSE has lots of two-address instructions and none of them are read/modify/write. As one example, this change does: pshufd %XMM5, XMMWORD PTR [%ESP + 84], 255 xorps %XMM2, %XMM5 cmpltps %XMM1, %XMM0 - movaps XMMWORD PTR [%ESP + 52], %XMM0 - movapd %XMM6, XMMWORD PTR [%ESP + 52] + movaps %XMM6, %XMM0 cmpltps %XMM6, XMMWORD PTR [%ESP + 68] movapd XMMWORD PTR [%ESP + 52], %XMM6 movaps %XMM6, %XMM0 cmpltps %XMM6, XMMWORD PTR [%ESP + 36] cmpltps %XMM3, %XMM0 - movaps XMMWORD PTR [%ESP + 20], %XMM0 - movapd %XMM7, XMMWORD PTR [%ESP + 20] + movaps %XMM7, %XMM0 cmpltps %XMM7, XMMWORD PTR [%ESP + 4] movapd XMMWORD PTR [%ESP + 20], %XMM7 cmpltps %XMM4, %XMM0 ... which is far better than a store followed by a load! --- Diffs of the changes: (+63 -25) VirtRegMap.cpp | 88 ++++++++++++++++++++++++++++++++++++++++----------------- 1 files changed, 63 insertions(+), 25 deletions(-) Index: llvm/lib/CodeGen/VirtRegMap.cpp diff -u llvm/lib/CodeGen/VirtRegMap.cpp:1.58 llvm/lib/CodeGen/VirtRegMap.cpp:1.59 --- llvm/lib/CodeGen/VirtRegMap.cpp:1.58 Fri Feb 24 20:17:31 2006 +++ llvm/lib/CodeGen/VirtRegMap.cpp Thu Apr 27 20:46:50 2006 @@ -558,33 +558,71 @@ unsigned PhysReg; // Check to see if this stack slot is available. - if ((PhysReg = Spills.getSpillSlotPhysReg(StackSlot)) && - // Don't reuse it for a def&use operand if we aren't allowed to change - // the physreg! - (!MO.isDef() || Spills.canClobberPhysReg(StackSlot))) { - // If this stack slot value is already available, reuse it! - DEBUG(std::cerr << "Reusing SS#" << StackSlot << " from physreg " - << MRI->getName(PhysReg) << " for vreg" - << VirtReg <<" instead of reloading into physreg " - << MRI->getName(VRM.getPhys(VirtReg)) << "\n"); - MI.SetMachineOperandReg(i, PhysReg); + if ((PhysReg = Spills.getSpillSlotPhysReg(StackSlot))) { - // The only technical detail we have is that we don't know that - // PhysReg won't be clobbered by a reloaded stack slot that occurs - // later in the instruction. In particular, consider 'op V1, V2'. - // If V1 is available in physreg R0, we would choose to reuse it - // here, instead of reloading it into the register the allocator - // indicated (say R1). However, V2 might have to be reloaded - // later, and it might indicate that it needs to live in R0. When - // this occurs, we need to have information available that - // indicates it is safe to use R1 for the reload instead of R0. + // Don't reuse it for a def&use operand if we aren't allowed to change + // the physreg! + if (!MO.isDef() || Spills.canClobberPhysReg(StackSlot)) { + // If this stack slot value is already available, reuse it! + DEBUG(std::cerr << "Reusing SS#" << StackSlot << " from physreg " + << MRI->getName(PhysReg) << " for vreg" + << VirtReg <<" instead of reloading into physreg " + << MRI->getName(VRM.getPhys(VirtReg)) << "\n"); + MI.SetMachineOperandReg(i, PhysReg); + + // The only technical detail we have is that we don't know that + // PhysReg won't be clobbered by a reloaded stack slot that occurs + // later in the instruction. In particular, consider 'op V1, V2'. + // If V1 is available in physreg R0, we would choose to reuse it + // here, instead of reloading it into the register the allocator + // indicated (say R1). However, V2 might have to be reloaded + // later, and it might indicate that it needs to live in R0. When + // this occurs, we need to have information available that + // indicates it is safe to use R1 for the reload instead of R0. + // + // To further complicate matters, we might conflict with an alias, + // or R0 and R1 might not be compatible with each other. In this + // case, we actually insert a reload for V1 in R1, ensuring that + // we can get at R0 or its alias. + ReusedOperands.addReuse(i, StackSlot, PhysReg, + VRM.getPhys(VirtReg), VirtReg); + ++NumReused; + continue; + } + + // Otherwise we have a situation where we have a two-address instruction + // whose mod/ref operand needs to be reloaded. This reload is already + // available in some register "PhysReg", but if we used PhysReg as the + // operand to our 2-addr instruction, the instruction would modify + // PhysReg. This isn't cool if something later uses PhysReg and expects + // to get its initial value. // - // To further complicate matters, we might conflict with an alias, - // or R0 and R1 might not be compatible with each other. In this - // case, we actually insert a reload for V1 in R1, ensuring that - // we can get at R0 or its alias. - ReusedOperands.addReuse(i, StackSlot, PhysReg, - VRM.getPhys(VirtReg), VirtReg); + // To avoid this problem, and to avoid doing a load right after a store, + // we emit a copy from PhysReg into the designated register for this + // operand. + unsigned DesignatedReg = VRM.getPhys(VirtReg); + assert(DesignatedReg && "Must map virtreg to physreg!"); + + // Note that, if we reused a register for a previous operand, the + // register we want to reload into might not actually be + // available. If this occurs, use the register indicated by the + // reuser. + if (ReusedOperands.hasReuses()) + DesignatedReg = ReusedOperands.GetRegForReload(DesignatedReg, &MI, + Spills, MaybeDeadStores); + + const TargetRegisterClass* RC = + MBB.getParent()->getSSARegMap()->getRegClass(VirtReg); + + PhysRegsUsed[DesignatedReg] = true; + MRI->copyRegToReg(MBB, &MI, DesignatedReg, PhysReg, RC); + + // This invalidates DesignatedReg. + Spills.ClobberPhysReg(DesignatedReg); + + Spills.addAvailable(StackSlot, DesignatedReg); + MI.SetMachineOperandReg(i, DesignatedReg); + DEBUG(std::cerr << '\t' << *prior(MII)); ++NumReused; continue; } From evan.cheng at apple.com Thu Apr 27 21:08:23 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Thu, 27 Apr 2006 21:08:23 -0500 Subject: [llvm-commits] CVS: llvm/utils/TableGen/DAGISelEmitter.cpp Message-ID: <200604280208.VAA24704@zion.cs.uiuc.edu> Changes in directory llvm/utils/TableGen: DAGISelEmitter.cpp updated: 1.197 -> 1.198 --- Log message: When isel'ing a node, mark its operands "InFlight" before selecting them. These nodes should not be folded into other nodes. This fixes the miscompilation of PR 749: http://llvm.cs.uiuc.edu/PR749 . Temporarily under flag control. --- Diffs of the changes: (+34 -2) DAGISelEmitter.cpp | 36 ++++++++++++++++++++++++++++++++++-- 1 files changed, 34 insertions(+), 2 deletions(-) Index: llvm/utils/TableGen/DAGISelEmitter.cpp diff -u llvm/utils/TableGen/DAGISelEmitter.cpp:1.197 llvm/utils/TableGen/DAGISelEmitter.cpp:1.198 --- llvm/utils/TableGen/DAGISelEmitter.cpp:1.197 Sat Apr 22 13:53:45 2006 +++ llvm/utils/TableGen/DAGISelEmitter.cpp Thu Apr 27 21:08:10 2006 @@ -2022,6 +2022,8 @@ // Names of all the folded nodes which produce chains. std::vector > FoldedChains; std::set Duplicates; + /// These nodes are being marked "in-flight" so they cannot be folded. + std::vector InflightNodes; /// GeneratedCode - This is the buffer that we emit code to. The first bool /// indicates whether this is an exit predicate (something that should be @@ -2128,6 +2130,9 @@ OpNo = 1; if (!isRoot) { const SDNodeInfo &CInfo = ISE.getSDNodeInfo(N->getOperator()); + // Not in flight? + emitCheck("(FoldNodeInFlight || InFlightSet.count(" + + RootName + ".Val) == 0)"); // Multiple uses of actual result? emitCheck(RootName + ".hasOneUse()"); EmittedUseCheck = true; @@ -2381,6 +2386,10 @@ Code += ", Tmp" + utostr(i + ResNo); emitCheck(Code + ")"); + for (unsigned i = 0; i < NumRes; ++i) { + emitCode("InFlightSet.insert(Tmp" + utostr(i+ResNo) + ".Val);"); + InflightNodes.push_back("Tmp" + utostr(i+ResNo)); + } for (unsigned i = 0; i < NumRes; ++i) emitCode("Select(Tmp" + utostr(i+ResNo) + ", Tmp" + utostr(i+ResNo) + ");"); @@ -2392,8 +2401,9 @@ // node even if it isn't one. Don't select it. if (LikeLeaf) emitCode("Tmp" + utostr(ResNo) + " = " + Val + ";"); - else + else { emitCode("Select(Tmp" + utostr(ResNo) + ", " + Val + ");"); + } if (isRoot && N->isLeaf()) { emitCode("Result = Tmp" + utostr(ResNo) + ";"); @@ -2477,9 +2487,24 @@ } } - // Emit all of the operands. + // Make sure these operands which would be selected won't be folded while + // the isel traverses the DAG upward. std::vector > NumTemps(EmitOrder.size()); for (unsigned i = 0, e = EmitOrder.size(); i != e; ++i) { + TreePatternNode *Child = EmitOrder[i].second; + if (!Child->getName().empty()) { + std::string &Val = VariableMap[Child->getName()]; + assert(!Val.empty() && + "Variable referenced but not defined and not caught earlier!"); + if (Child->isLeaf() && !NodeGetComplexPattern(Child, ISE)) { + emitCode("InFlightSet.insert(" + Val + ".Val);"); + InflightNodes.push_back(Val); + } + } + } + + // Emit all of the operands. + for (unsigned i = 0, e = EmitOrder.size(); i != e; ++i) { unsigned OpOrder = EmitOrder[i].first; TreePatternNode *Child = EmitOrder[i].second; std::pair NumTemp = EmitResultCode(Child); @@ -2500,6 +2525,10 @@ if (HasInFlag || HasOptInFlag || HasImpInputs) EmitInFlagSelectCode(Pattern, "N", ChainEmitted, true); + // The operands have been selected. Remove them from InFlightSet. + for (std::vector::iterator AI = InflightNodes.begin(), + AE = InflightNodes.end(); AI != AE; ++AI) + emitCode("InFlightSet.erase(" + *AI + ".Val);"); unsigned NumResults = Inst.getNumResults(); unsigned ResNo = TmpNo++; if (!isRoot) { @@ -3373,6 +3402,9 @@ OS << "// Instance var to keep track of mapping of place handle nodes\n" << "// and their replacement nodes.\n"; OS << "std::map ReplaceMap;\n"; + OS << "// Keep track of nodes that are currently being selecte and therefore\n" + << "// should not be folded.\n"; + OS << "std::set InFlightSet;\n"; OS << "\n"; OS << "static void findNonImmUse(SDNode* Use, SDNode* Def, bool &found, " From evan.cheng at apple.com Thu Apr 27 21:09:34 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Thu, 27 Apr 2006 21:09:34 -0500 Subject: [llvm-commits] CVS: llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp Message-ID: <200604280209.VAA24727@zion.cs.uiuc.edu> Changes in directory llvm/lib/CodeGen/SelectionDAG: SelectionDAGISel.cpp updated: 1.228 -> 1.229 --- Log message: Added a temporary option -no-isel-fold-inflight to control whether a "inflight" node can be folded. --- Diffs of the changes: (+12 -1) SelectionDAGISel.cpp | 13 ++++++++++++- 1 files changed, 12 insertions(+), 1 deletion(-) Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp diff -u llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1.228 llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1.229 --- llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1.228 Thu Apr 27 03:29:42 2006 +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp Thu Apr 27 21:09:19 2006 @@ -58,6 +58,14 @@ static const bool ViewISelDAGs = 0, ViewSchedDAGs = 0; #endif +namespace { +static cl::opt +NoFoldNodeInFlight( + "no-isel-fold-inflight", + cl::Hidden, + cl::desc("Do not attempt to fold a node even if it is being selected")); +} + // Scheduling heuristics enum SchedHeuristics { defaultScheduling, // Let the target specify its preference. @@ -3170,7 +3178,10 @@ DAG.Combine(true); if (ViewISelDAGs) DAG.viewGraph(); - + + // TEMPORARY. + FoldNodeInFlight = !NoFoldNodeInFlight; + // Third, instruction select all of the operations to machine code, adding the // code to the MachineBasicBlock. InstructionSelectBasicBlock(DAG); From evan.cheng at apple.com Thu Apr 27 21:09:34 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Thu, 27 Apr 2006 21:09:34 -0500 Subject: [llvm-commits] CVS: llvm/include/llvm/CodeGen/SelectionDAGISel.h Message-ID: <200604280209.VAA24723@zion.cs.uiuc.edu> Changes in directory llvm/include/llvm/CodeGen: SelectionDAGISel.h updated: 1.14 -> 1.15 --- Log message: Added a temporary option -no-isel-fold-inflight to control whether a "inflight" node can be folded. --- Diffs of the changes: (+1 -0) SelectionDAGISel.h | 1 + 1 files changed, 1 insertion(+) Index: llvm/include/llvm/CodeGen/SelectionDAGISel.h diff -u llvm/include/llvm/CodeGen/SelectionDAGISel.h:1.14 llvm/include/llvm/CodeGen/SelectionDAGISel.h:1.15 --- llvm/include/llvm/CodeGen/SelectionDAGISel.h:1.14 Sun Apr 23 01:26:20 2006 +++ llvm/include/llvm/CodeGen/SelectionDAGISel.h Thu Apr 27 21:09:19 2006 @@ -40,6 +40,7 @@ SSARegMap *RegMap; SelectionDAG *CurDAG; MachineBasicBlock *BB; + bool FoldNodeInFlight; SelectionDAGISel(TargetLowering &tli) : TLI(tli), JT(0,0,0,0) {} From evan.cheng at apple.com Thu Apr 27 21:12:20 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Thu, 27 Apr 2006 21:12:20 -0500 Subject: [llvm-commits] CVS: llvm-test/Makefile.programs Message-ID: <200604280212.VAA24772@zion.cs.uiuc.edu> Changes in directory llvm-test: Makefile.programs updated: 1.203 -> 1.204 --- Log message: Temporarily set -no-isel-fold-inflight to LLCBETAOPTION. --- Diffs of the changes: (+1 -1) Makefile.programs | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) Index: llvm-test/Makefile.programs diff -u llvm-test/Makefile.programs:1.203 llvm-test/Makefile.programs:1.204 --- llvm-test/Makefile.programs:1.203 Sat Apr 8 16:16:16 2006 +++ llvm-test/Makefile.programs Thu Apr 27 21:12:07 2006 @@ -197,7 +197,7 @@ LLCBETAOPTION := -sched=simple endif ifeq ($(ARCH),x86) -LLCBETAOPTION := -enable-x86-fastcc +LLCBETAOPTION := -no-isel-fold-inflight endif ifeq ($(ARCH),Sparc) LLCBETAOPTION := -enable-sparc-v9-insts From evan.cheng at apple.com Thu Apr 27 21:23:48 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Thu, 27 Apr 2006 21:23:48 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86RegisterInfo.cpp Message-ID: <200604280223.VAA24830@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86RegisterInfo.cpp updated: 1.144 -> 1.145 --- Log message: Use movaps instead of movapd for spill / restore. --- Diffs of the changes: (+2 -2) X86RegisterInfo.cpp | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) Index: llvm/lib/Target/X86/X86RegisterInfo.cpp diff -u llvm/lib/Target/X86/X86RegisterInfo.cpp:1.144 llvm/lib/Target/X86/X86RegisterInfo.cpp:1.145 --- llvm/lib/Target/X86/X86RegisterInfo.cpp:1.144 Mon Apr 24 00:37:35 2006 +++ llvm/lib/Target/X86/X86RegisterInfo.cpp Thu Apr 27 21:23:35 2006 @@ -63,7 +63,7 @@ } else if (RC == &X86::FR64RegClass) { Opc = X86::MOVSDmr; } else if (RC == &X86::VR128RegClass) { - Opc = X86::MOVAPDmr; + Opc = X86::MOVAPSmr; } else { assert(0 && "Unknown regclass"); abort(); @@ -89,7 +89,7 @@ } else if (RC == &X86::FR64RegClass) { Opc = X86::MOVSDrm; } else if (RC == &X86::VR128RegClass) { - Opc = X86::MOVAPDrm; + Opc = X86::MOVAPSrm; } else { assert(0 && "Unknown regclass"); abort(); From lattner at cs.uiuc.edu Thu Apr 27 23:14:44 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Thu, 27 Apr 2006 23:14:44 -0500 Subject: [llvm-commits] CVS: llvm/test/Regression/Transforms/Reassociate/2006-04-27-ReassociateVector.ll Message-ID: <200604280414.XAA25421@zion.cs.uiuc.edu> Changes in directory llvm/test/Regression/Transforms/Reassociate: 2006-04-27-ReassociateVector.ll added (r1.1) --- Log message: new testcase --- Diffs of the changes: (+7 -0) 2006-04-27-ReassociateVector.ll | 7 +++++++ 1 files changed, 7 insertions(+) Index: llvm/test/Regression/Transforms/Reassociate/2006-04-27-ReassociateVector.ll diff -c /dev/null llvm/test/Regression/Transforms/Reassociate/2006-04-27-ReassociateVector.ll:1.1 *** /dev/null Thu Apr 27 23:14:39 2006 --- llvm/test/Regression/Transforms/Reassociate/2006-04-27-ReassociateVector.ll Thu Apr 27 23:14:29 2006 *************** *** 0 **** --- 1,7 ---- + ; RUN: llvm-as < %s | opt -reassociate -disable-output + + void %foo() { + %tmp162 = sub <4 x float> zeroinitializer, zeroinitializer + %tmp164 = mul <4 x float> zeroinitializer, %tmp162 + ret void + } From lattner at cs.uiuc.edu Thu Apr 27 23:15:02 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Thu, 27 Apr 2006 23:15:02 -0500 Subject: [llvm-commits] CVS: llvm/lib/Transforms/Scalar/Reassociate.cpp Message-ID: <200604280415.XAA25437@zion.cs.uiuc.edu> Changes in directory llvm/lib/Transforms/Scalar: Reassociate.cpp updated: 1.60 -> 1.61 --- Log message: Fix Transforms/Reassociate/2006-04-27-ReassociateVector.ll --- Diffs of the changes: (+3 -2) Reassociate.cpp | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) Index: llvm/lib/Transforms/Scalar/Reassociate.cpp diff -u llvm/lib/Transforms/Scalar/Reassociate.cpp:1.60 llvm/lib/Transforms/Scalar/Reassociate.cpp:1.61 --- llvm/lib/Transforms/Scalar/Reassociate.cpp:1.60 Tue Mar 14 10:04:29 2006 +++ llvm/lib/Transforms/Scalar/Reassociate.cpp Thu Apr 27 23:14:49 2006 @@ -23,10 +23,10 @@ #define DEBUG_TYPE "reassociate" #include "llvm/Transforms/Scalar.h" #include "llvm/Constants.h" +#include "llvm/DerivedTypes.h" #include "llvm/Function.h" #include "llvm/Instructions.h" #include "llvm/Pass.h" -#include "llvm/Type.h" #include "llvm/Assembly/Writer.h" #include "llvm/Support/CFG.h" #include "llvm/Support/Debug.h" @@ -754,7 +754,8 @@ } // Reject cases where it is pointless to do this. - if (!isa(BI) || BI->getType()->isFloatingPoint()) + if (!isa(BI) || BI->getType()->isFloatingPoint() || + isa(BI->getType())) continue; // Floating point ops are not associative. // If this is a subtract instruction which is not already in negate form, From lattner at cs.uiuc.edu Thu Apr 27 23:43:31 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Thu, 27 Apr 2006 23:43:31 -0500 Subject: [llvm-commits] CVS: llvm/lib/CodeGen/VirtRegMap.cpp Message-ID: <200604280443.XAA25697@zion.cs.uiuc.edu> Changes in directory llvm/lib/CodeGen: VirtRegMap.cpp updated: 1.59 -> 1.60 --- Log message: Mapping of physregs can make it so that the designated and input physregs are the same. In this case, don't emit a noop copy. --- Diffs of the changes: (+13 -0) VirtRegMap.cpp | 13 +++++++++++++ 1 files changed, 13 insertions(+) Index: llvm/lib/CodeGen/VirtRegMap.cpp diff -u llvm/lib/CodeGen/VirtRegMap.cpp:1.59 llvm/lib/CodeGen/VirtRegMap.cpp:1.60 --- llvm/lib/CodeGen/VirtRegMap.cpp:1.59 Thu Apr 27 20:46:50 2006 +++ llvm/lib/CodeGen/VirtRegMap.cpp Thu Apr 27 23:43:18 2006 @@ -611,6 +611,19 @@ DesignatedReg = ReusedOperands.GetRegForReload(DesignatedReg, &MI, Spills, MaybeDeadStores); + // If the mapped designated register is actually the physreg we have + // incoming, we don't need to inserted a dead copy. + if (DesignatedReg == PhysReg) { + // If this stack slot value is already available, reuse it! + DEBUG(std::cerr << "Reusing SS#" << StackSlot << " from physreg " + << MRI->getName(PhysReg) << " for vreg" + << VirtReg + << " instead of reloading into same physreg.\n"); + MI.SetMachineOperandReg(i, PhysReg); + ++NumReused; + continue; + } + const TargetRegisterClass* RC = MBB.getParent()->getSSARegMap()->getRegClass(VirtReg); From evan.cheng at apple.com Fri Apr 28 00:25:30 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Fri, 28 Apr 2006 00:25:30 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86ISelLowering.cpp Message-ID: <200604280525.AAA28414@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86ISelLowering.cpp updated: 1.194 -> 1.195 --- Log message: TargetLowering::LowerArguments should return a VBIT_CONVERT of FORMAL_ARGUMENTS SDOperand in the return result vector. --- Diffs of the changes: (+1 -1) X86ISelLowering.cpp | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.194 llvm/lib/Target/X86/X86ISelLowering.cpp:1.195 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.194 Thu Apr 27 03:40:39 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Fri Apr 28 00:25:15 2006 @@ -453,7 +453,7 @@ assert(Op.getOpcode() == ISD::AssertSext || Op.getOpcode() == ISD::AssertZext); Objs.push_back(Op.getOperand(0)); - } else if (Opc == ISD::FP_ROUND) { + } else if (Opc == ISD::FP_ROUND || Opc == ISD::VBIT_CONVERT) { Objs.push_back(Op.getOperand(0)); } else if (Opc == ISD::BUILD_PAIR) { Objs.push_back(Op.getOperand(0)); From evan.cheng at apple.com Fri Apr 28 00:25:30 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Fri, 28 Apr 2006 00:25:30 -0500 Subject: [llvm-commits] CVS: llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp Message-ID: <200604280525.AAA28418@zion.cs.uiuc.edu> Changes in directory llvm/lib/CodeGen/SelectionDAG: SelectionDAGISel.cpp updated: 1.229 -> 1.230 --- Log message: TargetLowering::LowerArguments should return a VBIT_CONVERT of FORMAL_ARGUMENTS SDOperand in the return result vector. --- Diffs of the changes: (+14 -16) SelectionDAGISel.cpp | 30 ++++++++++++++---------------- 1 files changed, 14 insertions(+), 16 deletions(-) Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp diff -u llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1.229 llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1.230 --- llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1.229 Thu Apr 27 21:09:19 2006 +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp Fri Apr 28 00:25:15 2006 @@ -2447,14 +2447,25 @@ } else { // Otherwise, this is a vector type. We only support legal vectors // right now. - unsigned NumElems = cast(I->getType())->getNumElements(); - const Type *EltTy = cast(I->getType())->getElementType(); + const PackedType *PTy = cast(I->getType()); + unsigned NumElems = PTy->getNumElements(); + const Type *EltTy = PTy->getElementType(); // Figure out if there is a Packed type corresponding to this Vector // type. If so, convert to the packed type. MVT::ValueType TVT = MVT::getVectorType(getValueType(EltTy), NumElems); if (TVT != MVT::Other && isTypeLegal(TVT)) { - Ops.push_back(SDOperand(Result, i++)); + SDOperand N = SDOperand(Result, i++); + // Handle copies from generic vectors to registers. + MVT::ValueType PTyElementVT, PTyLegalElementVT; + unsigned NE = getPackedTypeBreakdown(PTy, PTyElementVT, + PTyLegalElementVT); + // Insert a VBIT_CONVERT of the FORMAL_ARGUMENTS to a + // "N x PTyElementVT" MVT::Vector type. + N = DAG.getNode(ISD::VBIT_CONVERT, MVT::Vector, N, + DAG.getConstant(NE, MVT::i32), + DAG.getValueType(PTyElementVT)); + Ops.push_back(N); } else { assert(0 && "Don't support illegal by-val vector arguments yet!"); } @@ -2996,19 +3007,6 @@ if (!AI->use_empty()) { SDL.setValue(AI, Args[a]); - MVT::ValueType VT = TLI.getValueType(AI->getType()); - if (VT == MVT::Vector) { - // Insert a VBIT_CONVERT between the FORMAL_ARGUMENT node and its uses. - // Or else legalizer will balk. - BasicBlock::iterator InsertPt = BB->begin(); - Value *NewVal = new CastInst(AI, AI->getType(), AI->getName(), InsertPt); - for (Value::use_iterator UI = AI->use_begin(), E = AI->use_end(); - UI != E; ++UI) { - Instruction *User = cast(*UI); - if (User != NewVal) - User->replaceUsesOfWith(AI, NewVal); - } - } // If this argument is live outside of the entry block, insert a copy from // whereever we got it to the vreg that other BB's will reference it as. if (FuncInfo.ValueMap.count(AI)) { From lattner at cs.uiuc.edu Fri Apr 28 00:36:38 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Fri, 28 Apr 2006 00:36:38 -0500 Subject: [llvm-commits] CVS: llvm/lib/Support/CommandLine.cpp Message-ID: <200604280536.AAA28528@zion.cs.uiuc.edu> Changes in directory llvm/lib/Support: CommandLine.cpp updated: 1.66 -> 1.67 --- Log message: Fix PR743: http://llvm.cs.uiuc.edu/PR743 : emit -help output of a tool to cout, not cerr. --- Diffs of the changes: (+20 -20) CommandLine.cpp | 40 ++++++++++++++++++++-------------------- 1 files changed, 20 insertions(+), 20 deletions(-) Index: llvm/lib/Support/CommandLine.cpp diff -u llvm/lib/Support/CommandLine.cpp:1.66 llvm/lib/Support/CommandLine.cpp:1.67 --- llvm/lib/Support/CommandLine.cpp:1.66 Mon Jan 16 18:32:28 2006 +++ llvm/lib/Support/CommandLine.cpp Fri Apr 28 00:36:25 2006 @@ -690,10 +690,10 @@ return std::strlen(ArgStr)+6; } -// Print out the option for the alias... +// Print out the option for the alias. void alias::printOptionInfo(unsigned GlobalWidth) const { unsigned L = std::strlen(ArgStr); - std::cerr << " -" << ArgStr << std::string(GlobalWidth-L-6, ' ') << " - " + std::cout << " -" << ArgStr << std::string(GlobalWidth-L-6, ' ') << " - " << HelpStr << "\n"; } @@ -720,12 +720,12 @@ // void basic_parser_impl::printOptionInfo(const Option &O, unsigned GlobalWidth) const { - std::cerr << " -" << O.ArgStr; + std::cout << " -" << O.ArgStr; if (const char *ValName = getValueName()) - std::cerr << "=<" << getValueStr(O, ValName) << ">"; + std::cout << "=<" << getValueStr(O, ValName) << ">"; - std::cerr << std::string(GlobalWidth-getOptionWidth(O), ' ') << " - " + std::cout << std::string(GlobalWidth-getOptionWidth(O), ' ') << " - " << O.HelpStr << "\n"; } @@ -842,20 +842,20 @@ unsigned GlobalWidth) const { if (O.hasArgStr()) { unsigned L = std::strlen(O.ArgStr); - std::cerr << " -" << O.ArgStr << std::string(GlobalWidth-L-6, ' ') + std::cout << " -" << O.ArgStr << std::string(GlobalWidth-L-6, ' ') << " - " << O.HelpStr << "\n"; for (unsigned i = 0, e = getNumOptions(); i != e; ++i) { unsigned NumSpaces = GlobalWidth-strlen(getOption(i))-8; - std::cerr << " =" << getOption(i) << std::string(NumSpaces, ' ') + std::cout << " =" << getOption(i) << std::string(NumSpaces, ' ') << " - " << getDescription(i) << "\n"; } } else { if (O.HelpStr[0]) - std::cerr << " " << O.HelpStr << "\n"; + std::cout << " " << O.HelpStr << "\n"; for (unsigned i = 0, e = getNumOptions(); i != e; ++i) { unsigned L = std::strlen(getOption(i)); - std::cerr << " -" << getOption(i) << std::string(GlobalWidth-L-8, ' ') + std::cout << " -" << getOption(i) << std::string(GlobalWidth-L-8, ' ') << " - " << getDescription(i) << "\n"; } } @@ -909,9 +909,9 @@ } if (ProgramOverview) - std::cerr << "OVERVIEW:" << ProgramOverview << "\n"; + std::cout << "OVERVIEW:" << ProgramOverview << "\n"; - std::cerr << "USAGE: " << ProgramName << " [options]"; + std::cout << "USAGE: " << ProgramName << " [options]"; // Print out the positional options... std::vector &PosOpts = getPositionalOpts(); @@ -921,28 +921,28 @@ for (unsigned i = CAOpt != 0, e = PosOpts.size(); i != e; ++i) { if (PosOpts[i]->ArgStr[0]) - std::cerr << " --" << PosOpts[i]->ArgStr; - std::cerr << " " << PosOpts[i]->HelpStr; + std::cout << " --" << PosOpts[i]->ArgStr; + std::cout << " " << PosOpts[i]->HelpStr; } // Print the consume after option info if it exists... - if (CAOpt) std::cerr << " " << CAOpt->HelpStr; + if (CAOpt) std::cout << " " << CAOpt->HelpStr; - std::cerr << "\n\n"; + std::cout << "\n\n"; // Compute the maximum argument length... MaxArgLen = 0; for (unsigned i = 0, e = Options.size(); i != e; ++i) MaxArgLen = std::max(MaxArgLen, Options[i].second->getOptionWidth()); - std::cerr << "OPTIONS:\n"; + std::cout << "OPTIONS:\n"; for (unsigned i = 0, e = Options.size(); i != e; ++i) Options[i].second->printOptionInfo(MaxArgLen); // Print any extra help the user has declared. for (std::vector::iterator I = MoreHelp().begin(), E = MoreHelp().end(); I != E; ++I) - std::cerr << *I; + std::cout << *I; MoreHelp().clear(); // Halt the program since help information was printed @@ -955,12 +955,12 @@ public: void operator=(bool OptionWasSpecified) { if (OptionWasSpecified) { - std::cerr << "Low Level Virtual Machine (" << PACKAGE_NAME << ") " + std::cout << "Low Level Virtual Machine (" << PACKAGE_NAME << ") " << PACKAGE_VERSION << " (see http://llvm.org/)"; #ifndef NDEBUG - std::cerr << " DEBUG BUILD\n"; + std::cout << " ASSERTIONS ENABLED\n"; #else - std::cerr << "\n"; + std::cout << "\n"; #endif getOpts().clear(); // Don't bother making option dtors remove from map. exit(1); From evan.cheng at apple.com Fri Apr 28 02:03:52 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Fri, 28 Apr 2006 02:03:52 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86ISelLowering.cpp Message-ID: <200604280703.CAA29115@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86ISelLowering.cpp updated: 1.195 -> 1.196 --- Log message: Implement four-wide shuffle with 2 shufps if no more than two elements come from each vector. e.g. shuffle(G1, G2, 7, 1, 5, 2) ==> movaps _G2, %xmm0 shufps $151, _G1, %xmm0 shufps $216, %xmm0, %xmm0 --- Diffs of the changes: (+47 -2) X86ISelLowering.cpp | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 47 insertions(+), 2 deletions(-) Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.195 llvm/lib/Target/X86/X86ISelLowering.cpp:1.196 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.195 Fri Apr 28 00:25:15 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Fri Apr 28 02:03:38 2006 @@ -2676,10 +2676,55 @@ } if (NumElems == 4) { - // Break it into (shuffle shuffle_hi, shuffle_lo). MVT::ValueType MaskVT = PermMask.getValueType(); MVT::ValueType MaskEVT = MVT::getVectorBaseType(MaskVT); - std::map > Locs; + std::vector > Locs; + Locs.reserve(NumElems); + std::vector Mask1(NumElems, DAG.getNode(ISD::UNDEF, MaskEVT)); + std::vector Mask2(NumElems, DAG.getNode(ISD::UNDEF, MaskEVT)); + unsigned NumHi = 0; + unsigned NumLo = 0; + // If no more than two elements come from either vector. This can be + // implemented with two shuffles. First shuffle gather the elements. + // The second shuffle, which takes the first shuffle as both of its + // vector operands, put the elements into the right order. + for (unsigned i = 0; i != NumElems; ++i) { + SDOperand Elt = PermMask.getOperand(i); + if (Elt.getOpcode() == ISD::UNDEF) { + Locs[i] = std::make_pair(-1, -1); + } else { + unsigned Val = cast(Elt)->getValue(); + if (Val < NumElems) { + Locs[i] = std::make_pair(0, NumLo); + Mask1[NumLo] = Elt; + NumLo++; + } else { + Locs[i] = std::make_pair(1, NumHi); + if (2+NumHi < NumElems) + Mask1[2+NumHi] = Elt; + NumHi++; + } + } + } + if (NumLo <= 2 && NumHi <= 2) { + V1 = DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V2, + DAG.getNode(ISD::BUILD_VECTOR, MaskVT, Mask1)); + for (unsigned i = 0; i != NumElems; ++i) { + if (Locs[i].first == -1) + continue; + else { + unsigned Idx = (i < NumElems/2) ? 0 : NumElems; + Idx += Locs[i].first * (NumElems/2) + Locs[i].second; + Mask2[i] = DAG.getConstant(Idx, MaskEVT); + } + } + + return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, V1, + DAG.getNode(ISD::BUILD_VECTOR, MaskVT, Mask2)); + } + + // Break it into (shuffle shuffle_hi, shuffle_lo). + Locs.clear(); std::vector LoMask(NumElems, DAG.getNode(ISD::UNDEF, MaskEVT)); std::vector HiMask(NumElems, DAG.getNode(ISD::UNDEF, MaskEVT)); std::vector *MaskPtr = &LoMask; From bocchino at persephone.cs.uiuc.edu Fri Apr 28 12:16:54 2006 From: bocchino at persephone.cs.uiuc.edu (Robert L. Bocchino Jr.) Date: Fri, 28 Apr 2006 12:16:54 -0500 (CDT) Subject: [llvm-commits] CVS: llvm-www/pubs/2006-06-15-VEE-VectorLLVA.html 2006-06-15-VEE-VectorLLVA.html~ 2006-06-15-VEE-VectorLLVA.pdf index.html Message-ID: <20060428171654.532A62443B6A@persephone.cs.uiuc.edu> Changes in directory llvm-www/pubs: 2006-06-15-VEE-VectorLLVA.html added (r1.1) 2006-06-15-VEE-VectorLLVA.html~ added (r1.1) 2006-06-15-VEE-VectorLLVA.pdf added (r1.1) index.html updated: 1.37 -> 1.38 --- Log message: Added VEE Vector LLVA paper to the pubs list. --- Diffs of the changes: (+103 -0) 2006-06-15-VEE-VectorLLVA.html | 50 ++++++++++++++++++++++++++++++++++++++++ 2006-06-15-VEE-VectorLLVA.html~ | 50 ++++++++++++++++++++++++++++++++++++++++ index.html | 3 ++ 3 files changed, 103 insertions(+) Index: llvm-www/pubs/2006-06-15-VEE-VectorLLVA.html diff -c /dev/null llvm-www/pubs/2006-06-15-VEE-VectorLLVA.html:1.1 *** /dev/null Fri Apr 28 12:16:32 2006 --- llvm-www/pubs/2006-06-15-VEE-VectorLLVA.html Fri Apr 28 12:16:21 2006 *************** *** 0 **** --- 1,50 ---- + + + + + + Vector LLVA: A Virtual Vector Instruction Set for Media Processing + + + +

    + Vector LLVA: A Virtual Vector Instruction Set for Media Processing +
    +
    + Robert L. Bocchino Jr. and Vikram S. Adve +
    + +

    Abstract:

    +
    + We present Vector LLVA, a virtual instruction set architecture (V-ISA) + that exposes extensive static information about vector parallelism + while avoiding the use of hardware-specific parameters. We provide + both arbitrary-length vectors (for targets that allow vectors of + arbitrary length, or where the target length is not known) and + fixed-length vectors (for targets that have a fixed vector length, + such as subword SIMD extensions), together with a rich set of + operations on both vector types. We have implemented translators that + compile (1) Vector LLVA written with arbitrary-length vectors to the + Motorola RSVP architecture and (2) Vector LLVA written with + fixed-length vectors to both AltiVec and Intel SSE2. Our + translator-generated code achieves speedups competitive with + handwritten native code versions of several benchmarks on all three + architectures. These experiments show that our V-ISA design captures + vector parallelism for two quite different classes of architectures + and provides virtual object code portability within the class of + subword SIMD architectures. +
    + +

    Published:

    +
    + "Vector LLVA: A Virtual Vector Instruction Set for Media Processing", Robert L. Bocchino Jr. and Vikram S. Adve.
    + Proceedings of the Second International Conference on Virtual Execution Environments (VEE '06), Ottawa, Canada, 2006. +
    + +

    Download:

    + + + + Index: llvm-www/pubs/2006-06-15-VEE-VectorLLVA.html~ diff -c /dev/null llvm-www/pubs/2006-06-15-VEE-VectorLLVA.html~:1.1 *** /dev/null Fri Apr 28 12:16:53 2006 --- llvm-www/pubs/2006-06-15-VEE-VectorLLVA.html~ Fri Apr 28 12:16:21 2006 *************** *** 0 **** --- 1,50 ---- + + + + + + Vector LLVA: A Virtual Vector Instruction Set for Media Processing + + + +
    + Vector LLVA: A Virtual Vector Instruction Set for Media Processing +
    +
    + Robert L. Bocchino Jr. and Vikram S. Adve +
    + +

    Abstract:

    +
    + We present Vector LLVA, a virtual instruction set architecture (V-ISA) + that exposes extensive static information about vector parallelism + while avoiding the use of hardware-specific parameters. We provide + both arbitrary-length vectors (for targets that allow vectors of + arbitrary length, or where the target length is not known) and + fixed-length vectors (for targets that have a fixed vector length, + such as subword SIMD extensions), together with a rich set of + operations on both vector types. We have implemented translators that + compile (1) Vector LLVA written with arbitrary-length vectors to the + Motorola RSVP architecture and (2) Vector LLVA written with + fixed-length vectors to both AltiVec and Intel SSE2. Our + translator-generated code achieves speedups competitive with + handwritten native code versions of several benchmarks on all three + architectures. These experiments show that our V-ISA design captures + vector parallelism for two quite different classes of architectures + and provides virtual object code portability within the class of + subword SIMD architectures. +
    + +

    Published:

    +
    + "Vector LLVA: A Virtual Vector Instruction Set for Media Processing", Robert L. Bocchino Jr. and Vikram S. Adve.
    + Proceedings of the Second International Conference on Virtual Execution Environments (VEE '06), Ottawa, Canada, 2006. +
    + +

    Download:

    + + + + Index: llvm-www/pubs/2006-06-15-VEE-VectorLLVA.pdf Index: llvm-www/pubs/index.html diff -u llvm-www/pubs/index.html:1.37 llvm-www/pubs/index.html:1.38 --- llvm-www/pubs/index.html:1.37 Tue Apr 25 23:46:11 2006 +++ llvm-www/pubs/index.html Fri Apr 28 12:16:22 2006 @@ -38,6 +38,9 @@
      +
    1. "Vector LLVA: A Virtual Vector Instruction Set for Media Processing"
      Robert L. Bocchino Jr. and Vikram S. Adve.
      +Proc. of the Second International Conference on Virtual Execution Environments (VEE'06), Ottawa, Canada, 2006.
    2. +
    3. "Tailoring Graph-coloring Register Allocation For Runtime Compilation"
      Keith D. Cooper and Anshuman Dasgupta
      Proc. of the 2006 International Symposium on Code Generation and Optimization (CGO'06), New York, New York, 2006.
    4. "How Successful is Data Structure Analysis in Isolating and Analyzing From evan.cheng at apple.com Fri Apr 28 13:54:25 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Fri, 28 Apr 2006 13:54:25 -0500 Subject: [llvm-commits] CVS: llvm/include/llvm/CodeGen/SelectionDAGISel.h Message-ID: <200604281854.NAA17434@zion.cs.uiuc.edu> Changes in directory llvm/include/llvm/CodeGen: SelectionDAGISel.h updated: 1.15 -> 1.16 --- Log message: Remove the temporary option: -no-isel-fold-inflight --- Diffs of the changes: (+0 -1) SelectionDAGISel.h | 1 - 1 files changed, 1 deletion(-) Index: llvm/include/llvm/CodeGen/SelectionDAGISel.h diff -u llvm/include/llvm/CodeGen/SelectionDAGISel.h:1.15 llvm/include/llvm/CodeGen/SelectionDAGISel.h:1.16 --- llvm/include/llvm/CodeGen/SelectionDAGISel.h:1.15 Thu Apr 27 21:09:19 2006 +++ llvm/include/llvm/CodeGen/SelectionDAGISel.h Fri Apr 28 13:54:11 2006 @@ -40,7 +40,6 @@ SSARegMap *RegMap; SelectionDAG *CurDAG; MachineBasicBlock *BB; - bool FoldNodeInFlight; SelectionDAGISel(TargetLowering &tli) : TLI(tli), JT(0,0,0,0) {} From evan.cheng at apple.com Fri Apr 28 13:54:26 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Fri, 28 Apr 2006 13:54:26 -0500 Subject: [llvm-commits] CVS: llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp Message-ID: <200604281854.NAA17438@zion.cs.uiuc.edu> Changes in directory llvm/lib/CodeGen/SelectionDAG: SelectionDAGISel.cpp updated: 1.230 -> 1.231 --- Log message: Remove the temporary option: -no-isel-fold-inflight --- Diffs of the changes: (+0 -11) SelectionDAGISel.cpp | 11 ----------- 1 files changed, 11 deletions(-) Index: llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp diff -u llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1.230 llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1.231 --- llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1.230 Fri Apr 28 00:25:15 2006 +++ llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp Fri Apr 28 13:54:11 2006 @@ -58,14 +58,6 @@ static const bool ViewISelDAGs = 0, ViewSchedDAGs = 0; #endif -namespace { -static cl::opt -NoFoldNodeInFlight( - "no-isel-fold-inflight", - cl::Hidden, - cl::desc("Do not attempt to fold a node even if it is being selected")); -} - // Scheduling heuristics enum SchedHeuristics { defaultScheduling, // Let the target specify its preference. @@ -3177,9 +3169,6 @@ if (ViewISelDAGs) DAG.viewGraph(); - // TEMPORARY. - FoldNodeInFlight = !NoFoldNodeInFlight; - // Third, instruction select all of the operations to machine code, adding the // code to the MachineBasicBlock. InstructionSelectBasicBlock(DAG); From evan.cheng at apple.com Fri Apr 28 13:54:26 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Fri, 28 Apr 2006 13:54:26 -0500 Subject: [llvm-commits] CVS: llvm/utils/TableGen/DAGISelEmitter.cpp Message-ID: <200604281854.NAA17446@zion.cs.uiuc.edu> Changes in directory llvm/utils/TableGen: DAGISelEmitter.cpp updated: 1.198 -> 1.199 --- Log message: Remove the temporary option: -no-isel-fold-inflight --- Diffs of the changes: (+1 -2) DAGISelEmitter.cpp | 3 +-- 1 files changed, 1 insertion(+), 2 deletions(-) Index: llvm/utils/TableGen/DAGISelEmitter.cpp diff -u llvm/utils/TableGen/DAGISelEmitter.cpp:1.198 llvm/utils/TableGen/DAGISelEmitter.cpp:1.199 --- llvm/utils/TableGen/DAGISelEmitter.cpp:1.198 Thu Apr 27 21:08:10 2006 +++ llvm/utils/TableGen/DAGISelEmitter.cpp Fri Apr 28 13:54:11 2006 @@ -2131,8 +2131,7 @@ if (!isRoot) { const SDNodeInfo &CInfo = ISE.getSDNodeInfo(N->getOperator()); // Not in flight? - emitCheck("(FoldNodeInFlight || InFlightSet.count(" - + RootName + ".Val) == 0)"); + emitCheck("InFlightSet.count(" + RootName + ".Val) == 0"); // Multiple uses of actual result? emitCheck(RootName + ".hasOneUse()"); EmittedUseCheck = true; From evan.cheng at apple.com Fri Apr 28 13:54:26 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Fri, 28 Apr 2006 13:54:26 -0500 Subject: [llvm-commits] CVS: llvm-test/Makefile.programs Message-ID: <200604281854.NAA17442@zion.cs.uiuc.edu> Changes in directory llvm-test: Makefile.programs updated: 1.204 -> 1.205 --- Log message: Remove the temporary option: -no-isel-fold-inflight --- Diffs of the changes: (+1 -1) Makefile.programs | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) Index: llvm-test/Makefile.programs diff -u llvm-test/Makefile.programs:1.204 llvm-test/Makefile.programs:1.205 --- llvm-test/Makefile.programs:1.204 Thu Apr 27 21:12:07 2006 +++ llvm-test/Makefile.programs Fri Apr 28 13:54:11 2006 @@ -197,7 +197,7 @@ LLCBETAOPTION := -sched=simple endif ifeq ($(ARCH),x86) -LLCBETAOPTION := -no-isel-fold-inflight +LLCBETAOPTION := -enable-x86-fastcc endif ifeq ($(ARCH),Sparc) LLCBETAOPTION := -enable-sparc-v9-insts From evan.cheng at apple.com Fri Apr 28 13:55:45 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Fri, 28 Apr 2006 13:55:45 -0500 Subject: [llvm-commits] CVS: llvm/test/Regression/CodeGen/X86/vec_shuffle-4.ll Message-ID: <200604281855.NAA17489@zion.cs.uiuc.edu> Changes in directory llvm/test/Regression/CodeGen/X86: vec_shuffle-4.ll updated: 1.3 -> 1.4 --- Log message: Update. It should use two shufps, not three! --- Diffs of the changes: (+1 -1) vec_shuffle-4.ll | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) Index: llvm/test/Regression/CodeGen/X86/vec_shuffle-4.ll diff -u llvm/test/Regression/CodeGen/X86/vec_shuffle-4.ll:1.3 llvm/test/Regression/CodeGen/X86/vec_shuffle-4.ll:1.4 --- llvm/test/Regression/CodeGen/X86/vec_shuffle-4.ll:1.3 Thu Apr 20 23:58:23 2006 +++ llvm/test/Regression/CodeGen/X86/vec_shuffle-4.ll Fri Apr 28 13:55:34 2006 @@ -1,5 +1,5 @@ ; RUN: llvm-as < %s | llc -march=x86 -mattr=+sse2 && -; RUN: llvm-as < %s | llc -march=x86 -mattr=+sse2 | grep shuf | wc -l | grep 3 && +; RUN: llvm-as < %s | llc -march=x86 -mattr=+sse2 | grep shuf | wc -l | grep 2 && ; RUN: llvm-as < %s | llc -march=x86 -mattr=+sse2 | not grep unpck void %test(<4 x float>* %res, <4 x float>* %A, <4 x float>* %B, <4 x float>* %C) { %tmp3 = load <4 x float>* %B From evan.cheng at apple.com Fri Apr 28 16:19:18 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Fri, 28 Apr 2006 16:19:18 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86ATTAsmPrinter.cpp X86ATTAsmPrinter.h X86IntelAsmPrinter.cpp X86IntelAsmPrinter.h Message-ID: <200604282119.QAA23406@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86ATTAsmPrinter.cpp updated: 1.34 -> 1.35 X86ATTAsmPrinter.h updated: 1.10 -> 1.11 X86IntelAsmPrinter.cpp updated: 1.27 -> 1.28 X86IntelAsmPrinter.h updated: 1.11 -> 1.12 --- Log message: Bare-bone X86 inline asm printer support. --- Diffs of the changes: (+66 -2) X86ATTAsmPrinter.cpp | 28 ++++++++++++++++++++++++++++ X86ATTAsmPrinter.h | 6 +++++- X86IntelAsmPrinter.cpp | 28 ++++++++++++++++++++++++++++ X86IntelAsmPrinter.h | 6 +++++- 4 files changed, 66 insertions(+), 2 deletions(-) Index: llvm/lib/Target/X86/X86ATTAsmPrinter.cpp diff -u llvm/lib/Target/X86/X86ATTAsmPrinter.cpp:1.34 llvm/lib/Target/X86/X86ATTAsmPrinter.cpp:1.35 --- llvm/lib/Target/X86/X86ATTAsmPrinter.cpp:1.34 Sat Apr 22 13:53:45 2006 +++ llvm/lib/Target/X86/X86ATTAsmPrinter.cpp Fri Apr 28 16:19:05 2006 @@ -264,6 +264,34 @@ O << "\"L" << getFunctionNumber() << "$pb\":"; } +/// PrintAsmOperand - Print out an operand for an inline asm expression. +/// +bool X86ATTAsmPrinter::PrintAsmOperand(const MachineInstr *MI, unsigned OpNo, + unsigned AsmVariant, + const char *ExtraCode) { + // Does this asm operand have a single letter operand modifier? + if (ExtraCode && ExtraCode[0]) { + if (ExtraCode[1] != 0) return true; // Unknown modifier. + + switch (ExtraCode[0]) { + default: return true; // Unknown modifier. + } + } + + printOperand(MI, OpNo); + return false; +} + +bool X86ATTAsmPrinter::PrintAsmMemoryOperand(const MachineInstr *MI, + unsigned OpNo, + unsigned AsmVariant, + const char *ExtraCode) { + if (ExtraCode && ExtraCode[0]) + return true; // Unknown modifier. + printMemReference(MI, OpNo); + return false; +} + /// printMachineInstruction -- Print out a single X86 LLVM instruction /// MI in Intel syntax to the current output stream. /// Index: llvm/lib/Target/X86/X86ATTAsmPrinter.h diff -u llvm/lib/Target/X86/X86ATTAsmPrinter.h:1.10 llvm/lib/Target/X86/X86ATTAsmPrinter.h:1.11 --- llvm/lib/Target/X86/X86ATTAsmPrinter.h:1.10 Mon Mar 13 17:20:37 2006 +++ llvm/lib/Target/X86/X86ATTAsmPrinter.h Fri Apr 28 16:19:05 2006 @@ -61,7 +61,11 @@ printMemReference(MI, OpNo); } - void printMachineInstruction(const MachineInstr *MI); + bool PrintAsmOperand(const MachineInstr *MI, unsigned OpNo, + unsigned AsmVariant, const char *ExtraCode); + bool PrintAsmMemoryOperand(const MachineInstr *MI, unsigned OpNo, + unsigned AsmVariant, const char *ExtraCode); + void printMachineInstruction(const MachineInstr *MI); void printSSECC(const MachineInstr *MI, unsigned Op); void printMemReference(const MachineInstr *MI, unsigned Op); void printPICLabel(const MachineInstr *MI, unsigned Op); Index: llvm/lib/Target/X86/X86IntelAsmPrinter.cpp diff -u llvm/lib/Target/X86/X86IntelAsmPrinter.cpp:1.27 llvm/lib/Target/X86/X86IntelAsmPrinter.cpp:1.28 --- llvm/lib/Target/X86/X86IntelAsmPrinter.cpp:1.27 Sat Apr 22 13:53:45 2006 +++ llvm/lib/Target/X86/X86IntelAsmPrinter.cpp Fri Apr 28 16:19:05 2006 @@ -242,6 +242,34 @@ O << "\"L" << getFunctionNumber() << "$pb\":"; } +/// PrintAsmOperand - Print out an operand for an inline asm expression. +/// +bool X86IntelAsmPrinter::PrintAsmOperand(const MachineInstr *MI, unsigned OpNo, + unsigned AsmVariant, + const char *ExtraCode) { + // Does this asm operand have a single letter operand modifier? + if (ExtraCode && ExtraCode[0]) { + if (ExtraCode[1] != 0) return true; // Unknown modifier. + + switch (ExtraCode[0]) { + default: return true; // Unknown modifier. + } + } + + printOperand(MI, OpNo); + return false; +} + +bool X86IntelAsmPrinter::PrintAsmMemoryOperand(const MachineInstr *MI, + unsigned OpNo, + unsigned AsmVariant, + const char *ExtraCode) { + if (ExtraCode && ExtraCode[0]) + return true; // Unknown modifier. + printMemReference(MI, OpNo); + return false; +} + /// printMachineInstruction -- Print out a single X86 LLVM instruction /// MI in Intel syntax to the current output stream. /// Index: llvm/lib/Target/X86/X86IntelAsmPrinter.h diff -u llvm/lib/Target/X86/X86IntelAsmPrinter.h:1.11 llvm/lib/Target/X86/X86IntelAsmPrinter.h:1.12 --- llvm/lib/Target/X86/X86IntelAsmPrinter.h:1.11 Mon Mar 13 17:20:37 2006 +++ llvm/lib/Target/X86/X86IntelAsmPrinter.h Fri Apr 28 16:19:05 2006 @@ -80,7 +80,11 @@ printMemReference(MI, OpNo); } - void printMachineInstruction(const MachineInstr *MI); + bool PrintAsmOperand(const MachineInstr *MI, unsigned OpNo, + unsigned AsmVariant, const char *ExtraCode); + bool PrintAsmMemoryOperand(const MachineInstr *MI, unsigned OpNo, + unsigned AsmVariant, const char *ExtraCode); + void printMachineInstruction(const MachineInstr *MI); void printOp(const MachineOperand &MO, const char *Modifier = 0); void printSSECC(const MachineInstr *MI, unsigned Op); void printMemReference(const MachineInstr *MI, unsigned Op); From evan.cheng at apple.com Fri Apr 28 16:29:50 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Fri, 28 Apr 2006 16:29:50 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86ISelLowering.cpp Message-ID: <200604282129.QAA24261@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86ISelLowering.cpp updated: 1.196 -> 1.197 --- Log message: Initial caller side support (for CCC only, not FastCC) of 128-bit vector passing by value. --- Diffs of the changes: (+73 -7) X86ISelLowering.cpp | 80 +++++++++++++++++++++++++++++++++++++++++++++++----- 1 files changed, 73 insertions(+), 7 deletions(-) Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.196 llvm/lib/Target/X86/X86ISelLowering.cpp:1.197 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.196 Fri Apr 28 02:03:38 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Fri Apr 28 16:29:37 2006 @@ -18,6 +18,7 @@ #include "X86TargetMachine.h" #include "llvm/CallingConv.h" #include "llvm/Constants.h" +#include "llvm/DerivedTypes.h" #include "llvm/Function.h" #include "llvm/Intrinsics.h" #include "llvm/ADT/VectorExtras.h" @@ -555,6 +556,11 @@ // Count how many bytes are to be pushed on the stack. unsigned NumBytes = 0; + // Keep track of the number of XMM regs passed so far. + unsigned NumXMMRegs = 0; + unsigned XMMArgRegs[] = { X86::XMM0, X86::XMM1, X86::XMM2 }; + + std::vector RegValuesToPass; if (Args.empty()) { // Save zero bytes. Chain = DAG.getCALLSEQ_START(Chain, DAG.getConstant(0, getPointerTy())); @@ -573,6 +579,12 @@ case MVT::f64: NumBytes += 8; break; + case MVT::Vector: + if (NumXMMRegs < 3) + ++NumXMMRegs; + else + NumBytes += 16; + break; } Chain = DAG.getCALLSEQ_START(Chain, @@ -580,13 +592,10 @@ // Arguments go on the stack in reverse order, as specified by the ABI. unsigned ArgOffset = 0; + NumXMMRegs = 0; SDOperand StackPtr = DAG.getRegister(X86::ESP, MVT::i32); std::vector Stores; - for (unsigned i = 0, e = Args.size(); i != e; ++i) { - SDOperand PtrOff = DAG.getConstant(ArgOffset, getPointerTy()); - PtrOff = DAG.getNode(ISD::ADD, MVT::i32, StackPtr, PtrOff); - switch (getValueType(Args[i].second)) { default: assert(0 && "Unexpected ValueType for argument!"); case MVT::i1: @@ -601,21 +610,40 @@ // FALL THROUGH case MVT::i32: - case MVT::f32: + case MVT::f32: { + SDOperand PtrOff = DAG.getConstant(ArgOffset, getPointerTy()); + PtrOff = DAG.getNode(ISD::ADD, MVT::i32, StackPtr, PtrOff); Stores.push_back(DAG.getNode(ISD::STORE, MVT::Other, Chain, Args[i].first, PtrOff, DAG.getSrcValue(NULL))); ArgOffset += 4; break; + } case MVT::i64: - case MVT::f64: + case MVT::f64: { + SDOperand PtrOff = DAG.getConstant(ArgOffset, getPointerTy()); + PtrOff = DAG.getNode(ISD::ADD, MVT::i32, StackPtr, PtrOff); Stores.push_back(DAG.getNode(ISD::STORE, MVT::Other, Chain, Args[i].first, PtrOff, DAG.getSrcValue(NULL))); ArgOffset += 8; break; } + case MVT::Vector: + if (NumXMMRegs < 3) { + RegValuesToPass.push_back(Args[i].first); + NumXMMRegs++; + } else { + SDOperand PtrOff = DAG.getConstant(ArgOffset, getPointerTy()); + PtrOff = DAG.getNode(ISD::ADD, MVT::i32, StackPtr, PtrOff); + Stores.push_back(DAG.getNode(ISD::STORE, MVT::Other, Chain, + Args[i].first, PtrOff, + DAG.getSrcValue(NULL))); + ArgOffset += 16; + } + } } + if (!Stores.empty()) Chain = DAG.getNode(ISD::TokenFactor, MVT::Other, Stores); } @@ -646,16 +674,34 @@ break; } + // Build a sequence of copy-to-reg nodes chained together with token chain + // and flag operands which copy the outgoing args into registers. + SDOperand InFlag; + for (unsigned i = 0, e = RegValuesToPass.size(); i != e; ++i) { + unsigned CCReg = XMMArgRegs[i]; + SDOperand RegToPass = RegValuesToPass[i]; + assert(RegToPass.getValueType() == MVT::Vector); + unsigned NumElems = cast(*(RegToPass.Val->op_end()-2))->getValue(); + MVT::ValueType EVT = cast(*(RegToPass.Val->op_end()-1))->getVT(); + MVT::ValueType PVT = getVectorType(EVT, NumElems); + SDOperand CCRegNode = DAG.getRegister(CCReg, PVT); + RegToPass = DAG.getNode(ISD::VBIT_CONVERT, PVT, RegToPass); + Chain = DAG.getCopyToReg(Chain, CCRegNode, RegToPass, InFlag); + InFlag = Chain.getValue(1); + } + std::vector NodeTys; NodeTys.push_back(MVT::Other); // Returns a chain NodeTys.push_back(MVT::Flag); // Returns a flag for retval copy to use. std::vector Ops; Ops.push_back(Chain); Ops.push_back(Callee); + if (InFlag.Val) + Ops.push_back(InFlag); // FIXME: Do not generate X86ISD::TAILCALL for now. Chain = DAG.getNode(X86ISD::CALL, NodeTys, Ops); - SDOperand InFlag = Chain.getValue(1); + InFlag = Chain.getValue(1); NodeTys.clear(); NodeTys.push_back(MVT::Other); // Returns a chain @@ -734,6 +780,16 @@ RetVal = DAG.getNode(ISD::FP_ROUND, MVT::f32, RetVal); break; } + case MVT::Vector: { + const PackedType *PTy = cast(RetTy); + MVT::ValueType EVT; + MVT::ValueType LVT; + unsigned NumRegs = getPackedTypeBreakdown(PTy, EVT, LVT); + assert(NumRegs == 1 && "Unsupported type!"); + RetVal = DAG.getCopyFromReg(Chain, X86::XMM0, EVT, InFlag); + Chain = RetVal.getValue(1); + break; + } } } @@ -978,8 +1034,18 @@ case MVT::f64: MF.addLiveOut(X86::ST0); break; + case MVT::Vector: { + const PackedType *PTy = cast(F.getReturnType()); + MVT::ValueType EVT; + MVT::ValueType LVT; + unsigned NumRegs = getPackedTypeBreakdown(PTy, EVT, LVT); + assert(NumRegs == 1 && "Unsupported type!"); + MF.addLiveOut(X86::XMM0); + break; + } } } + void X86TargetLowering::LowerFastCCArguments(SDOperand Op, SelectionDAG &DAG) { unsigned NumArgs = Op.Val->getNumValues(); From lattner at cs.uiuc.edu Fri Apr 28 16:52:37 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Fri, 28 Apr 2006 16:52:37 -0500 Subject: [llvm-commits] CVS: llvm/test/Regression/CodeGen/Generic/2006-04-28-Sign-extend-bool.ll Message-ID: <200604282152.QAA28436@zion.cs.uiuc.edu> Changes in directory llvm/test/Regression/CodeGen/Generic: 2006-04-28-Sign-extend-bool.ll added (r1.1) --- Log message: testcase that crashes the ppc backend, which can't sextinreg(i1) --- Diffs of the changes: (+10 -0) 2006-04-28-Sign-extend-bool.ll | 10 ++++++++++ 1 files changed, 10 insertions(+) Index: llvm/test/Regression/CodeGen/Generic/2006-04-28-Sign-extend-bool.ll diff -c /dev/null llvm/test/Regression/CodeGen/Generic/2006-04-28-Sign-extend-bool.ll:1.1 *** /dev/null Fri Apr 28 16:52:34 2006 --- llvm/test/Regression/CodeGen/Generic/2006-04-28-Sign-extend-bool.ll Fri Apr 28 16:52:24 2006 *************** *** 0 **** --- 1,10 ---- + ; RUN: llvm-as < %s | llc + + int %test(int %tmp93) { + %tmp98 = shl int %tmp93, ubyte 31 ; [#uses=1] + %tmp99 = shr int %tmp98, ubyte 31 ; [#uses=1] + %tmp99 = cast int %tmp99 to sbyte ; [#uses=1] + %tmp99100 = cast sbyte %tmp99 to int ; [#uses=1] + ret int %tmp99100 + } + From lattner at cs.uiuc.edu Fri Apr 28 16:56:24 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Fri, 28 Apr 2006 16:56:24 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/PowerPC/PPCISelLowering.cpp Message-ID: <200604282156.QAA28876@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/PowerPC: PPCISelLowering.cpp updated: 1.168 -> 1.169 --- Log message: Fix CodeGen/Generic/2006-04-28-Sign-extend-bool.ll --- Diffs of the changes: (+4 -0) PPCISelLowering.cpp | 4 ++++ 1 files changed, 4 insertions(+) Index: llvm/lib/Target/PowerPC/PPCISelLowering.cpp diff -u llvm/lib/Target/PowerPC/PPCISelLowering.cpp:1.168 llvm/lib/Target/PowerPC/PPCISelLowering.cpp:1.169 --- llvm/lib/Target/PowerPC/PPCISelLowering.cpp:1.168 Sat Apr 22 13:53:45 2006 +++ llvm/lib/Target/PowerPC/PPCISelLowering.cpp Fri Apr 28 16:56:10 2006 @@ -112,6 +112,10 @@ // PowerPC does not have truncstore for i1. setOperationAction(ISD::TRUNCSTORE, MVT::i1, Promote); + // We cannot sextinreg(i1). Expand to shifts. + setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1, Expand); + + // Support label based line numbers. setOperationAction(ISD::LOCATION, MVT::Other, Expand); setOperationAction(ISD::DEBUG_LOC, MVT::Other, Expand); From lattner at cs.uiuc.edu Fri Apr 28 17:17:33 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Fri, 28 Apr 2006 17:17:33 -0500 Subject: [llvm-commits] CVS: llvm/test/Regression/Transforms/InstCombine/2006-04-28-ShiftShiftLongLong.ll Message-ID: <200604282217.RAA30873@zion.cs.uiuc.edu> Changes in directory llvm/test/Regression/Transforms/InstCombine: 2006-04-28-ShiftShiftLongLong.ll added (r1.1) --- Log message: new testcase miscompiled by instcombine --- Diffs of the changes: (+10 -0) 2006-04-28-ShiftShiftLongLong.ll | 10 ++++++++++ 1 files changed, 10 insertions(+) Index: llvm/test/Regression/Transforms/InstCombine/2006-04-28-ShiftShiftLongLong.ll diff -c /dev/null llvm/test/Regression/Transforms/InstCombine/2006-04-28-ShiftShiftLongLong.ll:1.1 *** /dev/null Fri Apr 28 17:17:30 2006 --- llvm/test/Regression/Transforms/InstCombine/2006-04-28-ShiftShiftLongLong.ll Fri Apr 28 17:17:20 2006 *************** *** 0 **** --- 1,10 ---- + ; RUN: llvm-as < %s | opt -instcombine | llvm-dis | grep shl && + ; RUN: llvm-as < %s | opt -instcombine | llvm-dis | not grep cast + + ; This cannot be turned into a sign extending cast! + + long %test(long %X) { + %Y = shl long %X, ubyte 16 + %Z = shr long %Y, ubyte 16 + ret long %Z + } From lattner at cs.uiuc.edu Fri Apr 28 17:21:54 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Fri, 28 Apr 2006 17:21:54 -0500 Subject: [llvm-commits] CVS: llvm/lib/Transforms/Scalar/InstructionCombining.cpp Message-ID: <200604282221.RAA32380@zion.cs.uiuc.edu> Changes in directory llvm/lib/Transforms/Scalar: InstructionCombining.cpp updated: 1.472 -> 1.473 --- Log message: Fix InstCombine/2006-04-28-ShiftShiftLongLong.ll --- Diffs of the changes: (+1 -1) InstructionCombining.cpp | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) Index: llvm/lib/Transforms/Scalar/InstructionCombining.cpp diff -u llvm/lib/Transforms/Scalar/InstructionCombining.cpp:1.472 llvm/lib/Transforms/Scalar/InstructionCombining.cpp:1.473 --- llvm/lib/Transforms/Scalar/InstructionCombining.cpp:1.472 Thu Apr 27 16:14:21 2006 +++ llvm/lib/Transforms/Scalar/InstructionCombining.cpp Fri Apr 28 17:21:41 2006 @@ -4435,7 +4435,7 @@ // this case, C1 == C2 and C1 is 8, 16, or 32. if (ShiftAmt1 == ShiftAmt2) { const Type *SExtType = 0; - switch (ShiftAmt1) { + switch (Op0->getType()->getPrimitiveSizeInBits() - ShiftAmt1) { case 8 : SExtType = Type::SByteTy; break; case 16: SExtType = Type::ShortTy; break; case 32: SExtType = Type::IntTy; break; From lattner at cs.uiuc.edu Fri Apr 28 17:50:59 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Fri, 28 Apr 2006 17:50:59 -0500 Subject: [llvm-commits] CVS: llvm-www/pubs/2006-06-15-VEE-VectorLLVA.html~ Message-ID: <200604282250.RAA03503@zion.cs.uiuc.edu> Changes in directory llvm-www/pubs: 2006-06-15-VEE-VectorLLVA.html~ (r1.1) removed --- Log message: Remove this --- Diffs of the changes: (+0 -0) 0 files changed From evan.cheng at apple.com Fri Apr 28 18:11:53 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Fri, 28 Apr 2006 18:11:53 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86ATTAsmPrinter.cpp X86ATTAsmPrinter.h X86IntelAsmPrinter.cpp X86IntelAsmPrinter.h Message-ID: <200604282311.SAA04004@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86ATTAsmPrinter.cpp updated: 1.35 -> 1.36 X86ATTAsmPrinter.h updated: 1.11 -> 1.12 X86IntelAsmPrinter.cpp updated: 1.28 -> 1.29 X86IntelAsmPrinter.h updated: 1.12 -> 1.13 --- Log message: Implemented x86 inline asm b, h, w, k modifiers. --- Diffs of the changes: (+249 -1) X86ATTAsmPrinter.cpp | 123 +++++++++++++++++++++++++++++++++++++++++++++++++ X86ATTAsmPrinter.h | 4 + X86IntelAsmPrinter.cpp | 122 ++++++++++++++++++++++++++++++++++++++++++++++++ X86IntelAsmPrinter.h | 1 4 files changed, 249 insertions(+), 1 deletion(-) Index: llvm/lib/Target/X86/X86ATTAsmPrinter.cpp diff -u llvm/lib/Target/X86/X86ATTAsmPrinter.cpp:1.35 llvm/lib/Target/X86/X86ATTAsmPrinter.cpp:1.36 --- llvm/lib/Target/X86/X86ATTAsmPrinter.cpp:1.35 Fri Apr 28 16:19:05 2006 +++ llvm/lib/Target/X86/X86ATTAsmPrinter.cpp Fri Apr 28 18:11:40 2006 @@ -264,6 +264,124 @@ O << "\"L" << getFunctionNumber() << "$pb\":"; } + +bool X86ATTAsmPrinter::printAsmMRegsiter(const MachineOperand &MO, + const char Mode) { + const MRegisterInfo &RI = *TM.getRegisterInfo(); + unsigned Reg = MO.getReg(); + const char *Name = RI.get(Reg).Name; + switch (Mode) { + default: return true; // Unknown mode. + case 'b': // Print QImode register + switch (Reg) { + default: return true; + case X86::AH: case X86::AL: case X86::AX: case X86::EAX: + Name = "al"; + break; + case X86::DH: case X86::DL: case X86::DX: case X86::EDX: + Name = "dl"; + break; + case X86::CH: case X86::CL: case X86::CX: case X86::ECX: + Name = "cl"; + break; + case X86::BH: case X86::BL: case X86::BX: case X86::EBX: + Name = "bl"; + break; + case X86::ESI: + Name = "sil"; + break; + case X86::EDI: + Name = "dil"; + break; + case X86::EBP: + Name = "bpl"; + break; + case X86::ESP: + Name = "spl"; + break; + } + break; + case 'h': // Print QImode high register + switch (Reg) { + default: return true; + case X86::AH: case X86::AL: case X86::AX: case X86::EAX: + Name = "al"; + break; + case X86::DH: case X86::DL: case X86::DX: case X86::EDX: + Name = "dl"; + break; + case X86::CH: case X86::CL: case X86::CX: case X86::ECX: + Name = "cl"; + break; + case X86::BH: case X86::BL: case X86::BX: case X86::EBX: + Name = "bl"; + break; + } + break; + case 'w': // Print HImode register + switch (Reg) { + default: return true; + case X86::AH: case X86::AL: case X86::AX: case X86::EAX: + Name = "ax"; + break; + case X86::DH: case X86::DL: case X86::DX: case X86::EDX: + Name = "dx"; + break; + case X86::CH: case X86::CL: case X86::CX: case X86::ECX: + Name = "cx"; + break; + case X86::BH: case X86::BL: case X86::BX: case X86::EBX: + Name = "bx"; + break; + case X86::ESI: + Name = "si"; + break; + case X86::EDI: + Name = "di"; + break; + case X86::EBP: + Name = "bp"; + break; + case X86::ESP: + Name = "sp"; + break; + } + break; + case 'k': // Print SImode register + switch (Reg) { + default: return true; + case X86::AH: case X86::AL: case X86::AX: case X86::EAX: + Name = "eax"; + break; + case X86::DH: case X86::DL: case X86::DX: case X86::EDX: + Name = "edx"; + break; + case X86::CH: case X86::CL: case X86::CX: case X86::ECX: + Name = "ecx"; + break; + case X86::BH: case X86::BL: case X86::BX: case X86::EBX: + Name = "ebx"; + break; + case X86::ESI: + Name = "esi"; + break; + case X86::EDI: + Name = "edi"; + break; + case X86::EBP: + Name = "ebp"; + break; + case X86::ESP: + Name = "esp"; + break; + } + break; + } + + O << '%' << Name; + return false; +} + /// PrintAsmOperand - Print out an operand for an inline asm expression. /// bool X86ATTAsmPrinter::PrintAsmOperand(const MachineInstr *MI, unsigned OpNo, @@ -275,6 +393,11 @@ switch (ExtraCode[0]) { default: return true; // Unknown modifier. + case 'b': // Print QImode register + case 'h': // Print QImode high register + case 'w': // Print HImode register + case 'k': // Print SImode register + return printAsmMRegsiter(MI->getOperand(OpNo), ExtraCode[0]); } } Index: llvm/lib/Target/X86/X86ATTAsmPrinter.h diff -u llvm/lib/Target/X86/X86ATTAsmPrinter.h:1.11 llvm/lib/Target/X86/X86ATTAsmPrinter.h:1.12 --- llvm/lib/Target/X86/X86ATTAsmPrinter.h:1.11 Fri Apr 28 16:19:05 2006 +++ llvm/lib/Target/X86/X86ATTAsmPrinter.h Fri Apr 28 18:11:40 2006 @@ -61,11 +61,13 @@ printMemReference(MI, OpNo); } + bool printAsmMRegsiter(const MachineOperand &MO, const char Mode); bool PrintAsmOperand(const MachineInstr *MI, unsigned OpNo, unsigned AsmVariant, const char *ExtraCode); bool PrintAsmMemoryOperand(const MachineInstr *MI, unsigned OpNo, unsigned AsmVariant, const char *ExtraCode); - void printMachineInstruction(const MachineInstr *MI); + + void printMachineInstruction(const MachineInstr *MI); void printSSECC(const MachineInstr *MI, unsigned Op); void printMemReference(const MachineInstr *MI, unsigned Op); void printPICLabel(const MachineInstr *MI, unsigned Op); Index: llvm/lib/Target/X86/X86IntelAsmPrinter.cpp diff -u llvm/lib/Target/X86/X86IntelAsmPrinter.cpp:1.28 llvm/lib/Target/X86/X86IntelAsmPrinter.cpp:1.29 --- llvm/lib/Target/X86/X86IntelAsmPrinter.cpp:1.28 Fri Apr 28 16:19:05 2006 +++ llvm/lib/Target/X86/X86IntelAsmPrinter.cpp Fri Apr 28 18:11:40 2006 @@ -242,6 +242,123 @@ O << "\"L" << getFunctionNumber() << "$pb\":"; } +bool X86IntelAsmPrinter::printAsmMRegsiter(const MachineOperand &MO, + const char Mode) { + const MRegisterInfo &RI = *TM.getRegisterInfo(); + unsigned Reg = MO.getReg(); + const char *Name = RI.get(Reg).Name; + switch (Mode) { + default: return true; // Unknown mode. + case 'b': // Print QImode register + switch (Reg) { + default: return true; + case X86::AH: case X86::AL: case X86::AX: case X86::EAX: + Name = "AL"; + break; + case X86::DH: case X86::DL: case X86::DX: case X86::EDX: + Name = "DL"; + break; + case X86::CH: case X86::CL: case X86::CX: case X86::ECX: + Name = "CL"; + break; + case X86::BH: case X86::BL: case X86::BX: case X86::EBX: + Name = "BL"; + break; + case X86::ESI: + Name = "SIL"; + break; + case X86::EDI: + Name = "DIL"; + break; + case X86::EBP: + Name = "BPL"; + break; + case X86::ESP: + Name = "SPL"; + break; + } + break; + case 'h': // Print QImode high register + switch (Reg) { + default: return true; + case X86::AH: case X86::AL: case X86::AX: case X86::EAX: + Name = "AL"; + break; + case X86::DH: case X86::DL: case X86::DX: case X86::EDX: + Name = "DL"; + break; + case X86::CH: case X86::CL: case X86::CX: case X86::ECX: + Name = "CL"; + break; + case X86::BH: case X86::BL: case X86::BX: case X86::EBX: + Name = "BL"; + break; + } + break; + case 'w': // Print HImode register + switch (Reg) { + default: return true; + case X86::AH: case X86::AL: case X86::AX: case X86::EAX: + Name = "AX"; + break; + case X86::DH: case X86::DL: case X86::DX: case X86::EDX: + Name = "DX"; + break; + case X86::CH: case X86::CL: case X86::CX: case X86::ECX: + Name = "CX"; + break; + case X86::BH: case X86::BL: case X86::BX: case X86::EBX: + Name = "BX"; + break; + case X86::ESI: + Name = "SI"; + break; + case X86::EDI: + Name = "DI"; + break; + case X86::EBP: + Name = "BP"; + break; + case X86::ESP: + Name = "SP"; + break; + } + break; + case 'k': // Print SImode register + switch (Reg) { + default: return true; + case X86::AH: case X86::AL: case X86::AX: case X86::EAX: + Name = "EAX"; + break; + case X86::DH: case X86::DL: case X86::DX: case X86::EDX: + Name = "EDX"; + break; + case X86::CH: case X86::CL: case X86::CX: case X86::ECX: + Name = "ECX"; + break; + case X86::BH: case X86::BL: case X86::BX: case X86::EBX: + Name = "EBX"; + break; + case X86::ESI: + Name = "ESI"; + break; + case X86::EDI: + Name = "EDI"; + break; + case X86::EBP: + Name = "EBP"; + break; + case X86::ESP: + Name = "ESP"; + break; + } + break; + } + + O << '%' << Name; + return false; +} + /// PrintAsmOperand - Print out an operand for an inline asm expression. /// bool X86IntelAsmPrinter::PrintAsmOperand(const MachineInstr *MI, unsigned OpNo, @@ -253,6 +370,11 @@ switch (ExtraCode[0]) { default: return true; // Unknown modifier. + case 'b': // Print QImode register + case 'h': // Print QImode high register + case 'w': // Print HImode register + case 'k': // Print SImode register + return printAsmMRegsiter(MI->getOperand(OpNo), ExtraCode[0]); } } Index: llvm/lib/Target/X86/X86IntelAsmPrinter.h diff -u llvm/lib/Target/X86/X86IntelAsmPrinter.h:1.12 llvm/lib/Target/X86/X86IntelAsmPrinter.h:1.13 --- llvm/lib/Target/X86/X86IntelAsmPrinter.h:1.12 Fri Apr 28 16:19:05 2006 +++ llvm/lib/Target/X86/X86IntelAsmPrinter.h Fri Apr 28 18:11:40 2006 @@ -80,6 +80,7 @@ printMemReference(MI, OpNo); } + bool printAsmMRegsiter(const MachineOperand &MO, const char Mode); bool PrintAsmOperand(const MachineInstr *MI, unsigned OpNo, unsigned AsmVariant, const char *ExtraCode); bool PrintAsmMemoryOperand(const MachineInstr *MI, unsigned OpNo, From evan.cheng at apple.com Fri Apr 28 18:19:51 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Fri, 28 Apr 2006 18:19:51 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86ATTAsmPrinter.cpp X86ATTAsmPrinter.h X86IntelAsmPrinter.cpp X86IntelAsmPrinter.h Message-ID: <200604282319.SAA04084@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86ATTAsmPrinter.cpp updated: 1.36 -> 1.37 X86ATTAsmPrinter.h updated: 1.12 -> 1.13 X86IntelAsmPrinter.cpp updated: 1.29 -> 1.30 X86IntelAsmPrinter.h updated: 1.13 -> 1.14 --- Log message: I can't spell: Register, not Regsiter. --- Diffs of the changes: (+6 -6) X86ATTAsmPrinter.cpp | 4 ++-- X86ATTAsmPrinter.h | 2 +- X86IntelAsmPrinter.cpp | 4 ++-- X86IntelAsmPrinter.h | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) Index: llvm/lib/Target/X86/X86ATTAsmPrinter.cpp diff -u llvm/lib/Target/X86/X86ATTAsmPrinter.cpp:1.36 llvm/lib/Target/X86/X86ATTAsmPrinter.cpp:1.37 --- llvm/lib/Target/X86/X86ATTAsmPrinter.cpp:1.36 Fri Apr 28 18:11:40 2006 +++ llvm/lib/Target/X86/X86ATTAsmPrinter.cpp Fri Apr 28 18:19:39 2006 @@ -265,7 +265,7 @@ } -bool X86ATTAsmPrinter::printAsmMRegsiter(const MachineOperand &MO, +bool X86ATTAsmPrinter::printAsmMRegister(const MachineOperand &MO, const char Mode) { const MRegisterInfo &RI = *TM.getRegisterInfo(); unsigned Reg = MO.getReg(); @@ -397,7 +397,7 @@ case 'h': // Print QImode high register case 'w': // Print HImode register case 'k': // Print SImode register - return printAsmMRegsiter(MI->getOperand(OpNo), ExtraCode[0]); + return printAsmMRegister(MI->getOperand(OpNo), ExtraCode[0]); } } Index: llvm/lib/Target/X86/X86ATTAsmPrinter.h diff -u llvm/lib/Target/X86/X86ATTAsmPrinter.h:1.12 llvm/lib/Target/X86/X86ATTAsmPrinter.h:1.13 --- llvm/lib/Target/X86/X86ATTAsmPrinter.h:1.12 Fri Apr 28 18:11:40 2006 +++ llvm/lib/Target/X86/X86ATTAsmPrinter.h Fri Apr 28 18:19:39 2006 @@ -61,7 +61,7 @@ printMemReference(MI, OpNo); } - bool printAsmMRegsiter(const MachineOperand &MO, const char Mode); + bool printAsmMRegister(const MachineOperand &MO, const char Mode); bool PrintAsmOperand(const MachineInstr *MI, unsigned OpNo, unsigned AsmVariant, const char *ExtraCode); bool PrintAsmMemoryOperand(const MachineInstr *MI, unsigned OpNo, Index: llvm/lib/Target/X86/X86IntelAsmPrinter.cpp diff -u llvm/lib/Target/X86/X86IntelAsmPrinter.cpp:1.29 llvm/lib/Target/X86/X86IntelAsmPrinter.cpp:1.30 --- llvm/lib/Target/X86/X86IntelAsmPrinter.cpp:1.29 Fri Apr 28 18:11:40 2006 +++ llvm/lib/Target/X86/X86IntelAsmPrinter.cpp Fri Apr 28 18:19:39 2006 @@ -242,7 +242,7 @@ O << "\"L" << getFunctionNumber() << "$pb\":"; } -bool X86IntelAsmPrinter::printAsmMRegsiter(const MachineOperand &MO, +bool X86IntelAsmPrinter::printAsmMRegister(const MachineOperand &MO, const char Mode) { const MRegisterInfo &RI = *TM.getRegisterInfo(); unsigned Reg = MO.getReg(); @@ -374,7 +374,7 @@ case 'h': // Print QImode high register case 'w': // Print HImode register case 'k': // Print SImode register - return printAsmMRegsiter(MI->getOperand(OpNo), ExtraCode[0]); + return printAsmMRegister(MI->getOperand(OpNo), ExtraCode[0]); } } Index: llvm/lib/Target/X86/X86IntelAsmPrinter.h diff -u llvm/lib/Target/X86/X86IntelAsmPrinter.h:1.13 llvm/lib/Target/X86/X86IntelAsmPrinter.h:1.14 --- llvm/lib/Target/X86/X86IntelAsmPrinter.h:1.13 Fri Apr 28 18:11:40 2006 +++ llvm/lib/Target/X86/X86IntelAsmPrinter.h Fri Apr 28 18:19:39 2006 @@ -80,7 +80,7 @@ printMemReference(MI, OpNo); } - bool printAsmMRegsiter(const MachineOperand &MO, const char Mode); + bool printAsmMRegister(const MachineOperand &MO, const char Mode); bool PrintAsmOperand(const MachineInstr *MI, unsigned OpNo, unsigned AsmVariant, const char *ExtraCode); bool PrintAsmMemoryOperand(const MachineInstr *MI, unsigned OpNo, From lattner at cs.uiuc.edu Fri Apr 28 18:33:32 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Fri, 28 Apr 2006 18:33:32 -0500 Subject: [llvm-commits] CVS: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Message-ID: <200604282333.SAA04287@zion.cs.uiuc.edu> Changes in directory llvm/lib/CodeGen/SelectionDAG: DAGCombiner.cpp updated: 1.150 -> 1.151 --- Log message: Remove a bogus transformation. This fixes SingleSource/UnitTests/2006-01-23-InitializedBitField.c with some changes I have to the new CFE. --- Diffs of the changes: (+0 -7) DAGCombiner.cpp | 7 ------- 1 files changed, 7 deletions(-) Index: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp diff -u llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:1.150 llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:1.151 --- llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:1.150 Fri Apr 21 10:32:26 2006 +++ llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Fri Apr 28 18:33:20 2006 @@ -1899,13 +1899,6 @@ // fold (sext_in_reg x) -> (zext_in_reg x) if the sign bit is zero if (TLI.MaskedValueIsZero(N0, 1ULL << (EVTBits-1))) return DAG.getZeroExtendInReg(N0, EVT); - // fold (sext_in_reg (srl x)) -> sra x - if (N0.getOpcode() == ISD::SRL && - N0.getOperand(1).getOpcode() == ISD::Constant && - cast(N0.getOperand(1))->getValue() == EVTBits) { - return DAG.getNode(ISD::SRA, N0.getValueType(), N0.getOperand(0), - N0.getOperand(1)); - } // fold (sext_inreg (extload x)) -> (sextload x) if (N0.getOpcode() == ISD::EXTLOAD && EVT == cast(N0.getOperand(3))->getVT() && From jeffc at jolt-lang.org Sat Apr 29 13:41:58 2006 From: jeffc at jolt-lang.org (Jeff Cohen) Date: Sat, 29 Apr 2006 13:41:58 -0500 Subject: [llvm-commits] CVS: llvm/lib/System/Win32/Path.inc Message-ID: <200604291841.NAA25319@zion.cs.uiuc.edu> Changes in directory llvm/lib/System/Win32: Path.inc updated: 1.40 -> 1.41 --- Log message: Mingw32 patches supplied by Anton Korobeynikov. --- Diffs of the changes: (+1 -1) Path.inc | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) Index: llvm/lib/System/Win32/Path.inc diff -u llvm/lib/System/Win32/Path.inc:1.40 llvm/lib/System/Win32/Path.inc:1.41 --- llvm/lib/System/Win32/Path.inc:1.40 Sat Jul 9 13:42:49 2005 +++ llvm/lib/System/Win32/Path.inc Sat Apr 29 13:41:44 2006 @@ -741,7 +741,7 @@ } void -sys::CopyFile(const sys::Path &Dest, const sys::Path &Src) { +CopyFile(const sys::Path &Dest, const sys::Path &Src) { // Can't use CopyFile macro defined in Windows.h because it would mess up the // above line. We use the expansion it would have in a non-UNICODE build. if (!::CopyFileA(Src.c_str(), Dest.c_str(), false)) From jeffc at jolt-lang.org Sat Apr 29 13:41:58 2006 From: jeffc at jolt-lang.org (Jeff Cohen) Date: Sat, 29 Apr 2006 13:41:58 -0500 Subject: [llvm-commits] CVS: llvm/lib/Target/X86/X86JITInfo.cpp Message-ID: <200604291841.NAA25323@zion.cs.uiuc.edu> Changes in directory llvm/lib/Target/X86: X86JITInfo.cpp updated: 1.16 -> 1.17 --- Log message: Mingw32 patches supplied by Anton Korobeynikov. --- Diffs of the changes: (+2 -2) X86JITInfo.cpp | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) Index: llvm/lib/Target/X86/X86JITInfo.cpp diff -u llvm/lib/Target/X86/X86JITInfo.cpp:1.16 llvm/lib/Target/X86/X86JITInfo.cpp:1.17 --- llvm/lib/Target/X86/X86JITInfo.cpp:1.16 Tue Apr 25 15:54:26 2006 +++ llvm/lib/Target/X86/X86JITInfo.cpp Sat Apr 29 13:41:44 2006 @@ -48,7 +48,7 @@ asm( ".text\n" ".align 8\n" -#if defined(__CYGWIN__) || defined(__APPLE__) +#if defined(__CYGWIN__) || defined(__APPLE__) || defined(__MINGW32__) ".globl _X86CompilationCallback\n" "_X86CompilationCallback:\n" #else @@ -59,7 +59,7 @@ "movl %esp, %ebp\n" // Standard prologue "pushl %eax\n" "pushl %edx\n" // save EAX/EDX -#if defined(__CYGWIN__) +#if defined(__CYGWIN__) || defined(__MINGW32__) "call _X86CompilationCallback2\n" #elif defined(__APPLE__) "movl 4(%ebp), %eax\n" // load the address of return address From evan.cheng at apple.com Sun Apr 30 03:42:00 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Sun, 30 Apr 2006 03:42:00 -0500 Subject: [llvm-commits] CVS: llvm/lib/CodeGen/LiveIntervalAnalysis.cpp VirtRegMap.cpp VirtRegMap.h Message-ID: <200604300842.DAA26281@zion.cs.uiuc.edu> Changes in directory llvm/lib/CodeGen: LiveIntervalAnalysis.cpp updated: 1.154 -> 1.155 VirtRegMap.cpp updated: 1.60 -> 1.61 VirtRegMap.h updated: 1.18 -> 1.19 --- Log message: Local spiller kills a store if the folded restore is turned into a copy. But this is incorrect if the spilled value live range extends beyond the current BB. It is currently controlled by a temporary option -spiller-check-liveout. --- Diffs of the changes: (+21 -10) LiveIntervalAnalysis.cpp | 7 ++++--- VirtRegMap.cpp | 20 +++++++++++++++----- VirtRegMap.h | 4 ++-- 3 files changed, 21 insertions(+), 10 deletions(-) Index: llvm/lib/CodeGen/LiveIntervalAnalysis.cpp diff -u llvm/lib/CodeGen/LiveIntervalAnalysis.cpp:1.154 llvm/lib/CodeGen/LiveIntervalAnalysis.cpp:1.155 --- llvm/lib/CodeGen/LiveIntervalAnalysis.cpp:1.154 Sun Jan 22 17:41:00 2006 +++ llvm/lib/CodeGen/LiveIntervalAnalysis.cpp Sun Apr 30 03:41:47 2006 @@ -271,14 +271,15 @@ // can do this, we don't need to insert spill code. if (lv_) lv_->instructionChanged(MI, fmi); - vrm.virtFolded(li.reg, MI, i, fmi); + MachineBasicBlock &MBB = *MI->getParent(); + bool LiveOut = li.liveAt(getInstructionIndex(&MBB.back()) + + InstrSlots::NUM); + vrm.virtFolded(li.reg, MI, i, fmi, LiveOut); mi2iMap_.erase(MI); i2miMap_[index/InstrSlots::NUM] = fmi; mi2iMap_[fmi] = index; - MachineBasicBlock &MBB = *MI->getParent(); MI = MBB.insert(MBB.erase(MI), fmi); ++numFolded; - // Folding the load/store can completely change the instruction in // unpredictable ways, rescan it from the beginning. goto for_operand; Index: llvm/lib/CodeGen/VirtRegMap.cpp diff -u llvm/lib/CodeGen/VirtRegMap.cpp:1.60 llvm/lib/CodeGen/VirtRegMap.cpp:1.61 --- llvm/lib/CodeGen/VirtRegMap.cpp:1.60 Thu Apr 27 23:43:18 2006 +++ llvm/lib/CodeGen/VirtRegMap.cpp Sun Apr 30 03:41:47 2006 @@ -50,6 +50,10 @@ clEnumVal(local, " local spiller"), clEnumValEnd), cl::init(local)); + + // TEMPORARY option to test a fix. + cl::opt + SpillerCheckLiveOut("spiller-check-liveout", cl::Hidden); } //===----------------------------------------------------------------------===// @@ -81,7 +85,8 @@ } void VirtRegMap::virtFolded(unsigned VirtReg, MachineInstr *OldMI, - unsigned OpNo, MachineInstr *NewMI) { + unsigned OpNo, MachineInstr *NewMI, + bool LiveOut) { // Move previous memory references folded to new instruction. MI2VirtMapTy::iterator IP = MI2VirtMap.lower_bound(NewMI); for (MI2VirtMapTy::iterator I = MI2VirtMap.lower_bound(OldMI), @@ -96,6 +101,7 @@ MRInfo = isRef; } else { MRInfo = OldMI->getOperand(OpNo).isUse() ? isModRef : isMod; + if (LiveOut) MRInfo = (ModRef)(MRInfo | isLiveOut); } // add new memory reference @@ -727,10 +733,14 @@ MaybeDeadStores.erase(MDSI); else { // If we get here, the store is dead, nuke it now. - assert(MR == VirtRegMap::isMod && "Can't be modref!"); - MBB.erase(MDSI->second); - MaybeDeadStores.erase(MDSI); - ++NumDSE; + assert(!(MR & VirtRegMap::isRef) && "Can't be modref!"); + // Don't nuke it if the value is needed in another block. + if (!SpillerCheckLiveOut || !(MR & VirtRegMap::isLiveOut)) { + DEBUG(std::cerr << " Killed store:\t" << *MDSI->second); + MBB.erase(MDSI->second); + MaybeDeadStores.erase(MDSI); + ++NumDSE; + } } } Index: llvm/lib/CodeGen/VirtRegMap.h diff -u llvm/lib/CodeGen/VirtRegMap.h:1.18 llvm/lib/CodeGen/VirtRegMap.h:1.19 --- llvm/lib/CodeGen/VirtRegMap.h:1.18 Thu Apr 21 17:33:49 2005 +++ llvm/lib/CodeGen/VirtRegMap.h Sun Apr 30 03:41:47 2006 @@ -26,7 +26,7 @@ class VirtRegMap { public: - enum ModRef { isRef = 1, isMod = 2, isModRef = 3 }; + enum ModRef { isRef = 1, isMod = 2, isModRef = 3, isLiveOut = 4 }; typedef std::multimap > MI2VirtMapTy; @@ -128,7 +128,7 @@ /// folded into newMI machine instruction. The OpNum argument indicates the /// operand number of OldMI that is folded. void virtFolded(unsigned VirtReg, MachineInstr *OldMI, unsigned OpNum, - MachineInstr *NewMI); + MachineInstr *NewMI, bool LiveOut); /// @brief returns the virtual registers' values folded in memory /// operands of this instruction From evan.cheng at apple.com Sun Apr 30 03:42:48 2006 From: evan.cheng at apple.com (Evan Cheng) Date: Sun, 30 Apr 2006 03:42:48 -0500 Subject: [llvm-commits] CVS: llvm-test/Makefile.programs Message-ID: <200604300842.DAA26381@zion.cs.uiuc.edu> Changes in directory llvm-test: Makefile.programs updated: 1.205 -> 1.206 --- Log message: Temporarily set x86 llc-beta to -spiller-check-liveout. --- Diffs of the changes: (+1 -1) Makefile.programs | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) Index: llvm-test/Makefile.programs diff -u llvm-test/Makefile.programs:1.205 llvm-test/Makefile.programs:1.206 --- llvm-test/Makefile.programs:1.205 Fri Apr 28 13:54:11 2006 +++ llvm-test/Makefile.programs Sun Apr 30 03:42:35 2006 @@ -197,7 +197,7 @@ LLCBETAOPTION := -sched=simple endif ifeq ($(ARCH),x86) -LLCBETAOPTION := -enable-x86-fastcc +LLCBETAOPTION := -spiller-check-liveout endif ifeq ($(ARCH),Sparc) LLCBETAOPTION := -enable-sparc-v9-insts From lattner at cs.uiuc.edu Sun Apr 30 23:11:41 2006 From: lattner at cs.uiuc.edu (Chris Lattner) Date: Sun, 30 Apr 2006 23:11:41 -0500 Subject: [llvm-commits] CVS: llvm/lib/CodeGen/AsmPrinter.cpp Message-ID: <200605010411.XAA00549@zion.cs.uiuc.edu> Changes in directory llvm/lib/CodeGen: AsmPrinter.cpp updated: 1.59 -> 1.60 --- Log message: Format #APP lines a bit nicer --- Diffs of the changes: (+2 -2) AsmPrinter.cpp | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) Index: llvm/lib/CodeGen/AsmPrinter.cpp diff -u llvm/lib/CodeGen/AsmPrinter.cpp:1.59 llvm/lib/CodeGen/AsmPrinter.cpp:1.60 --- llvm/lib/CodeGen/AsmPrinter.cpp:1.59 Sat Apr 22 13:53:45 2006 +++ llvm/lib/CodeGen/AsmPrinter.cpp Sun Apr 30 23:11:03 2006 @@ -34,8 +34,8 @@ GlobalVarAddrSuffix(""), FunctionAddrPrefix(""), FunctionAddrSuffix(""), - InlineAsmStart("#APP\n"), - InlineAsmEnd("#NO_APP\n"), + InlineAsmStart("#APP\n\t"), + InlineAsmEnd("\t#NO_APP\n"), ZeroDirective("\t.zero\t"), AsciiDirective("\t.ascii\t"), AscizDirective("\t.asciz\t"),