From ahmedcharles at gmail.com Thu Mar 1 00:03:03 2012 From: ahmedcharles at gmail.com (Ahmed Charles) Date: Wed, 29 Feb 2012 22:03:03 -0800 Subject: [LLVMdev] Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++) In-Reply-To: References: Message-ID: > + ?// Helper for test code to print hash codes. > + ?void PrintTo(const hash_code &code, ::std::ostream *os) { > > What's with the extra leading :: before std::? Have you ever tried: namespace foo { class std {}; } using namespace foo; #include Well, I'm not sure that Chandler is guarding against this possibility, but most library implementations of the standard use ::std:: everywhere to avoid this potential for ambiguity. From baldrick at free.fr Thu Mar 1 02:05:02 2012 From: baldrick at free.fr (Duncan Sands) Date: Thu, 01 Mar 2012 09:05:02 +0100 Subject: [LLVMdev] Recovering variable names from bitcode In-Reply-To: References: <6F919A8A-78BE-43F0-B111-DDC0AC3D372B@gmail.com> Message-ID: <4F4F2DAE.40401@free.fr> Hi Ashay, > Thanks, I thought about clang but I need to work with Fortran programs too. > Hence using dragonegg. the debug info generated by dragonegg is pretty rotten. Between gcc-4.2 and gcc-4.5, internal changes in gcc meant that the technique that llvm-gcc used to generate debug info for local variables (llvm-gcc was based on gcc-4.2) no longer worked with gcc-4.5 (dragonegg was created by porting llvm-gcc to gcc-4.5). I never found time to fix this, but of course it needs to be fixed. Please open a bug report about it so that I don't forget. Thanks, Duncan. From baldrick at free.fr Thu Mar 1 02:29:10 2012 From: baldrick at free.fr (Duncan Sands) Date: Thu, 01 Mar 2012 09:29:10 +0100 Subject: [LLVMdev] problem with inlining pass In-Reply-To: <4F4EB091.1070503@googlemail.com> References: <4F4EB091.1070503@googlemail.com> Message-ID: <4F4F3356.7030106@free.fr> Hi Jochen, > My llvm version is 3.0 release. > I have a module generated by clang. When I optimize it, I first add an > inlining pass (llvm::createFunctionInliningPass), then these passes: > - own FunctionPass > - llvm::createPromoteMemoryToRegisterPass > - llvm::createInstructionCombiningPass > - llvm::createDeadInstEliminationPass > - llvm::createDeadStoreEliminationPass > - new llvm::DominatorTree() > - new llvm::LoopInfo() > - llvm::createLoopSimplifyPass() > - own FunctionPass > > The problem is that the last function pass (and maybe the others too) > gets called > with functions that the inlining pass is supposed to delete. I think this is a normal consequence of how the pass manager schedules passes. Suppose you have the following call graph: A --> B, i.e. function A calls function B. Passes will be scheduled as follows: First, the inliner will be run on function B. This won't do anything since B doesn't call anything. Then each function pass will be run on B in turn. That means that your function pass will be run on B. Then the inliner will be run on function A. This may inline function B into function A then delete B. Then each function pass will be run on A. In short the inliner works its way up the callgraph, and after it runs on each function, all other function passes are run on that function. That's why I think it is normal for your pass to be run on function B even if the inliner is going to delete B after inlining it into A. Ciao, Duncan. From lostfreeman at gmail.com Thu Mar 1 02:57:28 2012 From: lostfreeman at gmail.com (lost) Date: Thu, 1 Mar 2012 12:57:28 +0400 Subject: [LLVMdev] Is there any way to print assembly code of a function compiled by ExecutionEngine? Message-ID: Hello! I'm using LLVM's JIT in my project. Is there any way to view assembly code of functions it generates without disassembling them from the memory? From jfonseca at vmware.com Thu Mar 1 05:11:43 2012 From: jfonseca at vmware.com (Jose Fonseca) Date: Thu, 1 Mar 2012 03:11:43 -0800 (PST) Subject: [LLVMdev] Is there any way to print assembly code of a function compiled by ExecutionEngine? In-Reply-To: Message-ID: <1309949408.1784814.1330600303087.JavaMail.root@zimbra-prod-mbox-2.vmware.com> I'm not sure what you mean with "without disassembling them from the memory". If one can't read the JIT functions from memory, then from where would we read it? I wrote a function that can disassemble an arbitrary function using MC on http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/auxiliary/gallivm/lp_bld_debug.cpp Is this what you're looking for? If so, then perhaps this would be an useful thing to have in LLVM tree itself. Jose ----- Original Message ----- > Hello! > > I'm using LLVM's JIT in my project. Is there any way to view assembly > code of functions it generates without disassembling them from the > memory? > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From chandlerc at google.com Thu Mar 1 07:01:39 2012 From: chandlerc at google.com (Chandler Carruth) Date: Thu, 1 Mar 2012 05:01:39 -0800 Subject: [LLVMdev] Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++) In-Reply-To: References: Message-ID: Thanks for all the comments! I think I've addressed all of them, as wel as Duncan's comments from IRC. Based on your OK Nick, I'm planning to commit this tomorrow. If anyone has objections or serious concerns, please let me know to hold off. Updated patch is attached, as well as the latest version of the header. For the record, the only concern that has come up thus far is one of performance. The explanation there is that while some algorithms are slightly faster (5-10 cycles max), they are significantly lower quality, and don't currently show up on profiles. I'd like to get the quality up to remove collisions, and then continue working to improve the actual performance. Clearly, very sensitive and hot routines like the Clang lexer's StringMap aren't about to change without careful benchmarking and numbers on those specific components. =] I think StringMap will be the very last bit of hashing to change considering its requirements. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/8d390d1e/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: Hashing.h Type: text/x-chdr Size: 28937 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/8d390d1e/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: hashing2.diff.gz Type: application/x-gzip Size: 13939 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/8d390d1e/attachment.gz From Chuck.Caldarale at unisys.com Thu Mar 1 07:38:15 2012 From: Chuck.Caldarale at unisys.com (Caldarale, Charles R) Date: Thu, 1 Mar 2012 07:38:15 -0600 Subject: [LLVMdev] Is there any way to print assembly code of a function compiled by ExecutionEngine? In-Reply-To: References: Message-ID: <99C8B2929B39C24493377AC7A121E21FB013D75D6B@USEA-EXCH8.na.uis.unisys.com> > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of lost > Subject: [LLVMdev] Is there any way to print assembly code of a function compiled by ExecutionEngine? > I'm using LLVM's JIT in my project. Is there any way to view assembly > code of functions it generates without disassembling them from the > memory? We use the following: Context = getGlobalContext(); m3log = new dt_ostream(sysLogger); fm3log = new formatted_raw_ostream(*m3log); pMod = new Module("label", Context); TD = new TargetData(pMod->getDataLayout()); pMPasses = new PassManager(); pMPasses->add(new TargetData(*TD)); tmc->addPassesToEmitFile(*pMPasses, *fm3log, TargetMachine::CGFT_AssemblyFile); pMPasses->run(*pMod); (Lots of other bits omitted from the above for simplicity.) Reference: http://llvm.org/docs/doxygen/html/classllvm_1_1LLVMTargetMachine.html#a5c437ac9b4d158e38ad4e826dac63914 - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. From joel.gouly at gmail.com Thu Mar 1 07:49:25 2012 From: joel.gouly at gmail.com (Joey Gouly) Date: Thu, 1 Mar 2012 13:49:25 +0000 Subject: [LLVMdev] llvm-stress for fuzzing llvm In-Reply-To: References: <7DE70FDACDE4CD4887C4278C12A2E30509282D@HASMSX104.ger.corp.intel.com> <20120226182952.451b3ad8@sapling2> <7DE70FDACDE4CD4887C4278C12A2E305092CAD@HASMSX104.ger.corp.intel.com> Message-ID: Attached is a simple bash script I wrote to run llvm-stress several times, might be useful! Thanks Joey On 27 February 2012 19:14, Sean Silva wrote: > Here is that patch. > > Btw, I've just been using bugpoint, and it's really nifty! > > --Sean Silva > > 2012/2/27 Rotem, Nadav > > Sean, >> >> Thanks for looking at this. Knowing that the last instruction triggered >> the bug is often not enough. I use bugpoint to reduce the failing test. >> The reason is that some of the bugs may be caused by the interaction >> between several instruction. Having said that, I think that the change >> that you proposed is a good one. Can you send a patch ? >> >> Thanks, >> Nadav >> >> >> From: Sean Silva [mailto:silvas at purdue.edu] >> Sent: Monday, February 27, 2012 05:45 >> To: Hal Finkel >> Cc: Rotem, Nadav; llvmdev at cs.uiuc.edu >> Subject: Re: [LLVMdev] llvm-stress for fuzzing llvm >> >> I'm finding it useful to replace the main loop with: >> for (unsigned i = 0, n = SizeCL/Modifiers.size(); i < n; ++i) { >> Modifiers[i%Modifiers.size()]->Act(); >> } >> >> That way, changing the size by 1 adds exactly one instruction, which >> makes delta debugging MUCH easier. Maybe it would be worth changing? >> >> --Sean Silva >> >> On Sun, Feb 26, 2012 at 9:23 PM, Sean Silva wrote: >> Wow, nifty tool! I've already found a couple crashes! >> >> It is also really easy to pinpoint what is causing the error. Whenever >> you trigger a bug, run llvm-stress with the same seed but a really small >> size that doesn't trigger the bug (e.g. like 10). Then do binary search on >> the size. Eventually you find exactly the cutoff of size that triggers the >> bug (e.g. 539 runs fine, but 540 crashes), and then you can diff the >> crashing and non-crashing .ll files and there should only be a tiny >> difference. >> >> --Sean Silva >> >> On Sun, Feb 26, 2012 at 7:29 PM, Hal Finkel wrote: >> Nadav, >> >> Thanks, this is neat! Here is a patch which optionally enables >> generation of the other floating-point types. Please review. >> >> -Hal >> >> On Sun, 26 Feb 2012 11:51:04 +0000 >> "Rotem, Nadav" wrote: >> >> > Hi, >> > >> > Compiling lots of 'junk' helps in catching bugs. I added a new tool >> > (located under llvm/tools/llvm-stress) for generating random LL >> > files. The tool can be used to test different llvm components using >> > various compilation flags. Until now, I only found bugs in the >> > codegen, and not in general llvm optimizations. This probably means >> > that the generated tests are currently too simple for the >> > higher-level optimizations. >> > >> > The command line below generates a random ll file, and llc compiles >> > this file. It often crashes. >> > >> > ./llvm-stress -seed $RANDOM -o tmp.ll -size 1000 ; ./llc tmp.ll >> > -mcpu=corei7-avx -mattr=+avx -o /dev/null >> > >> > The "-seed" flag sets the initial seed to be used by the random >> > function. I implemented a simple portable 'random' function so that >> > the result should be identical on all platforms. The initial seed >> > also appears in the name of the generated function. The "-size" >> > parameter sets the size of the generated random file. >> > >> > Nadav >> > --------------------------------------------------------------------- >> > Intel Israel (74) Limited >> > >> > This e-mail and any attachments may contain confidential material for >> > the sole use of the intended recipient(s). Any review or distribution >> > by others is strictly prohibited. If you are not the intended >> > recipient, please contact the sender and delete all copies. >> > >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> -- >> Hal Finkel >> Postdoctoral Appointee >> Leadership Computing Facility >> Argonne National Laboratory >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> --------------------------------------------------------------------- >> Intel Israel (74) Limited >> >> This e-mail and any attachments may contain confidential material for >> the sole use of the intended recipient(s). Any review or distribution >> by others is strictly prohibited. If you are not the intended >> recipient, please contact the sender and delete all copies. >> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/c70a0268/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: run_stress.sh Type: application/x-sh Size: 305 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/c70a0268/attachment-0001.sh From criswell at illinois.edu Thu Mar 1 08:45:10 2012 From: criswell at illinois.edu (John Criswell) Date: Thu, 1 Mar 2012 08:45:10 -0600 Subject: [LLVMdev] Linking problem in a pass In-Reply-To: References: Message-ID: <4F4F8B76.20403@illinois.edu> On 2/29/12 11:03 PM, Welson Sun wrote: > My pass uses another class which is defined in a separate .h file, > which sits in the same folder as the pass .cpp file. The pass compiles > fine, but when using the pass "opt -load ...", there is an error: opt > symbol lookup error .... undefined symbol xxx, where xxx is the > class name. It looks like that class file's object file is not linked > into the pass.so file. How should I change the Makefile to make the > linking happen? If your .cpp file isn't including the .h file, then the class in the .h file isn't being compiled and, therefore, isn't being included in the final library file. Classes should be declared in header files and have their methods defined in .cpp files. That's the easiest way to fix the problem. -- John T. > > Thanks, > Welson > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/8c50422f/attachment.html From welson.sun at gmail.com Thu Mar 1 10:36:27 2012 From: welson.sun at gmail.com (Welson Sun) Date: Thu, 1 Mar 2012 08:36:27 -0800 Subject: [LLVMdev] Linking problem in a pass In-Reply-To: <4F4F8B76.20403@illinois.edu> References: <4F4F8B76.20403@illinois.edu> Message-ID: Originally, the problem is the lack of .cpp file. Then I noticed the compilation log is not showing the .o file being generated for the non-pass classes. Then I added the .cpp files for each .h file, then the .o files are being generated, shown in the Debug directory. Actually, if I add "VERBOSE = 1" in the Makefile, it shows the linking command is actually linking all the .o files into the pass.so file. Any help will be appreciated. Thanks! Welson On Thu, Mar 1, 2012 at 6:45 AM, John Criswell wrote: > On 2/29/12 11:03 PM, Welson Sun wrote: > > My pass uses another class which is defined in a separate .h file, which > sits in the same folder as the pass .cpp file. The pass compiles fine, but > when using the pass "opt -load ...", there is an error: opt symbol lookup > error .... undefined symbol xxx, where xxx is the class name. It looks > like that class file's object file is not linked into the pass.so file. How > should I change the Makefile to make the linking happen? > > > If your .cpp file isn't including the .h file, then the class in the .h > file isn't being compiled and, therefore, isn't being included in the final > library file. > > Classes should be declared in header files and have their methods defined > in .cpp files. That's the easiest way to fix the problem. > > -- John T. > > > > Thanks, > Welson > > > _______________________________________________ > LLVM Developers mailing listLLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -- Welson Phone: (408) 418-8385 Email: welson.sun at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/8a858e1a/attachment.html From baldrick at free.fr Thu Mar 1 12:00:42 2012 From: baldrick at free.fr (Duncan Sands) Date: Thu, 01 Mar 2012 19:00:42 +0100 Subject: [LLVMdev] Linking problem in a pass In-Reply-To: References: <4F4F8B76.20403@illinois.edu> Message-ID: <4F4FB94A.1020907@free.fr> Hi Welson, are you saying that the .o file containing the class is being linked into the .so file, but nonetheless you get "undefined symbol XYZ" errors where XYZ is that class? Maybe you defined the class inside an anonymous name space? Alternatively, if XYZ refers to a method of the class, maybe you forgot to define that method. Ciao, Duncan. On 01/03/12 17:36, Welson Sun wrote: > Originally, the problem is the lack of .cpp file. Then I noticed the compilation > log is not showing the .o file being generated for the non-pass classes. > > Then I added the .cpp files for each .h file, then the .o files are being > generated, shown in the Debug directory. Actually, if I add "VERBOSE = 1" in the > Makefile, it shows the linking command is actually linking all the .o files into > the pass.so file. > > Any help will be appreciated. > > Thanks! > Welson > > > On Thu, Mar 1, 2012 at 6:45 AM, John Criswell > wrote: > > On 2/29/12 11:03 PM, Welson Sun wrote: >> My pass uses another class which is defined in a separate .h file, which >> sits in the same folder as the pass .cpp file. The pass compiles fine, but >> when using the pass "opt -load ...", there is an error: opt symbol lookup >> error .... undefined symbol xxx, where xxx is the class name. It looks >> like that class file's object file is not linked into the pass.so file. >> How should I change the Makefile to make the linking happen? > > If your .cpp file isn't including the .h file, then the class in the .h file > isn't being compiled and, therefore, isn't being included in the final > library file. > > Classes should be declared in header files and have their methods defined in > .cpp files. That's the easiest way to fix the problem. > > -- John T. > > >> >> Thanks, >> Welson >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > -- > Welson > > Phone: (408) 418-8385 > Email: welson.sun at gmail.com > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From elena.demikhovsky at intel.com Thu Mar 1 12:16:46 2012 From: elena.demikhovsky at intel.com (Demikhovsky, Elena) Date: Thu, 1 Mar 2012 18:16:46 +0000 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: Message-ID: ./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep ymm | grep rbp vmovaps -176(%rbp), %ymm14 vmovaps -144(%rbp), %ymm11 vmovaps -240(%rbp), %ymm13 vmovaps -208(%rbp), %ymm9 vmovaps -272(%rbp), %ymm7 vmovaps -304(%rbp), %ymm0 vmovaps -112(%rbp), %ymm0 vmovaps -80(%rbp), %ymm1 vmovaps -112(%rbp), %ymm0 vmovaps -80(%rbp), %ymm0 vmovaps -176(%rbp), %ymm15 vmovaps -144(%rbp), %ymm0 vmovaps -240(%rbp), %ymm0 vmovaps -208(%rbp), %ymm0 vmovaps -272(%rbp), %ymm0 vmovaps -304(%rbp), %ymm0 vmovaps should not access stack if it is not aligned to 32 - Elena -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Demikhovsky, Elena Sent: Thursday, March 01, 2012 02:59 To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] Stack alignment in kernel I'm running in AVX mode, but the stack before call to kernel is aligned to 16 bit. Could you, please, tell me where it should be specified? Thank you. - Elena --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- A non-text attachment was scrubbed... Name: basic.ll Type: application/octet-stream Size: 38821 bytes Desc: basic.ll Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/a99e9bc7/attachment.obj From joerg at britannica.bec.de Thu Mar 1 12:31:17 2012 From: joerg at britannica.bec.de (Joerg Sonnenberger) Date: Thu, 1 Mar 2012 19:31:17 +0100 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: Message-ID: <20120301183117.GA9912@britannica.bec.de> On Thu, Mar 01, 2012 at 06:16:46PM +0000, Demikhovsky, Elena wrote: > vmovaps should not access stack if it is not aligned to 32 I'm not completely sure I understand your problem. Are you saying that the generated code assumes 256bit alignment, your default stack alignment is 128bit and LLVM doesn't adjust it automatically? Joerg From elena.demikhovsky at intel.com Thu Mar 1 13:44:42 2012 From: elena.demikhovsky at intel.com (Demikhovsky, Elena) Date: Thu, 1 Mar 2012 19:44:42 +0000 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: <20120301183117.GA9912@britannica.bec.de> References: <20120301183117.GA9912@britannica.bec.de> Message-ID: When stack is unaligned, LLVM should generate vmovups instead of vmovaps. - Elena -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Joerg Sonnenberger Sent: Thursday, March 01, 2012 20:31 To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Stack alignment on X86 AVX seems incorrect On Thu, Mar 01, 2012 at 06:16:46PM +0000, Demikhovsky, Elena wrote: > vmovaps should not access stack if it is not aligned to 32 I'm not completely sure I understand your problem. Are you saying that the generated code assumes 256bit alignment, your default stack alignment is 128bit and LLVM doesn't adjust it automatically? Joerg _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From joerg at britannica.bec.de Thu Mar 1 13:53:53 2012 From: joerg at britannica.bec.de (Joerg Sonnenberger) Date: Thu, 1 Mar 2012 20:53:53 +0100 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: <20120301183117.GA9912@britannica.bec.de> Message-ID: <20120301195353.GA12694@britannica.bec.de> On Thu, Mar 01, 2012 at 07:44:42PM +0000, Demikhovsky, Elena wrote: > When stack is unaligned, LLVM should generate vmovups instead of vmovaps. That's the question I have asked. If LLVM believes the stack is aligned, the choice is correct. Joerg > > - Elena > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Joerg Sonnenberger > Sent: Thursday, March 01, 2012 20:31 > To: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Stack alignment on X86 AVX seems incorrect > > On Thu, Mar 01, 2012 at 06:16:46PM +0000, Demikhovsky, Elena wrote: > > vmovaps should not access stack if it is not aligned to 32 > > I'm not completely sure I understand your problem. Are you saying that > the generated code assumes 256bit alignment, your default stack > alignment is 128bit and LLVM doesn't adjust it automatically? > > Joerg > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > From eli.friedman at gmail.com Thu Mar 1 14:21:31 2012 From: eli.friedman at gmail.com (Eli Friedman) Date: Thu, 1 Mar 2012 12:21:31 -0800 Subject: [LLVMdev] Predicate registers/condition codes question In-Reply-To: <4F4D2836.9000806@codeaurora.org> References: <4F4D2836.9000806@codeaurora.org> Message-ID: On Tue, Feb 28, 2012 at 11:17 AM, Tony Linthicum wrote: > Hey folks, > > We are having some difficulty with how we have been representing our > predicate registers, and wanted some advice from the list. ?First, we > had been representing our predicate registers as 1 bit (i1). ?The truth, > however, is that they are 8 bits. ?The reason for this is that they > serve as predicates for conditional execution of instructions, ?branch > condition codes, and also as vector mask registers for conditional > selection of vector elements. > > We have run into problems with type mismatches with intrinsics for some > of our vector operations. ?We decided to try to solve it by representing > the predicate registers as what they really are, namely i8. ?We changed > our intrinsic and instruction definitions accordingly, changed the data > type of the predicate registers to be i8, and changed > getSetCCResultType() to return i8. ?After doing this, the compiler > builds just fine but dies at runtime trying to match some target > independent operations (e.g. setcc/brcond) that appear to want an i1 for > the condition code. > > So, my question is this: is it even possible to represent our predicate > registers (and our condition codes) as i8, and if so, what hook are we > missing? Making getSetCCResultType return i8 is definitely supported, and brcond should be okay with that. It's not obvious what is going wrong; are you sure there isn't anything in your target still expecting an i1? -Eli From eli.friedman at gmail.com Thu Mar 1 14:28:13 2012 From: eli.friedman at gmail.com (Eli Friedman) Date: Thu, 1 Mar 2012 12:28:13 -0800 Subject: [LLVMdev] How to vectorize a vector type cast? In-Reply-To: References: Message-ID: On Tue, Feb 28, 2012 at 2:11 PM, Gurd, Preston wrote: > Since Clang does not seem to allow type casts, such as uchar4 to float4, > between vector types, it seems it is necessary to write them as element by > element conversions, such as > > > > typedef float float4 __attribute__((ext_vector_type(4))); > > typedef unsigned char uchar4 __attribute__((ext_vector_type(4))); > > > > float4 to_float4(uchar4 in) > > { > > ? float4 out = {in.x, in.y, in.z, in.w}; > > ? return out; > > } I think that's right... we can represent them in IR, but I don't think clang has a generic way to write them outside OpenCL mode. Granted, you can use platform-specific intrinsics (_mm_cvttps_epi32 etc.). > Running this code through ?clang ?c ?emit-llvm? and then through ?opt ?O2 > ?S?, produces the following IR: > > > > define <4 x float> @to_float4(i32 %in.coerce) nounwind uwtable readnone { > > entry: > > ? %0 = bitcast i32 %in.coerce to <4 x i8> > > ? %1 = extractelement <4 x i8> %0, i32 0 > > ? %conv = uitofp i8 %1 to float > > ? %vecinit = insertelement <4 x float> undef, float %conv, i32 0 > > ? %2 = extractelement <4 x i8> %0, i32 1 > > ? %conv2 = uitofp i8 %2 to float > > ? %vecinit3 = insertelement <4 x float> %vecinit, float %conv2, i32 1 > > ? %3 = extractelement <4 x i8> %0, i32 2 > > ? %conv4 = uitofp i8 %3 to float > > ? %vecinit5 = insertelement <4 x float> %vecinit3, float %conv4, i32 2 > > ? %4 = extractelement <4 x i8> %0, i32 3 > > ? %conv6 = uitofp i8 %4 to float > > ? %vecinit7 = insertelement <4 x float> %vecinit5, float %conv6, i32 3 > > ? ret <4 x float> %vecinit7 > > > > Which does the cast as a sequence of scalar operations, whereas it could be > done as > > > > ?? %1 = uitofp <4 x i8> %0 to <4 x float> > > ?? ret <4 x float> %1 > > > > It seemed to me that the recently committed basic block vectorizer might be > able to do this kind of optimization, but the current version does not do > so. Yes, that seems reasonable. -Eli From slarin at codeaurora.org Thu Mar 1 15:04:17 2012 From: slarin at codeaurora.org (Sergei Larin) Date: Thu, 1 Mar 2012 15:04:17 -0600 Subject: [LLVMdev] Aliasing bug or feature? In-Reply-To: <4F4F3356.7030106@free.fr> References: <4F4EB091.1070503@googlemail.com> <4F4F3356.7030106@free.fr> Message-ID: <03ab01ccf7ee$dda384e0$98ea8ea0$@org> Hello everyone, I am working on some changes to the Hexagon VLIW PreRA scheduler, and as a part of it need to test aliasing properties of two instruction. What it boils down to is the following code: char a[20]; char s; char *p, *q; // p == &a[0]; q == &s; void test() { register char reg; s = 0; reg = p[0] + p[1]; s = q[0] + reg; return; } When I ask the question whether "&s" and "&q[0]" may potentially alias, I got negative affirmation. In the full test (not presented) they indeed may and do in fact alias, resulting in incorrect VLIW schedule. My question - is it a feature or a bug :) Here is somewhat more info: Before lowering begins: *** IR Dump After Remove sign extends *** define void @test() nounwind { entry: store i8 0, i8* @s, align 1, !tbaa !0 %0 = load i8** @p, align 4, !tbaa !2 %1 = load i8* %0, align 1, !tbaa !0 %conv = zext i8 %1 to i32 %arrayidx1 = getelementptr inbounds i8* %0, i32 1 %2 = load i8* %arrayidx1, align 1, !tbaa !0 %conv2 = zext i8 %2 to i32 %3 = load i8** @q, align 4, !tbaa !2 <<< Can this load be bypassed by the store below? %4 = load i8* %3, align 1, !tbaa !0 %conv5 = zext i8 %4 to i32 %add = add i32 %conv2, %conv %add7 = add i32 %add, %conv5 %conv8 = trunc i32 %add7 to i8 store i8 %conv8, i8* @s, align 1, !tbaa !0 <<< Can this store bypass the above load? ret void } At the point of enquiry I have the following (lowered) instructions: x3df7900: i32,ch = LDw_GP_V4 0x3df4c70, 0x3df5470 [ORD=8] [ID=6] // This is Load from q[0] 0x3df5470: ch = STb_GP_V4 0x3df5170, 0x3df4e70, 0x3d9c130 [ID=4] // This is a store to s Underlying Values: @q = common global i8* null, align 4 @s = common global i8 0, align 1 The way inquiry is made is similar to DAGCombiner::isAlias() SDNode *SDN1; SDNode *SDN2; MachineMemOperand *MMOa; MachineMemOperand *MMOb; ... const MachineSDNode *MNb = dyn_cast(SDN2); const MachineSDNode *MNa = dyn_cast(SDN1); ... MMOa = !MNa->memoperands_empty() ? (*MNa->memoperands_begin()) : NULL; MMOb = !MNb->memoperands_empty() ? (*MNb->memoperands_begin()) : NULL; if (MMOa && MMOa->getValue() && MMOb && MMOb->getValue()) { ... int64_t MinOffset = std::min(MMOa->getOffset(), MMOb->getOffset()); int64_t Overlapa = MMOa->getSize() + MMOa->getOffset() - MinOffset; int64_t Overlapb = MMOb->getSize() + MMOb->getOffset() - MinOffset; AliasAnalysis::AliasResult AAResult = AA->alias( AliasAnalysis::Location(MMOa->getValue(), Overlapa, MMOa->getTBAAInfo()), AliasAnalysis::Location(MMOb->getValue(), Overlapb, MMOb->getTBAAInfo())); Quick debug of BasicAliasAnalysis::aliasCheck() points to this code: if (isIdentifiedObject(O1) && isIdentifiedObject(O2)) return NoAlias; And in llvm::isIdentifiedObject() this is true: if (isa(V) && !isa(V)) Any qlues/suggestions are welcome. Thanks. Sergei Larin -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum. From welson.sun at gmail.com Thu Mar 1 15:09:35 2012 From: welson.sun at gmail.com (Welson Sun) Date: Thu, 1 Mar 2012 13:09:35 -0800 Subject: [LLVMdev] Linking problem in a pass In-Reply-To: <4F4FB94A.1020907@free.fr> References: <4F4F8B76.20403@illinois.edu> <4F4FB94A.1020907@free.fr> Message-ID: HI Duncan, Your understanding of the problem is correct. However, the XYZ class is not inside an anonymous name space. Also, all the XYZ methods are defined in the .cpp file. Looking at the error message: opt: symbol lookup error: path/to/pass.so: undefined symbol: _ZN12DataTransferD1Ev Where DataTransfer is the class name. Maybe I am missing certain type of constructors? Thanks, Welson On Thu, Mar 1, 2012 at 10:00 AM, Duncan Sands wrote: > Hi Welson, are you saying that the .o file containing the class is being > linked > into the .so file, but nonetheless you get "undefined symbol XYZ" errors > where > XYZ is that class? Maybe you defined the class inside an anonymous name > space? > Alternatively, if XYZ refers to a method of the class, maybe you forgot to > define that method. > > Ciao, Duncan. > > On 01/03/12 17:36, Welson Sun wrote: > > Originally, the problem is the lack of .cpp file. Then I noticed the > compilation > > log is not showing the .o file being generated for the non-pass classes. > > > > Then I added the .cpp files for each .h file, then the .o files are being > > generated, shown in the Debug directory. Actually, if I add "VERBOSE = > 1" in the > > Makefile, it shows the linking command is actually linking all the .o > files into > > the pass.so file. > > > > Any help will be appreciated. > > > > Thanks! > > Welson > > > > > > On Thu, Mar 1, 2012 at 6:45 AM, John Criswell > > wrote: > > > > On 2/29/12 11:03 PM, Welson Sun wrote: > >> My pass uses another class which is defined in a separate .h file, > which > >> sits in the same folder as the pass .cpp file. The pass compiles > fine, but > >> when using the pass "opt -load ...", there is an error: opt symbol > lookup > >> error .... undefined symbol xxx, where xxx is the class name. It > looks > >> like that class file's object file is not linked into the pass.so > file. > >> How should I change the Makefile to make the linking happen? > > > > If your .cpp file isn't including the .h file, then the class in the > .h file > > isn't being compiled and, therefore, isn't being included in the > final > > library file. > > > > Classes should be declared in header files and have their methods > defined in > > .cpp files. That's the easiest way to fix the problem. > > > > -- John T. > > > > > >> > >> Thanks, > >> Welson > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu > http://llvm.cs.uiuc.edu > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu > http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > > > > -- > > Welson > > > > Phone: (408) 418-8385 > > Email: welson.sun at gmail.com > > > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- Welson Phone: (408) 418-8385 Email: welson.sun at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/f231064d/attachment-0001.html From eli.friedman at gmail.com Thu Mar 1 15:14:41 2012 From: eli.friedman at gmail.com (Eli Friedman) Date: Thu, 1 Mar 2012 13:14:41 -0800 Subject: [LLVMdev] Aliasing bug or feature? In-Reply-To: <03ab01ccf7ee$dda384e0$98ea8ea0$@org> References: <4F4EB091.1070503@googlemail.com> <4F4F3356.7030106@free.fr> <03ab01ccf7ee$dda384e0$98ea8ea0$@org> Message-ID: On Thu, Mar 1, 2012 at 1:04 PM, Sergei Larin wrote: > Hello everyone, > > ?I am working on some changes to the Hexagon VLIW PreRA scheduler, and as a > part of it need to test aliasing properties of two instruction. > What it boils down to is the following code: > > char a[20]; > char s; > char *p, *q; // p == &a[0]; q == &s; > > void test() > { > ?register char reg; > > ?s = 0; > ?reg = p[0] + p[1]; > ?s = q[0] + reg; > > ?return; > } > > When I ask the question whether "&s" and "&q[0]" may potentially alias, I > got negative affirmation. > In the full test (not presented) they indeed may and do in fact alias, > resulting in incorrect VLIW schedule. > > My question - is it a feature or a bug :) > > Here is somewhat more info: > > Before lowering begins: > *** IR Dump After Remove sign extends *** > define void @test() nounwind { > entry: > ?store i8 0, i8* @s, align 1, !tbaa !0 > ?%0 = load i8** @p, align 4, !tbaa !2 > ?%1 = load i8* %0, align 1, !tbaa !0 > ?%conv = zext i8 %1 to i32 > ?%arrayidx1 = getelementptr inbounds i8* %0, i32 1 > ?%2 = load i8* %arrayidx1, align 1, !tbaa !0 > ?%conv2 = zext i8 %2 to i32 > ?%3 = load i8** @q, align 4, !tbaa !2 <<< Can this load be bypassed by the > store below? > ?%4 = load i8* %3, align 1, !tbaa !0 > ?%conv5 = zext i8 %4 to i32 > ?%add = add i32 %conv2, %conv > ?%add7 = add i32 %add, %conv5 > ?%conv8 = trunc i32 %add7 to i8 > ?store i8 %conv8, i8* @s, align 1, !tbaa !0 <<< Can this store bypass the > above load? Err, are you sure you're asking the right question? Given the loads you're pointing at, you're asking whether &s and &q alias. -Eli From welson.sun at gmail.com Thu Mar 1 15:18:03 2012 From: welson.sun at gmail.com (Welson Sun) Date: Thu, 1 Mar 2012 13:18:03 -0800 Subject: [LLVMdev] Linking problem in a pass In-Reply-To: References: <4F4F8B76.20403@illinois.edu> <4F4FB94A.1020907@free.fr> Message-ID: OK, problem found! Inspired by Duncan's message, I double checked the XYZ class, and what I was missing is the definition of the destructor! What a stupid mistake! Thanks, Welson On Thu, Mar 1, 2012 at 1:09 PM, Welson Sun wrote: > HI Duncan, > > Your understanding of the problem is correct. However, the XYZ class is > not inside an anonymous name space. Also, all the XYZ methods are defined > in the .cpp file. Looking at the error message: > > opt: symbol lookup error: path/to/pass.so: undefined symbol: > _ZN12DataTransferD1Ev > > Where DataTransfer is the class name. Maybe I am missing certain type of > constructors? > > Thanks, > Welson > > > > > On Thu, Mar 1, 2012 at 10:00 AM, Duncan Sands wrote: > >> Hi Welson, are you saying that the .o file containing the class is being >> linked >> into the .so file, but nonetheless you get "undefined symbol XYZ" errors >> where >> XYZ is that class? Maybe you defined the class inside an anonymous name >> space? >> Alternatively, if XYZ refers to a method of the class, maybe you forgot to >> define that method. >> >> Ciao, Duncan. >> >> On 01/03/12 17:36, Welson Sun wrote: >> > Originally, the problem is the lack of .cpp file. Then I noticed the >> compilation >> > log is not showing the .o file being generated for the non-pass classes. >> > >> > Then I added the .cpp files for each .h file, then the .o files are >> being >> > generated, shown in the Debug directory. Actually, if I add "VERBOSE = >> 1" in the >> > Makefile, it shows the linking command is actually linking all the .o >> files into >> > the pass.so file. >> > >> > Any help will be appreciated. >> > >> > Thanks! >> > Welson >> > >> > >> > On Thu, Mar 1, 2012 at 6:45 AM, John Criswell > > > wrote: >> > >> > On 2/29/12 11:03 PM, Welson Sun wrote: >> >> My pass uses another class which is defined in a separate .h file, >> which >> >> sits in the same folder as the pass .cpp file. The pass compiles >> fine, but >> >> when using the pass "opt -load ...", there is an error: opt symbol >> lookup >> >> error .... undefined symbol xxx, where xxx is the class name. It >> looks >> >> like that class file's object file is not linked into the pass.so >> file. >> >> How should I change the Makefile to make the linking happen? >> > >> > If your .cpp file isn't including the .h file, then the class in >> the .h file >> > isn't being compiled and, therefore, isn't being included in the >> final >> > library file. >> > >> > Classes should be declared in header files and have their methods >> defined in >> > .cpp files. That's the easiest way to fix the problem. >> > >> > -- John T. >> > >> > >> >> >> >> Thanks, >> >> Welson >> >> >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> LLVMdev at cs.uiuc.edu >> http://llvm.cs.uiuc.edu >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > LLVMdev at cs.uiuc.edu >> http://llvm.cs.uiuc.edu >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > >> > >> > >> > >> > -- >> > Welson >> > >> > Phone: (408) 418-8385 >> > Email: welson.sun at gmail.com >> > >> > >> > >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > > > -- > Welson > > Phone: (408) 418-8385 > Email: welson.sun at gmail.com > > > -- Welson Phone: (408) 418-8385 Email: welson.sun at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/a35226dc/attachment.html From cameron.mcinally at nyu.edu Thu Mar 1 15:18:40 2012 From: cameron.mcinally at nyu.edu (Cameron McInally) Date: Thu, 1 Mar 2012 16:18:40 -0500 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect Message-ID: Hi Elena, You're correct. LLVM does not align the stack to 32-bytes for AVX and unaligned moves should be used for YMM spills. I wrote some code to align the stack to 32-bytes when AVX spills are present; it does break the x86-64 ABI though. If upstream would be interested in this code, I can arrange with my employer to send a patch to the mailing list. -Cameron On Mar 1, 2012, at 4:09 PM, wrote: Message: 2 Date: Thu, 1 Mar 2012 18:16:46 +0000 From: "Demikhovsky, Elena" Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect To: "llvmdev at cs.uiuc.edu" Message-ID: Content-Type: text/plain; charset="windows-1252" ./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep ymm | grep rbp vmovaps -176(%rbp), %ymm14 vmovaps -144(%rbp), %ymm11 vmovaps -240(%rbp), %ymm13 vmovaps -208(%rbp), %ymm9 vmovaps -272(%rbp), %ymm7 vmovaps -304(%rbp), %ymm0 vmovaps -112(%rbp), %ymm0 vmovaps -80(%rbp), %ymm1 vmovaps -112(%rbp), %ymm0 vmovaps -80(%rbp), %ymm0 vmovaps -176(%rbp), %ymm15 vmovaps -144(%rbp), %ymm0 vmovaps -240(%rbp), %ymm0 vmovaps -208(%rbp), %ymm0 vmovaps -272(%rbp), %ymm0 vmovaps -304(%rbp), %ymm0 vmovaps should not access stack if it is not aligned to 32 - Elena -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/91d47a80/attachment.html From slarin at codeaurora.org Thu Mar 1 15:19:31 2012 From: slarin at codeaurora.org (Sergei Larin) Date: Thu, 1 Mar 2012 15:19:31 -0600 Subject: [LLVMdev] Aliasing bug or feature? In-Reply-To: References: <4F4EB091.1070503@googlemail.com> <4F4F3356.7030106@free.fr> <03ab01ccf7ee$dda384e0$98ea8ea0$@org> Message-ID: <03ff01ccf7f0$fe24d460$fa6e7d20$@org> > Err, are you sure you're asking the right question? Given the loads > you're pointing at, you're asking whether &s and &q alias. Yes. And I am pretty sure this enquiry works fine for 99.9999% of cases, but has some issue with this one... Scheduling is notorious for exposing latent bugs way later after they have been introduced :( Sergei -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum. > -----Original Message----- > From: Eli Friedman [mailto:eli.friedman at gmail.com] > Sent: Thursday, March 01, 2012 3:15 PM > To: Sergei Larin > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Aliasing bug or feature? > > On Thu, Mar 1, 2012 at 1:04 PM, Sergei Larin > wrote: > > Hello everyone, > > > > ?I am working on some changes to the Hexagon VLIW PreRA scheduler, > and as a > > part of it need to test aliasing properties of two instruction. > > What it boils down to is the following code: > > > > char a[20]; > > char s; > > char *p, *q; // p == &a[0]; q == &s; > > > > void test() > > { > > ?register char reg; > > > > ?s = 0; > > ?reg = p[0] + p[1]; > > ?s = q[0] + reg; > > > > ?return; > > } > > > > When I ask the question whether "&s" and "&q[0]" may potentially > alias, I > > got negative affirmation. > > In the full test (not presented) they indeed may and do in fact > alias, > > resulting in incorrect VLIW schedule. > > > > My question - is it a feature or a bug :) > > > > Here is somewhat more info: > > > > Before lowering begins: > > *** IR Dump After Remove sign extends *** > > define void @test() nounwind { > > entry: > > ?store i8 0, i8* @s, align 1, !tbaa !0 > > ?%0 = load i8** @p, align 4, !tbaa !2 > > ?%1 = load i8* %0, align 1, !tbaa !0 > > ?%conv = zext i8 %1 to i32 > > ?%arrayidx1 = getelementptr inbounds i8* %0, i32 1 > > ?%2 = load i8* %arrayidx1, align 1, !tbaa !0 > > ?%conv2 = zext i8 %2 to i32 > > ?%3 = load i8** @q, align 4, !tbaa !2 <<< Can this load be bypassed > by the > > store below? > > ?%4 = load i8* %3, align 1, !tbaa !0 > > ?%conv5 = zext i8 %4 to i32 > > ?%add = add i32 %conv2, %conv > > ?%add7 = add i32 %add, %conv5 > > ?%conv8 = trunc i32 %add7 to i8 > > ?store i8 %conv8, i8* @s, align 1, !tbaa !0 <<< Can this store bypass > the > > above load? > > Err, are you sure you're asking the right question? Given the loads > you're pointing at, you're asking whether &s and &q alias. > > -Eli From emenezes at codeaurora.org Thu Mar 1 15:29:33 2012 From: emenezes at codeaurora.org (Evandro Menezes) Date: Thu, 01 Mar 2012 15:29:33 -0600 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: Message-ID: <4F4FEA3D.1000205@codeaurora.org> Cameron, Aligning the stack to 32 bytes when there are auto AVX vector variables present shouldn't necessarily break the x86-64 ABI, as long as smaller auto variables remain properly aligned. A similar approach was taken for i386 in GCC in order to support SSE vectors. Perhaps you could elaborate where the ABI was violated when your patch is applied. HTH -- Evandro Menezes Austin, TX emenezes at codeaurora.org Qualcomm Innovation Center, Inc is a member of the Code Aurora Forum On 03/01/12 15:18, Cameron McInally wrote: > Hi Elena, > > You're correct. LLVM does not align the stack to 32-bytes for AVX and > unaligned moves should be used for YMM spills. > > I wrote some code to align the stack to 32-bytes when AVX spills are > present; it does break the x86-64 ABI though. If upstream would be > interested in this code, I can arrange with my employer to send a patch > to the mailing list. > > -Cameron > > On Mar 1, 2012, at 4:09 PM, > wrote: > >> Message: 2 >> Date: Thu, 1 Mar 2012 18:16:46 +0000 >> From: "Demikhovsky, Elena" > > >> Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect >> To: "llvmdev at cs.uiuc.edu " >> > >> Message-ID: >> > > >> Content-Type: text/plain; charset="windows-1252" >> >> ./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep >> ymm | grep rbp >> vmovaps -176(%rbp), %ymm14 >> vmovaps -144(%rbp), %ymm11 >> vmovaps -240(%rbp), %ymm13 >> vmovaps -208(%rbp), %ymm9 >> vmovaps -272(%rbp), %ymm7 >> vmovaps -304(%rbp), %ymm0 >> vmovaps -112(%rbp), %ymm0 >> vmovaps -80(%rbp), %ymm1 >> vmovaps -112(%rbp), %ymm0 >> vmovaps -80(%rbp), %ymm0 >> vmovaps -176(%rbp), %ymm15 >> vmovaps -144(%rbp), %ymm0 >> vmovaps -240(%rbp), %ymm0 >> vmovaps -208(%rbp), %ymm0 >> vmovaps -272(%rbp), %ymm0 >> vmovaps -304(%rbp), %ymm0 >> >> vmovaps should not access stack if it is not aligned to 32 >> >> - Elena > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From eli.friedman at gmail.com Thu Mar 1 15:33:14 2012 From: eli.friedman at gmail.com (Eli Friedman) Date: Thu, 1 Mar 2012 13:33:14 -0800 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: Message-ID: On Thu, Mar 1, 2012 at 1:18 PM, Cameron McInally wrote: > Hi Elena, > > You're correct. LLVM does not align the stack to 32-bytes for AVX and > unaligned moves should be used for YMM spills. Really? There's supposed to be code to realign the stack when necessary... it's possible that code is broken, though. -Eli From cameron.mcinally at nyu.edu Thu Mar 1 15:37:41 2012 From: cameron.mcinally at nyu.edu (Cameron McInally) Date: Thu, 1 Mar 2012 16:37:41 -0500 Subject: [LLVMdev] LLVMdev Digest, Vol 93, Issue 3 In-Reply-To: References: Message-ID: > > > -----Original Message----- > ... > Subject: Re: [LLVMdev] Stack alignment on X86 AVX seems incorrect > > On Thu, Mar 01, 2012 at 06:16:46PM +0000, Demikhovsky, Elena wrote: > > vmovaps should not access stack if it is not aligned to 32 > > I'm not completely sure I understand your problem. Are you saying that > the generated code assumes 256bit alignment, your default stack > alignment is 128bit and LLVM doesn't adjust it automatically? > > Joerg > Hey Joerg, The faulty code can be found in function X86InstrInfo::storeRegToStackSlot(...) from /lib/Target/X86/X86InstrInfo.cpp. > bool isAligned = (RI.getStackAlignment() >= 16) || RI.canRealignStack(MF); When creating the spill's machine instruction, the spill slot is assumed to be aligned if the alignment is >= 16 bytes, which is not the case for AVX. AVX spills require 32 byte alignment to make use of aligned moves. The stack is not adjusted automatically. For performance, the best fix is to align the frame to a 32-byte boundary, ensuring that the YMM spill slots are also on 32-byte boundaries. This, of course, breaks the ABI. -Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/68ebf9d3/attachment.html From eli.friedman at gmail.com Thu Mar 1 16:09:23 2012 From: eli.friedman at gmail.com (Eli Friedman) Date: Thu, 1 Mar 2012 14:09:23 -0800 Subject: [LLVMdev] Aliasing bug or feature? In-Reply-To: <041801ccf7f6$ba758060$2f608120$@org> References: <4F4EB091.1070503@googlemail.com> <4F4F3356.7030106@free.fr> <03ab01ccf7ee$dda384e0$98ea8ea0$@org> <041801ccf7f6$ba758060$2f608120$@org> Message-ID: On Thu, Mar 1, 2012 at 2:00 PM, Sergei Larin wrote: > Eli, > > ?I might not have answered your question fully/accurately... > > On my architecture, these _two_ loads are lowered to a single instruction: > > %3 = load i8** @q, align 4, !tbaa !2 > %4 = load i8* %3, align 1, !tbaa !0 > > Becomes > > i32,ch = LDw_GP_V4 0x3df4c70, 0x3df5470 > I guess what is happening, the alias properties of combined instruction are > not updated properly, and I am not sure if this is something I need to do, > or it is getting done "automatically" somewhere at DAG combine... Sorry. > Still learning :) Ah, that explains it: you've only attached one memory operand to an instruction that reads from two memory locations. You probably just need to add the second memory operand. -Eli From cameron.mcinally at nyu.edu Thu Mar 1 16:30:40 2012 From: cameron.mcinally at nyu.edu (Cameron McInally) Date: Thu, 1 Mar 2012 17:30:40 -0500 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: <4F4FEA3D.1000205@codeaurora.org> References: <4F4FEA3D.1000205@codeaurora.org> Message-ID: On Thu, Mar 1, 2012 at 4:29 PM, Evandro Menezes wrote: ... > Aligning the stack to 32 bytes when there are auto AVX vector variables > present shouldn't necessarily break the x86-64 ABI, as long as smaller auto > variables remain properly aligned. A similar approach was taken for i386 > in GCC in order to support SSE vectors. > > Perhaps you could elaborate where the ABI was violated when your patch is > applied. > Sorry, I confused myself. This was worked out about a year ago, so it's not in my cache. You're right. In my last email I wrote "align the stack", when I should have written "align the frame when variable sized objects are in play". Take main(...) for example, with a few alloca's. If one would like to spill AVX regs with aligned moves, one must align the frame to 32 bytes to ensure that the spill slots are aligned correctly, since spill slots are based off of the frame pointer. The x86-64 ABI lays out the stack frame as... ... 16(%rbp) mem arg[0] 8(%rbp) return address 0(%rbp) previous %rbp -8(%rbp) stack My patch breaks the ABI since the ABI requires the return address to be found at 8(%rbp) and mem arg[0] at 16(%rbp). Unfortunately, to align the frame at runtime, it's sometimes required to insert padding in between the two. I'll ask for my company's permission to share my implementation. Until then, I'll have to bite my tongue. -Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/e9a0be25/attachment.html From joerg at britannica.bec.de Thu Mar 1 16:50:00 2012 From: joerg at britannica.bec.de (Joerg Sonnenberger) Date: Thu, 1 Mar 2012 23:50:00 +0100 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: <4F4FEA3D.1000205@codeaurora.org> Message-ID: <20120301225000.GB16119@britannica.bec.de> On Thu, Mar 01, 2012 at 05:30:40PM -0500, Cameron McInally wrote: > The x86-64 ABI lays out the stack frame as... > > ... > 16(%rbp) mem arg[0] > 8(%rbp) return address > 0(%rbp) previous %rbp > -8(%rbp) stack > > My patch breaks the ABI since the ABI requires the return address to be > found at 8(%rbp) and mem arg[0] at 16(%rbp). Unfortunately, to align the > frame at runtime, it's sometimes required to insert padding in between the > two. Eh, no. The X86-64 ABI doesn't require the use of a frame pointer, so it obviously can't require anything relative to %rbp. Please note that stack realignment is implement unless code requires dynamic allocas. The conditionals seems to be wrong though as mentioned in this thread. Joerg From cameron.mcinally at nyu.edu Thu Mar 1 17:04:22 2012 From: cameron.mcinally at nyu.edu (Cameron McInally) Date: Thu, 1 Mar 2012 18:04:22 -0500 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: <4F4FEA3D.1000205@codeaurora.org> Message-ID: On Thu, Mar 1, 2012 at 5:30 PM, Cameron McInally wrote: > Aligning the stack to 32 bytes when there are auto AVX vector variables >> present shouldn't necessarily break the x86-64 ABI, as long as smaller auto >> variables remain properly aligned. A similar approach was taken for i386 >> in GCC in order to support SSE vectors. >> >> This topic is starting to come back to me now. The reason the GCC solution above did not work for us is that we do not build all of the libraries used with our compiler. For example, some are proprietary compiled object files and some are GCC compiled object files from other sources. If our object files called another library, and in turn that library called a function in our object code, it's not possible to ensure that the frame of the current function is still aligned to 32 bytes. That was the determining factor in my implementation. That is, unless you know something that I don't. I'm pretty new to compiler development. :) -Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/02183e07/attachment.html From echristo at apple.com Thu Mar 1 17:07:44 2012 From: echristo at apple.com (Eric Christopher) Date: Thu, 01 Mar 2012 15:07:44 -0800 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: <4F4FEA3D.1000205@codeaurora.org> Message-ID: On Mar 1, 2012, at 3:04 PM, Cameron McInally wrote: > On Thu, Mar 1, 2012 at 5:30 PM, Cameron McInally wrote: > Aligning the stack to 32 bytes when there are auto AVX vector variables present shouldn't necessarily break the x86-64 ABI, as long as smaller auto variables remain properly aligned. A similar approach was taken for i386 in GCC in order to support SSE vectors. > > > This topic is starting to come back to me now. The reason the GCC solution above did not work for us is that we do not build all of the libraries used with our compiler. For example, some are proprietary compiled object files and some are GCC compiled object files from other sources. If our object files called another library, and in turn that library called a function in our object code, it's not possible to ensure that the frame of the current function is still aligned to 32 bytes. That was the determining factor in my implementation. You can only ever guarantee that code is aligned to the ABI alignment. -eric From grosbach at apple.com Thu Mar 1 17:25:06 2012 From: grosbach at apple.com (Jim Grosbach) Date: Thu, 01 Mar 2012 15:25:06 -0800 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: <4F4FEA3D.1000205@codeaurora.org> Message-ID: <6EE31BBE-C649-4299-B2F4-CF9CAAD71915@apple.com> On Mar 1, 2012, at 3:07 PM, Eric Christopher wrote: > > On Mar 1, 2012, at 3:04 PM, Cameron McInally wrote: > >> On Thu, Mar 1, 2012 at 5:30 PM, Cameron McInally wrote: >> Aligning the stack to 32 bytes when there are auto AVX vector variables present shouldn't necessarily break the x86-64 ABI, as long as smaller auto variables remain properly aligned. A similar approach was taken for i386 in GCC in order to support SSE vectors. >> >> >> This topic is starting to come back to me now. The reason the GCC solution above did not work for us is that we do not build all of the libraries used with our compiler. For example, some are proprietary compiled object files and some are GCC compiled object files from other sources. If our object files called another library, and in turn that library called a function in our object code, it's not possible to ensure that the frame of the current function is still aligned to 32 bytes. That was the determining factor in my implementation. > > You can only ever guarantee that code is aligned to the ABI alignment. On entry to the function, right. The function can do dynamic stack realignment, however, to whatever it wants. ARM, especially on Darwin, does that quite a lot. -Jim From elena.demikhovsky at intel.com Thu Mar 1 17:28:43 2012 From: elena.demikhovsky at intel.com (Demikhovsky, Elena) Date: Thu, 1 Mar 2012 23:28:43 +0000 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: <4F4FEA3D.1000205@codeaurora.org> Message-ID: Even if you explicitly specify ?stack-alignment=16 the aligned movs are still generated. It is not an issue related to ABI. See my original mail: ./llc -mattr=+avx -stack-alignment=16 < basic.ll | grep movaps | grep ymm | grep rbp vmovaps -176(%rbp), %ymm14 vmovaps -144(%rbp), %ymm11 vmovaps -240(%rbp), %ymm13 - Elena From: Cameron McInally [mailto:cameron.mcinally at nyu.edu] Sent: Friday, March 02, 2012 01:04 To: Evandro Menezes Cc: Demikhovsky, Elena; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Stack alignment on X86 AVX seems incorrect On Thu, Mar 1, 2012 at 5:30 PM, Cameron McInally > wrote: Aligning the stack to 32 bytes when there are auto AVX vector variables present shouldn't necessarily break the x86-64 ABI, as long as smaller auto variables remain properly aligned. A similar approach was taken for i386 in GCC in order to support SSE vectors. This topic is starting to come back to me now. The reason the GCC solution above did not work for us is that we do not build all of the libraries used with our compiler. For example, some are proprietary compiled object files and some are GCC compiled object files from other sources. If our object files called another library, and in turn that library called a function in our object code, it's not possible to ensure that the frame of the current function is still aligned to 32 bytes. That was the determining factor in my implementation. That is, unless you know something that I don't. I'm pretty new to compiler development. :) -Cameron --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/250477a4/attachment.html From cameron.mcinally at nyu.edu Thu Mar 1 17:59:26 2012 From: cameron.mcinally at nyu.edu (Cameron McInally) Date: Thu, 1 Mar 2012 18:59:26 -0500 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: <4F4FEA3D.1000205@codeaurora.org> Message-ID: On Thu, Mar 1, 2012 at 6:28 PM, Demikhovsky, Elena < elena.demikhovsky at intel.com> wrote: > Even if you explicitly specify ?stack-alignment=16 the aligned movs are > still generated.**** > > It is not an issue related to ABI. > > > Right, your issue is triggered by the code I sent out earlier: > The faulty code can be found in function X86InstrInfo::storeRegToStackSlot(...) from > /lib/Target/X86/X86InstrInfo.cpp. > >> bool isAligned = (RI.getStackAlignment() >= 16) || RI.canRealignStack(MF); > In some cases, the stack is assumed to be aligned if it's on a 16 byte or greater boundary. Your desired alignment is 32 bytes, so aligned 256b moves are selected which is not correct. At runtime, your stack slots could still be aligned on a 16 byte boundary. You'll have to either: 1) Always use unaligned moves; 2) Update the code above to handle 32 byte alignment (Is this even possible at compile time? I wouldn't think so.); 3) Align the frame and stack to 32 bytes, so that AVX spill slots are always on 32 byte boundaries (This is what I'm proposing.); -Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/3b944ee3/attachment.html From bruno.cardoso at gmail.com Thu Mar 1 21:18:19 2012 From: bruno.cardoso at gmail.com (Bruno Cardoso Lopes) Date: Fri, 2 Mar 2012 00:18:19 -0300 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: <4F4FEA3D.1000205@codeaurora.org> Message-ID: Hi Elena, On Thu, Mar 1, 2012 at 8:28 PM, Demikhovsky, Elena wrote: > Even if you explicitly specify ?stack-alignment=16 the aligned movs are > still generated. > > It is not an issue related to ABI. This looks like PR10841, explanation and the way to solve it: http://llvm.org/bugs/show_bug.cgi?id=10841 Cheers, -- Bruno Cardoso Lopes http://www.brunocardoso.cc From chrisjones.lambda at gmail.com Thu Mar 1 21:24:09 2012 From: chrisjones.lambda at gmail.com (Christopher Jones) Date: Thu, 1 Mar 2012 22:24:09 -0500 Subject: [LLVMdev] (Newbie) Using lli with clang++? Message-ID: Hello all, I'm brand new to using LLVM and am having trouble using lli with a C++ program. I tried to compile the following: #include using namespace std; int main() { cout << "Hello, world!" << endl; return 0; } When I compile directly to an executable with the following command, all is well: $ clang++ -O3 hello.cpp -o hello But when I try to produce a bitcode file, I get an error: $ clang++ -O3 -emit-llvm hello.cpp -c -o hello.bc $ lli hello.bc LLVM ERROR: Program used external function '_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l' which could not be resolved! I'm running this on x86_64. I'd appreciate any help about what I'm doing wrong. Thanks! Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120301/1f689dc3/attachment-0001.html From joerg at britannica.bec.de Thu Mar 1 21:37:55 2012 From: joerg at britannica.bec.de (Joerg Sonnenberger) Date: Fri, 2 Mar 2012 04:37:55 +0100 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: <4F4FEA3D.1000205@codeaurora.org> Message-ID: <20120302033755.GA23103@britannica.bec.de> On Fri, Mar 02, 2012 at 12:18:19AM -0300, Bruno Cardoso Lopes wrote: > Hi Elena, > > On Thu, Mar 1, 2012 at 8:28 PM, Demikhovsky, Elena > wrote: > > Even if you explicitly specify ?stack-alignment=16 the aligned movs are > > still generated. > > > > It is not an issue related to ABI. > > This looks like PR10841, explanation and the way to solve it: > http://llvm.org/bugs/show_bug.cgi?id=10841 I was looking at this again today. What about the following approach: (1) Change AllocaInst to compute the isStaticAlloca once and remember it. (2) Check all functions for (a) static allocations with an alignment larger than the default stack alignemnt (b) dynamic alloca (3) If (a) is present and not (b), use the frame pointer to address arguments and the stack pointer to address local variables. If (b) is present and not (a), use the frame pointer to address arguments and local variables. Realign the stack pointer to the largest alignment needed for dynamic alloca. If (a) and (b) are present, adjust the isStatic attribute of all allocas with alignment larger than the default stack alignment. Deal with the rest like the case before. At least for 32bit x86 reserving another register as alternative frame pointer is very heavy. The above would allow normal spill logic to decide when to keep a reference in register and when not. It also reuses existing functionality as much as possible. Joerg From chenwj at iis.sinica.edu.tw Thu Mar 1 21:50:48 2012 From: chenwj at iis.sinica.edu.tw (=?utf-8?B?6Zmz6Z+L5Lu7?=) Date: Fri, 2 Mar 2012 11:50:48 +0800 Subject: [LLVMdev] (Newbie) Using lli with clang++? In-Reply-To: References: Message-ID: <20120302035048.GA22927@cs.nctu.edu.tw> > $ clang++ -O3 -emit-llvm hello.cpp -c -o hello.bc > $ lli hello.bc > LLVM ERROR: Program used external function > '_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l' > which could not be resolved! What version of LLVM and Clang you are using? I have no such problem on my machine. Regards, chenwj -- Wei-Ren Chen (???) Computer Systems Lab, Institute of Information Science, Academia Sinica, Taiwan (R.O.C.) Tel:886-2-2788-3799 #1667 Homepage: http://people.cs.nctu.edu.tw/~chenwj From chrisjones.lambda at gmail.com Thu Mar 1 23:38:44 2012 From: chrisjones.lambda at gmail.com (Christopher Jones) Date: Fri, 2 Mar 2012 00:38:44 -0500 Subject: [LLVMdev] (Newbie) Using lli with clang++? In-Reply-To: <20120302035048.GA22927@cs.nctu.edu.tw> References: <20120302035048.GA22927@cs.nctu.edu.tw> Message-ID: <2D975A09-0640-4E94-BC83-66E7152CBAAD@gmail.com> I'm using 3.1 for both Clang and LLVM: $ lli -version LLVM version 3.1svn DEBUG build with assertions. Built Feb 29 2012 (17:54:38). Default target: x86_64-unknown-linux-gnu $ clang -v clang version 3.1 (3edf02f66d339a3ae6d06aeb96c78d9089b53bc1) Target: x86_64-unknown-linux-gnu Thread model: posix Thanks, Chris On Mar 1, 2012, at 10:50 PM, ??? wrote: >> $ clang++ -O3 -emit-llvm hello.cpp -c -o hello.bc >> $ lli hello.bc >> LLVM ERROR: Program used external function >> '_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l' >> which could not be resolved! > > What version of LLVM and Clang you are using? I have no such problem > on my machine. > > Regards, > chenwj > > -- > Wei-Ren Chen (???) > Computer Systems Lab, Institute of Information Science, > Academia Sinica, Taiwan (R.O.C.) > Tel:886-2-2788-3799 #1667 > Homepage: http://people.cs.nctu.edu.tw/~chenwj From chrisjones.lambda at gmail.com Fri Mar 2 00:06:21 2012 From: chrisjones.lambda at gmail.com (Christopher Jones) Date: Fri, 2 Mar 2012 01:06:21 -0500 Subject: [LLVMdev] (Newbie) Using lli with clang++? In-Reply-To: <20120302035048.GA22927@cs.nctu.edu.tw> References: <20120302035048.GA22927@cs.nctu.edu.tw> Message-ID: Something else that may help: when I try to use llc to generate native assembly, I see that I have a linking problem: $ llc hello.bc -o hello.s $ g++ hello.s -o hello.native In function `main': hello.bc:(.text+0x11): undefined reference to `std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)' hello.bc:(.text+0x3b): undefined reference to `std::ctype::_M_widen_init() const' collect2: ld returned 1 exit status What should I do to modify the following line to link to the standard library? $ clang++ -O3 -emit-llvm hello.cpp -c -o hello.bc I'll also mention that when I try this exercise using a C program and clang instead of clang++, lli works fine. Thanks, Chris On Mar 1, 2012, at 10:50 PM, ??? wrote: >> $ clang++ -O3 -emit-llvm hello.cpp -c -o hello.bc >> $ lli hello.bc >> LLVM ERROR: Program used external function >> '_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l' >> which could not be resolved! > > What version of LLVM and Clang you are using? I have no such problem > on my machine. > > Regards, > chenwj > > -- > Wei-Ren Chen (???) > Computer Systems Lab, Institute of Information Science, > Academia Sinica, Taiwan (R.O.C.) > Tel:886-2-2788-3799 #1667 > Homepage: http://people.cs.nctu.edu.tw/~chenwj From johnso87 at crhc.illinois.edu Fri Mar 2 00:14:08 2012 From: johnso87 at crhc.illinois.edu (Matt Johnson) Date: Fri, 2 Mar 2012 00:14:08 -0600 Subject: [LLVMdev] (Newbie) Using lli with clang++? In-Reply-To: References: Message-ID: <4F506530.9090806@crhc.illinois.edu> On 03/01/2012 09:24 PM, Christopher Jones wrote: > Hello all, > > I'm brand new to using LLVM and am having trouble using lli with a C++ > program. I tried to compile the following: > > #include > using namespace std; > int main() > { > cout << "Hello, world!" << endl; > return 0; > } > > When I compile directly to an executable with the following command, > all is well: > $ clang++ -O3 hello.cpp -o hello > > But when I try to produce a bitcode file, I get an error: > > $ clang++ -O3 -emit-llvm hello.cpp -c -o hello.bc hello.bc doesn't contain the libstdc++ bits your program needs (iostream and its (many) dependencies). When you produce an executable, clang tells the linker to link your binary with libsupc++, libstdc++, and others, so the dynamic linker can satisfy your iostream dependencies at runtime. When running under lli, the interpreter will provide *a few* basic functions for you (see lib/ExecutionEngine/Interpreter/ExternalFunctions.cpp), but things like exit(), abort(), printf(), and scanf(), nothing as complicated as libstdc++. So if the function you need is not in the short list provided by the interpreter itself, it will try to find your function using libffi (if you compiled it in). If that doesn't work, you'll get errors like the below. One solution would be to try to generate a single big .bc file that is "statically linked" with all your dependencies (for some clues as to what these are, try "ldd ./hello" on your clang++-generated binary. Unfortunately, I'm no expert on this or any other methods of informing lli about your .bc file's dependencies and where they can be found when your interpreted program calls out to them. -Matt > $ lli hello.bc > LLVM ERROR: Program used external function > '_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l' > which could not be resolved! > > I'm running this on x86_64. I'd appreciate any help about what I'm > doing wrong. > Thanks! > > Chris > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120302/3512bb0e/attachment.html From baldrick at free.fr Fri Mar 2 01:13:46 2012 From: baldrick at free.fr (Duncan Sands) Date: Fri, 02 Mar 2012 08:13:46 +0100 Subject: [LLVMdev] (Newbie) Using lli with clang++? In-Reply-To: References: Message-ID: <4F50732A.8040800@free.fr> Hi, ... > When I compile directly to an executable with the following command, all is well: > $ clang++ -O3 hello.cpp -o hello > > But when I try to produce a bitcode file, I get an error: > > $ clang++ -O3 -emit-llvm hello.cpp -c -o hello.bc > $ lli hello.bc > LLVM ERROR: Program used external function > '_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l' > which could not be resolved! > > I'm running this on x86_64. I'd appreciate any help about what I'm doing wrong. first off you need to build with FFI support (configure with --enable-libffi). Then you doubtless need to pass libstdc++ to lli, like this (IIRC): -load=libstdc++.so When you compile with clang++ it automagically adds the C++ standard library to the list of things to link with, which is why you don't notice that the linker is getting passed libstdc++.so. As lli is doing linking too, it also needs libstdc++.so. Ciao, Duncan. From pdox at google.com Fri Mar 2 02:04:51 2012 From: pdox at google.com (David Meyer) Date: Fri, 2 Mar 2012 00:04:51 -0800 Subject: [LLVMdev] "-march" trashing ARM triple Message-ID: ARM subtarget features are determined by parsing the target tuple string TT. (ParseARMTriple(StringRef TT) in ARMMCTargetDesc.cpp) In llc, the -march setting overrides the architecture specified in -mtriple. So when you invoke: $ llc -march arm -mtriple armv7-none-linux ... ParseARMTriple() will see TT == "arm-none-linux" instead of "armv7-none-linux". As a result, the target features will be set generically. (Note that using "-march armv7" is not valid.) This is clearly wrong, but I'm not clear on where/how this should be fixed. Does the -march substitution need to happen at all? Could it be disabled only for ARM? Should TargetTriple or -march be made more precise? Thanks, - pdox -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120302/ca064760/attachment.html From konstantin.vladimirov at gmail.com Fri Mar 2 02:20:11 2012 From: konstantin.vladimirov at gmail.com (Konstantin Vladimirov) Date: Fri, 2 Mar 2012 12:20:11 +0400 Subject: [LLVMdev] how to annotate assembler Message-ID: Hi, In GCC there is one useful option -dp (or -dP for more verbose output) to annotate assembler with instruction patterns, that was used when assembler was generated. For example: double test(long long s) { return s; } gcc -S -dp -O0 test.c test: .LFB0: .cfi_startproc pushq %rbp # 18 *pushdi2_rex64/1 [length = 1] .cfi_def_cfa_offset 16 movq %rsp, %rbp # 19 *movdi_1_rex64/2 [length = 3] .cfi_offset 6, -16 .cfi_def_cfa_register 6 movq %rdi, -8(%rbp) # 2 *movdi_1_rex64/4 [length = 4] cvtsi2sdq -8(%rbp), %xmm0 # 6 *floatdidf2_sse_interunit/2 [length = 6] leave # 24 leave_rex64 [length = 1] .cfi_def_cfa 7, 8 ret # 25 return_internal [length = 1] .cfi_endproc Now I may look into config/i386/i386.md, and look for mentioned *pushdi2_rex64, movdi_1_rex64, floatdidf2_sse_interunit and other patterns and study how they work. How to make the same annotation for clang output assembler code? test: .Leh_func_begin1: pushq %rbp # ??? what insn in X86InstrInfo.td? .Llabel1: movq %rsp, %rbp # ??? what insn in X86InstrInfo.td? .Llabel2: movq %rdi, -16(%rbp) # ??? what insn in X86InstrInfo.td? movq -16(%rbp), %rax # ??? what insn in X86InstrInfo.td? cvtsi2sdq %rax, %xmm0 # ??? what insn in X86InstrInfo.td? movsd %xmm0, -8(%rbp) # ??? what insn in X86InstrInfo.td? movsd -8(%rbp), %xmm0 # ??? what insn in X86InstrInfo.td? popq %rbp # ??? what insn in X86InstrInfo.td? ret --- With best regards, Konstantin From hkultala at iki.fi Fri Mar 2 02:50:38 2012 From: hkultala at iki.fi (Heikki Kultala) Date: Fri, 02 Mar 2012 10:50:38 +0200 Subject: [LLVMdev] vector shuffle emulation/expand in backend? Message-ID: <4F5089DE.8000606@iki.fi> I'm having some troubles implementing vector support to our custom backend It seems that llvm cannot emulate shuffle with extracts, inserts and builds? I've enabled vector registers with addRegisterClass(MVT::v2i32, TCE::V2I32RegsRegisterClass); addRegisterClass(MVT::v2f32, TCE::V2F32RegsRegisterClass); and created patterns for most vector instructions, including insert, extract and build. I've tried to say setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2i32, Expand); setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2f32, Expand); but this does not seem to do anything, I still get LLVM ERROR: Cannot select: 0x1fde870: v2i32 = vector_shuffle 0x1fdda70, 0x1fdea80<1,0> [ID=38] 0x1fdda70: v2i32 = add 0x1fddf70, 0x20540e0 [ORD=2811] [ID=37] ... 0x1fdea80: v2i32 = undef [ID=16] Is there some solution to this? Is this a bug or a feature? From ivanllopard at gmail.com Fri Mar 2 03:19:47 2012 From: ivanllopard at gmail.com (Ivan Llopard) Date: Fri, 02 Mar 2012 10:19:47 +0100 Subject: [LLVMdev] vector shuffle emulation/expand in backend? In-Reply-To: <4F5089DE.8000606@iki.fi> References: <4F5089DE.8000606@iki.fi> Message-ID: <4F5090B3.4070806@gmail.com> Hi Heikki, You can look at SelectionDAGLegalize::ExpandNode() what the default expand implementation does for vector_shuffle nodes. If it does not fit your needs, you may try to custom lower them. Ivan. Le 02/03/2012 09:50, Heikki Kultala a ?crit : > I'm having some troubles implementing vector support to our custom backend > > It seems that llvm cannot emulate shuffle with extracts, inserts and builds? > > I've enabled vector registers with > > addRegisterClass(MVT::v2i32, TCE::V2I32RegsRegisterClass); > addRegisterClass(MVT::v2f32, TCE::V2F32RegsRegisterClass); > > and created patterns for most vector instructions, including insert, > extract and build. > > I've tried to say > > setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2i32, Expand); > setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2f32, Expand); > > but this does not seem to do anything, I still get > > LLVM ERROR: Cannot select: 0x1fde870: v2i32 = vector_shuffle 0x1fdda70, > 0x1fdea80<1,0> [ID=38] > 0x1fdda70: v2i32 = add 0x1fddf70, 0x20540e0 [ORD=2811] [ID=37] > ... > > 0x1fdea80: v2i32 = undef [ID=16] > > > Is there some solution to this? Is this a bug or a feature? > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From anton at korobeynikov.info Fri Mar 2 05:57:58 2012 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Fri, 2 Mar 2012 15:57:58 +0400 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: <20120302033755.GA23103@britannica.bec.de> References: <4F4FEA3D.1000205@codeaurora.org> <20120302033755.GA23103@britannica.bec.de> Message-ID: Joerg, > At least for 32bit x86 reserving another register as alternative frame > pointer is very heavy. The above would allow normal spill logic to > decide when to keep a reference in register and when not. It also reuses > existing functionality as much as possible. It does not seem to be enough. Even is there are *no* allocas in the function the stack realignment might still be necessary, for example due to spill of vector register. So, we'll need to decide very late whether we'll need realignment or not. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From anton at korobeynikov.info Fri Mar 2 06:00:36 2012 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Fri, 2 Mar 2012 16:00:36 +0400 Subject: [LLVMdev] how to annotate assembler In-Reply-To: References: Message-ID: Hello > How to make the same annotation for clang output assembler code? There is no such functionality. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From benny.kra at googlemail.com Fri Mar 2 06:32:55 2012 From: benny.kra at googlemail.com (Benjamin Kramer) Date: Fri, 2 Mar 2012 13:32:55 +0100 Subject: [LLVMdev] how to annotate assembler In-Reply-To: References: Message-ID: On 02.03.2012, at 09:20, Konstantin Vladimirov wrote: > Hi, > > In GCC there is one useful option -dp (or -dP for more verbose output) > to annotate assembler with instruction patterns, that was used when > assembler was generated. For example: The internal "-mllvm -show-mc-inst" option is probably as close as you can get. $ clang -S -O0 test.c -mllvm -show-mc-inst -o - _test: ## @test .cfi_startproc ## BB#0: ## %entry pushq %rbp ## > Ltmp2: .cfi_def_cfa_offset 16 Ltmp3: .cfi_offset %rbp, -16 movq %rsp, %rbp ## ## > Ltmp4: .cfi_def_cfa_register %rbp movq %rdi, -8(%rbp) ## ## ## ## ## ## > cvtsi2sdq -8(%rbp), %xmm0 ## ## ## ## ## ## > popq %rbp ## > ret ## .cfi_endproc - Ben > > double > test(long long s) > { > return s; > } > > gcc -S -dp -O0 test.c > > test: > .LFB0: > .cfi_startproc > pushq %rbp # 18 *pushdi2_rex64/1 [length = 1] > .cfi_def_cfa_offset 16 > movq %rsp, %rbp # 19 *movdi_1_rex64/2 [length = 3] > .cfi_offset 6, -16 > .cfi_def_cfa_register 6 > movq %rdi, -8(%rbp) # 2 *movdi_1_rex64/4 [length = 4] > cvtsi2sdq -8(%rbp), %xmm0 # 6 *floatdidf2_sse_interunit/2 [length = 6] > leave # 24 leave_rex64 [length = 1] > .cfi_def_cfa 7, 8 > ret # 25 return_internal [length = 1] > .cfi_endproc > > Now I may look into config/i386/i386.md, and look for mentioned > *pushdi2_rex64, movdi_1_rex64, floatdidf2_sse_interunit and other > patterns and study how they work. > > How to make the same annotation for clang output assembler code? > > test: > .Leh_func_begin1: > pushq %rbp # ??? what insn in X86InstrInfo.td? > .Llabel1: > movq %rsp, %rbp # ??? what insn in X86InstrInfo.td? > .Llabel2: > movq %rdi, -16(%rbp) # ??? what insn in X86InstrInfo.td? > movq -16(%rbp), %rax # ??? what insn in X86InstrInfo.td? > cvtsi2sdq %rax, %xmm0 # ??? what insn in X86InstrInfo.td? > movsd %xmm0, -8(%rbp) # ??? what insn in X86InstrInfo.td? > movsd -8(%rbp), %xmm0 # ??? what insn in X86InstrInfo.td? > popq %rbp # ??? what insn in X86InstrInfo.td? > ret > > > --- > With best regards, Konstantin > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From konstantin.vladimirov at gmail.com Fri Mar 2 06:51:36 2012 From: konstantin.vladimirov at gmail.com (Konstantin Vladimirov) Date: Fri, 2 Mar 2012 16:51:36 +0400 Subject: [LLVMdev] how to annotate assembler In-Reply-To: References: Message-ID: Hi, Thank you, it is just what I need. But... it doesn't work for me: $ clang -S -O0 test.c -mllvm -show-mc-inst error: unknown argument: '-show-mc-inst' $ clang --version clang version 1.1 (branches/release_27) Target: x86_64-pc-linux-gnu Thread model: posix May be I need LLVM with higher version, or mention something in configure options? On Fri, Mar 2, 2012 at 4:32 PM, Benjamin Kramer wrote: > > On 02.03.2012, at 09:20, Konstantin Vladimirov wrote: > >> Hi, >> >> In GCC there is one useful option -dp (or -dP for more verbose output) >> to annotate assembler with instruction patterns, that was used when >> assembler was generated. For example: > > The internal "-mllvm -show-mc-inst" option is probably as close as you can get. > > $ clang -S -O0 test.c -mllvm -show-mc-inst -o - > _test: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## @test > ? ? ? ?.cfi_startproc > ## BB#0: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## %entry > ? ? ? ?pushq ? %rbp ? ? ? ? ? ? ? ? ? ?## ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## ?> > Ltmp2: > ? ? ? ?.cfi_def_cfa_offset 16 > Ltmp3: > ? ? ? ?.cfi_offset %rbp, -16 > ? ? ? ?movq ? ?%rsp, %rbp ? ? ? ? ? ? ?## ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## ?> > Ltmp4: > ? ? ? ?.cfi_def_cfa_register %rbp > ? ? ? ?movq ? ?%rdi, -8(%rbp) ? ? ? ? ?## ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## ?> > ? ? ? ?cvtsi2sdq ? ? ? -8(%rbp), %xmm0 ## ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## ?> > ? ? ? ?popq ? ?%rbp ? ? ? ? ? ? ? ? ? ?## ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?## ?> > ? ? ? ?ret ? ? ? ? ? ? ? ? ? ? ? ? ? ? ## > ? ? ? ?.cfi_endproc > > - Ben > >> >> double >> test(long long s) >> { >> ?return s; >> } >> >> gcc -S -dp -O0 test.c >> >> test: >> .LFB0: >> ?.cfi_startproc >> ?pushq %rbp ?# 18 ?*pushdi2_rex64/1 ?[length = 1] >> ?.cfi_def_cfa_offset 16 >> ?movq ?%rsp, %rbp ?# 19 ?*movdi_1_rex64/2 ?[length = 3] >> ?.cfi_offset 6, -16 >> ?.cfi_def_cfa_register 6 >> ?movq ?%rdi, -8(%rbp) ?# 2 *movdi_1_rex64/4 ?[length = 4] >> ?cvtsi2sdq -8(%rbp), %xmm0 # 6 *floatdidf2_sse_interunit/2 [length = 6] >> ?leave # 24 ?leave_rex64 [length = 1] >> ?.cfi_def_cfa 7, 8 >> ?ret # 25 ?return_internal [length = 1] >> ?.cfi_endproc >> >> Now I may look into config/i386/i386.md, and look for mentioned >> *pushdi2_rex64, movdi_1_rex64, floatdidf2_sse_interunit and other >> patterns and study how they work. >> >> How to make the same annotation for clang output assembler code? >> >> test: >> .Leh_func_begin1: >> ?pushq %rbp ? ? ? ? ?# ??? what insn in X86InstrInfo.td? >> .Llabel1: >> ?movq ?%rsp, %rbp ?# ??? what insn in X86InstrInfo.td? >> .Llabel2: >> ?movq ?%rdi, -16(%rbp) ?# ??? what insn in X86InstrInfo.td? >> ?movq ?-16(%rbp), %rax ?# ??? what insn in X86InstrInfo.td? >> ?cvtsi2sdq %rax, %xmm0 ?# ??? what insn in X86InstrInfo.td? >> ?movsd %xmm0, -8(%rbp) ?# ??? what insn in X86InstrInfo.td? >> ?movsd -8(%rbp), %xmm0 ?# ??? what insn in X86InstrInfo.td? >> ?popq ?%rbp ?# ??? what insn in X86InstrInfo.td? >> ?ret >> >> >> --- >> With best regards, Konstantin >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From jochen.wilhelmy at googlemail.com Fri Mar 2 06:55:18 2012 From: jochen.wilhelmy at googlemail.com (Jochen Wilhelmy) Date: Fri, 02 Mar 2012 13:55:18 +0100 Subject: [LLVMdev] replace hardcoded function names by intrinsics Message-ID: <4F50C336.9020200@googlemail.com> Hi! in the llvm code there are several places with hardcoded function names for e.g. sin, sinf, sqrt, sqrtf etc., namely ConstantFolding.cpp InlineCost.cpp SelectionDAGBuilder.cpp IntrinsicLowering.cpp TargetLowering.cpp my question is: wouldn't it be beneficial to use intrinsics for this? for example a c/c++ frontend (clang) could translate the function calls to intrinsics and then in a very late step (IntrinsicLowering.cpp?) translate it back to function calls. an opencl frontend then could use the intrinsics on vector types and ConstantFolding.cpp would work on sin/cos of vector types. currently the intrinsics for sin/cos are missing in ConstantFolding. To summarize, using only intrinsics would reduce complexity and increase flexibility as vector types are supported. -Jochen From chenwj at iis.sinica.edu.tw Fri Mar 2 07:06:37 2012 From: chenwj at iis.sinica.edu.tw (=?utf-8?B?6Zmz6Z+L5Lu7?=) Date: Fri, 2 Mar 2012 21:06:37 +0800 Subject: [LLVMdev] how to annotate assembler In-Reply-To: References: Message-ID: <20120302130637.GA23093@cs.nctu.edu.tw> > > $ clang --version > clang version 1.1 (branches/release_27) ^^^^^^^^^^ Looks suspicious. I am not sure MC related thing appears at that time. Try 3.0 instead. > Target: x86_64-pc-linux-gnu > Thread model: posix Regards, chenwj -- Wei-Ren Chen (???) Computer Systems Lab, Institute of Information Science, Academia Sinica, Taiwan (R.O.C.) Tel:886-2-2788-3799 #1667 Homepage: http://people.cs.nctu.edu.tw/~chenwj From Prafulla.Thakare at kpitcummins.com Fri Mar 2 02:13:16 2012 From: Prafulla.Thakare at kpitcummins.com (Prafulla Thakare) Date: Fri, 2 Mar 2012 08:13:16 +0000 Subject: [LLVMdev] LLVM for 8 bit controller Message-ID: <52692704CDDDD34E8E52CB2059AF9D2F32EEA0C0@KCHJEXMB01.kpit.com> Hi, Can I get information on which all 8 bit controllers are currently supported by LLVM project and which are planned (development in progress)? Moreover, I would also like to know, how popular is LLVM for 8 bit controller and what are the advantages of using LLVM over GCC (specifically for 8 bit controllers). We are trying to explore LLVM and would like to gather as much information as possible. Thank you. Regards, Prafulla This message contains information that may be privileged or confidential and is the property of the KPIT Cummins Infosystems Ltd. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. KPIT Cummins Infosystems Ltd. does not accept any liability for virus infected mails. From pbarrio at die.upm.es Fri Mar 2 08:25:37 2012 From: pbarrio at die.upm.es (Pablo Barrio) Date: Fri, 02 Mar 2012 15:25:37 +0100 Subject: [LLVMdev] Interactions between module and loop passes Message-ID: <4F50D861.2080507@die.upm.es> Hi all, I have a code with three passes (one loop pass and two module passes) and my own pass manager. If I schedule the loop pass between the others, my code segfaults. Is there any explanation why loop passes cannot be scheduled between two module passes? Perhaps I misunderstood the behaviour of pass managers. I paste here my "usage" information: int main(...){ Module m = ... //Read module PassManager pm; pm.add(new ModPass1); pm.add(new LoopPass); pm.add(new ModPass2); pm.run(m); } class ModPass1 : public ModulePass{ virtual void getAnalysisUsage(AnalysisUsage&AU) const{ AU.setPreservesAll(); } }; class LoopPass : public LoopPass{ virtual void getAnalysisUsage(AnalysisUsage&AU) const{ AU.setRequires(); AU.setPreservesAll(); } }; class ModPass2 : public ModulePass{ virtual void getAnalysisUsage(AnalysisUsage&AU) const{ AU.setRequires(); AU.setPreservesAll(); } }; If I remove any of the passes (updating the usage information), it's OK. If I transform the loop pass into a module pass, it also works. Thanks ahead, -- Pablo Barrio Dpt. Electrical Engineering - Technical University of Madrid Office C-203 Avda. Complutense s/n, 28040 Madrid Tel. (+34) 915495700 ext. 4234 @: pbarrio at die.upm.es From baldrick at free.fr Fri Mar 2 08:36:08 2012 From: baldrick at free.fr (Duncan Sands) Date: Fri, 02 Mar 2012 15:36:08 +0100 Subject: [LLVMdev] Interactions between module and loop passes In-Reply-To: <4F50D861.2080507@die.upm.es> References: <4F50D861.2080507@die.upm.es> Message-ID: <4F50DAD8.8060203@free.fr> Hi Pablo, > I have a code with three passes (one loop pass and two module passes) > and my own pass manager. If I schedule the loop pass between the others, > my code segfaults. when developing with LLVM you should configure with --enable-assertions. That way you should get an assert failure with a helpful message rather than a crash. Is there any explanation why loop passes cannot be > scheduled between two module passes? Perhaps I misunderstood the > behaviour of pass managers. > > I paste here my "usage" information: > > int main(...){ > > Module m = ... //Read module > PassManager pm; > > pm.add(new ModPass1); > pm.add(new LoopPass); > pm.add(new ModPass2); > pm.run(m); > > } > > class ModPass1 : public ModulePass{ > > virtual void getAnalysisUsage(AnalysisUsage&AU) const{ > AU.setPreservesAll(); > } > }; > > class LoopPass : public LoopPass{ > > virtual void getAnalysisUsage(AnalysisUsage&AU) const{ > AU.setRequires(); I'm pretty sure a LoopPass cannot require a ModulePass. Ciao, Duncan. > AU.setPreservesAll(); > } > }; > > class ModPass2 : public ModulePass{ > > virtual void getAnalysisUsage(AnalysisUsage&AU) const{ > > AU.setRequires(); > AU.setPreservesAll(); > } > }; > > > If I remove any of the passes (updating the usage information), it's OK. > If I transform the loop pass into a module pass, it also works. > > Thanks ahead, > From hfinkel at anl.gov Fri Mar 2 08:41:48 2012 From: hfinkel at anl.gov (Hal Finkel) Date: Fri, 2 Mar 2012 08:41:48 -0600 Subject: [LLVMdev] replace hardcoded function names by intrinsics In-Reply-To: <4F50C336.9020200@googlemail.com> References: <4F50C336.9020200@googlemail.com> Message-ID: <20120302084148.098e6641@sapling2> On Fri, 02 Mar 2012 13:55:18 +0100 Jochen Wilhelmy wrote: > Hi! > > in the llvm code there are several places with hardcoded function > names for e.g. sin, sinf, sqrt, sqrtf etc., namely > ConstantFolding.cpp > InlineCost.cpp > SelectionDAGBuilder.cpp > IntrinsicLowering.cpp > TargetLowering.cpp > > my question is: wouldn't it be beneficial to use intrinsics for this? > for example a c/c++ > frontend (clang) could translate the function calls to intrinsics and > then in a very late > step (IntrinsicLowering.cpp?) translate it back to function calls. > an opencl frontend then could use the intrinsics on vector types and > ConstantFolding.cpp > would work on sin/cos of vector types. currently the intrinsics for > sin/cos are missing in > ConstantFolding. > To summarize, using only intrinsics would reduce complexity and > increase flexibility as > vector types are supported. I also think that this is a good idea. -Hal > > -Jochen > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory From sarathcse19 at gmail.com Fri Mar 2 08:44:11 2012 From: sarathcse19 at gmail.com (Sarath Chandra) Date: Fri, 2 Mar 2012 20:14:11 +0530 Subject: [LLVMdev] How to use 'opt' command? Message-ID: Hi all, How to print the analysis results using 'opt' command? I tried using the below command for my *module.ll* file *opt -analyze -memdep module.ll* * * But it's printing Printing analysis 'Memory Dependence Analysis' for function 'main': Pass::print not implemented for pass: 'Memory Dependence Analysis'! Printing analysis 'Memory Dependence Analysis' for function 'CustomMalloc': Pass::print not implemented for pass: 'Memory Dependence Analysis'! Printing analysis 'Memory Dependence Analysis' for function 'wrapDouble': Pass::print not implemented for pass: 'Memory Dependence Analysis'! Printing analysis 'Memory Dependence Analysis' for function 'wrapString': Pass::print not implemented for pass: 'Memory Dependence Analysis'! Printing analysis 'Memory Dependence Analysis' for function 'isDoubleType': Pass::print not implemented for pass: 'Memory Dependence Analysis'! Printing analysis 'Memory Dependence Analysis' for function 'unwrapValue': Pass::print not implemented for pass: 'Memory Dependence Analysis'! Printing analysis 'Memory Dependence Analysis' for function 'ADD': Pass::print not implemented for pass: 'Memory Dependence Analysis'! -- (?`?.???) `?.?(?`?.???) (?`?.???)?.?? Sarath!!! `?.?.?? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120302/851d88ef/attachment.html From baldrick at free.fr Fri Mar 2 09:05:17 2012 From: baldrick at free.fr (Duncan Sands) Date: Fri, 02 Mar 2012 16:05:17 +0100 Subject: [LLVMdev] replace hardcoded function names by intrinsics In-Reply-To: <20120302084148.098e6641@sapling2> References: <4F50C336.9020200@googlemail.com> <20120302084148.098e6641@sapling2> Message-ID: <4F50E1AD.8050604@free.fr> Hi, >> in the llvm code there are several places with hardcoded function >> names for e.g. sin, sinf, sqrt, sqrtf etc., namely >> ConstantFolding.cpp >> InlineCost.cpp >> SelectionDAGBuilder.cpp >> IntrinsicLowering.cpp >> TargetLowering.cpp >> >> my question is: wouldn't it be beneficial to use intrinsics for this? >> for example a c/c++ >> frontend (clang) could translate the function calls to intrinsics and >> then in a very late >> step (IntrinsicLowering.cpp?) translate it back to function calls. >> an opencl frontend then could use the intrinsics on vector types and >> ConstantFolding.cpp >> would work on sin/cos of vector types. currently the intrinsics for >> sin/cos are missing in >> ConstantFolding. >> To summarize, using only intrinsics would reduce complexity and >> increase flexibility as >> vector types are supported. > > I also think that this is a good idea. intrinsics don't have the same semantics as the library functions. For example they don't set errno and in general they are less accurate. Thus you can't turn every use of eg sqrt into an intrinsic. However you will still want to constant fold instances of sqrt that weren't turned into an intrinsic, and thus all those names will still need to exist in constant fold etc, so this change wouldn't buy you much. Ciao, Duncan. From lostfreeman at gmail.com Fri Mar 2 09:08:18 2012 From: lostfreeman at gmail.com (lost) Date: Fri, 2 Mar 2012 19:08:18 +0400 Subject: [LLVMdev] Access Violation using ExecutionEngine on 64-bit Windows 8 Consumer Preview Message-ID: Hi everyone! I've faced a strange problem after updating to Windows 8 Consumer Preview recently. It seems that LLVM inserts 4 calls to the same function at the start of generated code. The function's disassembly (taken from nearby computer with Windows 7) is: 00000000773A0DD0 sub rsp,10h 00000000773A0DD4 mov qword ptr [rsp],r10 00000000773A0DD8 mov qword ptr [rsp+8],r11 00000000773A0DDD xor r11,r11 00000000773A0DE0 lea r10,[rsp+18h] 00000000773A0DE5 sub r10,rax 00000000773A0DE8 cmovb r10,r11 00000000773A0DEC mov r11,qword ptr gs:[10h] 00000000773A0DF5 cmp r10,r11 00000000773A0DF8 jae 00000000773A0E10 00000000773A0DFA and r10w,0F000h 00000000773A0E00 lea r11,[r11-1000h] 00000000773A0E07 mov byte ptr [r11],0 00000000773A0E0B cmp r10,r11 00000000773A0E0E jne 00000000773A0E00 00000000773A0E10 mov r10,qword ptr [rsp] 00000000773A0E14 mov r11,qword ptr [rsp+8] 00000000773A0E19 add rsp,10h 00000000773A0E1D ret That function is called 3 or 4 times from my function like this: 0000000000C700A5 push rax 0000000000C700A6 mov esi,ecx 0000000000C700A8 sub rsp,20h 0000000000C700AC mov rax,76CBC490h 0000000000C700B6 call rax ; this is my call to DebugBreak() which goes first 0000000000C700B8 add rsp,20h 0000000000C700BC mov eax,10h 0000000000C700C1 call 00000000773A0DD0 0000000000C700C6 sub rsp,rax 0000000000C700C9 mov r8,rsp 0000000000C700CC mov dword ptr [r8],0 0000000000C700D3 mov eax,10h 0000000000C700D8 call 00000000773A0DD0 0000000000C700DD sub rsp,rax 0000000000C700E0 mov rdx,rsp 0000000000C700E3 mov dword ptr [rdx],0 0000000000C700E9 mov eax,10h 0000000000C700EE call 00000000773A0DD0 0000000000C700F3 sub rsp,rax 0000000000C700F6 mov rcx,rsp 0000000000C700F9 mov dword ptr [rcx],0 0000000000C700FF mov eax,10h 0000000000C70104 call 00000000773A0DD0 ; 4 calls to the above function 0000000000C70109 sub rsp,rax 0000000000C7010C mov dword ptr [rsp],0 ; here goes the remaining code of my function 0000000000C70113 mov dword ptr [r8],1 0000000000C7011A mov dword ptr [rdx],2 .... The problem is that in Windows 8 CP 4 calls to the first function actually lead to nowhere. E.g. to the address in memory, that is not allocated or improperly protected (either NX bit is set, or Read is not set). Where should I start from to debug this behavior? Best regards, Victor Milovanov From S.Parker3 at lboro.ac.uk Fri Mar 2 09:25:40 2012 From: S.Parker3 at lboro.ac.uk (sam parker) Date: Fri, 2 Mar 2012 07:25:40 -0800 (PST) Subject: [LLVMdev] How to use 'opt' command? In-Reply-To: References: Message-ID: <33429234.post@talk.nabble.com> Hi Sarah, I believe opt takes the bytecode file (.bc) not the human readable form, but i am just a learner... Sam. sarath chandra-5 wrote: > > Hi all, > > How to print the analysis results using 'opt' command? > > I tried using the below command for my *module.ll* file > > *opt -analyze -memdep module.ll* > * > * > But it's printing > > Printing analysis 'Memory Dependence Analysis' for function 'main': > Pass::print not implemented for pass: 'Memory Dependence Analysis'! > Printing analysis 'Memory Dependence Analysis' for function > 'CustomMalloc': > Pass::print not implemented for pass: 'Memory Dependence Analysis'! > Printing analysis 'Memory Dependence Analysis' for function 'wrapDouble': > Pass::print not implemented for pass: 'Memory Dependence Analysis'! > Printing analysis 'Memory Dependence Analysis' for function 'wrapString': > Pass::print not implemented for pass: 'Memory Dependence Analysis'! > Printing analysis 'Memory Dependence Analysis' for function > 'isDoubleType': > Pass::print not implemented for pass: 'Memory Dependence Analysis'! > Printing analysis 'Memory Dependence Analysis' for function 'unwrapValue': > Pass::print not implemented for pass: 'Memory Dependence Analysis'! > Printing analysis 'Memory Dependence Analysis' for function 'ADD': > Pass::print not implemented for pass: 'Memory Dependence Analysis'! > > -- > > (?`?.???) > `?.?(?`?.???) > (?`?.???)?.?? Sarath!!! > `?.?.?? > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -- View this message in context: http://old.nabble.com/How-to-use-%27opt%27-command--tp33428944p33429234.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From hfinkel at anl.gov Fri Mar 2 09:31:32 2012 From: hfinkel at anl.gov (Hal Finkel) Date: Fri, 2 Mar 2012 09:31:32 -0600 Subject: [LLVMdev] replace hardcoded function names by intrinsics In-Reply-To: <4F50E1AD.8050604@free.fr> References: <4F50C336.9020200@googlemail.com> <20120302084148.098e6641@sapling2> <4F50E1AD.8050604@free.fr> Message-ID: <20120302093132.7fdcdf2f@sapling2> On Fri, 02 Mar 2012 16:05:17 +0100 Duncan Sands wrote: > Hi, > > >> in the llvm code there are several places with hardcoded function > >> names for e.g. sin, sinf, sqrt, sqrtf etc., namely > >> ConstantFolding.cpp > >> InlineCost.cpp > >> SelectionDAGBuilder.cpp > >> IntrinsicLowering.cpp > >> TargetLowering.cpp > >> > >> my question is: wouldn't it be beneficial to use intrinsics for > >> this? for example a c/c++ > >> frontend (clang) could translate the function calls to intrinsics > >> and then in a very late > >> step (IntrinsicLowering.cpp?) translate it back to function calls. > >> an opencl frontend then could use the intrinsics on vector types > >> and ConstantFolding.cpp > >> would work on sin/cos of vector types. currently the intrinsics for > >> sin/cos are missing in > >> ConstantFolding. > >> To summarize, using only intrinsics would reduce complexity and > >> increase flexibility as > >> vector types are supported. > > > > I also think that this is a good idea. > > intrinsics don't have the same semantics as the library functions. > For example they don't set errno and in general they are less > accurate. Thus you can't turn every use of eg sqrt into an > intrinsic. However you will still want to constant fold instances of > sqrt that weren't turned into an intrinsic, and thus all those names > will still need to exist in constant fold etc, so this change > wouldn't buy you much. In some cases, this will depend on how these things are lowered, if bounds can be put on the input ranges, etc. Otherwise, I think this is a "fast math" kind of optimization. Do you disagree? Would it be useful, for this purpose, to have an (inter-procedural) analysis pass, or some annotation-driven mechanism, or both, to mark errno as "dead" so we don't have to worry about this kind of thing if it is not necessary? -Hal > > Ciao, Duncan. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory From chenwj at iis.sinica.edu.tw Fri Mar 2 09:42:08 2012 From: chenwj at iis.sinica.edu.tw (=?utf-8?B?6Zmz6Z+L5Lu7?=) Date: Fri, 2 Mar 2012 23:42:08 +0800 Subject: [LLVMdev] How to use 'opt' command? In-Reply-To: <33429234.post@talk.nabble.com> References: <33429234.post@talk.nabble.com> Message-ID: <20120302154208.GA33084@cs.nctu.edu.tw> > I believe opt takes the bytecode file (.bc) not the human readable form, but > i am just a learner... `opt` can eat *.ll file. You can give it a try. :) Regards, chenwj -- Wei-Ren Chen (???) Computer Systems Lab, Institute of Information Science, Academia Sinica, Taiwan (R.O.C.) Tel:886-2-2788-3799 #1667 Homepage: http://people.cs.nctu.edu.tw/~chenwj From baldrick at free.fr Fri Mar 2 09:47:16 2012 From: baldrick at free.fr (Duncan Sands) Date: Fri, 02 Mar 2012 16:47:16 +0100 Subject: [LLVMdev] How to use 'opt' command? In-Reply-To: <33429234.post@talk.nabble.com> References: <33429234.post@talk.nabble.com> Message-ID: <4F50EB84.1070501@free.fr> > I believe opt takes the bytecode file (.bc) not the human readable form, but > i am just a learner... It takes both. Ciao, Duncan. From baldrick at free.fr Fri Mar 2 09:49:56 2012 From: baldrick at free.fr (Duncan Sands) Date: Fri, 02 Mar 2012 16:49:56 +0100 Subject: [LLVMdev] How to use 'opt' command? In-Reply-To: References: Message-ID: <4F50EC24.6060601@free.fr> Hi Sarath, > How to print the analysis results using 'opt' command? > > I tried using the below command for my *module.ll* file > > *opt -analyze -memdep module.ll * > * > * > But it's printing > > Printing analysis 'Memory Dependence Analysis' for function 'main': > Pass::print not implemented for pass: 'Memory Dependence Analysis'! I guess it's not implemented for this analysis! That makes sense because memdep is a lazy analysis, i.e. it only does something when a user of it asks for some specific information. As it doesn't do any analysis spontaneously, there is no analysis result for it to print in your context. Ciao, Duncan. From boran.car at gmail.com Fri Mar 2 10:02:56 2012 From: boran.car at gmail.com (Boran Car) Date: Fri, 02 Mar 2012 17:02:56 +0100 Subject: [LLVMdev] General modular and multiprecision arithmetic Message-ID: <4F50EF30.2060509@gmail.com> Hi, I know there's been some talk about bignums already, this is similar to it, but not exactly the same. I'm currently using LLVM for my master thesis. The goal is to make a compiler for zero-knowledge proofs of knowledge protocols. This compiler should target embedded devices. There's a language called the protocol implementation language in which these protocols should be implemented. Here's an excerpt from a simple sample of it: Common ( Z SZKParameter = 80; Prime(1024) p = 17; Prime(160) q = 1; Zmod*(p) y = 1, g=3 ) { } Prover(Zmod+(q) x) { Zmod+(q) _s_1=1, _r_1=4; Def (Void): Round0(Void) { } Def (Zmod*(p) _t_1): Round1(Void) { _r_1 := Random(Zmod+(q)); _t_1 := (g^_r_1); } Def (_s_1): Round2(_C=Int(80) _c) { _s_1 := (_r_1+(x*_c)); } } I have already written a parser and an LLVM front-end for it. The approach I've used so far was to have external functions modexp1024 (and cast all inputs to 1024 - the maximal bitsize allowed for my simple cases), modmul1024 and modadd1024, doing exponentiation, multiplication and addition, respectively. So far I've tried to avoid extending LLVM with group types by using group types only during code generation and backing them with LLVM's arbitrary ints. This required inferring the resulting type (and its modulus) and just plugging it in to modexp1024, modmul... Here's an excerpt of the generated LLVM IR from the frontend: ; ModuleID = 'Prover' @_s_1 = external global i1024 @_r_1 = external global i1024 declare i1024 @Random() declare i1024 @modadd1024(i1024, i1024, i1024) declare i1024 @modsub1024(i1024, i1024, i1024) declare i1024 @modmul1024(i1024, i1024, i1024) declare i1024 @modexp1024(i1024, i1024, i1024) declare i1 @Verify(i1) define void @Round0() { entry: ret void } define i1024 @Round1() { entry: %calltmp = call i1024 @Random() store i1024 %calltmp, i1024* @_r_1 %_r_1 = load i1024* @_r_1 %_t_1 = call i1024 @modexp1024(i1024 3, i1024 %_r_1, i1024 17) ret i1024 %_t_1 } I want this IR to be transformable to multiple architectures (x86, ARM, even 8051 so C++ target code generation is a no-no) with or without a support for a custom coprocessor for modular arithmetic. There are multiple problems I'm facing in the end: 1. C backend will not work with anything bigger than 128 bits 2. x86 backend just pushes everything on the stack (naturally) even if I use i1024* -> I want references 3. all the expressions in the parser are assumed to use mod... for doing the operations -> constant folding is made hard, uniformity is lost (you really can't pass non-modular things) -> can't have constant expressions What I need is: 1. Uniform treatment of whatever I plug in via LLVM IR's add (to reuse existing constant folding and just define my own for modular arithmetic) What I think I should do: 1. Add support from group types in LLVM (mostly done) 2. Combine it with LLVM's add, sub, maybe add exp 3. Then edit existing codegens and write maybe a custom one for 8051 What should I actually do? Thanks Boran Car From cameron.mcinally at nyu.edu Fri Mar 2 10:17:13 2012 From: cameron.mcinally at nyu.edu (Cameron McInally) Date: Fri, 2 Mar 2012 11:17:13 -0500 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect Message-ID: > > At least for 32bit x86 reserving another register as alternative frame > pointer is very heavy. The above would allow normal spill logic to > decide when to keep a reference in register and when not. It also reuses > existing functionality as much as possible. > Hi Joerg, Yes, this was a problem in my implementation also. Empirically, for the chips I work on, reserving the extra frame register was shown to be a win. But, of course, I am sure this win is not universal. I did receive permission to share my work with the community. Although, without discovering a creative solution to the extra frame register problem, I doubt my patch would be wanted. If anyone is motivated to work out this issue, I would be happy to help. My current thinking is that an emergency spill slot could be set aside to hold the original, ABI conforming, frame pointer. Not an ideal solution, but in my situation where I must cover any code a user throws at me, breaking the ABI and playing with the stack is preferred. Thanks, Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120302/27160242/attachment.html From grosbach at apple.com Fri Mar 2 10:16:57 2012 From: grosbach at apple.com (Jim Grosbach) Date: Fri, 02 Mar 2012 08:16:57 -0800 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: <4F4FEA3D.1000205@codeaurora.org> <20120302033755.GA23103@britannica.bec.de> Message-ID: On Mar 2, 2012, at 3:57 AM, Anton Korobeynikov wrote: > Joerg, > >> At least for 32bit x86 reserving another register as alternative frame >> pointer is very heavy. The above would allow normal spill logic to >> decide when to keep a reference in register and when not. It also reuses >> existing functionality as much as possible. > It does not seem to be enough. Even is there are *no* allocas in the > function the stack realignment might still be necessary, for example > due to spill of vector register. > So, we'll need to decide very late whether we'll need realignment or not. > Absolutely right. Which means it will need to be a heuristic, because regalloc needs to know early whether to reserve the register or not. There's already some of this sort of thing for whether the frame pointer is reserved or not. The ARM backend already does a fair bit of this (see hasBasePointer() and friends) and may be useful as an example of what's involved. -Jim > -- > With best regards, Anton Korobeynikov > Faculty of Mathematics and Mechanics, Saint Petersburg State University > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From grosbach at apple.com Fri Mar 2 10:17:57 2012 From: grosbach at apple.com (Jim Grosbach) Date: Fri, 02 Mar 2012 08:17:57 -0800 Subject: [LLVMdev] how to annotate assembler In-Reply-To: <20120302130637.GA23093@cs.nctu.edu.tw> References: <20120302130637.GA23093@cs.nctu.edu.tw> Message-ID: On Mar 2, 2012, at 5:06 AM, ??? wrote: >> >> $ clang --version >> clang version 1.1 (branches/release_27) > ^^^^^^^^^^ > > Looks suspicious. I am not sure MC related thing appears at that time. > Try 3.0 instead. Right. 2.7 is ancient and predates lots of the MC related stuff. -Jim > >> Target: x86_64-pc-linux-gnu >> Thread model: posix > > Regards, > chenwj > > -- > Wei-Ren Chen (???) > Computer Systems Lab, Institute of Information Science, > Academia Sinica, Taiwan (R.O.C.) > Tel:886-2-2788-3799 #1667 > Homepage: http://people.cs.nctu.edu.tw/~chenwj > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From emenezes at codeaurora.org Fri Mar 2 10:32:32 2012 From: emenezes at codeaurora.org (Evandro Menezes) Date: Fri, 02 Mar 2012 10:32:32 -0600 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: Message-ID: <4F50F620.30601@codeaurora.org> Cameron, Figure 3.3 on page 16 of www.x86-64.org/documentation/abi.pdf is not normative. See foot note 7 in the same page. Figure 3.4 on page 21 confirms that the use of a frame-pointer is optional. So, if one doesn't use ENTER in the prologue and uses RSP to access local variables, RBP may be used as a calee-saved GPR. -- Evandro Menezes Austin, TX emenezes at codeaurora.org Qualcomm Innovation Center, Inc is a member of the Code Aurora Forum On 03/02/12 10:17, Cameron McInally wrote: > At least for 32bit x86 reserving another register as alternative frame > pointer is very heavy. The above would allow normal spill logic to > decide when to keep a reference in register and when not. It also reuses > existing functionality as much as possible. > > > Hi Joerg, > > Yes, this was a problem in my implementation also. Empirically, for the > chips I work on, reserving the extra frame register was shown to be a > win. But, of course, I am sure this win is not universal. > > I did receive permission to share my work with the community. Although, > without discovering a creative solution to the extra frame register > problem, I doubt my patch would be wanted. If anyone is motivated to > work out this issue, I would be happy to help. > > My current thinking is that an emergency spill slot could be set aside > to hold the original, ABI conforming, frame pointer. Not an ideal > solution, but in my situation where I must cover any code a user throws > at me, breaking the ABI and playing with the stack is preferred. > > Thanks, > Cameron > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From cameron.mcinally at nyu.edu Fri Mar 2 10:58:29 2012 From: cameron.mcinally at nyu.edu (Cameron McInally) Date: Fri, 2 Mar 2012 11:58:29 -0500 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: <4F50F620.30601@codeaurora.org> References: <4F50F620.30601@codeaurora.org> Message-ID: On Fri, Mar 2, 2012 at 11:32 AM, Evandro Menezes wrote: ... > Figure 3.3 on page 16 of www.x86-64.org/documentation/abi.pdf is not > normative. See foot note 7 in the same page. Figure 3.4 on page 21 > confirms that the use of a frame-pointer is optional. > > So, if one doesn't use ENTER in the prologue and uses RSP to access local > variables, RBP may be used as a calee-saved GPR. I am not sure if I am completely following. The issue that required aligning the frame to 32 bytes is when there are variable sized objects on the stack (e.g. alloca). In that case, the RBP frame pointer is required to access the spill slots. If I'm not mistaken, calculating the address of spill slots off of RSP would be costly in this case. Are you suggesting that there is a way to base spill slots off of RSP when the stack size is unknown at compile time? This does bring up an interesting idea though. If we wanted to punt, it would be possible to check for variable sized objects on the stack and then only issue unaligned moves for 256b spills/reloads. Not ideal for performance, but it would work as a stopgap. -Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120302/b604f045/attachment.html From hfinkel at anl.gov Fri Mar 2 11:01:55 2012 From: hfinkel at anl.gov (Hal Finkel) Date: Fri, 2 Mar 2012 11:01:55 -0600 Subject: [LLVMdev] Adjusting Load Latencies Message-ID: <20120302110155.18e9001d@sapling2> Hello, I am interested in writing an analysis pass that looks at the stride used for loads in a loop and passes that information down so that it can be used by the instruction scheduler. The reason is that if the load stride is greater than the cache line size, then I would expect the load to always miss the cache, and, as a result, the scheduler should use a much larger effective latency when scheduling the load and its dependencies. Cache-miss metadata might also be a good supplemental option. I can add methods to TLI that can convert the access stride information into effective latency information, but what is the best way to annotate the loads so that the information will be available to the SDNodes? Has anyone tried something like this before? A related issue is automatically adding prefetching to loops. The trick here is to accurately estimate the number of cycles the loop body will take the execute (so that you prefetch the correct amount ahead). This information is not really available until instruction scheduling, and so prefetch adding cannot really complete until just before MC generation (the prefetch instructions can be scheduled, but their constant offset needs to be held free for a while). In addition, estimating the number of cycles also requires relatively accurate load/store latiencies, and this, in turn, requires cache-miss latencies to be accounted for (which must then account for the prefetches). If anyone has thoughts on these ideas, I would like to hear them. Thanks again, Hal -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory From geek4civic at gmail.com Fri Mar 2 11:07:44 2012 From: geek4civic at gmail.com (NAKAMURA Takumi) Date: Sat, 3 Mar 2012 02:07:44 +0900 Subject: [LLVMdev] Access Violation using ExecutionEngine on 64-bit Windows 8 Consumer Preview In-Reply-To: References: Message-ID: Viktor, could you try my patch? I guess they are __chkstk. http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120220/137577.html ...Takumi From grosbach at apple.com Fri Mar 2 11:11:06 2012 From: grosbach at apple.com (Jim Grosbach) Date: Fri, 02 Mar 2012 09:11:06 -0800 Subject: [LLVMdev] "-march" trashing ARM triple In-Reply-To: References: Message-ID: <6FE2499E-DA0F-4D06-AC2E-E880015ABC5A@apple.com> On Mar 2, 2012, at 12:04 AM, David Meyer wrote: > ARM subtarget features are determined by parsing the target tuple string TT. (ParseARMTriple(StringRef TT) in ARMMCTargetDesc.cpp) > > In llc, the -march setting overrides the architecture specified in -mtriple. So when you invoke: > > $ llc -march arm -mtriple armv7-none-linux ... > > ParseARMTriple() will see TT == "arm-none-linux" instead of "armv7-none-linux". As a result, the target features will be set generically. (Note that using "-march armv7" is not valid.) > > This is clearly wrong, but I'm not clear on where/how this should be fixed. Does the -march substitution need to happen at all? Could it be disabled only for ARM? Should TargetTriple or -march be made more precise? > When using a triple, -march doesn't add any additional information. The idea is that -march is a shorthand for a generic triple (e.g., -march=arm implies -mtriple=arm-unknown-unknown or something similar). It seems to me that using both on the llc command line should issue a diagnostic. -Jim > Thanks, > - pdox > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From anton at korobeynikov.info Fri Mar 2 11:14:05 2012 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Fri, 2 Mar 2012 21:14:05 +0400 Subject: [LLVMdev] Access Violation using ExecutionEngine on 64-bit Windows 8 Consumer Preview In-Reply-To: References: Message-ID: Takumi, > Viktor, could you try my patch? I guess they are __chkstk. 4 are definitely too much :) If they are emitted - this is a bug. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From pbarrio at die.upm.es Fri Mar 2 11:14:30 2012 From: pbarrio at die.upm.es (Pablo Barrio) Date: Fri, 02 Mar 2012 18:14:30 +0100 Subject: [LLVMdev] Interactions between module and loop passes In-Reply-To: <4F50DAD8.8060203@free.fr> References: <4F50D861.2080507@die.upm.es> <4F50DAD8.8060203@free.fr> Message-ID: <4F50FFF6.4060000@die.upm.es> Hi Duncan, > Hi Pablo, > >> I have a code with three passes (one loop pass and two module passes) >> and my own pass manager. If I schedule the loop pass between the others, >> my code segfaults. > when developing with LLVM you should configure with --enable-assertions. > That way you should get an assert failure with a helpful message rather > than a crash. Sorry, I forgot to add the assertion failure: PassManager.cpp:540: void llvm::PMTopLevelManager::setLastUser(const llvm::SmallVectorImpl&, llvm::Pass*): Assertion `AnalysisPass && "Expected analysis pass to exist."' failed. > class ModPass1 : public ModulePass{ > > virtual void getAnalysisUsage(AnalysisUsage&AU) const{ > AU.setPreservesAll(); > } > }; > > class LoopPass : public LoopPass{ > > virtual void getAnalysisUsage(AnalysisUsage&AU) const{ > AU.setRequires(); > I'm pretty sure a LoopPass cannot require a ModulePass. Is it possible to overcome this limitation? I need to access and modify the loops in a function. Is it possible to do that from the function itself, or is the loop pass the only way to get Loop objects? If I can do it from a Module (or another) pass, I don't mind. Loop passes just sound to me like the most straightforward way. Thanks for your time, -- Pablo Barrio Dpt. Electrical Engineering - Technical University of Madrid Office C-203 Avda. Complutense s/n, 28040 Madrid Tel. (+34) 915495700 ext. 4234 @: pbarrio at die.upm.es From joerg at britannica.bec.de Fri Mar 2 11:16:10 2012 From: joerg at britannica.bec.de (Joerg Sonnenberger) Date: Fri, 2 Mar 2012 18:16:10 +0100 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: <4F50F620.30601@codeaurora.org> Message-ID: <20120302171610.GA1696@britannica.bec.de> On Fri, Mar 02, 2012 at 11:58:29AM -0500, Cameron McInally wrote: > On Fri, Mar 2, 2012 at 11:32 AM, Evandro Menezes > wrote: > ... > > Figure 3.3 on page 16 of www.x86-64.org/documentation/abi.pdf is not > > normative. See foot note 7 in the same page. Figure 3.4 on page 21 > > confirms that the use of a frame-pointer is optional. > > > > So, if one doesn't use ENTER in the prologue and uses RSP to access local > > variables, RBP may be used as a calee-saved GPR. > > I am not sure if I am completely following. The issue that required > aligning the frame to 32 bytes is when there are variable sized objects on > the stack (e.g. alloca). In that case, the RBP frame pointer is required to > access the spill slots. If I'm not mistaken, calculating the address of > spill slots off of RSP would be costly in this case. No, stack realignment needs to happen if there are auto variables on the stack of types that need a larger alignment than the default. This currently means AVX vectors for x86-64 and SSE/AVX vectors for x86-32 folloing the original sysv ABI. In that case %rbp/%ebp is used to reference the original arguments on the stack and %rsp/%esp is used to reference the auto variables. This doesn't work though if dynamic allocas exist, so either stack variables with larger alignment need to be turned into / remain as dynamic allocas OR another register is needed to replace %rsp/%esp in the above. > This does bring up an interesting idea though. If we wanted to punt, it > would be possible to check for variable sized objects on the stack and then > only issue unaligned moves for 256b spills/reloads. Not ideal for > performance, but it would work as a stopgap. The problem is worse on x86-32 following the original sysv ABI. In that case both GCC and LLVM currently just create broken code if a function uses both SSE instructions and alloca. Joerg From grosbach at apple.com Fri Mar 2 11:19:55 2012 From: grosbach at apple.com (Jim Grosbach) Date: Fri, 02 Mar 2012 09:19:55 -0800 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: <20120302171610.GA1696@britannica.bec.de> References: <4F50F620.30601@codeaurora.org> <20120302171610.GA1696@britannica.bec.de> Message-ID: <8BB04D72-98A5-4A0B-9440-2010B79ECCD3@apple.com> On Mar 2, 2012, at 9:16 AM, Joerg Sonnenberger wrote: > On Fri, Mar 02, 2012 at 11:58:29AM -0500, Cameron McInally wrote: >> On Fri, Mar 2, 2012 at 11:32 AM, Evandro Menezes >> wrote: >> ... >>> Figure 3.3 on page 16 of www.x86-64.org/documentation/abi.pdf is not >>> normative. See foot note 7 in the same page. Figure 3.4 on page 21 >>> confirms that the use of a frame-pointer is optional. >>> >>> So, if one doesn't use ENTER in the prologue and uses RSP to access local >>> variables, RBP may be used as a calee-saved GPR. >> >> I am not sure if I am completely following. The issue that required >> aligning the frame to 32 bytes is when there are variable sized objects on >> the stack (e.g. alloca). In that case, the RBP frame pointer is required to >> access the spill slots. If I'm not mistaken, calculating the address of >> spill slots off of RSP would be costly in this case. > > No, stack realignment needs to happen if there are auto variables on the > stack of types that need a larger alignment than the default. This > currently means AVX vectors for x86-64 and SSE/AVX vectors for x86-32 > folloing the original sysv ABI. In that case %rbp/%ebp is used to > reference the original arguments on the stack and %rsp/%esp is used to > reference the auto variables. > > This doesn't work though if dynamic allocas exist, so either stack > variables with larger alignment need to be turned into / remain as > dynamic allocas OR another register is needed to replace %rsp/%esp > in the above. > Exactly right. >> This does bring up an interesting idea though. If we wanted to punt, it >> would be possible to check for variable sized objects on the stack and then >> only issue unaligned moves for 256b spills/reloads. Not ideal for >> performance, but it would work as a stopgap. > > The problem is worse on x86-32 following the original sysv ABI. In that > case both GCC and LLVM currently just create broken code if a function > uses both SSE instructions and alloca. > > Joerg > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From neonomaly.x at gmail.com Fri Mar 2 11:30:42 2012 From: neonomaly.x at gmail.com (=?koi8-r?B?7cnIwcnM?=) Date: Fri, 2 Mar 2012 21:30:42 +0400 Subject: [LLVMdev] IR + Module Pass Message-ID: Hi. Can I know that in Module *M Function* F is a method of StructType *st? Also can I know that in this StructType the F has mark "virtual"? Thanks! Yours sincerely, Kadysev Mikhail -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120302/5c77c3ee/attachment.html From babslachem at gmail.com Fri Mar 2 11:30:46 2012 From: babslachem at gmail.com (Seb) Date: Fri, 2 Mar 2012 18:30:46 +0100 Subject: [LLVMdev] Question on debug information Message-ID: Hi all, I'm using my own front-end to generate following code .ll file targeting x86 32-bit: ; ModuleID = 'check.c' target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32" target triple = "i386-pc-linux-gnu" @.str581 = internal constant [52 x i8] c"---- test number %d failed. result %d expected %d\0a\00" @.str584 = internal constant [61 x i8] c"---- %3d tests completed. %d tests PASSED. %d tests failed.\0a\00" @.str587 = internal constant [61 x i8] c"---- %3d tests completed. %d tests passed. %d tests FAILED.\0a\00" define void @check(i32* %result, i32* %expect, i32 %n) { L.entry: %tests_passed = alloca i32 %tests_failed = alloca i32 %i = alloca i32 call void @llvm.dbg.value (metadata !{i32* %result}, i64 0, metadata !9), !dbg !4 call void @llvm.dbg.value (metadata !{i32* %expect}, i64 0, metadata !10), !dbg !4 call void @llvm.dbg.value (metadata !{i32 %n}, i64 0, metadata !11), !dbg !4 call void @llvm.dbg.declare (metadata !{i32* %tests_passed}, metadata !13), !dbg !4 store i32 0, i32* %tests_passed, !dbg !12 call void @llvm.dbg.declare (metadata !{i32* %tests_failed}, metadata !15), !dbg !4 store i32 0, i32* %tests_failed, !dbg !14 call void @llvm.dbg.declare (metadata !{i32* %i}, metadata !17), !dbg !4 store i32 0, i32* %i, !dbg !16 br label %L.B0000 L.B0000: %0 = load i32* %i, !dbg !16 %1 = icmp sge i32 %0, %n, !dbg !16 br i1 %1, label %L.B0001, label %L.B0008, !dbg !16 L.B0008: %2 = bitcast i32* %expect to i8*, !dbg !18 %3 = load i32* %i, !dbg !18 %4 = mul i32 %3, 4, !dbg !18 %5 = getelementptr i8* %2, i32 %4, !dbg !18 %6 = bitcast i8* %5 to i32*, !dbg !18 %7 = load i32* %6, !dbg !18 %8 = bitcast i32* %result to i8*, !dbg !18 %9 = load i32* %i, !dbg !18 %10 = mul i32 %9, 4, !dbg !18 %11 = getelementptr i8* %8, i32 %10, !dbg !18 %12 = bitcast i8* %11 to i32*, !dbg !18 %13 = load i32* %12, !dbg !18 %14 = icmp ne i32 %7, %13, !dbg !18 br i1 %14, label %L.B0003, label %L.B0009, !dbg !18 L.B0009: %15 = load i32* %tests_passed, !dbg !18 %16 = add i32 %15, 1, !dbg !18 store i32 %16, i32* %tests_passed, !dbg !18 br label %L.B0004, !dbg !19 L.B0003: %17 = load i32* %tests_failed, !dbg !20 %18 = add i32 %17, 1, !dbg !20 store i32 %18, i32* %tests_failed, !dbg !20 %19 = bitcast [52 x i8]* @.str581 to i8*, !dbg !21 %20 = load i32* %i, !dbg !21 %21 = bitcast i32* %result to i8*, !dbg !21 %22 = load i32* %i, !dbg !21 %23 = mul i32 %22, 4, !dbg !21 %24 = getelementptr i8* %21, i32 %23, !dbg !21 %25 = bitcast i8* %24 to i32*, !dbg !21 %26 = load i32* %25, !dbg !21 %27 = bitcast i32* %expect to i8*, !dbg !21 %28 = load i32* %i, !dbg !21 %29 = mul i32 %28, 4, !dbg !21 %30 = getelementptr i8* %27, i32 %29, !dbg !21 %31 = bitcast i8* %30 to i32*, !dbg !21 %32 = load i32* %31, !dbg !21 %33 = call i32 (i8*, ...)* @printf (i8* %19, i32 %20, i32 %26, i32 %32), !dbg !21 br label %L.B0004 L.B0004: %34 = load i32* %i, !dbg !22 %35 = add i32 %34, 1, !dbg !22 store i32 %35, i32* %i, !dbg !22 br label %L.B0000, !dbg !22 L.B0001: %36 = load i32* %tests_failed, !dbg !23 %37 = icmp ne i32 %36, 0, !dbg !23 br i1 %37, label %L.B0006, label %L.B0010, !dbg !23 L.B0010: %38 = bitcast [61 x i8]* @.str584 to i8*, !dbg !24 %39 = load i32* %tests_passed, !dbg !24 %40 = load i32* %tests_failed, !dbg !24 %41 = call i32 (i8*, ...)* @printf (i8* %38, i32 %n, i32 %39, i32 %40), !dbg !24 br label %L.B0007, !dbg !25 L.B0006: %42 = bitcast [61 x i8]* @.str587 to i8*, !dbg !26 %43 = load i32* %tests_passed, !dbg !26 %44 = load i32* %tests_failed, !dbg !26 %45 = call i32 (i8*, ...)* @printf (i8* %42, i32 %n, i32 %43, i32 %44), !dbg !26 br label %L.B0007 L.B0007: ret void, !dbg !27 } declare void @llvm.dbg.value(metadata, i64, metadata) declare void @llvm.dbg.declare(metadata, metadata) declare i32 @printf(i8*,...) !llvm.dbg.sp = !{!3} !llvm.dbg.lv.check = !{!9, !10, !11} !0 = metadata !{i32 589841, i32 0, i32 2, metadata !"check.c", metadata !".", metadata !" Seb Rel Dev-r02.27", i1 1, i1 0, metadata !"", i32 0} ; DW_TAG_compile_unit !1 = metadata !{i32 589865, metadata !"check.c", metadata !".", metadata !0} ; DW_TAG_file_type !2 = metadata !{i32 589845, metadata !1, metadata !"", metadata !1, i32 0, i64 0, i64 0, i32 0, i32 0, i32 0, null, i32 0, i32 0} ; DW_TAG_subroutine_type !3 = metadata !{i32 589870, i32 0, metadata !1, metadata !"check", metadata !"check", metadata !"", metadata !1, i32 7, metadata !2, i1 0, i1 1, i32 0, i32 0, i32 0, i32 0, i1 0, void (i32*, i32*, i32)* @check} ; DW_TAG_subprogram !4 = metadata !{i32 0, i32 0, metadata !3, null} !5 = metadata !{i32 589835, metadata !3, i32 7, i32 0, metadata !1, i32 0} ; DW_TAG_lexical_block !6 = metadata !{i32 0, i32 0, metadata !5, null} !7 = metadata !{i32 589860, metadata !0, metadata !"int", null, i32 0, i64 32, i64 32, i64 0, i32 0, i32 5} ; DW_TAG_base_type !8 = metadata !{i32 589839, metadata !0, metadata !"", null, i32 0, i64 32, i64 32, i64 0, i32 0, metadata !7} ; DW_TAG_pointer_type !9 = metadata !{i32 590081, metadata !3, metadata !"result", metadata !1, i32 16777216, metadata !8, i32 0} ; DW_TAG_arg_variable !10 = metadata !{i32 590081, metadata !3, metadata !"expect", metadata !1, i32 33554432, metadata !8, i32 0} ; DW_TAG_arg_variable !11 = metadata !{i32 590081, metadata !3, metadata !"n", metadata !1, i32 50331648, metadata !7, i32 0} ; DW_TAG_arg_variable !12 = metadata !{i32 9, i32 0, metadata !5, null} !13 = metadata !{i32 590080, metadata !5, metadata !"tests_passed", metadata !1, i32 0, metadata !7, i32 0} ; DW_TAG_auto_variable !14 = metadata !{i32 10, i32 0, metadata !5, null} !15 = metadata !{i32 590080, metadata !5, metadata !"tests_failed", metadata !1, i32 0, metadata !7, i32 0} ; DW_TAG_auto_variable !16 = metadata !{i32 12, i32 0, metadata !5, null} !17 = metadata !{i32 590080, metadata !5, metadata !"i", metadata !1, i32 0, metadata !7, i32 0} ; DW_TAG_auto_variable !18 = metadata !{i32 13, i32 0, metadata !5, null} !19 = metadata !{i32 14, i32 0, metadata !5, null} !20 = metadata !{i32 15, i32 0, metadata !5, null} !21 = metadata !{i32 17, i32 0, metadata !5, null} !22 = metadata !{i32 19, i32 0, metadata !5, null} !23 = metadata !{i32 20, i32 0, metadata !5, null} !24 = metadata !{i32 22, i32 0, metadata !5, null} !25 = metadata !{i32 23, i32 0, metadata !5, null} !26 = metadata !{i32 25, i32 0, metadata !5, null} !27 = metadata !{i32 26, i32 0, metadata !5, null} When I use llc 2.9 as follows: llc check.ll -march=x86 -o check.s and gcc -m32 -c check.s I've got a check.o file generated that targets x86 32-bit. Reading dwarf symbol using readelf --debug-dump check.o I've got for 'n' parameter: <2><71>: Abbrev Number: 3 (DW_TAG_formal_parameter) <72> DW_AT_name : n <74> DW_AT_type : <0xb3> <78> DW_AT_location : 0x0 (location list) I would have expected a DW_AT_location that is FP related and not 0x0. Is my LL file incorrect ? Is there something I can use in metadata to enforce a FP relative DW_AT_location to be generated ? Thanks for your answers Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120302/d6066a59/attachment-0001.html From cameron.mcinally at nyu.edu Fri Mar 2 12:24:59 2012 From: cameron.mcinally at nyu.edu (Cameron McInally) Date: Fri, 2 Mar 2012 13:24:59 -0500 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect Message-ID: Two responses inline... My current thinking is that an emergency spill slot could be set aside to > hold the original, ABI conforming, frame pointer. Not an ideal solution, > but in my situation where I must cover any code a user throws at me, > breaking the ABI and playing with the stack is preferred. > > Ah, this is not a good idea. I examined this a while back. The issue is that spilling the base pointer causes two levels of indirection to access arguments. > > > Figure 3.3 on page 16 of www.x86-64.org/documentation/abi.pdf is not > > > normative. See foot note 7 in the same page. Figure 3.4 on page 21 > > > confirms that the use of a frame-pointer is optional. > > > > > > So, if one doesn't use ENTER in the prologue and uses RSP to access > local > > > variables, RBP may be used as a calee-saved GPR. > > > > I am not sure if I am completely following. The issue that required > > aligning the frame to 32 bytes is when there are variable sized objects > on > > the stack (e.g. alloca). In that case, the RBP frame pointer is required > to > > access the spill slots. If I'm not mistaken, calculating the address of > > spill slots off of RSP would be costly in this case. > > No, stack realignment needs to happen if there are auto variables on the > stack of types that need a larger alignment than the default. This > currently means AVX vectors for x86-64 and SSE/AVX vectors for x86-32 > folloing the original sysv ABI. In that case %rbp/%ebp is used to > reference the original arguments on the stack and %rsp/%esp is used to > reference the auto variables. > That sounds about right; my mistake. When I realign the frame in the presence of variable sized objects and AVX spills, I have three pointers sitting around: the real, unaligned frame pointer (let's say RBX and used as the 'base pointer'); the aligned frame pointer (RBP); and the stack pointer (RSP). The arguments are based off of the unaligned frame pointer. Besides the change to make RBX the base pointer in the Emit[Prologue|Epilogue] routines, everything else stayed the same. -Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120302/026967af/attachment.html From baldrick at free.fr Fri Mar 2 12:59:24 2012 From: baldrick at free.fr (Duncan Sands) Date: Fri, 02 Mar 2012 19:59:24 +0100 Subject: [LLVMdev] IR + Module Pass In-Reply-To: References: Message-ID: <4F51188C.3080603@free.fr> Hi ??????, > Can I know that in Module *M Function* F is a method of StructType *st? Also can > I know that in this StructType the F has mark "virtual"? no, LLVM IR is too low level for this. However you may be able to work it out from debug information. Perhaps you should be working at the clang AST level? What do you want to do with this information? Ciao, Duncan. From jochen.wilhelmy at googlemail.com Fri Mar 2 13:35:00 2012 From: jochen.wilhelmy at googlemail.com (Jochen Wilhelmy) Date: Fri, 02 Mar 2012 20:35:00 +0100 Subject: [LLVMdev] replace hardcoded function names by intrinsics In-Reply-To: <20120302084148.098e6641@sapling2> References: <4F50C336.9020200@googlemail.com> <20120302084148.098e6641@sapling2> Message-ID: <4F5120E4.5010702@googlemail.com> >> >> To summarize, using only intrinsics would reduce complexity and >> increase flexibility as >> vector types are supported. > I also think that this is a good idea. the first step could be doing it for sin, cos and sqrt for which intrinsics already exist. -Jochen From afylot at gmail.com Fri Mar 2 11:13:39 2012 From: afylot at gmail.com (simona bellavista) Date: Fri, 2 Mar 2012 18:13:39 +0100 Subject: [LLVMdev] make check-all : errors in clang and llvm Message-ID: I downloaded via svn the release_30 and current version code. I am on x86_64 GNU/Linux, I am compiling with gcc 4.4.6 I compiled release_30 with make ENABLE_OPTIMIZED=0 OPTIMIZE_OPTION=-O0 and current release with make In both cases, when I make check-all I get : FAIL: Clang :: Preprocessor/macro_paste_c_block_comment.c (2562 of 9598) ******************** TEST 'Clang :: Preprocessor/macro_paste_c_block_comment.c' FAILED ******************** Script: -- /scratch/user/download/release_30/build/Debug/bin/clang -cc1 -internal-isystem /scratch/user/download/release_30/build/Debug/bin/../lib/clang/3.0/include /scratch/user/download/release_30/llvm/tools/clang/test/Preprocessor/macro_paste_c_block_comment.c -Eonly 2>&1 | grep error /scratch/user/download/release_30/build/Debug/bin/clang -cc1 -internal-isystem /scratch/user/download/release_30/build/Debug/bin/../lib/clang/3.0/include /scratch/user/download/release_30/llvm/tools/clang/test/Preprocessor/macro_paste_c_block_comment.c -Eonly 2>&1 | not grep unterminated /scratch/user/download/release_30/build/Debug/bin/clang -cc1 -internal-isystem /scratch/user/download/release_30/build/Debug/bin/../lib/clang/3.0/include /scratch/user/download/release_30/llvm/tools/clang/test/Preprocessor/macro_paste_c_block_comment.c -Eonly 2>&1 | not grep scratch -- Exit Code: 1 Command Output (stdout): -- /scratch/user/download/release_30/llvm/tools/clang/test/Preprocessor/macro_paste_c_block_comment.c:6:1: error: pasting formed '/*', an invalid preprocessing token 1 error generated. /scratch/user/download/release_30/llvm/tools/clang/test/Preprocessor/macro_paste_c_block_comment.c:6:1: error: pasting formed '/*', an invalid preprocessing token /scratch/user/download/release_30/llvm/tools/clang/test/Preprocessor/macro_paste_c_block_comment.c:5:16: note: expanded from: -- ******************** FAIL: LLVM :: Transforms/GVN/null-aliases-nothing.ll (8045 of 9598) ******************** TEST 'LLVM :: Transforms/GVN/null-aliases-nothing.ll' FAILED ******************** Script: -- /scratch/user/download/release_30/build/Debug/bin/opt /scratch/user/download/release_30/llvm/test/Transforms/GVN/null-aliases-nothing.ll -basicaa -gvn -S | /scratch/user/download/release_30/build/Debug/bin/FileCheck /scratch/user/download/release_30/llvm/test/Transforms/GVN/null-aliases-nothing.ll -- Exit Code: 1 Command Output (stderr): -- :9:12: error: CHECK-NOT: string occurred! %before = load i32* %p ^ /scratch/user/download/release_30/llvm/test/Transforms/GVN/null-aliases-nothing.ll:18:18: note: CHECK-NOT: pattern specified here ; CHECK-NOT: load ^ -- [reply ] [-] Comment 1 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120302/1dc49a80/attachment.html From baldrick at free.fr Fri Mar 2 14:29:05 2012 From: baldrick at free.fr (Duncan Sands) Date: Fri, 02 Mar 2012 21:29:05 +0100 Subject: [LLVMdev] how to remove inlined function In-Reply-To: <4F4DF889.4060901@googlemail.com> References: <4F4DF889.4060901@googlemail.com> Message-ID: <4F512D91.6010808@free.fr> Hi Jochen, > I'm using clang/llvm 3.0 release and I have a module that is generated > by clang > with some functions declared as inline. after inlining > (llvm::createFunctionInliningPass) > I'd like to remove the functions that were inlined. how can this be done? > surprisingly they are removed if a print pass > (llvm::createPrintModulePass) is > present. is there an explanation for this? why do you think they are not removed? Ciao, Duncan. From lostfreeman at gmail.com Fri Mar 2 14:52:48 2012 From: lostfreeman at gmail.com (lost) Date: Sat, 3 Mar 2012 00:52:48 +0400 Subject: [LLVMdev] Access Violation using ExecutionEngine on 64-bit Windows 8 Consumer Preview In-Reply-To: References: Message-ID: Hi, Takumi! I tried your patch, and it did not help. Moreover, I tried to compile under Windows 7 and copy files to Windows 8, and received the same exception. So the problem seems to be in Windows 8 itself or some non-portable code inside LLVM. Could anyone tell me what LLVM code in ExecutionEngine is responsible for allocating and protecting memory for generated native functions? Best regards, Victor Milovanov. 2012/3/2 NAKAMURA Takumi : > Viktor, could you try my patch? I guess they are __chkstk. > > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120220/137577.html > > ...Takumi From nadav.rotem at intel.com Fri Mar 2 15:04:15 2012 From: nadav.rotem at intel.com (Rotem, Nadav) Date: Fri, 2 Mar 2012 21:04:15 +0000 Subject: [LLVMdev] Access Violation using ExecutionEngine on 64-bit Windows 8 Consumer Preview In-Reply-To: References: Message-ID: <7DE70FDACDE4CD4887C4278C12A2E3050A1EF2@HASMSX104.ger.corp.intel.com> Hi Victor, Try this fix by Marina Yatsina: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120220/137532.html Nadav -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of lost Sent: Friday, March 02, 2012 22:53 To: NAKAMURA Takumi; LLVM Subject: Re: [LLVMdev] Access Violation using ExecutionEngine on 64-bit Windows 8 Consumer Preview Hi, Takumi! I tried your patch, and it did not help. Moreover, I tried to compile under Windows 7 and copy files to Windows 8, and received the same exception. So the problem seems to be in Windows 8 itself or some non-portable code inside LLVM. Could anyone tell me what LLVM code in ExecutionEngine is responsible for allocating and protecting memory for generated native functions? Best regards, Victor Milovanov. 2012/3/2 NAKAMURA Takumi : > Viktor, could you try my patch? I guess they are __chkstk. > > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120220/1 > 37577.html > > ...Takumi _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From emenezes at codeaurora.org Fri Mar 2 15:38:03 2012 From: emenezes at codeaurora.org (Evandro Menezes) Date: Fri, 02 Mar 2012 15:38:03 -0600 Subject: [LLVMdev] Stack alignment on X86 AVX seems incorrect In-Reply-To: References: <4F50F620.30601@codeaurora.org> Message-ID: <4F513DBB.6030304@codeaurora.org> Cameron, I was the one not completely following you. I missed the detail about variable-sized variables on the stack. -- Evandro Menezes Austin, TX emenezes at codeaurora.org Qualcomm Innovation Center, Inc is a member of the Code Aurora Forum On 03/02/12 10:58, Cameron McInally wrote: > On Fri, Mar 2, 2012 at 11:32 AM, Evandro Menezes > > wrote: > ... > > Figure 3.3 on page 16 of www.x86-64.org/documentation/abi.pdf > is not > > normative. See foot note 7 in the same page. Figure 3.4 on page 21 > > confirms that the use of a frame-pointer is optional. > > > > So, if one doesn't use ENTER in the prologue and uses RSP to access local > > variables, RBP may be used as a calee-saved GPR. > > I am not sure if I am completely following. The issue that required > aligning the frame to 32 bytes is when there are variable sized objects > on the stack (e.g. alloca). In that case, the RBP frame pointer is > required to access the spill slots. If I'm not mistaken, calculating the > address of spill slots off of RSP would be costly in this case. > > Are you suggesting that there is a way to base spill slots off of RSP > when the stack size is unknown at compile time? > > This does bring up an interesting idea though. If we wanted to punt, it > would be possible to check for variable sized objects on the stack and > then only issue unaligned moves for 256b spills/reloads. Not ideal for > performance, but it would work as a stopgap. > > -Cameron From atrick at apple.com Fri Mar 2 15:49:48 2012 From: atrick at apple.com (Andrew Trick) Date: Fri, 02 Mar 2012 13:49:48 -0800 Subject: [LLVMdev] Adjusting Load Latencies In-Reply-To: <20120302110155.18e9001d@sapling2> References: <20120302110155.18e9001d@sapling2> Message-ID: <92A5B4F9-A9EC-4CA1-BC9E-23094A6B38DD@apple.com> On Mar 2, 2012, at 9:01 AM, Hal Finkel wrote: > Hello, > > I am interested in writing an analysis pass that looks at the stride > used for loads in a loop and passes that information down so that it > can be used by the instruction scheduler. The reason is that if the > load stride is greater than the cache line size, then I would expect > the load to always miss the cache, and, as a result, the scheduler > should use a much larger effective latency when scheduling the load and > its dependencies. Cache-miss metadata might also be a good supplemental > option. I can add methods to TLI that can convert the access stride > information into effective latency information, but what is the best > way to annotate the loads so that the information will be available to > the SDNodes? > > Has anyone tried something like this before? > > A related issue is automatically adding prefetching to loops. The > trick here is to accurately estimate the number of cycles the loop > body will take the execute (so that you prefetch the correct amount > ahead). This information is not really available until instruction > scheduling, and so prefetch adding cannot really complete until just > before MC generation (the prefetch instructions can be scheduled, but > their constant offset needs to be held free for a while). In addition, > estimating the number of cycles also requires relatively accurate > load/store latiencies, and this, in turn, requires cache-miss latencies > to be accounted for (which must then account for the prefetches). > > If anyone has thoughts on these ideas, I would like to hear them. If you annotate loads with their expected latency, the upcoming MachineScheduler will be able to use the information. In the short term (next couple months), you're free to hack the SDScheduler as well. Although the scheduler can use the information, I don't think it can do much good with it scheduling for mainstream targets. It would be more interesting scheduling for an in-order machine without a hardware prefetch unit. An acyclic instruction scheduler can schedule for L1 and L2 latency at most. But the out-of-order engine should be able to compensate for these latencies. L2 misses within a high trip count loop will benefit greatly from stride prefetching. But regular strides should already be handled in hardware. So my suggestions are: 1) If you have an in-order machine, your workload actually fits in L2, and you care deeply about every stall cycle, it may be useful for the scheduler distinguish between expected L1 vs L2 latency. Try to issue multiple L1 misses in parallel or cover their latency. You can consider offsets from aligned objects in addition to induction variable strides. 2) If you have a machine without hardware prefetching, you really need to insert prefetches. This is a much bigger bang for the buck than scheduling for L1/L2 latency. To cover the latency of an L2 miss, which is what really matters, you need to prefetch many iterations ahead. Rather that trying to predict the number of cycles each iteration takes, you're better off prefetching as many iterations ahead as possible up to the hardware's limit on outstanding loads. If the loop has a constant trip count, you can probably do something clever. Otherwise I think branch profiling could help by telling you which loops have a very high trip count. I think prefetch insertion is more closely tied to loop unrolling than instruction scheduling. -Andy From hfinkel at anl.gov Fri Mar 2 16:59:13 2012 From: hfinkel at anl.gov (Hal Finkel) Date: Fri, 2 Mar 2012 16:59:13 -0600 Subject: [LLVMdev] Adjusting Load Latencies In-Reply-To: <92A5B4F9-A9EC-4CA1-BC9E-23094A6B38DD@apple.com> References: <20120302110155.18e9001d@sapling2> <92A5B4F9-A9EC-4CA1-BC9E-23094A6B38DD@apple.com> Message-ID: <20120302165913.6cf4901c@sapling2> On Fri, 02 Mar 2012 13:49:48 -0800 Andrew Trick wrote: > On Mar 2, 2012, at 9:01 AM, Hal Finkel wrote: > > > Hello, > > > > I am interested in writing an analysis pass that looks at the stride > > used for loads in a loop and passes that information down so that it > > can be used by the instruction scheduler. The reason is that if the > > load stride is greater than the cache line size, then I would expect > > the load to always miss the cache, and, as a result, the scheduler > > should use a much larger effective latency when scheduling the load > > and its dependencies. Cache-miss metadata might also be a good > > supplemental option. I can add methods to TLI that can convert the > > access stride information into effective latency information, but > > what is the best way to annotate the loads so that the information > > will be available to the SDNodes? > > > > Has anyone tried something like this before? > > > > A related issue is automatically adding prefetching to loops. The > > trick here is to accurately estimate the number of cycles the loop > > body will take the execute (so that you prefetch the correct amount > > ahead). This information is not really available until instruction > > scheduling, and so prefetch adding cannot really complete until just > > before MC generation (the prefetch instructions can be scheduled, > > but their constant offset needs to be held free for a while). In > > addition, estimating the number of cycles also requires relatively > > accurate load/store latiencies, and this, in turn, requires > > cache-miss latencies to be accounted for (which must then account > > for the prefetches). > > > > If anyone has thoughts on these ideas, I would like to hear them. > > Andy, Thank you for writing such a detailed response. > If you annotate loads with their expected latency, the upcoming > MachineScheduler will be able to use the information. In the short > term (next couple months), you're free to hack the SDScheduler as > well. Alright, sounds good. If I add metadata to the load, can I get to it thought the Value * in the associated MachineMemOperand object? > > Although the scheduler can use the information, I don't think it can > do much good with it scheduling for mainstream targets. It would be > more interesting scheduling for an in-order machine without a > hardware prefetch unit. > > An acyclic instruction scheduler can schedule for L1 and L2 latency > at most. But the out-of-order engine should be able to compensate for > these latencies. > > L2 misses within a high trip count loop will benefit greatly from > stride prefetching. But regular strides should already be handled in > hardware. I agree. For the machine with which I'm working, the hardware prefetch unit only works if you access N consecutive cache lines. Any pattern that does not do that will need explicit prefetch instructions. Also, I need to be careful not to prefetch too much because the request buffer is fairly small (it handles < 10 outstanding requests). > > So my suggestions are: > > 1) If you have an in-order machine, your workload actually fits in > L2, and you care deeply about every stall cycle, it may be useful for > the scheduler distinguish between expected L1 vs L2 latency. Try to > issue multiple L1 misses in parallel or cover their latency. You can > consider offsets from aligned objects in addition to induction > variable strides. Issuing multiple L1 misses in parallel is exactly what I would like to do. Offsets from aligned objects is also a good idea. > > 2) If you have a machine without hardware prefetching, you really > need to insert prefetches. This is a much bigger bang for the buck > than scheduling for L1/L2 latency. To cover the latency of an L2 > miss, which is what really matters, you need to prefetch many > iterations ahead. Rather that trying to predict the number of cycles > each iteration takes, you're better off prefetching as many > iterations ahead as possible up to the hardware's limit on > outstanding loads. If the loop has a constant trip count, you can > probably do something clever. Otherwise I think branch profiling > could help by telling you which loops have a very high trip count. I > think prefetch insertion is more closely tied to loop unrolling than > instruction scheduling. I'm afraid that "as many iterations ahead as possible" may turn out to be too few if I start at the very next iteration because the request buffer is small. Nevertheless, it is certainly worth a try. Thanks again, Hal > > -Andy -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory From atrick at apple.com Fri Mar 2 17:27:10 2012 From: atrick at apple.com (Andrew Trick) Date: Fri, 02 Mar 2012 15:27:10 -0800 Subject: [LLVMdev] Adjusting Load Latencies In-Reply-To: <20120302165913.6cf4901c@sapling2> References: <20120302110155.18e9001d@sapling2> <92A5B4F9-A9EC-4CA1-BC9E-23094A6B38DD@apple.com> <20120302165913.6cf4901c@sapling2> Message-ID: <97400B06-E5E8-43FC-A7CA-44AEC2FD501F@apple.com> On Mar 2, 2012, at 2:59 PM, Hal Finkel wrote: > >> If you annotate loads with their expected latency, the upcoming >> MachineScheduler will be able to use the information. In the short >> term (next couple months), you're free to hack the SDScheduler as >> well. > > Alright, sounds good. If I add metadata to the load, can I get to it > thought the Value * in the associated MachineMemOperand object? AFAIK. I certainly don't have a problem with that approach. I've heard there's a preference for lowering information into self-contained machine code. But referring back to IR makes a lot of sense to me personally. If we ever want to serialize MIs I think we should serialize the IR with it. > I'm afraid that "as many iterations ahead as possible" may turn out to > be too few if I start at the very next iteration because the request > buffer is small. Nevertheless, it is certainly worth a try. It sounds like it's really important for you to avoid useless prefetches. This can be tricky. Other than that I don't see a way around your problem. Does it help to give your prefetches a head start? If your loop eats cache lines faster than you can feed it, eventually it will catch up. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120302/edd5e10b/attachment.html From lostfreeman at gmail.com Fri Mar 2 17:31:44 2012 From: lostfreeman at gmail.com (lost) Date: Sat, 3 Mar 2012 03:31:44 +0400 Subject: [LLVMdev] Access Violation using ExecutionEngine on 64-bit Windows 8 Consumer Preview In-Reply-To: <7DE70FDACDE4CD4887C4278C12A2E3050A1EF2@HASMSX104.ger.corp.intel.com> References: <7DE70FDACDE4CD4887C4278C12A2E3050A1EF2@HASMSX104.ger.corp.intel.com> Message-ID: Hi Rotem, Thank to you, and especially to Marina! The problem gone. I'm a bit interested, what is the reason it worked in Win7, and not in Win8. I've recently used Process Explorer to discover, that the call was to ntdll.dll, which in Win8 is loaded to the totally different address. Best regards, Victor Milovanov Moscow State University graduate student 2012/3/3 Rotem, Nadav : > Hi Victor, > > Try this fix by Marina Yatsina: > > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120220/137532.html > > Nadav > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of lost > Sent: Friday, March 02, 2012 22:53 > To: NAKAMURA Takumi; LLVM > Subject: Re: [LLVMdev] Access Violation using ExecutionEngine on 64-bit Windows 8 Consumer Preview > > Hi, Takumi! > > I tried your patch, and it did not help. Moreover, I tried to compile under Windows 7 and copy files to Windows 8, and received the same exception. So the problem seems to be in Windows 8 itself or some non-portable code inside LLVM. > > Could anyone tell me what LLVM code in ExecutionEngine is responsible for allocating and protecting memory for generated native functions? > > Best regards, > Victor Milovanov. > > 2012/3/2 NAKAMURA Takumi : >> Viktor, could you try my patch? I guess they are __chkstk. >> >> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120220/1 >> 37577.html >> >> ...Takumi > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > From chrisjones.lambda at gmail.com Fri Mar 2 18:15:52 2012 From: chrisjones.lambda at gmail.com (Christopher Jones) Date: Fri, 2 Mar 2012 19:15:52 -0500 Subject: [LLVMdev] LLVMdev Digest, Vol 93, Issue 5 In-Reply-To: References: Message-ID: Duncan, thanks! I needed libffi. Everything is fine now. Matt, thanks for the explanation of why clang worked with a simple C example and clang++ didn't seem to with a simple C++ example. Chris On Mar 2, 2012, at 10:05 AM, llvmdev-request at cs.uiuc.edu wrote: > first off you need to build with FFI support (configure with --enable-libffi). > Then you doubtless need to pass libstdc++ to lli, like this (IIRC): > -load=libstdc++.so > When you compile with clang++ it automagically adds the C++ standard library > to the list of things to link with, which is why you don't notice that the > linker is getting passed libstdc++.so. As lli is doing linking too, it also > needs libstdc++.so. > > Ciao, Duncan. > On Mar 2, 2012, at 10:05 AM, llvmdev-request at cs.uiuc.edu wrote: > hello.bc doesn't contain the libstdc++ bits your program needs (iostream > and its (many) dependencies). When you produce an executable, clang > tells the linker to link your binary with libsupc++, libstdc++, and > others, so the dynamic linker can satisfy your iostream dependencies at > runtime. When running under lli, the interpreter will provide *a few* > basic functions for you (see > lib/ExecutionEngine/Interpreter/ExternalFunctions.cpp), but things like > exit(), abort(), printf(), and scanf(), nothing as complicated as > libstdc++. So if the function you need is not in the short list > provided by the interpreter itself, it will try to find your function > using libffi (if you compiled it in). If that doesn't work, you'll get > errors like the below. > > One solution would be to try to generate a single big .bc file that is > "statically linked" with all your dependencies (for some clues as to > what these are, try "ldd ./hello" on your clang++-generated binary. > Unfortunately, I'm no expert on this or any other methods of informing > lli about your .bc file's dependencies and where they can be found when > your interpreted program calls out to them. > > -Matt From nadav.rotem at intel.com Fri Mar 2 23:54:51 2012 From: nadav.rotem at intel.com (Rotem, Nadav) Date: Sat, 3 Mar 2012 05:54:51 +0000 Subject: [LLVMdev] Access Violation using ExecutionEngine on 64-bit Windows 8 Consumer Preview In-Reply-To: References: <7DE70FDACDE4CD4887C4278C12A2E3050A1EF2@HASMSX104.ger.corp.intel.com> Message-ID: <7DE70FDACDE4CD4887C4278C12A2E3050A2157@HASMSX104.ger.corp.intel.com> On Windows, the LLVM JIT runner looks for the '_chkstk' symbol by enumerating all of the loaded DLLs. On Win8, NTDLL.DLL (where _chkstk is defined) is found in a location that is more than 32bits bytes away from the jitted code. Marina's patch changes the code that generates a call to '_chkstk' from PCREL32 (which uses a 32bit offset) to an indirect call (which uses a 64bit address from a register). -----Original Message----- From: lost [mailto:lostfreeman at gmail.com] Sent: Saturday, March 03, 2012 01:32 To: Rotem, Nadav; Yatsina, Marina; LLVM Subject: Re: [LLVMdev] Access Violation using ExecutionEngine on 64-bit Windows 8 Consumer Preview Hi Rotem, Thank to you, and especially to Marina! The problem gone. I'm a bit interested, what is the reason it worked in Win7, and not in Win8. I've recently used Process Explorer to discover, that the call was to ntdll.dll, which in Win8 is loaded to the totally different address. Best regards, Victor Milovanov Moscow State University graduate student 2012/3/3 Rotem, Nadav : > Hi Victor, > > Try this fix by Marina Yatsina: > > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120220/1 > 37532.html > > Nadav > > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of lost > Sent: Friday, March 02, 2012 22:53 > To: NAKAMURA Takumi; LLVM > Subject: Re: [LLVMdev] Access Violation using ExecutionEngine on > 64-bit Windows 8 Consumer Preview > > Hi, Takumi! > > I tried your patch, and it did not help. Moreover, I tried to compile under Windows 7 and copy files to Windows 8, and received the same exception. So the problem seems to be in Windows 8 itself or some non-portable code inside LLVM. > > Could anyone tell me what LLVM code in ExecutionEngine is responsible for allocating and protecting memory for generated native functions? > > Best regards, > Victor Milovanov. > > 2012/3/2 NAKAMURA Takumi : >> Viktor, could you try my patch? I guess they are __chkstk. >> >> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120220/ >> 1 >> 37577.html >> >> ...Takumi > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From geek4civic at gmail.com Sat Mar 3 00:05:49 2012 From: geek4civic at gmail.com (NAKAMURA Takumi) Date: Sat, 3 Mar 2012 15:05:49 +0900 Subject: [LLVMdev] Access Violation using ExecutionEngine on 64-bit Windows 8 Consumer Preview In-Reply-To: <7DE70FDACDE4CD4887C4278C12A2E3050A2157@HASMSX104.ger.corp.intel.com> References: <7DE70FDACDE4CD4887C4278C12A2E3050A1EF2@HASMSX104.ger.corp.intel.com> <7DE70FDACDE4CD4887C4278C12A2E3050A2157@HASMSX104.ger.corp.intel.com> Message-ID: 2012/3/3 Rotem, Nadav : > On Windows, the LLVM JIT runner looks for the '_chkstk' symbol by enumerating all of the loaded DLLs. ?On Win8, NTDLL.DLL (where _chkstk is defined) is found in a location that is more than 32bits bytes away from the jitted code. ?Marina's patch changes the code that generates a call to '_chkstk' from PCREL32 (which uses a 32bit offset) to an indirect call (which uses a 64bit address from a register). This issue was not only due to ntdll.dll. Potentially it could be on "large-address-aware" with JIT. I have missed to consider the case that JIT memory pool would not be within 2GB area. Marina's patch makes sense, chkstk in prologue insertion should be the special case in codegen. ...Takumi From ivanllopard at gmail.com Sat Mar 3 06:48:04 2012 From: ivanllopard at gmail.com (Ivan Llopard) Date: Sat, 03 Mar 2012 13:48:04 +0100 Subject: [LLVMdev] Data/Address registers Message-ID: <4F521304.1030900@gmail.com> Hi, I'm facing a problem in llvm while porting it to a new target and I'll need some support. We have 2 kind of register, one for general purposes (i.e. arithmetic, comparisons, etc.) and the other for memory addressing. Cross copies are not allowed (no data path). We use clang 3.0 to produce assembler code. Because both registers have the same size and type (i16), I don't know what would be the best solution to distinguish them in order to match the right instructions. Moreover, the standard pointer arithmetic is not enough for us (we need to support modulo operations also). I thought that I could manually match every arithmetic operation while matching the addressing mode but it doesn't work because intermediate results are sometimes reused for other purposes (e.g. comparisons). Do I need to add another type to clang/llvm ? Thanks in advance, Ivan From j.wilhelmy at arcor.de Sat Mar 3 14:28:17 2012 From: j.wilhelmy at arcor.de (Jochen Wilhelmy) Date: Sat, 03 Mar 2012 21:28:17 +0100 Subject: [LLVMdev] replace hardcoded function names by intrinsics References: 20120302084148.098e6641@sapling2 Message-ID: <4F527EE1.4030500@arcor.de> Hi! The main problem I currently see is that frontend/language specific assumptions are hardcoded inside the constant folding, namely that a function named sin calculates the sine. languages with some kind of name mangling don't benefit from this. so another solution would be making the constant folding extendable, i.e. that a table of function names and evaluators can be passed in from the outside. this way the c-specific stuff is removed from llvm and other functions like e.g. convert_int_rte() of opencl could be constant folded too. maybe this table can be per llvmContext so that InlineCost.cpp can accessed it too. -Jochen From xerox.time.tech at gmail.com Sat Mar 3 15:35:59 2012 From: xerox.time.tech at gmail.com (Xin Tong) Date: Sat, 3 Mar 2012 16:35:59 -0500 Subject: [LLVMdev] LLVM Value Tracking Analysis Message-ID: It seems to me that LLVM does not do too much on value range analysis. i.e. what are the value constraints on a variable at a given point in the program. The closest thing i can find is the ValueTracking API, which can do some simple analysis on the value of a variables. Am I missing something/Is there a plan on the implementation of a more powerful value range analysis ? Thanks Xin From baldrick at free.fr Sat Mar 3 16:26:27 2012 From: baldrick at free.fr (Duncan Sands) Date: Sat, 03 Mar 2012 23:26:27 +0100 Subject: [LLVMdev] LLVM Value Tracking Analysis In-Reply-To: References: Message-ID: <4F529A93.5040109@free.fr> Hi Xin, > It seems to me that LLVM does not do too much on value range analysis. > i.e. what are the value constraints on a variable at a given point in > the program. The closest thing i can find is the ValueTracking API, > which can do some simple analysis on the value of a variables. Am I > missing something/Is there a plan on the implementation of a more > powerful value range analysis ? as far as I know there have been two implementations of this kind of thing in the past, but they were each removed in turn. IIRC, this was due to them significantly increasing compilation time without a sufficient improvement in the quality of code to justify the compile time cost. Currently the closest thing is the correlated value propagation pass, but I doubt it will be useful for you. Ciao, Duncan. From pdox at google.com Sat Mar 3 16:27:07 2012 From: pdox at google.com (David Meyer) Date: Sat, 3 Mar 2012 14:27:07 -0800 Subject: [LLVMdev] "-march" trashing ARM triple In-Reply-To: <6FE2499E-DA0F-4D06-AC2E-E880015ABC5A@apple.com> References: <6FE2499E-DA0F-4D06-AC2E-E880015ABC5A@apple.com> Message-ID: Jim, There's a comment in llc.cpp: // Allocate target machine. First, check whether the user has explicitly // specified an architecture to compile for. If so we have to look it up by // name, because it might be a backend that has no mapping to a target triple. const Target *TheTarget = 0; if (!MArch.empty()) { It explicitly uses MArch over the triple for the Target lookup. Daniel, it looks like you wrote this comment. Could you explain what you mean by a backend with no mapping to a target triple? Thanks, - pdox On Fri, Mar 2, 2012 at 9:11 AM, Jim Grosbach wrote: > > On Mar 2, 2012, at 12:04 AM, David Meyer wrote: > > > ARM subtarget features are determined by parsing the target tuple string > TT. (ParseARMTriple(StringRef TT) in ARMMCTargetDesc.cpp) > > > > In llc, the -march setting overrides the architecture specified in > -mtriple. So when you invoke: > > > > $ llc -march arm -mtriple armv7-none-linux ... > > > > ParseARMTriple() will see TT == "arm-none-linux" instead of > "armv7-none-linux". As a result, the target features will be set > generically. (Note that using "-march armv7" is not valid.) > > > > This is clearly wrong, but I'm not clear on where/how this should be > fixed. Does the -march substitution need to happen at all? Could it be > disabled only for ARM? Should TargetTriple or -march be made more precise? > > > > When using a triple, -march doesn't add any additional information. The > idea is that -march is a shorthand for a generic triple (e.g., -march=arm > implies -mtriple=arm-unknown-unknown or something similar). > > It seems to me that using both on the llc command line should issue a > diagnostic. > > -Jim > > > Thanks, > > - pdox > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120303/b64054a5/attachment.html From xerox.time.tech at gmail.com Sat Mar 3 17:57:12 2012 From: xerox.time.tech at gmail.com (Xin Tong) Date: Sat, 3 Mar 2012 18:57:12 -0500 Subject: [LLVMdev] LLVM Value Tracking Analysis In-Reply-To: <4F529A93.5040109@free.fr> References: <4F529A93.5040109@free.fr> Message-ID: On Sat, Mar 3, 2012 at 5:26 PM, Duncan Sands wrote: > Hi Xin, > >> It seems to me that LLVM does not do too much on value range analysis. >> ? i.e. what are the value constraints on a variable at a given point in >> the program. The closest thing i can find is the ValueTracking API, >> which can do some simple analysis on the value of a variables. Am I >> missing something/Is there a plan on the implementation of a more >> powerful value range analysis ? > > as far as I know there have been two implementations of this kind of thing > in the past, but they were each removed in turn. ?IIRC, this was due to them > significantly increasing compilation time without a sufficient improvement in > the quality of code to justify the compile time cost. ?Currently the closest > thing is the correlated value propagation pass, but I doubt it will be useful > for > you. The correlated value propagation pass is what is currently in the ValueTrack.cpp file ? do you know when the two implementations are removed ? and where i can get them ? and how difficult is it to bring them up to the current src tree. Thanks Xin > > Ciao, Duncan. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From harel.cain at gmail.com Sun Mar 4 06:42:54 2012 From: harel.cain at gmail.com (Harel Cain) Date: Sun, 4 Mar 2012 14:42:54 +0200 Subject: [LLVMdev] Passing arguments to opt via clang Message-ID: Hi all, In the good old llvmc, the -Wo flag could be used to pass arguments to the optimizer. Is there a similar mechanism anywhere for clang? Is there also a similar mechanism to -Wllc? Thanks! Harel Cain -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120304/0e5c7267/attachment.html From anton at korobeynikov.info Sun Mar 4 07:03:49 2012 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Sun, 4 Mar 2012 17:03:49 +0400 Subject: [LLVMdev] Passing arguments to opt via clang In-Reply-To: References: Message-ID: > In the good old llvmc, the -Wo flag could be used to pass arguments to the > optimizer. Is there a similar mechanism anywhere for clang? Is there also a > similar mechanism to -Wllc? -mlvm will handle all of them -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From harel.cain at gmail.com Sun Mar 4 07:32:53 2012 From: harel.cain at gmail.com (Harel Cain) Date: Sun, 4 Mar 2012 15:32:53 +0200 Subject: [LLVMdev] Passing arguments to opt via clang In-Reply-To: References: Message-ID: Thanks, but I'm not sure I understand. I see no such flag in clang 2.9 nor couldn't I find any mention of it. What does it do? Harel Cain On Sun, Mar 4, 2012 at 15:03, Anton Korobeynikov wrote: > > In the good old llvmc, the -Wo flag could be used to pass arguments to > the > > optimizer. Is there a similar mechanism anywhere for clang? Is there > also a > > similar mechanism to -Wllc? > -mlvm will handle all of them > > -- > With best regards, Anton Korobeynikov > Faculty of Mathematics and Mechanics, Saint Petersburg State University > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120304/1a539789/attachment.html From baldrick at free.fr Sun Mar 4 07:42:46 2012 From: baldrick at free.fr (Duncan Sands) Date: Sun, 04 Mar 2012 14:42:46 +0100 Subject: [LLVMdev] Passing arguments to opt via clang In-Reply-To: References: Message-ID: <4F537156.8080602@free.fr> On 04/03/12 14:32, Harel Cain wrote: > Thanks, but I'm not sure I understand. I see no such flag in clang 2.9 nor > couldn't I find any mention of it. What does it do? I think he meant -mllvm not -mlvm. The next thing that follows is passed to LLVM, for example -mllvm -disable-llvm-optzns Ciao, Duncan. > > > Harel Cain > > > On Sun, Mar 4, 2012 at 15:03, Anton Korobeynikov > wrote: > > > In the good old llvmc, the -Wo flag could be used to pass arguments to the > > optimizer. Is there a similar mechanism anywhere for clang? Is there also a > > similar mechanism to -Wllc? > -mlvm will handle all of them > > -- > With best regards, Anton Korobeynikov > Faculty of Mathematics and Mechanics, Saint Petersburg State University > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From tobias at grosser.es Sun Mar 4 07:48:20 2012 From: tobias at grosser.es (Tobias Grosser) Date: Sun, 04 Mar 2012 14:48:20 +0100 Subject: [LLVMdev] Passing arguments to opt via clang In-Reply-To: References: Message-ID: <4F5372A4.2050204@grosser.es> On 03/04/2012 02:32 PM, Harel Cain wrote: > Thanks, but I'm not sure I understand. I see no such flag in clang 2.9 > nor couldn't I find any mention of it. What does it do? It's called -mllvm. You can use it like this. clang -mllvm -vectorize ... Cheers Tobi From pmon.mail at gmail.com Sun Mar 4 09:24:13 2012 From: pmon.mail at gmail.com (pmon mail) Date: Sun, 4 Mar 2012 17:24:13 +0200 Subject: [LLVMdev] Passing arguments to opt via clang In-Reply-To: <4F5372A4.2050204@grosser.es> References: <4F5372A4.2050204@grosser.es> Message-ID: I have tried to invoke a transformation/optimization pass using -mllvm, without success. I might be missing something. For example I have a shared/dynamic library which contains LLVM passes. I used to invoke them with llvmc like this: >>*llvmc mycode.c -o mycode.o -c -opt -Wo,=-load,libFoo.dylib,-Foo* Can this style of optimization be executed using -mllvm argument? Thx, PMon On Sun, Mar 4, 2012 at 3:48 PM, Tobias Grosser wrote: > On 03/04/2012 02:32 PM, Harel Cain wrote: > > Thanks, but I'm not sure I understand. I see no such flag in clang 2.9 > > nor couldn't I find any mention of it. What does it do? > > It's called -mllvm. > > You can use it like this. > > clang -mllvm -vectorize ... > > Cheers > Tobi > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120304/4480a4f2/attachment.html From jon at ffconsultancy.com Sun Mar 4 09:38:31 2012 From: jon at ffconsultancy.com (Jon Harrop) Date: Sun, 4 Mar 2012 15:38:31 -0000 Subject: [LLVMdev] LLVM from .NET Message-ID: <006301ccfa1c$db465670$91d30350$@ffconsultancy.com> I've been struggling to get LLVM to work from .NET using the llvm-fs bindings for the past few weeks. I finally found an installation procedure that works and documented it here: http://fsharpnews.blogspot.com/2012/03/using-llvm-from-f-under-windows.html The good news is that I have that program compiling the Fibonacci function and executing it from F# all via LLVM. However, I still have a couple of problems. Firstly, I was getting "stack unbalanced" warnings from the managed debugging assistant when I run a debug build. I managed to fix them on my (x86 Vista) desktop by specifying the Cdecl calling convention but the warnings persist on my (x86 Win 7) netbook. Secondly, the performance is awful which (IIRC) is probably because the native target has not been initialized correctly and LLVM is falling back to the IR interpreter. Keith Sheppard used a hack to call LLVMInitializeX86Target but that hasn't done the trick and I cannot figure out how to call the correct LLVMInitializeNativeTarget function from .NET. Does anyone know the solutions to these problems or, better yet, have a pre-existing .NET binding to LLVM where everything just works effortlessly? FWIW, I tried running Microsoft's PInvoke Interop Assistant on LLVM-3.0.dll in an attempt to generate .NET bindings automatically but it chokes on the DLL. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com From anton at korobeynikov.info Sun Mar 4 09:57:06 2012 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Sun, 4 Mar 2012 19:57:06 +0400 Subject: [LLVMdev] Passing arguments to opt via clang In-Reply-To: References: <4F5372A4.2050204@grosser.es> Message-ID: Hello > For example I have a shared/dynamic library which contains LLVM passes. I > used to invoke them with llvmc like this: >>>llvmc mycode.c -o mycode.o -c -opt -Wo,=-load,libFoo.dylib,-Foo > Can this style of optimization be executed using -mllvm argument? No. You cannot add additional passes this way, only pass 'ordinary' arguments. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From jonathandeanharrop at googlemail.com Sun Mar 4 11:04:19 2012 From: jonathandeanharrop at googlemail.com (Jon Harrop) Date: Sun, 4 Mar 2012 17:04:19 -0000 Subject: [LLVMdev] LLVM from .NET In-Reply-To: <006301ccfa1c$db465670$91d30350$@ffconsultancy.com> References: <006301ccfa1c$db465670$91d30350$@ffconsultancy.com> Message-ID: <006801ccfa28$d75305c0$85f91140$@ffconsultancy.com> I've fixed one problem. LLVM was falling back to its IR interpreter because the native target was not initialized correctly. The solution is to call the following three functions in turn: LLVMInitializeX86TargetInfo LLVMInitializeX86Target LLVMInitializeX86TargetMC Cheers, Jon. > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On > Behalf Of Jon Harrop > Sent: 04 March 2012 15:39 > To: llvmdev at cs.uiuc.edu > Subject: [LLVMdev] LLVM from .NET > > I've been struggling to get LLVM to work from .NET using the llvm-fs bindings for > the past few weeks. I finally found an installation procedure that works and > documented it here: > > http://fsharpnews.blogspot.com/2012/03/using-llvm-from-f-under- > windows.html > > The good news is that I have that program compiling the Fibonacci function and > executing it from F# all via LLVM. > > However, I still have a couple of problems. Firstly, I was getting "stack > unbalanced" warnings from the managed debugging assistant when I run a > debug build. I managed to fix them on my (x86 Vista) desktop by specifying the > Cdecl calling convention but the warnings persist on my (x86 Win 7) netbook. > Secondly, the performance is awful which (IIRC) is probably because the native > target has not been initialized correctly and LLVM is falling back to the IR > interpreter. Keith Sheppard used a hack to call LLVMInitializeX86Target but that > hasn't done the trick and I cannot figure out how to call the correct > LLVMInitializeNativeTarget function from .NET. > > Does anyone know the solutions to these problems or, better yet, have a pre- > existing .NET binding to LLVM where everything just works effortlessly? > > FWIW, I tried running Microsoft's PInvoke Interop Assistant on LLVM-3.0.dll in > an attempt to generate .NET bindings automatically but it chokes on the DLL. > > -- > Dr Jon Harrop, Flying Frog Consultancy Ltd. > http://www.ffconsultancy.com > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From anders at 0x63.nu Sun Mar 4 14:58:42 2012 From: anders at 0x63.nu (Anders Waldenborg) Date: Sun, 04 Mar 2012 21:58:42 +0100 Subject: [LLVMdev] Debug info compileunit metadata strangeness.. Message-ID: <87pqcsf5h9.wl%anders@0x63.nu> Hi, I have a question regarding the metadata for compileunit debug info. I find a few things in it a bit strange, but maybe there it is a reason for it to be that way that I just don't understand (but if that is the case I guess the documentation needs to be clearer). Consider this C program: "int X;" Compiled with "clang -g" it debug metadata along these lines: !llvm.dbg.cu = !{!0} !0 = metadata !{i32 786449, ..... metadata !1, ;; List of enums types metadata !1, ;; List of retained types metadata !1, ;; List of subprograms metadata !3 ;; List of global variables } ; [ DW_TAG_compile_unit ] !1 = metadata !{metadata !2} !2 = metadata !{i32 0} !3 = metadata !{metadata !4} !4 = metadata !{metadata !5} !5 = metadata !{i32 786484, ...} ; [ DW_TAG_variable ] Documentation says "List of global variables", but it is built as a "One element list containing the list of global variables". Is there a reason it is built that way? Looking at DIBuilder::createCompileUnit it does indeed build it with an extra indirection. Another thing I find strange is the empty lists, such as "List of enums types" in my example above. They apparently end up being one-element lists containing "i32 0". I would expect that the reference to the empty lists would be "null". In other words, I'd expect it to look like this: !llvm.dbg.cu = !{!0} !0 = metadata !{i32 786449, ... null, ;; List of enums types null, ;; List of retained types null, ;; List of subprograms metadata !1 ;; List of global variables } ; [ DW_TAG_compile_unit ] !1 = metadata !{metadata !2} !2 = metadata !{i32 786484, ...} ; [ DW_TAG_variable ] I attached two patches that changes compileunit metadata to have the layout I think makes sense. I do not expect these patches to be applied (especially the second one is just a gross hack), I'm attaching them because maybe they may help show what I'm trying to say (a patch says more than 1000 words?). anders -------------- next part -------------- From 8c6193bc46573858741be624893ff7745020ec48 Mon Sep 17 00:00:00 2001 From: Anders Waldenborg Date: Sun, 4 Mar 2012 13:56:21 +0100 Subject: [PATCH 1/2] No indirection of in compileunit debuginfo --- lib/Analysis/DIBuilder.cpp | 20 +++++--------------- 1 files changed, 5 insertions(+), 15 deletions(-) diff --git a/lib/Analysis/DIBuilder.cpp b/lib/Analysis/DIBuilder.cpp index f0bdc48..5d37123 100644 --- a/lib/Analysis/DIBuilder.cpp +++ b/lib/Analysis/DIBuilder.cpp @@ -82,21 +82,11 @@ void DIBuilder::createCompileUnit(unsigned Lang, StringRef Filename, assert(!Filename.empty() && "Unable to create compile unit without filename"); Value *TElts[] = { GetTagConstant(VMContext, DW_TAG_base_type) }; - TempEnumTypes = MDNode::getTemporary(VMContext, TElts); - Value *THElts[] = { TempEnumTypes }; - MDNode *EnumHolder = MDNode::get(VMContext, THElts); + TempEnumTypes = MDNode::getTemporary(VMContext, TElts); TempRetainTypes = MDNode::getTemporary(VMContext, TElts); - Value *TRElts[] = { TempRetainTypes }; - MDNode *RetainHolder = MDNode::get(VMContext, TRElts); - TempSubprograms = MDNode::getTemporary(VMContext, TElts); - Value *TSElts[] = { TempSubprograms }; - MDNode *SPHolder = MDNode::get(VMContext, TSElts); - TempGVs = MDNode::getTemporary(VMContext, TElts); - Value *TVElts[] = { TempGVs }; - MDNode *GVHolder = MDNode::get(VMContext, TVElts); Value *Elts[] = { GetTagConstant(VMContext, dwarf::DW_TAG_compile_unit), @@ -110,10 +100,10 @@ void DIBuilder::createCompileUnit(unsigned Lang, StringRef Filename, ConstantInt::get(Type::getInt1Ty(VMContext), isOptimized), MDString::get(VMContext, Flags), ConstantInt::get(Type::getInt32Ty(VMContext), RunTimeVer), - EnumHolder, - RetainHolder, - SPHolder, - GVHolder + TempEnumTypes, + TempRetainTypes, + TempSubprograms, + TempGVs }; TheCU = DICompileUnit(MDNode::get(VMContext, Elts)); -- 1.7.9.1 -------------- next part -------------- From 92c9c2646f767d7b532da5b5f548936a52a453fa Mon Sep 17 00:00:00 2001 From: Anders Waldenborg Date: Sun, 4 Mar 2012 20:42:24 +0100 Subject: [PATCH 2/2] hack: make empty lists null in compileunit debuginfo --- lib/Analysis/DIBuilder.cpp | 56 ++++++++++++++++++++++++++++--------------- 1 files changed, 36 insertions(+), 20 deletions(-) diff --git a/lib/Analysis/DIBuilder.cpp b/lib/Analysis/DIBuilder.cpp index 5d37123..b18a60e 100644 --- a/lib/Analysis/DIBuilder.cpp +++ b/lib/Analysis/DIBuilder.cpp @@ -36,30 +36,46 @@ DIBuilder::DIBuilder(Module &m) /// finalize - Construct any deferred debug info descriptors. void DIBuilder::finalize() { - DIArray Enums = getOrCreateArray(AllEnumTypes); - DIType(TempEnumTypes).replaceAllUsesWith(Enums); - - DIArray RetainTypes = getOrCreateArray(AllRetainTypes); - DIType(TempRetainTypes).replaceAllUsesWith(RetainTypes); - - DIArray SPs = getOrCreateArray(AllSubprograms); - DIType(TempSubprograms).replaceAllUsesWith(SPs); - for (unsigned i = 0, e = SPs.getNumElements(); i != e; ++i) { - DISubprogram SP(SPs.getElement(i)); - if (NamedMDNode *NMD = getFnSpecificMDNode(M, SP)) { - SmallVector Variables; - for (unsigned ii = 0, ee = NMD->getNumOperands(); ii != ee; ++ii) - Variables.push_back(NMD->getOperand(ii)); - if (MDNode *Temp = SP.getVariablesNodes()) { - DIArray AV = getOrCreateArray(Variables); - DIType(Temp).replaceAllUsesWith(AV); + if (AllEnumTypes.empty()) { + TheCU->replaceOperandWith(10, NULL); + } else { + DIArray Enums = getOrCreateArray(AllEnumTypes); + DIType(TempEnumTypes).replaceAllUsesWith(Enums); + } + + if (AllRetainTypes.empty()) { + TheCU->replaceOperandWith(11, NULL); + } else { + DIArray RetainTypes = getOrCreateArray(AllRetainTypes); + DIType(TempRetainTypes).replaceAllUsesWith(RetainTypes); + } + + if (AllSubprograms.empty()) { + TheCU->replaceOperandWith(12, NULL); + } else { + DIArray SPs = getOrCreateArray(AllSubprograms); + DIType(TempSubprograms).replaceAllUsesWith(SPs); + for (unsigned i = 0, e = SPs.getNumElements(); i != e; ++i) { + DISubprogram SP(SPs.getElement(i)); + if (NamedMDNode *NMD = getFnSpecificMDNode(M, SP)) { + SmallVector Variables; + for (unsigned ii = 0, ee = NMD->getNumOperands(); ii != ee; ++ii) + Variables.push_back(NMD->getOperand(ii)); + if (MDNode *Temp = SP.getVariablesNodes()) { + DIArray AV = getOrCreateArray(Variables); + DIType(Temp).replaceAllUsesWith(AV); + } + NMD->eraseFromParent(); } - NMD->eraseFromParent(); } } - DIArray GVs = getOrCreateArray(AllGVs); - DIType(TempGVs).replaceAllUsesWith(GVs); + if (AllGVs.empty()) { + TheCU->replaceOperandWith(13, NULL); + } else { + DIArray GVs = getOrCreateArray(AllGVs); + DIType(TempGVs).replaceAllUsesWith(GVs); + } } /// getNonCompileUnitScope - If N is compile unit return NULL otherwise return -- 1.7.9.1 From borja.ferav at gmail.com Sun Mar 4 15:07:50 2012 From: borja.ferav at gmail.com (Borja Ferrer) Date: Sun, 4 Mar 2012 22:07:50 +0100 Subject: [LLVMdev] Adding a new function attribute Message-ID: Hello, I'm adding a new function attribute in clang and llvm for a backend I'm writing that treats prolog and epilogue code in a special way inside interrupt handlers, similar to what naked does. One way I've seen to do this is to add a new attribute type in Attributes.h, however to me it feels bad to add a target dependent attribute into this place which is very target independent. So what's the best way to do this or is there an api to handle this kind of issues? Thanks for the help. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120304/65c0e13d/attachment.html From rich at pennware.com Sun Mar 4 16:19:34 2012 From: rich at pennware.com (Richard Pennington) Date: Sun, 4 Mar 2012 16:19:34 -0600 Subject: [LLVMdev] I stole the demo. Message-ID: <201203041619.34401.rich@pennware.com> I had a little time on my hands this afternoon, so I stole the Clang/LLVM demo and modified it to allow compiling for several other targets: http://ellcc.org/demo I did notice one flaw in the LLVM demo, there doesn't seem to be a way to upload a file to compile, at least with Firefox. I modified the cgi script slightly to clear the $source variable if a upload file has been selected. -Rich From pmon.mail at gmail.com Sun Mar 4 02:08:16 2012 From: pmon.mail at gmail.com (PMon) Date: Sun, 4 Mar 2012 00:08:16 -0800 (PST) Subject: [LLVMdev] Invalid relocation types for Thumb in LLVM version 2.9 In-Reply-To: References: Message-ID: <33437362.post@talk.nabble.com> Hi Nick I have the same problem when compiling code with LLVM 2.9 and linking the objects with the linker which comes with iOS SDK 4.2 (which can be found in "/Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/"). When you say "newer linker" can you elaborate which linker to use? Thanks PMon Nick Kledzik wrote: > > The llvm compiler can now generated movt/movw instructions to create > 32-bit constants. Those new instructions use new relocations. Mach-o > uses different numbering for relocations than ELF does. For mach-o, > ARM_RELOC_PAIR=1 and ARM_RELOC_HALF=8. You need a newer linker that > understands the new relocations. > > -Nick > > On Feb 20, 2012, at 5:20 AM, Harel Cain wrote: >> Hi all, >> >> I'm trying to figure out a problem with relocation types 1 and 8 (as >> observed using otool -r on ARM/Thumb object files). Earlier, when I used >> LLVM 2.8 with llc to generate thumb (-march=thumb -mattr=+thumb2) >> assembly listings, then assemble those using the gcc of iPhone 4.2 SDK, >> there wasn't any problem. >> >> However starting with LLVM 2.9, the same toolchain emits slightly >> different assembly listings that after assembly into object files have >> relocation entries of type 1 and 8 which the iPhonsOS 4.2 SDK linker >> doesn't like (they produce warnings, and the linked binary crashes). >> >> According to http://simplemachines.it/doc/aaelf.pdf, these relocation >> types are called R_ARM_PC24 and R_ARM_ABS8. They simply weren't created >> with the assembly listings generated with LLVM 2.8. >> >> Anyone has any suggestion has to solve this? Is there any other toolchain >> combination you can suggest in order to build Thumb object files for the >> iOS/ARM platform? I'm not sure I'm even using LLVM the way I should here. >> >> >> Many thanks! >> >> >> Harel Cain >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -- View this message in context: http://old.nabble.com/Invalid-relocation-types-for-Thumb-in-LLVM-version-2.9-tp33356625p33437362.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From cristiannomartins at gmail.com Sun Mar 4 23:37:59 2012 From: cristiannomartins at gmail.com (Cristianno Martins) Date: Mon, 5 Mar 2012 02:37:59 -0300 Subject: [LLVMdev] Problem using march=c Message-ID: Hello everyone, I've been trying to generate a C file using the llc tool, but I'm having a problem. I'm using a single Hello World program in C, and executing the following passes: clang -emit-llvm -c -o hello.bc hello.c? # getting the bit code of hello.c llc -march=c hello.bc ? ? ? ? ? ? ? ? ? ? ? ? ?# generating the hello.cbe.c file using the llvm C backend So far, nothing weird happened. The problem occurs once I try to compile the hello.cbe.c: gcc, and even clang, show me some warning and error messages. Cheking in hello.cbe.c, I could see that the problem was related with the structs declarations and initialization. There, they appears to be something like /* Global Variable Declarations */ static _OC_str { unsigned char array[13]; }; /* Global Variable Definitions and Initialization */ static _OC_str { unsigned char array[13]; } = { "Hello World\n" }; when they should be like /* Global Variable Declarations */ static struct type_OC_str { unsigned char array[13]; }; /* Global Variable Definitions and Initialization */ static struct type_OC_str _OC_str = { "Hello World\n" }; It is a known problem, or am I doing something wrong? Thks in advance, -- Cristianno Martins PhD Student of Computer Science University of Campinas cmartins at ic.unicamp.br From anton at korobeynikov.info Mon Mar 5 00:09:43 2012 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Mon, 5 Mar 2012 10:09:43 +0400 Subject: [LLVMdev] Problem using march=c In-Reply-To: References: Message-ID: Hello > It is a known problem, or am I doing something wrong? C backend is known to be buggy. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From spande at codeaurora.org Mon Mar 5 01:56:29 2012 From: spande at codeaurora.org (Sirish Pande) Date: Mon, 05 Mar 2012 01:56:29 -0600 Subject: [LLVMdev] printing hex format for floating point number Message-ID: <4F5471AD.1050205@codeaurora.org> Hi, I am trying to print a hex value ( 4111999A for 9.1) for a corresponding floating point number. The routine convertToHexString in APFFloat class only prints in C99 Floating point hexagondecimal constant (eg 1.e00000p3). Without writing my own routine, how do I get to print the hexadecimal representation for a floating point value? Sirish -- Qualcomm Innovation Center, Inc is a member of Code Aurora Forum -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/6dae85a2/attachment.html From maemarcus at gmail.com Mon Mar 5 03:19:07 2012 From: maemarcus at gmail.com (Dmitry N. Mikushin) Date: Mon, 05 Mar 2012 12:19:07 +0300 Subject: [LLVMdev] Problem using march=c In-Reply-To: References: Message-ID: <1330939147.5965.5.camel@Nokia-N900> Hi Cristianno, This problem has been around for a while, ourselves we solve it with the following patches: https://hpcforge.org/scm/viewvc.php/trunk/patches/llvm.gpu.patch?root=kernelgen&view=markup https://hpcforge.org/scm/viewvc.php/trunk/patches/llvm.patch?revision=591&root=kernelgen&view=markup Please feel free to apply them, they *should* work for you even with the latest llvm 3.1. Also please feel free to ping me in case of any troubles. - D. ----- Original message ----- > Hello everyone, > > I've been trying to generate a C file using the llc tool, but I'm > having a problem. I'm using a single Hello World program in C, and > executing the following passes: > > clang -emit-llvm -c -o hello.bc hello.c?? # getting the bit code of > hello.c llc -march=c hello.bc ? ? ? ? ? ? ? ? ? ? ? ? ?# generating the > hello.cbe.c file using the llvm C backend > > So far, nothing weird happened. The problem occurs once I try to > compile the hello.cbe.c: gcc, and even clang, show me some warning and > error messages. > Cheking in hello.cbe.c, I could see that the problem was related with > the structs declarations and initialization. There, they appears to be > something like > >? ? ? ? ? ? ? ? /* Global Variable Declarations */ >? ? ? ? ? ? ? ? static _OC_str { unsigned char array[13]; }; > >? ? ? ? ? ? ? ? /* Global Variable Definitions and Initialization */ >? ? ? ? ? ? ? ? static _OC_str { unsigned char array[13]; } = { "Hello World\n" > }; > > when they should be like > >? ? ? ? ? ? ? ? /* Global Variable Declarations */ >? ? ? ? ? ? ? ? static struct type_OC_str { unsigned char array[13]; }; > >? ? ? ? ? ? ? ? /* Global Variable Definitions and Initialization */ >? ? ? ? ? ? ? ? static struct type_OC_str _OC_str = { "Hello World\n" }; > > It is a known problem, or am I doing something wrong? > > Thks in advance, > > -- > Cristianno Martins > PhD Student of Computer Science > University of Campinas > cmartins at ic.unicamp.br > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu? ? ? ? ? ? ? ? http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/7ba3d7ce/attachment.html From baldrick at free.fr Mon Mar 5 02:29:56 2012 From: baldrick at free.fr (Duncan Sands) Date: Mon, 05 Mar 2012 09:29:56 +0100 Subject: [LLVMdev] I stole the demo. In-Reply-To: <201203041619.34401.rich@pennware.com> References: <201203041619.34401.rich@pennware.com> Message-ID: <4F547984.4090306@free.fr> Hi Richard, > I had a little time on my hands this afternoon, so I stole the Clang/LLVM demo > and modified it to allow compiling for several other targets: > http://ellcc.org/demo does it use the correct header files for the target etc? Ciao, Duncan. From r4start at gmail.com Mon Mar 5 02:40:06 2012 From: r4start at gmail.com (r4start) Date: Mon, 05 Mar 2012 12:40:06 +0400 Subject: [LLVMdev] Microsoft constructors implementation problem. In-Reply-To: References: <4F4B6C08.1080104@gmail.com> Message-ID: <4F547BE6.1060006@gmail.com> Hi! I have another question. If ctor was called from other ctor then additional parameter must be equal 0 otherwise it`s equal 1. How can I determine who call constructor? - Dmitry. From james.molloy at arm.com Mon Mar 5 05:06:26 2012 From: james.molloy at arm.com (James Molloy) Date: Mon, 5 Mar 2012 11:06:26 -0000 Subject: [LLVMdev] replace hardcoded function names by intrinsics In-Reply-To: <20120302093132.7fdcdf2f@sapling2> References: <4F50C336.9020200@googlemail.com> <20120302084148.098e6641@sapling2> <4F50E1AD.8050604@free.fr> <20120302093132.7fdcdf2f@sapling2> Message-ID: <006701ccfac0$024198e0$06c4caa0$@molloy@arm.com> Hi, > Would it be useful, for this purpose, to have an > (inter-procedural) analysis pass, or some annotation-driven mechanism, > or both, to mark errno as "dead" so we don't have to worry about this > kind of thing if it is not necessary? This is currently done by marking the declaration of @sinf and friends to be "readnone". I recently fixed a bug where log2 and exp2 did not have that readnone check, so it can be quite temperamental.. Cheers, James -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel Sent: 02 March 2012 15:32 To: Duncan Sands Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] replace hardcoded function names by intrinsics On Fri, 02 Mar 2012 16:05:17 +0100 Duncan Sands wrote: > Hi, > > >> in the llvm code there are several places with hardcoded function > >> names for e.g. sin, sinf, sqrt, sqrtf etc., namely > >> ConstantFolding.cpp > >> InlineCost.cpp > >> SelectionDAGBuilder.cpp > >> IntrinsicLowering.cpp > >> TargetLowering.cpp > >> > >> my question is: wouldn't it be beneficial to use intrinsics for > >> this? for example a c/c++ > >> frontend (clang) could translate the function calls to intrinsics > >> and then in a very late > >> step (IntrinsicLowering.cpp?) translate it back to function calls. > >> an opencl frontend then could use the intrinsics on vector types > >> and ConstantFolding.cpp > >> would work on sin/cos of vector types. currently the intrinsics for > >> sin/cos are missing in > >> ConstantFolding. > >> To summarize, using only intrinsics would reduce complexity and > >> increase flexibility as > >> vector types are supported. > > > > I also think that this is a good idea. > > intrinsics don't have the same semantics as the library functions. > For example they don't set errno and in general they are less > accurate. Thus you can't turn every use of eg sqrt into an > intrinsic. However you will still want to constant fold instances of > sqrt that weren't turned into an intrinsic, and thus all those names > will still need to exist in constant fold etc, so this change > wouldn't buy you much. In some cases, this will depend on how these things are lowered, if bounds can be put on the input ranges, etc. Otherwise, I think this is a "fast math" kind of optimization. Do you disagree? Would it be useful, for this purpose, to have an (inter-procedural) analysis pass, or some annotation-driven mechanism, or both, to mark errno as "dead" so we don't have to worry about this kind of thing if it is not necessary? -Hal > > Ciao, Duncan. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From cristiannomartins at gmail.com Mon Mar 5 05:20:35 2012 From: cristiannomartins at gmail.com (Cristianno Martins) Date: Mon, 5 Mar 2012 08:20:35 -0300 Subject: [LLVMdev] Problem using march=c In-Reply-To: <1330939147.5965.5.camel@Nokia-N900> References: <1330939147.5965.5.camel@Nokia-N900> Message-ID: Hello again, Thanks for the responses =) Dmitry, I have two points to comment: - First, I applied these two patches, and the .cbe.c file came out ok, except for one little thing -- the global variable was created with both modifiers: static and extern. Then, I just added a single guard to prevent this to happen (in a case of a variable having local linkage, the "extern" part was not printed to file). - Second, I created the patch appended in this email, but instead of representing only the "guard-adding" part, this patch is a union of this change, and the others two patches you sent to me. Thanks again, -- Cristianno Martins PhD Student of Computer Science University of Campinas cmartins at ic.unicamp.br On Mon, Mar 5, 2012 at 6:19 AM, Dmitry N. Mikushin wrote: > Hi Cristianno, > > This problem has been around for a while, ourselves we solve it with the > following patches: > > https://hpcforge.org/scm/viewvc.php/trunk/patches/llvm.gpu.patch?root=kernelgen&view=markup > > https://hpcforge.org/scm/viewvc.php/trunk/patches/llvm.patch?revision=591&root=kernelgen&view=markup > > Please feel free to apply them, they *should* work for you even with the > latest llvm 3.1. Also please feel free to ping me in case of any troubles. > > - D. > > > > ----- Original message ----- >> Hello everyone, >> >> I've been trying to generate a C file using the llc tool, but I'm >> having a problem. I'm using a single Hello World program in C, and >> executing the following passes: >> >> clang -emit-llvm -c -o hello.bc hello.c?? # getting the bit code of >> hello.c llc -march=c hello.bc ? ? ? ? ? ? ? ? ? ? ? ? ?# generating the >> hello.cbe.c file using the llvm C backend >> >> So far, nothing weird happened. The problem occurs once I try to >> compile the hello.cbe.c: gcc, and even clang, show me some warning and >> error messages. >> Cheking in hello.cbe.c, I could see that the problem was related with >> the structs declarations and initialization. There, they appears to be >> something like >> >>? ? ? ? ? ? ? ? /* Global Variable Declarations */ >>? ? ? ? ? ? ? ? static _OC_str { unsigned char array[13]; }; >> >>? ? ? ? ? ? ? ? /* Global Variable Definitions and Initialization */ >>? ? ? ? ? ? ? ? static _OC_str { unsigned char array[13]; } = { "Hello >> World\n" >> }; >> >> when they should be like >> >>? ? ? ? ? ? ? ? /* Global Variable Declarations */ >>? ? ? ? ? ? ? ? static struct type_OC_str { unsigned char array[13]; }; >> >>? ? ? ? ? ? ? ? /* Global Variable Definitions and Initialization */ >>? ? ? ? ? ? ? ? static struct type_OC_str _OC_str = { "Hello World\n" }; >> >> It is a known problem, or am I doing something wrong? >> >> Thks in advance, >> >> -- >> Cristianno Martins >> PhD Student of Computer Science >> University of Campinas >> cmartins at ic.unicamp.br >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu? ? ? ? ? ? ? ? http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -------------- next part -------------- A non-text attachment was scrubbed... Name: CBackend.patch Type: application/octet-stream Size: 20144 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/64ab34a2/attachment.obj From shanmuk.rao008 at gmail.com Mon Mar 5 05:42:05 2012 From: shanmuk.rao008 at gmail.com (shanmuk rao) Date: Mon, 5 Mar 2012 17:12:05 +0530 Subject: [LLVMdev] problem in implementing loop fission using ModulePass Message-ID: Hi, I am trying to implement my own Loop fission transformations in llvm. But to find circular dependency, i think i have to use LoopDependenceAnalysis. I am using ModulePass. In this pass I am getting LoopInfo and Loops. but when I try to use LoopDependenceAnalysis It throws segmentation fault. the example shows what i want to do : for(int i = 0; i< n ; i++){ s1 : a[i] = a[i] + x[i]; s2 : x[i] = x[i+1] + i*2 ; } /*I have to find here that from s2 to s1 there is no dependency * but there is dependency from s1 to s2 * I wont consider the RAR dependency */ after distribution(it should be) : for(int i = 0; i< n ; i++) s1: a[i] = a[i] + x[i]; for(int i = 0; i< n ; i++) s2: x[i] = x[i+1] + i*2 ; I think there is a function isDependendent() in LoopDependencyAnalysis. thank you, shanmuk -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/dbf61896/attachment.html From rich at pennware.com Mon Mar 5 05:52:14 2012 From: rich at pennware.com (Richard Pennington) Date: Mon, 5 Mar 2012 05:52:14 -0600 Subject: [LLVMdev] I stole the demo. In-Reply-To: <4F547984.4090306@free.fr> References: <201203041619.34401.rich@pennware.com> <4F547984.4090306@free.fr> Message-ID: <201203050552.15084.rich@pennware.com> On Monday, March 05, 2012 02:29:56 AM Duncan Sands wrote: > Hi Richard, > > > I had a little time on my hands this afternoon, so I stole the Clang/LLVM > > demo and modified it to allow compiling for several other targets: > > http://ellcc.org/demo > > does it use the correct header files for the target etc? > > Ciao, Duncan. Yes, it does. The header files are from my port of the NetBSD C library. I'm tempted to add an option to execute the result under QEMU, but I shudder to think about the security holes that would open. ;-) -Rich From maemarcus at gmail.com Mon Mar 5 07:16:08 2012 From: maemarcus at gmail.com (Dmitry N. Mikushin) Date: Mon, 5 Mar 2012 16:16:08 +0300 Subject: [LLVMdev] Problem using march=c In-Reply-To: References: <1330939147.5965.5.camel@Nokia-N900> Message-ID: Hi Cristianno, Great to know it works for you! Btw it would be very nice if someone could help cleaning these fixes from CUDA/OpenCL stuff and contribute patch to trunk, finally, with some tests. > the global variable was created with both modifiers: static and extern. Oh, right, this is wrong in general, was there just to unify behavior with the cudafe source preprocessor, which does the same thing, since on GPU global variables could not be shared between compilation units. - D. 2012/3/5 Cristianno Martins > Hello again, > > Thanks for the responses =) > > Dmitry, I have two points to comment: > - First, I applied these two patches, and the .cbe.c file came out ok, > except for one little thing -- the global variable was created with > both modifiers: static and extern. Then, I just added a single guard > to prevent this to happen (in a case of a variable having local > linkage, the "extern" part was not printed to file). > - Second, I created the patch appended in this email, but instead of > representing only the "guard-adding" part, this patch is a union of > this change, and the others two patches you sent to me. > > Thanks again, > > -- > Cristianno Martins > PhD Student of Computer Science > University of Campinas > cmartins at ic.unicamp.br > > > > On Mon, Mar 5, 2012 at 6:19 AM, Dmitry N. Mikushin > wrote: > > Hi Cristianno, > > > > This problem has been around for a while, ourselves we solve it with the > > following patches: > > > > > https://hpcforge.org/scm/viewvc.php/trunk/patches/llvm.gpu.patch?root=kernelgen&view=markup > > > > > https://hpcforge.org/scm/viewvc.php/trunk/patches/llvm.patch?revision=591&root=kernelgen&view=markup > > > > Please feel free to apply them, they *should* work for you even with the > > latest llvm 3.1. Also please feel free to ping me in case of any > troubles. > > > > - D. > > > > > > > > ----- Original message ----- > >> Hello everyone, > >> > >> I've been trying to generate a C file using the llc tool, but I'm > >> having a problem. I'm using a single Hello World program in C, and > >> executing the following passes: > >> > >> clang -emit-llvm -c -o hello.bc hello.c # getting the bit code of > >> hello.c llc -march=c hello.bc # generating the > >> hello.cbe.c file using the llvm C backend > >> > >> So far, nothing weird happened. The problem occurs once I try to > >> compile the hello.cbe.c: gcc, and even clang, show me some warning and > >> error messages. > >> Cheking in hello.cbe.c, I could see that the problem was related with > >> the structs declarations and initialization. There, they appears to be > >> something like > >> > >> /* Global Variable Declarations */ > >> static _OC_str { unsigned char array[13]; }; > >> > >> /* Global Variable Definitions and Initialization */ > >> static _OC_str { unsigned char array[13]; } = { "Hello > >> World\n" > >> }; > >> > >> when they should be like > >> > >> /* Global Variable Declarations */ > >> static struct type_OC_str { unsigned char array[13]; }; > >> > >> /* Global Variable Definitions and Initialization */ > >> static struct type_OC_str _OC_str = { "Hello World\n" }; > >> > >> It is a known problem, or am I doing something wrong? > >> > >> Thks in advance, > >> > >> -- > >> Cristianno Martins > >> PhD Student of Computer Science > >> University of Campinas > >> cmartins at ic.unicamp.br > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/4bae2888/attachment-0001.html From hkultala at cs.tut.fi Mon Mar 5 07:39:16 2012 From: hkultala at cs.tut.fi (Heikki Kultala) Date: Mon, 05 Mar 2012 15:39:16 +0200 Subject: [LLVMdev] commit r152019 broke architectures with more than 255 registers Message-ID: <4F54C204.1070108@cs.tut.fi> Our architecture(TCE) can have LOTS of registers. It seems r152019 changed some register bookkeeping data structures to 8-bit. This broke support for architectures with >255 registers. Please revert this change or make those register-related values at least 16 bits wide. From rafael.espindola at gmail.com Mon Mar 5 09:13:36 2012 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Mon, 5 Mar 2012 12:13:36 -0300 Subject: [LLVMdev] [patch] Enhance of asm macros In-Reply-To: References: <082C3546-45AE-4800-870C-254C040C1B96@apple.com> Message-ID: > For compability this problems requiring some compiler switch flag. Can you > give me description/example how it's can be done? grep for DwarfRequiresRelocationForSectionOffset. Something like that might do what you want. Cheers, Rafael From afylot at gmail.com Mon Mar 5 09:33:48 2012 From: afylot at gmail.com (simona bellavista) Date: Mon, 5 Mar 2012 16:33:48 +0100 Subject: [LLVMdev] installing llvm from source, make check-all fails on llvm::transforms and clang:preprocessor Message-ID: I downloaded via svn the release_30 and current version code. I am on x86_64 GNU/Linux, I am compiling with gcc 4.4.6 I compiled release_30 with make ENABLE_OPTIMIZED=0 OPTIMIZE_OPTION=-O0 and current release with make In both cases, when I make check-all I get : FAIL: Clang :: Preprocessor/macro_paste_c_block_comment.c (2562 of 9598) ******************** TEST 'Clang :: Preprocessor/macro_paste_c_block_comment.c' FAILED ******************** Script: -- /scratch/user/download/release_30/build/Debug/bin/clang -cc1 -internal-isystem /scratch/user/download/release_30/build/Debug/bin/../lib/clang/3.0/include /scratch/user/download/release_30/llvm/tools/clang/test/Preprocessor/macro_paste_c_block_comment.c -Eonly 2>&1 | grep error /scratch/user/download/release_30/build/Debug/bin/clang -cc1 -internal-isystem /scratch/user/download/release_30/build/Debug/bin/../lib/clang/3.0/include /scratch/user/download/release_30/llvm/tools/clang/test/Preprocessor/macro_paste_c_block_comment.c -Eonly 2>&1 | not grep unterminated /scratch/user/download/release_30/build/Debug/bin/clang -cc1 -internal-isystem /scratch/user/download/release_30/build/Debug/bin/../lib/clang/3.0/include /scratch/user/download/release_30/llvm/tools/clang/test/Preprocessor/macro_paste_c_block_comment.c -Eonly 2>&1 | not grep scratch -- Exit Code: 1 Command Output (stdout): -- /scratch/user/download/release_30/llvm/tools/clang/test/Preprocessor/macro_paste_c_block_comment.c:6:1: error: pasting formed '/*', an invalid preprocessing token 1 error generated. /scratch/user/download/release_30/llvm/tools/clang/test/Preprocessor/macro_paste_c_block_comment.c:6:1: error: pasting formed '/*', an invalid preprocessing token /scratch/user/download/release_30/llvm/tools/clang/test/Preprocessor/macro_paste_c_block_comment.c:5:16: note: expanded from: -- ******************** FAIL: LLVM :: Transforms/GVN/null-aliases-nothing.ll (8045 of 9598) ******************** TEST 'LLVM :: Transforms/GVN/null-aliases-nothing.ll' FAILED ******************** Script: -- /scratch/user/download/release_30/build/Debug/bin/opt /scratch/user/download/release_30/llvm/test/Transforms/GVN/null-aliases-nothing.ll -basicaa -gvn -S | /scratch/user/download/release_30/build/Debug/bin/FileCheck /scratch/user/download/release_30/llvm/test/Transforms/GVN/null-aliases-nothing.ll -- Exit Code: 1 Command Output (stderr): -- :9:12: error: CHECK-NOT: string occurred! %before = load i32* %p ^ /scratch/user/download/release_30/llvm/test/Transforms/GVN/null-aliases-nothing.ll:18:18: note: CHECK-NOT: pattern specified here ; CHECK-NOT: load ^ -- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/063925ac/attachment.html From javier.e.martinez at intel.com Mon Mar 5 11:10:18 2012 From: javier.e.martinez at intel.com (Martinez, Javier E) Date: Mon, 5 Mar 2012 17:10:18 +0000 Subject: [LLVMdev] Expand vector type In-Reply-To: References: <004a01ccf6cd$ce071140$6a1533c0$@molloy@arm.com> Message-ID: I still haven't received any feedback on me adding support for widening of stores. Is there interest? Thanks, Javier From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Martinez, Javier E Sent: Wednesday, February 29, 2012 11:35 AM To: James Molloy; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Expand vector type James, Thanks for your response. I'm working in LLVM 2.7 (I know, it's old) and the default behavior is already promote. This means that for example a call to DAGTypeLegalizer::getTypeAction(v3i32) in my case and I presume in ARM NEON returns TypeWidenVector. From here legalization calls WidenVectorOperand() to process the STORE node and follows the call chain I have on my original email to FindMemType(). If my analysis is correct then your v316 STOREs are being broken into multiple ones depending on ARM NEON support. Can you please confirm? Thanks, Javier From: James Molloy [mailto:james.molloy at arm.com] Sent: Wednesday, February 29, 2012 2:35 AM To: Martinez, Javier E; llvmdev at cs.uiuc.edu Subject: RE: Expand vector type Hi, * Is there a way to setup LLVM to automatically convert vec3s to vec4s? Yes, if you specify v3i16 and friends as "promote" instead of "legal", llvm will promote it to a v4i16. The ARM NEON backend does this already. I'm surprised you haven't got this happening already as you mention that LLVM widens your loads to 4-element vectors... (this should happen during DAG type legalization, by the way). Cheers, James From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Martinez, Javier E Sent: 29 February 2012 00:27 To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] Expand vector type Hello, My input language has support for 3 and 4 element vectors but my target only has support for the latter. The language defines vec3 with the same storage space as vec4 so from a backend perspective they are both the same. I'd really like if I could have LLVM treat vec3 as vec4 but I haven't found out how. Currently the target has emulated support for vec3 through LLVM. Loads are already widened by LLVM to a vec4. Stores are kind of funny. By default LLVM sets the action to 'widen' but in GenWidenVectorStores what ends up happening is an 2:1 split of the vector that's less efficient in this case than actually widening the vector. The reason is that at this point the call to FindMemType assumes that stores can never be widened to a bigger type and so those types are not considered. The call sequence I'm looking at is WidenVectorOperand() -> WidenVecOp_STORE() -> GenWidenVectorStores() -> FindMemType(). I've made a very small modification to enable support for widening stores to a larger type. Before spending more time on working on a generic solution I have a couple of questions: * Is there a way to setup LLVM to automatically convert vec3s to vec4s? * Is there interest in adding support for widened vector stores to a larger type? Thanks, Javier -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/f2f43f37/attachment.html From Micah.Villmow at amd.com Mon Mar 5 11:36:35 2012 From: Micah.Villmow at amd.com (Villmow, Micah) Date: Mon, 5 Mar 2012 17:36:35 +0000 Subject: [LLVMdev] commit r152019 broke architectures with more than 255 registers In-Reply-To: <4F54C204.1070108@cs.tut.fi> References: <4F54C204.1070108@cs.tut.fi> Message-ID: <88EE5EEF64BDB14686BA3D45C5C30BA3185110A8@sausexdag03.amd.com> Ughh... yeah I would have to agree here. The AMDIL backend uses more than 256 registers to model its register file correctly. > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Heikki Kultala > Sent: Monday, March 05, 2012 5:39 AM > To: LLVM Dev > Subject: [LLVMdev] commit r152019 broke architectures with more than > 255 registers > > Our architecture(TCE) can have LOTS of registers. > > It seems r152019 changed some register bookkeeping data structures to > 8-bit. This broke support for architectures with >255 registers. > > Please revert this change or make those register-related values at > least > 16 bits wide. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From joel.andersson at esat.kuleuven.be Mon Mar 5 12:04:57 2012 From: joel.andersson at esat.kuleuven.be (Joel Andersson) Date: Mon, 5 Mar 2012 19:04:57 +0100 Subject: [LLVMdev] LLVM for automatic differentiation or linear algebra? Message-ID: Dear all, I am the author of an open-source package for mathematical optimization and automatic differentiation called CasADi (www.casadi.org) and have recently started realize the potential of the LLVM project. At the core of CasADi are two fast interpretors for mathematical expressions and I'm now planning to complement these with JIT-compilation using LLVM. Does anyone know if there is someone using LLVM for either automatic differentiation or (sparse) linear algebra (two of the things CasADi is capable of doing)? One thing particularly interesting would be direct sparse linear algebra. The only thing I could find was a blog entry from 2009: http://justindomke.wordpress.com/2009/11/30/automatic-differentiation-without-compromises/ Kind regards, Joel -- Joel Andersson, PhD Student Electrical Engineering Department (ESAT-SCD), Room 05.11, K.U.Leuven, Kasteelpark Arenberg 10 - bus 2446, 3001 Heverlee, Belgium Phone: +32-16-321819 Mobile: +32-486-672874 (Belgium) / +34-63-4408800 (Spain) / +46-727-365878(Sweden) Private address: Weidestraat 5, 3000 Leuven, Belgium -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/29be24a6/attachment.html From stoklund at 2pi.dk Mon Mar 5 12:40:07 2012 From: stoklund at 2pi.dk (Jakob Stoklund Olesen) Date: Mon, 05 Mar 2012 10:40:07 -0800 Subject: [LLVMdev] commit r152019 broke architectures with more than 255 registers In-Reply-To: <4F54C204.1070108@cs.tut.fi> References: <4F54C204.1070108@cs.tut.fi> Message-ID: <94C47797-91E0-4BB0-856E-782CC5F3507F@2pi.dk> On Mar 5, 2012, at 5:39 AM, Heikki Kultala wrote: > Our architecture(TCE) can have LOTS of registers. > > It seems r152019 changed some register bookkeeping data structures to > 8-bit. This broke support for architectures with >255 registers. > > Please revert this change or make those register-related values at least > 16 bits wide. I agree. We can limit the number of physregs to 64k, but no more. /jakob From ryta1203 at gmail.com Mon Mar 5 12:53:48 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Mon, 5 Mar 2012 10:53:48 -0800 Subject: [LLVMdev] Clang question Message-ID: Clang is inserting an llvm.memcpy function call into my program where it does not exist (the code never calls memcpy), is there a particular reason for this? It also looks like it's inserting two other artificial function calls, something to do with llvm.lifetime.start and llvm.lifetime.end, what are these functions and why are they being inserted artificially? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/f67c449e/attachment.html From ryta1203 at gmail.com Mon Mar 5 13:04:25 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Mon, 5 Mar 2012 11:04:25 -0800 Subject: [LLVMdev] Clang question In-Reply-To: References: Message-ID: So, let me rephrase, I understand what these functions are, I just want to know why and when they are inserted so that I can make an attempt to remove them, as they are not produced in llvm-gcc, only in clang? On Mon, Mar 5, 2012 at 10:53 AM, Ryan Taylor wrote: > Clang is inserting an llvm.memcpy function call into my program where it > does not exist (the code never calls memcpy), is there a particular reason > for this? It also looks like it's inserting two other artificial function > calls, something to do with llvm.lifetime.start and llvm.lifetime.end, what > are these functions and why are they being inserted artificially? > > Thanks. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/6d72cec1/attachment.html From Micah.Villmow at amd.com Mon Mar 5 13:07:03 2012 From: Micah.Villmow at amd.com (Villmow, Micah) Date: Mon, 5 Mar 2012 19:07:03 +0000 Subject: [LLVMdev] Clang question In-Reply-To: References: Message-ID: <88EE5EEF64BDB14686BA3D45C5C30BA3185111B5@sausexdag03.amd.com> Memcpy in my experience has been inserted when a struct copy is generated. From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Ryan Taylor Sent: Monday, March 05, 2012 11:04 AM To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Clang question So, let me rephrase, I understand what these functions are, I just want to know why and when they are inserted so that I can make an attempt to remove them, as they are not produced in llvm-gcc, only in clang? On Mon, Mar 5, 2012 at 10:53 AM, Ryan Taylor > wrote: Clang is inserting an llvm.memcpy function call into my program where it does not exist (the code never calls memcpy), is there a particular reason for this? It also looks like it's inserting two other artificial function calls, something to do with llvm.lifetime.start and llvm.lifetime.end, what are these functions and why are they being inserted artificially? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/5478ecbc/attachment.html From ryta1203 at gmail.com Mon Mar 5 13:15:01 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Mon, 5 Mar 2012 11:15:01 -0800 Subject: [LLVMdev] Clang question In-Reply-To: <88EE5EEF64BDB14686BA3D45C5C30BA3185111B5@sausexdag03.amd.com> References: <88EE5EEF64BDB14686BA3D45C5C30BA3185111B5@sausexdag03.amd.com> Message-ID: Ok, thanks. Is this an automatic optimization or is there some other way (possibly some other opt I am calling that does this) to get around the memcpy, such as llvm-gcc does? (since it does not use it) So it appears that the external node calls these three functions along with my "real" function. On Mon, Mar 5, 2012 at 11:07 AM, Villmow, Micah wrote: > Memcpy in my experience has been inserted when a struct copy is > generated.**** > > ** ** > > *From:* llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] *On > Behalf Of *Ryan Taylor > *Sent:* Monday, March 05, 2012 11:04 AM > *To:* llvmdev at cs.uiuc.edu > *Subject:* Re: [LLVMdev] Clang question**** > > ** ** > > So, let me rephrase, I understand what these functions are, I just want to > know why and when they are inserted so that I can make an attempt to remove > them, as they are not produced in llvm-gcc, only in clang?**** > > On Mon, Mar 5, 2012 at 10:53 AM, Ryan Taylor wrote:** > ** > > Clang is inserting an llvm.memcpy function call into my program where it > does not exist (the code never calls memcpy), is there a particular reason > for this? It also looks like it's inserting two other artificial function > calls, something to do with llvm.lifetime.start and llvm.lifetime.end, what > are these functions and why are they being inserted artificially? > > Thanks.**** > > ** ** > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/8cac0375/attachment.html From baldrick at free.fr Mon Mar 5 13:16:21 2012 From: baldrick at free.fr (Duncan Sands) Date: Mon, 05 Mar 2012 20:16:21 +0100 Subject: [LLVMdev] Expand vector type In-Reply-To: References: <004a01ccf6cd$ce071140$6a1533c0$@molloy@arm.com> Message-ID: <4F551105.9060007@free.fr> Hi Javier, On 05/03/12 18:10, Martinez, Javier E wrote: > I still haven?t received any feedback on me adding support for widening of > stores. Is there interest? did you try LLVM 3.0? Ciao, Duncan. > > Thanks, > > Javier > > *From:*llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] *On > Behalf Of *Martinez, Javier E > *Sent:* Wednesday, February 29, 2012 11:35 AM > *To:* James Molloy; llvmdev at cs.uiuc.edu > *Subject:* Re: [LLVMdev] Expand vector type > > James, > > Thanks for your response. I?m working in LLVM 2.7 (I know, it?s old) and the > default behavior is already promote. This means that for example a call to > DAGTypeLegalizer::getTypeAction(v3i32) in my case and I presume in ARM NEON > returns TypeWidenVector. From here legalization calls WidenVectorOperand() to > process the STORE node and follows the call chain I have on my original email to > FindMemType(). > > If my analysis is correct then your v316 STOREs are being broken into multiple > ones depending on ARM NEON support. Can you please confirm? > > Thanks, > > Javier > > *From:*James Molloy [mailto:james.molloy at arm.com] > > *Sent:* Wednesday, February 29, 2012 2:35 AM > *To:* Martinez, Javier E; llvmdev at cs.uiuc.edu > *Subject:* RE: Expand vector type > > Hi, > > * *Is there a way to setup LLVM to automatically convert vec3s to vec4s? * > > ** > > Yes, if you specify v3i16 and friends as ?promote? instead of ?legal?, llvm will > promote it to a v4i16. The ARM NEON backend does this already. I?m surprised you > haven?t got this happening already as you mention that LLVM widens your loads to > 4-element vectors? (this should happen during DAG type legalization, by the way). > > Cheers, > > James > > *From:*llvmdev-bounces at cs.uiuc.edu > [mailto:llvmdev-bounces at cs.uiuc.edu] > *On Behalf Of *Martinez, Javier E > *Sent:* 29 February 2012 00:27 > *To:* llvmdev at cs.uiuc.edu > *Subject:* [LLVMdev] Expand vector type > > Hello, > > My input language has support for 3 and 4 element vectors but my target only has > support for the latter. The language defines vec3 with the same storage space as > vec4 so from a backend perspective they are both the same. I?d really like if I > could have LLVM treat vec3 as vec4 but I haven?t found out how. > > Currently the target has emulated support for vec3 through LLVM. Loads are > already widened by LLVM to a vec4. Stores are kind of funny. By default LLVM > sets the action to ?widen? but in GenWidenVectorStores what ends up happening is > an 2:1 split of the vector that?s less efficient in this case than actually > widening the vector. The reason is that at this point the call to FindMemType > assumes that stores can never be widened to a bigger type and so those types are > not considered. The call sequence I?m looking at is WidenVectorOperand() -> > WidenVecOp_STORE() -> GenWidenVectorStores() -> FindMemType(). I?ve made a very > small modification to enable support for widening stores to a larger type. > > Before spending more time on working on a generic solution I have a couple of > questions: > > * *Is there a way to setup LLVM to automatically convert vec3s to vec4s?* > > * *Is there interest in adding support for widened vector stores to a larger type?* > > Thanks, > > Javier > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From sebastian.redl at getdesigned.at Mon Mar 5 13:52:01 2012 From: sebastian.redl at getdesigned.at (Sebastian Redl) Date: Mon, 5 Mar 2012 20:52:01 +0100 Subject: [LLVMdev] Clang question In-Reply-To: References: Message-ID: <0DE12174-D7A4-4F1C-B88E-275A92DC11F4@getdesigned.at> On 05.03.2012, at 19:53, Ryan Taylor wrote: > Clang is inserting an llvm.memcpy function call into my program where it does not exist (the code never calls memcpy), is there a particular reason for this? It also looks like it's inserting two other artificial function calls, something to do with llvm.lifetime.start and llvm.lifetime.end, what are these functions and why are they being inserted artificially? llvm.lifetime.* are just markers that are used by the optimizer to reason about the code. http://llvm.org/docs/LangRef.html#int_memorymarkers They disappear without a trace when lowering to machine code. The memcpy is just the way Clang does POD copying. It's up to the optimizers to decide whether to lower this to custom code or actually emit a call to memcpy. http://llvm.org/docs/LangRef.html#int_memcpy Sebastian From christoph at sicherha.de Mon Mar 5 13:51:40 2012 From: christoph at sicherha.de (Christoph Erhardt) Date: Mon, 05 Mar 2012 20:51:40 +0100 Subject: [LLVMdev] Clang question In-Reply-To: References: Message-ID: <4F55194C.1070406@sicherha.de> Hi Ryan, the compiler is free to insert implicit calls to memcpy(), for instance for assignments from one struct/class variable to another. The same goes for memset(), which may be inserted implicitly for the initialization of local structs or arrays. The good news is that the backend normally optimizes these calls away where possible, replacing them with simple moves - at least as long as the number of bytes to copy does not exceed a certain threshold. As for the llvm.lifetime intrinsics, take a look at the documentation: http://llvm.org/docs/LangRef.html#int_memorymarkers If I'm not mistaken, these calls seem to be used to mark the lifespan of a stack-allocated object. Regards, Christoph From ryta1203 at gmail.com Mon Mar 5 13:56:27 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Mon, 5 Mar 2012 11:56:27 -0800 Subject: [LLVMdev] Clang question In-Reply-To: <4F55194C.1070406@sicherha.de> References: <4F55194C.1070406@sicherha.de> Message-ID: Christoph, Yes, you are correct on the lifetime calls, they are just markers for liveness. However, the backend is not optimizing these calls away. I could try to deal with them outside of llvm but I was hoping for a cleaner solution using llvm? On Mon, Mar 5, 2012 at 11:51 AM, Christoph Erhardt wrote: > Hi Ryan, > > the compiler is free to insert implicit calls to memcpy(), for instance > for assignments from one struct/class variable to another. The same goes > for memset(), which may be inserted implicitly for the initialization of > local structs or arrays. > > The good news is that the backend normally optimizes these calls away > where possible, replacing them with simple moves - at least as long as > the number of bytes to copy does not exceed a certain threshold. > > As for the llvm.lifetime intrinsics, take a look at the documentation: > http://llvm.org/docs/LangRef.html#int_memorymarkers > If I'm not mistaken, these calls seem to be used to mark the lifespan of > a stack-allocated object. > > Regards, > Christoph > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/b24b9740/attachment.html From echristo at apple.com Mon Mar 5 14:27:54 2012 From: echristo at apple.com (Eric Christopher) Date: Mon, 05 Mar 2012 12:27:54 -0800 Subject: [LLVMdev] Clang question In-Reply-To: References: <4F55194C.1070406@sicherha.de> Message-ID: <8F2F8D01-7E9B-44CA-BE67-AFE640671274@apple.com> You don't have memcpy or want it to always lower it? -eric On Mar 5, 2012, at 11:56 AM, Ryan Taylor wrote: > Christoph, > > Yes, you are correct on the lifetime calls, they are just markers for liveness. > > However, the backend is not optimizing these calls away. I could try to deal with them outside of llvm but I was hoping for a cleaner solution using llvm? > > On Mon, Mar 5, 2012 at 11:51 AM, Christoph Erhardt wrote: > Hi Ryan, > > the compiler is free to insert implicit calls to memcpy(), for instance > for assignments from one struct/class variable to another. The same goes > for memset(), which may be inserted implicitly for the initialization of > local structs or arrays. > > The good news is that the backend normally optimizes these calls away > where possible, replacing them with simple moves - at least as long as > the number of bytes to copy does not exceed a certain threshold. > > As for the llvm.lifetime intrinsics, take a look at the documentation: > http://llvm.org/docs/LangRef.html#int_memorymarkers > If I'm not mistaken, these calls seem to be used to mark the lifespan of > a stack-allocated object. > > Regards, > Christoph > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From ryta1203 at gmail.com Mon Mar 5 14:28:54 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Mon, 5 Mar 2012 12:28:54 -0800 Subject: [LLVMdev] Clang question In-Reply-To: <8F2F8D01-7E9B-44CA-BE67-AFE640671274@apple.com> References: <4F55194C.1070406@sicherha.de> <8F2F8D01-7E9B-44CA-BE67-AFE640671274@apple.com> Message-ID: I would like it to always be lowered, I don't want it. On Mon, Mar 5, 2012 at 12:27 PM, Eric Christopher wrote: > You don't have memcpy or want it to always lower it? > > -eric > > On Mar 5, 2012, at 11:56 AM, Ryan Taylor wrote: > > > Christoph, > > > > Yes, you are correct on the lifetime calls, they are just markers for > liveness. > > > > However, the backend is not optimizing these calls away. I could try to > deal with them outside of llvm but I was hoping for a cleaner solution > using llvm? > > > > On Mon, Mar 5, 2012 at 11:51 AM, Christoph Erhardt < > christoph at sicherha.de> wrote: > > Hi Ryan, > > > > the compiler is free to insert implicit calls to memcpy(), for instance > > for assignments from one struct/class variable to another. The same goes > > for memset(), which may be inserted implicitly for the initialization of > > local structs or arrays. > > > > The good news is that the backend normally optimizes these calls away > > where possible, replacing them with simple moves - at least as long as > > the number of bytes to copy does not exceed a certain threshold. > > > > As for the llvm.lifetime intrinsics, take a look at the documentation: > > http://llvm.org/docs/LangRef.html#int_memorymarkers > > If I'm not mistaken, these calls seem to be used to mark the lifespan of > > a stack-allocated object. > > > > Regards, > > Christoph > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/c0963455/attachment.html From echristo at apple.com Mon Mar 5 14:35:23 2012 From: echristo at apple.com (Eric Christopher) Date: Mon, 05 Mar 2012 12:35:23 -0800 Subject: [LLVMdev] Clang question In-Reply-To: References: <4F55194C.1070406@sicherha.de> <8F2F8D01-7E9B-44CA-BE67-AFE640671274@apple.com> Message-ID: You'll need to do the work then. I'd also question why? On most platforms a decent memcpy exists. -eric On Mar 5, 2012, at 12:28 PM, Ryan Taylor wrote: > I would like it to always be lowered, I don't want it. > > On Mon, Mar 5, 2012 at 12:27 PM, Eric Christopher wrote: > You don't have memcpy or want it to always lower it? > > -eric > > On Mar 5, 2012, at 11:56 AM, Ryan Taylor wrote: > > > Christoph, > > > > Yes, you are correct on the lifetime calls, they are just markers for liveness. > > > > However, the backend is not optimizing these calls away. I could try to deal with them outside of llvm but I was hoping for a cleaner solution using llvm? > > > > On Mon, Mar 5, 2012 at 11:51 AM, Christoph Erhardt wrote: > > Hi Ryan, > > > > the compiler is free to insert implicit calls to memcpy(), for instance > > for assignments from one struct/class variable to another. The same goes > > for memset(), which may be inserted implicitly for the initialization of > > local structs or arrays. > > > > The good news is that the backend normally optimizes these calls away > > where possible, replacing them with simple moves - at least as long as > > the number of bytes to copy does not exceed a certain threshold. > > > > As for the llvm.lifetime intrinsics, take a look at the documentation: > > http://llvm.org/docs/LangRef.html#int_memorymarkers > > If I'm not mistaken, these calls seem to be used to mark the lifespan of > > a stack-allocated object. > > > > Regards, > > Christoph > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From ryta1203 at gmail.com Mon Mar 5 14:38:09 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Mon, 5 Mar 2012 12:38:09 -0800 Subject: [LLVMdev] Fwd: Clang question In-Reply-To: References: <4F55194C.1070406@sicherha.de> <8F2F8D01-7E9B-44CA-BE67-AFE640671274@apple.com> Message-ID: Eric, Ok, thanks, looks like I'll need to figure something out. I was hoping scalarrepl would take care of this for me, but it's not lowering the structure (I haven't look at the opt code to see why, I"m sure there's some valid reason I'm unaware of atm). On Mon, Mar 5, 2012 at 12:35 PM, Eric Christopher wrote: > You'll need to do the work then. I'd also question why? On most platforms > a decent memcpy exists. > > -eric > > On Mar 5, 2012, at 12:28 PM, Ryan Taylor wrote: > > > I would like it to always be lowered, I don't want it. > > > > On Mon, Mar 5, 2012 at 12:27 PM, Eric Christopher > wrote: > > You don't have memcpy or want it to always lower it? > > > > -eric > > > > On Mar 5, 2012, at 11:56 AM, Ryan Taylor wrote: > > > > > Christoph, > > > > > > Yes, you are correct on the lifetime calls, they are just markers for > liveness. > > > > > > However, the backend is not optimizing these calls away. I could try > to deal with them outside of llvm but I was hoping for a cleaner solution > using llvm? > > > > > > On Mon, Mar 5, 2012 at 11:51 AM, Christoph Erhardt < > christoph at sicherha.de> wrote: > > > Hi Ryan, > > > > > > the compiler is free to insert implicit calls to memcpy(), for instance > > > for assignments from one struct/class variable to another. The same > goes > > > for memset(), which may be inserted implicitly for the initialization > of > > > local structs or arrays. > > > > > > The good news is that the backend normally optimizes these calls away > > > where possible, replacing them with simple moves - at least as long as > > > the number of bytes to copy does not exceed a certain threshold. > > > > > > As for the llvm.lifetime intrinsics, take a look at the documentation: > > > http://llvm.org/docs/LangRef.html#int_memorymarkers > > > If I'm not mistaken, these calls seem to be used to mark the lifespan > of > > > a stack-allocated object. > > > > > > Regards, > > > Christoph > > > > > > _______________________________________________ > > > LLVM Developers mailing list > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/bd844e25/attachment.html From simon.m.moll at googlemail.com Mon Mar 5 15:00:32 2012 From: simon.m.moll at googlemail.com (Simon Moll) Date: Mon, 05 Mar 2012 21:00:32 +0000 Subject: [LLVMdev] OpenCL backend for LLVM Message-ID: <1330981232.1849.7.camel@gnarf-laptop> Hi, this is a follow-up on my email from august (http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-August/042737.html). i have, finally, released my OpenCL backend and control-flow restructuring framework for LLVM (AST-Extractor, or short axtor). The framework restructures function CFGs such that they can be expressed entirely without GOTOs or switch/loop-trickery. Hence, making it possible to emit source-code for strictly control-flow structured languages (OpenCL, GLSL). The code includes a drop-in OpenCL driver that allows source-to-source OpenCL code transformations on existing OpenCL applications. The OpenCL backend has been under development for a while now and was tested against the NVIDIA, AMD and Rodinia demo/benchmark suites with recent NVIDIA/AMD drivers. Results for NVIDIA and AMD show, with few exceptions, that the source-to-source-loop does not introduce any performance penalty on the generated kernels (known exception: AES on recent AMD drivers), However, kernels with sampler types are currently unsupported and the source-to-source-loop may introduce slight imprecisions to floating point operations. The project builds against the current SVN version of LLVM and Clang. The GLSL backend has been lacking some attention (still at 2.9) and will be ported later to LLVM-svn. To have a look at the source, go to https://bitbucket.org/gnarf/axtor/ where it is hosted under the GPL license. Please get back to me, if you have any questions or want to work on the code (however, i won't be able to regulary check on my emails before April but you will get your reply sooner or later). Kind regards, Simon Moll From Micah.Villmow at amd.com Mon Mar 5 15:07:50 2012 From: Micah.Villmow at amd.com (Villmow, Micah) Date: Mon, 5 Mar 2012 21:07:50 +0000 Subject: [LLVMdev] OpenCL backend for LLVM In-Reply-To: <1330981232.1849.7.camel@gnarf-laptop> References: <1330981232.1849.7.camel@gnarf-laptop> Message-ID: <88EE5EEF64BDB14686BA3D45C5C30BA31851132D@sausexdag03.amd.com> Simon, Have you looked at the control flow structizer that we have in the Open Source AMDIL backend? > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Simon Moll > Sent: Monday, March 05, 2012 1:01 PM > To: llvmdev at cs.uiuc.edu > Subject: [LLVMdev] OpenCL backend for LLVM > > Hi, > > this is a follow-up on my email from august > (http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-August/042737.html). > > i have, finally, released my OpenCL backend and control-flow > restructuring framework for LLVM (AST-Extractor, or short axtor). The > framework restructures function CFGs such that they can be expressed > entirely without GOTOs or switch/loop-trickery. Hence, making it > possible to emit source-code for strictly control-flow structured > languages (OpenCL, GLSL). The code includes a drop-in OpenCL driver > that > allows source-to-source OpenCL code transformations on existing OpenCL > applications. > The OpenCL backend has been under development for a while now and was > tested against the NVIDIA, AMD and Rodinia demo/benchmark suites with > recent NVIDIA/AMD drivers. Results for NVIDIA and AMD show, with few > exceptions, that the source-to-source-loop does not introduce any > performance penalty on the generated kernels (known exception: AES on > recent AMD drivers), > > However, kernels with sampler types are currently unsupported and the > source-to-source-loop may introduce slight imprecisions to floating > point operations. > > The project builds against the current SVN version of LLVM and Clang. > The GLSL backend has been lacking some attention (still at 2.9) and > will > be ported later to LLVM-svn. > > To have a look at the source, go to https://bitbucket.org/gnarf/axtor/ > where it is hosted under the GPL license. > > Please get back to me, if you have any questions or want to work on the > code (however, i won't be able to regulary check on my emails before > April but you will get your reply sooner or later). > > Kind regards, > Simon Moll > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From resistor at mac.com Mon Mar 5 15:44:23 2012 From: resistor at mac.com (Owen Anderson) Date: Mon, 05 Mar 2012 13:44:23 -0800 Subject: [LLVMdev] Clang question In-Reply-To: References: <4F55194C.1070406@sicherha.de> <8F2F8D01-7E9B-44CA-BE67-AFE640671274@apple.com> Message-ID: Does -fno-builtin[-memcpy] handle this? --Owen On Mar 5, 2012, at 12:35 PM, Eric Christopher wrote: > You'll need to do the work then. I'd also question why? On most platforms a decent memcpy exists. > > -eric > > On Mar 5, 2012, at 12:28 PM, Ryan Taylor wrote: > >> I would like it to always be lowered, I don't want it. >> >> On Mon, Mar 5, 2012 at 12:27 PM, Eric Christopher wrote: >> You don't have memcpy or want it to always lower it? >> >> -eric >> >> On Mar 5, 2012, at 11:56 AM, Ryan Taylor wrote: >> >>> Christoph, >>> >>> Yes, you are correct on the lifetime calls, they are just markers for liveness. >>> >>> However, the backend is not optimizing these calls away. I could try to deal with them outside of llvm but I was hoping for a cleaner solution using llvm? >>> >>> On Mon, Mar 5, 2012 at 11:51 AM, Christoph Erhardt wrote: >>> Hi Ryan, >>> >>> the compiler is free to insert implicit calls to memcpy(), for instance >>> for assignments from one struct/class variable to another. The same goes >>> for memset(), which may be inserted implicitly for the initialization of >>> local structs or arrays. >>> >>> The good news is that the backend normally optimizes these calls away >>> where possible, replacing them with simple moves - at least as long as >>> the number of bytes to copy does not exceed a certain threshold. >>> >>> As for the llvm.lifetime intrinsics, take a look at the documentation: >>> http://llvm.org/docs/LangRef.html#int_memorymarkers >>> If I'm not mistaken, these calls seem to be used to mark the lifespan of >>> a stack-allocated object. >>> >>> Regards, >>> Christoph >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From Micah.Villmow at amd.com Mon Mar 5 16:18:52 2012 From: Micah.Villmow at amd.com (Villmow, Micah) Date: Mon, 5 Mar 2012 22:18:52 +0000 Subject: [LLVMdev] Clang question In-Reply-To: References: <4F55194C.1070406@sicherha.de> <8F2F8D01-7E9B-44CA-BE67-AFE640671274@apple.com> Message-ID: <88EE5EEF64BDB14686BA3D45C5C30BA3185113EF@sausexdag03.amd.com> Ryan, In a backend, set this in the TargetLowering: maxStoresPerMemcpy = 4096; maxStoresPerMemmove = 4096; maxStoresPerMemset = 4096; We don't have memcpy in our backend so we have to expand it to a sequence of stores. Micah > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Eric Christopher > Sent: Monday, March 05, 2012 12:35 PM > To: Ryan Taylor > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] Clang question > > You'll need to do the work then. I'd also question why? On most > platforms a decent memcpy exists. > > -eric > > On Mar 5, 2012, at 12:28 PM, Ryan Taylor wrote: > > > I would like it to always be lowered, I don't want it. > > > > On Mon, Mar 5, 2012 at 12:27 PM, Eric Christopher > wrote: > > You don't have memcpy or want it to always lower it? > > > > -eric > > > > On Mar 5, 2012, at 11:56 AM, Ryan Taylor wrote: > > > > > Christoph, > > > > > > Yes, you are correct on the lifetime calls, they are just markers > for liveness. > > > > > > However, the backend is not optimizing these calls away. I could > try to deal with them outside of llvm but I was hoping for a cleaner > solution using llvm? > > > > > > On Mon, Mar 5, 2012 at 11:51 AM, Christoph Erhardt > wrote: > > > Hi Ryan, > > > > > > the compiler is free to insert implicit calls to memcpy(), for > instance > > > for assignments from one struct/class variable to another. The same > goes > > > for memset(), which may be inserted implicitly for the > initialization of > > > local structs or arrays. > > > > > > The good news is that the backend normally optimizes these calls > away > > > where possible, replacing them with simple moves - at least as long > as > > > the number of bytes to copy does not exceed a certain threshold. > > > > > > As for the llvm.lifetime intrinsics, take a look at the > documentation: > > > http://llvm.org/docs/LangRef.html#int_memorymarkers > > > If I'm not mistaken, these calls seem to be used to mark the > lifespan of > > > a stack-allocated object. > > > > > > Regards, > > > Christoph > > > > > > _______________________________________________ > > > LLVM Developers mailing list > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From ryta1203 at gmail.com Mon Mar 5 16:40:30 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Mon, 5 Mar 2012 14:40:30 -0800 Subject: [LLVMdev] Clang question In-Reply-To: References: <4F55194C.1070406@sicherha.de> <8F2F8D01-7E9B-44CA-BE67-AFE640671274@apple.com> Message-ID: As a command line argument for clang? Or an opt? Says, "argument unused during compilation", but I think that is basically what I'm looking for right? On Mon, Mar 5, 2012 at 1:44 PM, Owen Anderson wrote: > Does -fno-builtin[-memcpy] handle this? > > --Owen > > On Mar 5, 2012, at 12:35 PM, Eric Christopher wrote: > > > You'll need to do the work then. I'd also question why? On most > platforms a decent memcpy exists. > > > > -eric > > > > On Mar 5, 2012, at 12:28 PM, Ryan Taylor wrote: > > > >> I would like it to always be lowered, I don't want it. > >> > >> On Mon, Mar 5, 2012 at 12:27 PM, Eric Christopher > wrote: > >> You don't have memcpy or want it to always lower it? > >> > >> -eric > >> > >> On Mar 5, 2012, at 11:56 AM, Ryan Taylor wrote: > >> > >>> Christoph, > >>> > >>> Yes, you are correct on the lifetime calls, they are just markers for > liveness. > >>> > >>> However, the backend is not optimizing these calls away. I could try > to deal with them outside of llvm but I was hoping for a cleaner solution > using llvm? > >>> > >>> On Mon, Mar 5, 2012 at 11:51 AM, Christoph Erhardt < > christoph at sicherha.de> wrote: > >>> Hi Ryan, > >>> > >>> the compiler is free to insert implicit calls to memcpy(), for instance > >>> for assignments from one struct/class variable to another. The same > goes > >>> for memset(), which may be inserted implicitly for the initialization > of > >>> local structs or arrays. > >>> > >>> The good news is that the backend normally optimizes these calls away > >>> where possible, replacing them with simple moves - at least as long as > >>> the number of bytes to copy does not exceed a certain threshold. > >>> > >>> As for the llvm.lifetime intrinsics, take a look at the documentation: > >>> http://llvm.org/docs/LangRef.html#int_memorymarkers > >>> If I'm not mistaken, these calls seem to be used to mark the lifespan > of > >>> a stack-allocated object. > >>> > >>> Regards, > >>> Christoph > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/4a9c4d12/attachment.html From ryta1203 at gmail.com Mon Mar 5 17:00:44 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Mon, 5 Mar 2012 15:00:44 -0800 Subject: [LLVMdev] Fwd: Clang question In-Reply-To: References: <4F55194C.1070406@sicherha.de> <8F2F8D01-7E9B-44CA-BE67-AFE640671274@apple.com> Message-ID: Owen, Clang doesn't accept this as an option; however, it did accept -fno-builtin (the more general for all usage) and this has seemed to work. Thank you. My other question would then be how to lower vector instructions, such as extractelement, insertelement and shufflevector. These should be solved by ld/st/address calculation, correct? This is somewhat of the same problem it seems to me, or not? On Mon, Mar 5, 2012 at 1:44 PM, Owen Anderson wrote: > Does -fno-builtin[-memcpy] handle this? > > --Owen > > On Mar 5, 2012, at 12:35 PM, Eric Christopher wrote: > > > You'll need to do the work then. I'd also question why? On most > platforms a decent memcpy exists. > > > > -eric > > > > On Mar 5, 2012, at 12:28 PM, Ryan Taylor wrote: > > > >> I would like it to always be lowered, I don't want it. > >> > >> On Mon, Mar 5, 2012 at 12:27 PM, Eric Christopher > wrote: > >> You don't have memcpy or want it to always lower it? > >> > >> -eric > >> > >> On Mar 5, 2012, at 11:56 AM, Ryan Taylor wrote: > >> > >>> Christoph, > >>> > >>> Yes, you are correct on the lifetime calls, they are just markers for > liveness. > >>> > >>> However, the backend is not optimizing these calls away. I could try > to deal with them outside of llvm but I was hoping for a cleaner solution > using llvm? > >>> > >>> On Mon, Mar 5, 2012 at 11:51 AM, Christoph Erhardt < > christoph at sicherha.de> wrote: > >>> Hi Ryan, > >>> > >>> the compiler is free to insert implicit calls to memcpy(), for instance > >>> for assignments from one struct/class variable to another. The same > goes > >>> for memset(), which may be inserted implicitly for the initialization > of > >>> local structs or arrays. > >>> > >>> The good news is that the backend normally optimizes these calls away > >>> where possible, replacing them with simple moves - at least as long as > >>> the number of bytes to copy does not exceed a certain threshold. > >>> > >>> As for the llvm.lifetime intrinsics, take a look at the documentation: > >>> http://llvm.org/docs/LangRef.html#int_memorymarkers > >>> If I'm not mistaken, these calls seem to be used to mark the lifespan > of > >>> a stack-allocated object. > >>> > >>> Regards, > >>> Christoph > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/42ded319/attachment.html From ryta1203 at gmail.com Mon Mar 5 17:09:38 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Mon, 5 Mar 2012 15:09:38 -0800 Subject: [LLVMdev] Clang question In-Reply-To: References: <4F55194C.1070406@sicherha.de> <8F2F8D01-7E9B-44CA-BE67-AFE640671274@apple.com> Message-ID: Owen, Nevermind. bb-vectorize causes this optimization I see, I have disabled it. I am still curious though, what is the syntactically correct way to just remove the -memcpy using -fno-builtin, I have tried both -fno-builtin[-memcpy] and the "gcc" version -fno-builtin-memcpy? On Mon, Mar 5, 2012 at 3:00 PM, Ryan Taylor wrote: > > Owen, > > Clang doesn't accept this as an option; however, it did accept > -fno-builtin (the more general for all usage) and this has seemed to work. > Thank you. > > My other question would then be how to lower vector instructions, such > as extractelement, insertelement and shufflevector. These should be solved > by ld/st/address calculation, correct? This is somewhat of the same problem > it seems to me, or not? > > > On Mon, Mar 5, 2012 at 1:44 PM, Owen Anderson wrote: > >> Does -fno-builtin[-memcpy] handle this? >> >> --Owen >> >> On Mar 5, 2012, at 12:35 PM, Eric Christopher wrote: >> >> > You'll need to do the work then. I'd also question why? On most >> platforms a decent memcpy exists. >> > >> > -eric >> > >> > On Mar 5, 2012, at 12:28 PM, Ryan Taylor wrote: >> > >> >> I would like it to always be lowered, I don't want it. >> >> >> >> On Mon, Mar 5, 2012 at 12:27 PM, Eric Christopher >> wrote: >> >> You don't have memcpy or want it to always lower it? >> >> >> >> -eric >> >> >> >> On Mar 5, 2012, at 11:56 AM, Ryan Taylor wrote: >> >> >> >>> Christoph, >> >>> >> >>> Yes, you are correct on the lifetime calls, they are just markers for >> liveness. >> >>> >> >>> However, the backend is not optimizing these calls away. I could try >> to deal with them outside of llvm but I was hoping for a cleaner solution >> using llvm? >> >>> >> >>> On Mon, Mar 5, 2012 at 11:51 AM, Christoph Erhardt < >> christoph at sicherha.de> wrote: >> >>> Hi Ryan, >> >>> >> >>> the compiler is free to insert implicit calls to memcpy(), for >> instance >> >>> for assignments from one struct/class variable to another. The same >> goes >> >>> for memset(), which may be inserted implicitly for the initialization >> of >> >>> local structs or arrays. >> >>> >> >>> The good news is that the backend normally optimizes these calls away >> >>> where possible, replacing them with simple moves - at least as long as >> >>> the number of bytes to copy does not exceed a certain threshold. >> >>> >> >>> As for the llvm.lifetime intrinsics, take a look at the documentation: >> >>> http://llvm.org/docs/LangRef.html#int_memorymarkers >> >>> If I'm not mistaken, these calls seem to be used to mark the lifespan >> of >> >>> a stack-allocated object. >> >>> >> >>> Regards, >> >>> Christoph >> >>> >> >>> _______________________________________________ >> >>> LLVM Developers mailing list >> >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/92779e76/attachment.html From eli.friedman at gmail.com Mon Mar 5 17:16:01 2012 From: eli.friedman at gmail.com (Eli Friedman) Date: Mon, 5 Mar 2012 15:16:01 -0800 Subject: [LLVMdev] Clang question In-Reply-To: References: <4F55194C.1070406@sicherha.de> <8F2F8D01-7E9B-44CA-BE67-AFE640671274@apple.com> Message-ID: On Mon, Mar 5, 2012 at 3:09 PM, Ryan Taylor wrote: > Owen, > > ? Nevermind. bb-vectorize causes this optimization I see, I have disabled > it. > > ? I am still curious though, what is the syntactically correct way to just > remove the -memcpy using -fno-builtin, I have tried both > -fno-builtin[-memcpy] and the "gcc" version -fno-builtin-memcpy? It's a known issue that we don't support -fno-builtin-memcpy etc. As far as I know, nobody really considers it a priority. -Eli From ryta1203 at gmail.com Mon Mar 5 17:21:27 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Mon, 5 Mar 2012 15:21:27 -0800 Subject: [LLVMdev] Clang question In-Reply-To: References: <4F55194C.1070406@sicherha.de> <8F2F8D01-7E9B-44CA-BE67-AFE640671274@apple.com> Message-ID: Thanks for the reply. On Mon, Mar 5, 2012 at 3:16 PM, Eli Friedman wrote: > On Mon, Mar 5, 2012 at 3:09 PM, Ryan Taylor wrote: > > Owen, > > > > Nevermind. bb-vectorize causes this optimization I see, I have disabled > > it. > > > > I am still curious though, what is the syntactically correct way to > just > > remove the -memcpy using -fno-builtin, I have tried both > > -fno-builtin[-memcpy] and the "gcc" version -fno-builtin-memcpy? > > It's a known issue that we don't support -fno-builtin-memcpy etc. As > far as I know, nobody really considers it a priority. > > -Eli > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120305/7a3a8589/attachment.html From bjorn.desutter at elis.ugent.be Tue Mar 6 02:24:51 2012 From: bjorn.desutter at elis.ugent.be (Bjorn De Sutter) Date: Tue, 6 Mar 2012 09:24:51 +0100 Subject: [LLVMdev] Recent changes to MCRegisterClass fields: uint8_t is too narrow Message-ID: <69B78617-EC86-4C1A-8B3B-EECA3E163BC3@elis.ugent.be> Hi all, in r152019 (from ctopper), the number of available registers of any type in a machine description is decreased to 256 because it needs to be encoded in uint8_t now. I'm trying to support an experimental embedded architecture with more registers (out of tree), but now that becomes impossible. Anyone knows a solution? Thanks, Bjorn De Sutter Computer Systems Lab Ghent University From craig.topper at gmail.com Tue Mar 6 02:32:43 2012 From: craig.topper at gmail.com (Craig Topper) Date: Tue, 6 Mar 2012 00:32:43 -0800 Subject: [LLVMdev] Recent changes to MCRegisterClass fields: uint8_t is too narrow In-Reply-To: <69B78617-EC86-4C1A-8B3B-EECA3E163BC3@elis.ugent.be> References: <69B78617-EC86-4C1A-8B3B-EECA3E163BC3@elis.ugent.be> Message-ID: I changed it to uint16_t in r152100. Is that enough for your architecture? On Tue, Mar 6, 2012 at 12:24 AM, Bjorn De Sutter < bjorn.desutter at elis.ugent.be> wrote: > Hi all, > > in r152019 (from ctopper), the number of available registers of any type > in a machine description is decreased to 256 because it needs to be encoded > in uint8_t now. I'm trying to support an experimental embedded architecture > with more registers (out of tree), but now that becomes impossible. Anyone > knows a solution? > > Thanks, > > Bjorn De Sutter > Computer Systems Lab > Ghent University > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/93e0567c/attachment.html From craig.topper at gmail.com Tue Mar 6 02:33:20 2012 From: craig.topper at gmail.com (Craig Topper) Date: Tue, 6 Mar 2012 00:33:20 -0800 Subject: [LLVMdev] commit r152019 broke architectures with more than 255 registers In-Reply-To: <94C47797-91E0-4BB0-856E-782CC5F3507F@2pi.dk> References: <4F54C204.1070108@cs.tut.fi> <94C47797-91E0-4BB0-856E-782CC5F3507F@2pi.dk> Message-ID: This has been changed to uint16_t in r152100. On Mon, Mar 5, 2012 at 10:40 AM, Jakob Stoklund Olesen wrote: > > On Mar 5, 2012, at 5:39 AM, Heikki Kultala wrote: > > > Our architecture(TCE) can have LOTS of registers. > > > > It seems r152019 changed some register bookkeeping data structures to > > 8-bit. This broke support for architectures with >255 registers. > > > > Please revert this change or make those register-related values at least > > 16 bits wide. > > I agree. We can limit the number of physregs to 64k, but no more. > > /jakob > > -- ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/3ab33aae/attachment.html From bjorn.desutter at elis.ugent.be Tue Mar 6 02:53:24 2012 From: bjorn.desutter at elis.ugent.be (Bjorn De Sutter) Date: Tue, 6 Mar 2012 09:53:24 +0100 Subject: [LLVMdev] Recent changes to MCRegisterClass fields: uint8_t is too narrow In-Reply-To: References: <69B78617-EC86-4C1A-8B3B-EECA3E163BC3@elis.ugent.be> Message-ID: <4E512D75-077D-472C-B592-889FF5A71178@elis.ugent.be> Yep, that should work fine. Thanks (and sorry for the duplicate posting), Bjorn On 06 Mar 2012, at 09:32, Craig Topper wrote: > I changed it to uint16_t in r152100. Is that enough for your architecture? > > On Tue, Mar 6, 2012 at 12:24 AM, Bjorn De Sutter wrote: > Hi all, > > in r152019 (from ctopper), the number of available registers of any type in a machine description is decreased to 256 because it needs to be encoded in uint8_t now. I'm trying to support an experimental embedded architecture with more registers (out of tree), but now that becomes impossible. Anyone knows a solution? > > Thanks, > > Bjorn De Sutter > Computer Systems Lab > Ghent University > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > ~Craig -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/5166b539/attachment.html From baldrick at free.fr Tue Mar 6 04:05:12 2012 From: baldrick at free.fr (Duncan Sands) Date: Tue, 06 Mar 2012 11:05:12 +0100 Subject: [LLVMdev] installing llvm from source, make check-all fails on llvm::transforms and clang:preprocessor In-Reply-To: References: Message-ID: <4F55E158.60906@free.fr> Hi Simona, these failures are due to the name of the path to LLVM/clang, see below. > /scratch/user/download/release_30/build/Debug/bin/clang -cc1 -internal-isystem > /scratch/user/download/release_30/build/Debug/bin/../lib/clang/3.0/include > /scratch/user/download/release_30/llvm/tools/clang/test/Preprocessor/macro_paste_c_block_comment.c > > > -Eonly 2>&1 | not grep scratch This test checks that the word "scratch" doesn't occur in the clang output. But because it lives under your "/scratch" directory, the word scratch does occur, in the path. > /scratch/user/download/release_30/llvm/test/Transforms/GVN/null-aliases-nothing.ll:18:18: > note: CHECK-NOT: pattern specified here > ; CHECK-NOT: load Here it is checked that the word "load" does not occur. But because it lives under "download" which contains "load", the word load is wrongly thought to occur. Yes, this is all very silly and shows a weakness in the testing infrastructure. Ciao, Duncan. From simon.m.moll at googlemail.com Tue Mar 6 04:49:11 2012 From: simon.m.moll at googlemail.com (Simon Moll) Date: Tue, 06 Mar 2012 10:49:11 +0000 Subject: [LLVMdev] OpenCL backend for LLVM In-Reply-To: <88EE5EEF64BDB14686BA3D45C5C30BA31851132D@sausexdag03.amd.com> References: <1330981232.1849.7.camel@gnarf-laptop> <88EE5EEF64BDB14686BA3D45C5C30BA31851132D@sausexdag03.amd.com> Message-ID: <1331030951.1776.32.camel@gnarf-laptop> Hi Micah, i just had a quick look at your structurizer. Here is what if found (correct me, if i am mistaken): * Our approaches for handling Loops with multiple exits are identical. ("Loop-Exit Enumeration") * Axtor implements Controlled-Node Splitting and can cope with irreducible control-flow. (http://cardit.et.tudelft.nl/MOVE/papers/cc96.ps) * Axtor translates switches to cascading IF-instructions * You are cloning nodes for predecessors to restructure IF-structures. In Axtor, additionally to that, i implemented another method of dealing with unstructured IFs. That method basically does the same as the loop-exit structurizer. When parsing an IF, it collects all branches to conflicting blocks you would otherwise clone and puts all those blocks behind a landing block. That block than makes the exit for the IF. I favour that approach for several reasons: Firstly, it is not safe to clone blocks that contain memory barriers/fences (at least not wrt the OpenCL specification, because the pathes of the threads leading to a barrier might not all be governed by uniform state). Secondly, i assumed that it is easier for the "receiving" OpenCL compiler to recover the original CFG with the landing block approach. It seems much harder to identify duplicate blocks than to trace a successor through the landing block. The idea behind axtor was to make functions with arbitrary CFGs work on GPUs (usual exceptions apply: no fnc/block ptrs), such that a reliable OpenCL backend becomes feasible. On Mon, 2012-03-05 at 21:07 +0000, Villmow, Micah wrote: > Simon, > Have you looked at the control flow structizer that we have in the Open Source AMDIL backend? > > > -----Original Message----- > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > > On Behalf Of Simon Moll > > Sent: Monday, March 05, 2012 1:01 PM > > To: llvmdev at cs.uiuc.edu > > Subject: [LLVMdev] OpenCL backend for LLVM > > > > Hi, > > > > this is a follow-up on my email from august > > (http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-August/042737.html). > > > > i have, finally, released my OpenCL backend and control-flow > > restructuring framework for LLVM (AST-Extractor, or short axtor). The > > framework restructures function CFGs such that they can be expressed > > entirely without GOTOs or switch/loop-trickery. Hence, making it > > possible to emit source-code for strictly control-flow structured > > languages (OpenCL, GLSL). The code includes a drop-in OpenCL driver > > that > > allows source-to-source OpenCL code transformations on existing OpenCL > > applications. > > The OpenCL backend has been under development for a while now and was > > tested against the NVIDIA, AMD and Rodinia demo/benchmark suites with > > recent NVIDIA/AMD drivers. Results for NVIDIA and AMD show, with few > > exceptions, that the source-to-source-loop does not introduce any > > performance penalty on the generated kernels (known exception: AES on > > recent AMD drivers), > > > > However, kernels with sampler types are currently unsupported and the > > source-to-source-loop may introduce slight imprecisions to floating > > point operations. > > > > The project builds against the current SVN version of LLVM and Clang. > > The GLSL backend has been lacking some attention (still at 2.9) and > > will > > be ported later to LLVM-svn. > > > > To have a look at the source, go to https://bitbucket.org/gnarf/axtor/ > > where it is hosted under the GPL license. > > > > Please get back to me, if you have any questions or want to work on the > > code (however, i won't be able to regulary check on my emails before > > April but you will get your reply sooner or later). > > > > Kind regards, > > Simon Moll > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > From James.Molloy at arm.com Tue Mar 6 04:54:55 2012 From: James.Molloy at arm.com (James Molloy) Date: Tue, 6 Mar 2012 10:54:55 +0000 Subject: [LLVMdev] installing llvm from source, make check-all fails on llvm::transforms and clang:preprocessor In-Reply-To: <4F55E158.60906@free.fr> References: <4F55E158.60906@free.fr> Message-ID: > -Eonly 2>&1 | not grep scratch I'm pretty sure this one was fixed after 3.0 branched. -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Duncan Sands Sent: 06 March 2012 10:05 To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] installing llvm from source, make check-all fails on llvm::transforms and clang:preprocessor Hi Simona, these failures are due to the name of the path to LLVM/clang, see below. > /scratch/user/download/release_30/build/Debug/bin/clang -cc1 -internal-isystem > /scratch/user/download/release_30/build/Debug/bin/../lib/clang/3.0/include > /scratch/user/download/release_30/llvm/tools/clang/test/Preprocessor/macro_paste_c_block_comment.c > > > -Eonly 2>&1 | not grep scratch This test checks that the word "scratch" doesn't occur in the clang output. But because it lives under your "/scratch" directory, the word scratch does occur, in the path. > /scratch/user/download/release_30/llvm/test/Transforms/GVN/null-aliases-nothing.ll:18:18: > note: CHECK-NOT: pattern specified here > ; CHECK-NOT: load Here it is checked that the word "load" does not occur. But because it lives under "download" which contains "load", the word load is wrongly thought to occur. Yes, this is all very silly and shows a weakness in the testing infrastructure. Ciao, Duncan. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From parizi.computacao at gmail.com Tue Mar 6 06:25:43 2012 From: parizi.computacao at gmail.com (Rafael Parizi) Date: Tue, 6 Mar 2012 09:25:43 -0300 Subject: [LLVMdev] Assembly Mips from bitecode llvm Message-ID: Hi, I'm trying to compile the benchmarks from Mibench suite with the application of the all LLVM's tranformation passes. Moreover, I'm trying to generate assembly code for Mips architecture for extraction of energy and performance metrics. For this, for example, initially I compile the sources and link them generating a bitecode file. After, I apply each optimization using opt tool: *opt -$i file.bc -o file.opt.bc* where $i is each one of the LLVM's tranformation pass. After, to generate assembly MIPS, I do: *llc -march=mipsel file.opt.bc -o file.opt.s* * * I configured the LLVM with --build=mipsel, mips and --enable-targets=mipsel, mips For some benchmarks from Mibench, this process worked, but for others it doesn't worked. For example: BITCOUNT, BLOWFISH, QSORT, DIJKSTRA, PATRICIA - (OK) SUSAN and BASICMATH ( Not OK) The output generated with BASICMATH and SUSAN CORNERS with llc was: llc 0x08a35738 Stack dump: 0. Program arguments: llc -march=mipsel basicmath.strip.bc -o basicmath.strip.s 1. Running pass 'Function Pass Manager' on module 'basicmath.strip.bc'. 2. Running pass 'MIPS DAG->DAG Pattern Instruction Selection' on function '@SolveCubic' How can I resolve this problem? Am I performing the process of wrong way? Thanks! -- *Rafael Parizi* -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/478216fe/attachment.html From anton at korobeynikov.info Tue Mar 6 07:11:43 2012 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Tue, 6 Mar 2012 17:11:43 +0400 Subject: [LLVMdev] Assembly Mips from bitecode llvm In-Reply-To: References: Message-ID: Hello > For this, for example, initially I compile the sources How have you made this step? -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From parizi.computacao at gmail.com Tue Mar 6 07:22:12 2012 From: parizi.computacao at gmail.com (Rafael Parizi) Date: Tue, 6 Mar 2012 10:22:12 -0300 Subject: [LLVMdev] Assembly Mips from bitecode llvm In-Reply-To: References: Message-ID: For compile and link Basicmath files (using shell script): llvm-gcc -emit-llvm basicmath_small.c -c -o basicmath_small.bc llvm-gcc -emit-llvm cubic.c -c -o cubic.bc llvm-gcc -emit-llvm isqrt.c -c -o isqrt.bc llvm-gcc -emit-llvm rad2deg.c -c -o rad2deg.bc llvm-link basicmath_small.bc cubic.bc isqrt.bc rad2deg.bc -o basicmath.bc otms="disable-opt#adce#always-inline#argpromotion#block-placement#....." IFS=# for i in $otms do printf "\n$i ::::::: " opt -$i basicmath.bc -o basicmath.$i.bc llc -march=mipsel basicmath.$i.bc -o basicmath.$i.s done On Tue, Mar 6, 2012 at 10:11 AM, Anton Korobeynikov wrote: > Hello > > > For this, for example, initially I compile the sources > How have you made this step? > > -- > With best regards, Anton Korobeynikov > Faculty of Mathematics and Mechanics, Saint Petersburg State University > -- *Rafael Parizi* -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/6421968b/attachment.html From babslachem at gmail.com Tue Mar 6 07:31:20 2012 From: babslachem at gmail.com (Seb) Date: Tue, 6 Mar 2012 14:31:20 +0100 Subject: [LLVMdev] Question on debug information In-Reply-To: References: Message-ID: Hi all, Anyone have ideas/info on this topic ? Thanks Seb 2012/3/2 Seb > Hi all, > > I'm using my own front-end to generate following code .ll file targeting > x86 32-bit: > > ; ModuleID = 'check.c' > target datalayout = > "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32" > target triple = "i386-pc-linux-gnu" > @.str581 = internal constant [52 x i8] c"---- test number %d failed. > result %d expected %d\0a\00" > @.str584 = internal constant [61 x i8] c"---- %3d tests completed. %d > tests PASSED. %d tests failed.\0a\00" > @.str587 = internal constant [61 x i8] c"---- %3d tests completed. %d > tests passed. %d tests FAILED.\0a\00" > define void @check(i32* %result, i32* %expect, i32 %n) { > L.entry: > %tests_passed = alloca i32 > %tests_failed = alloca i32 > %i = alloca i32 > call void @llvm.dbg.value (metadata !{i32* %result}, i64 0, > metadata !9), !dbg !4 > call void @llvm.dbg.value (metadata !{i32* %expect}, i64 0, > metadata !10), !dbg !4 > call void @llvm.dbg.value (metadata !{i32 %n}, i64 0, metadata > !11), !dbg !4 > call void @llvm.dbg.declare (metadata !{i32* %tests_passed}, > metadata !13), !dbg !4 > store i32 0, i32* %tests_passed, !dbg !12 > call void @llvm.dbg.declare (metadata !{i32* %tests_failed}, > metadata !15), !dbg !4 > store i32 0, i32* %tests_failed, !dbg !14 > call void @llvm.dbg.declare (metadata !{i32* %i}, metadata !17), > !dbg !4 > store i32 0, i32* %i, !dbg !16 > br label %L.B0000 > L.B0000: > %0 = load i32* %i, !dbg !16 > %1 = icmp sge i32 %0, %n, !dbg !16 > br i1 %1, label %L.B0001, label %L.B0008, !dbg !16 > L.B0008: > %2 = bitcast i32* %expect to i8*, !dbg !18 > %3 = load i32* %i, !dbg !18 > %4 = mul i32 %3, 4, !dbg !18 > %5 = getelementptr i8* %2, i32 %4, !dbg !18 > %6 = bitcast i8* %5 to i32*, !dbg !18 > %7 = load i32* %6, !dbg !18 > %8 = bitcast i32* %result to i8*, !dbg !18 > %9 = load i32* %i, !dbg !18 > %10 = mul i32 %9, 4, !dbg !18 > %11 = getelementptr i8* %8, i32 %10, !dbg !18 > %12 = bitcast i8* %11 to i32*, !dbg !18 > %13 = load i32* %12, !dbg !18 > %14 = icmp ne i32 %7, %13, !dbg !18 > br i1 %14, label %L.B0003, label %L.B0009, !dbg !18 > L.B0009: > %15 = load i32* %tests_passed, !dbg !18 > > %16 = add i32 %15, 1, !dbg !18 > store i32 %16, i32* %tests_passed, !dbg !18 > br label %L.B0004, !dbg !19 > L.B0003: > %17 = load i32* %tests_failed, !dbg !20 > > %18 = add i32 %17, 1, !dbg !20 > store i32 %18, i32* %tests_failed, !dbg !20 > %19 = bitcast [52 x i8]* @.str581 to i8*, !dbg !21 > %20 = load i32* %i, !dbg !21 > %21 = bitcast i32* %result to i8*, !dbg !21 > %22 = load i32* %i, !dbg !21 > %23 = mul i32 %22, 4, !dbg !21 > %24 = getelementptr i8* %21, i32 %23, !dbg !21 > %25 = bitcast i8* %24 to i32*, !dbg !21 > %26 = load i32* %25, !dbg !21 > %27 = bitcast i32* %expect to i8*, !dbg !21 > %28 = load i32* %i, !dbg !21 > %29 = mul i32 %28, 4, !dbg !21 > %30 = getelementptr i8* %27, i32 %29, !dbg !21 > %31 = bitcast i8* %30 to i32*, !dbg !21 > %32 = load i32* %31, !dbg !21 > %33 = call i32 (i8*, ...)* @printf (i8* %19, i32 %20, i32 %26, > i32 %32), !dbg !21 > br label %L.B0004 > L.B0004: > %34 = load i32* %i, !dbg !22 > > %35 = add i32 %34, 1, !dbg !22 > store i32 %35, i32* %i, !dbg !22 > br label %L.B0000, !dbg !22 > L.B0001: > %36 = load i32* %tests_failed, !dbg !23 > %37 = icmp ne i32 %36, 0, !dbg !23 > br i1 %37, label %L.B0006, label %L.B0010, !dbg !23 > L.B0010: > %38 = bitcast [61 x i8]* @.str584 to i8*, !dbg !24 > %39 = load i32* %tests_passed, !dbg !24 > %40 = load i32* %tests_failed, !dbg !24 > %41 = call i32 (i8*, ...)* @printf (i8* %38, i32 %n, i32 %39, i32 > %40), !dbg !24 > br label %L.B0007, !dbg !25 > L.B0006: > %42 = bitcast [61 x i8]* @.str587 to i8*, !dbg !26 > %43 = load i32* %tests_passed, !dbg !26 > %44 = load i32* %tests_failed, !dbg !26 > %45 = call i32 (i8*, ...)* @printf (i8* %42, i32 %n, i32 %43, i32 > %44), !dbg !26 > br label %L.B0007 > L.B0007: > ret void, !dbg !27 > } > > declare void @llvm.dbg.value(metadata, i64, metadata) > declare void @llvm.dbg.declare(metadata, metadata) > declare i32 @printf(i8*,...) > > !llvm.dbg.sp = !{!3} > > !llvm.dbg.lv.check = !{!9, !10, !11} > > !0 = metadata !{i32 589841, i32 0, i32 2, metadata !"check.c", metadata > !".", metadata !" Seb Rel Dev-r02.27", i1 1, i1 0, metadata !"", i32 0} ; > DW_TAG_compile_unit > !1 = metadata !{i32 589865, metadata !"check.c", metadata !".", metadata > !0} ; DW_TAG_file_type > !2 = metadata !{i32 589845, metadata !1, metadata !"", metadata !1, i32 > 0, i64 0, i64 0, i32 0, i32 0, i32 0, null, i32 0, i32 0} ; > DW_TAG_subroutine_type > !3 = metadata !{i32 589870, i32 0, metadata !1, metadata !"check", > metadata !"check", metadata !"", metadata !1, i32 7, metadata !2, i1 0, i1 > 1, i32 0, i32 0, i32 0, i32 0, i1 0, void (i32*, i32*, i32)* @check} ; > DW_TAG_subprogram > !4 = metadata !{i32 0, i32 0, metadata !3, null} > !5 = metadata !{i32 589835, metadata !3, i32 7, i32 0, metadata !1, i32 > 0} ; DW_TAG_lexical_block > !6 = metadata !{i32 0, i32 0, metadata !5, null} > !7 = metadata !{i32 589860, metadata !0, metadata !"int", null, i32 0, > i64 32, i64 32, i64 0, i32 0, i32 5} ; DW_TAG_base_type > !8 = metadata !{i32 589839, metadata !0, metadata !"", null, i32 0, i64 > 32, i64 32, i64 0, i32 0, metadata !7} ; DW_TAG_pointer_type > !9 = metadata !{i32 590081, metadata !3, metadata !"result", metadata !1, > i32 16777216, metadata !8, i32 0} ; DW_TAG_arg_variable > !10 = metadata !{i32 590081, metadata !3, metadata !"expect", metadata > !1, i32 33554432, metadata !8, i32 0} ; DW_TAG_arg_variable > !11 = metadata !{i32 590081, metadata !3, metadata !"n", metadata !1, i32 > 50331648, metadata !7, i32 0} ; DW_TAG_arg_variable > !12 = metadata !{i32 9, i32 0, metadata !5, null} > !13 = metadata !{i32 590080, metadata !5, metadata !"tests_passed", > metadata !1, i32 0, metadata !7, i32 0} ; DW_TAG_auto_variable > !14 = metadata !{i32 10, i32 0, metadata !5, null} > !15 = metadata !{i32 590080, metadata !5, metadata !"tests_failed", > metadata !1, i32 0, metadata !7, i32 0} ; DW_TAG_auto_variable > !16 = metadata !{i32 12, i32 0, metadata !5, null} > !17 = metadata !{i32 590080, metadata !5, metadata !"i", metadata !1, i32 > 0, metadata !7, i32 0} ; DW_TAG_auto_variable > !18 = metadata !{i32 13, i32 0, metadata !5, null} > !19 = metadata !{i32 14, i32 0, metadata !5, null} > !20 = metadata !{i32 15, i32 0, metadata !5, null} > !21 = metadata !{i32 17, i32 0, metadata !5, null} > !22 = metadata !{i32 19, i32 0, metadata !5, null} > !23 = metadata !{i32 20, i32 0, metadata !5, null} > !24 = metadata !{i32 22, i32 0, metadata !5, null} > !25 = metadata !{i32 23, i32 0, metadata !5, null} > !26 = metadata !{i32 25, i32 0, metadata !5, null} > !27 = metadata !{i32 26, i32 0, metadata !5, null} > > When I use llc 2.9 as follows: > llc check.ll -march=x86 -o check.s > and > gcc -m32 -c check.s > > I've got a check.o file generated that targets x86 32-bit. > Reading dwarf symbol using > readelf --debug-dump check.o > > I've got for 'n' parameter: > > <2><71>: Abbrev Number: 3 (DW_TAG_formal_parameter) > <72> DW_AT_name : n > <74> DW_AT_type : <0xb3> > <78> DW_AT_location : 0x0 (location list) > > I would have expected a DW_AT_location that is FP related and not 0x0. > Is my LL file incorrect ? > Is there something I can use in metadata to enforce a FP relative > DW_AT_location to be generated ? > > Thanks for your answers > Best Regards > Seb -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/6e38212b/attachment-0001.html From anton at korobeynikov.info Tue Mar 6 07:36:19 2012 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Tue, 6 Mar 2012 17:36:19 +0400 Subject: [LLVMdev] Assembly Mips from bitecode llvm In-Reply-To: References: Message-ID: Ok. And what does llvm-gcc --version show? --- With best regards, Anton Korobeynikov On Mar 6, 2012 5:22 PM, "Rafael Parizi" wrote: > > For compile and link Basicmath files (using shell script): > > llvm-gcc -emit-llvm basicmath_small.c -c -o basicmath_small.bc > llvm-gcc -emit-llvm cubic.c -c -o cubic.bc > llvm-gcc -emit-llvm isqrt.c -c -o isqrt.bc > llvm-gcc -emit-llvm rad2deg.c -c -o rad2deg.bc > llvm-link basicmath_small.bc cubic.bc isqrt.bc rad2deg.bc -o basicmath.bc > > otms="disable-opt#adce#always-inline#argpromotion#block-placement#....." > IFS=# > for i in $otms > do > printf "\n$i ::::::: " > opt -$i basicmath.bc -o basicmath.$i.bc > llc -march=mipsel basicmath.$i.bc -o basicmath.$i.s > done > > On Tue, Mar 6, 2012 at 10:11 AM, Anton Korobeynikov < anton at korobeynikov.info> wrote: >> >> Hello >> >> > For this, for example, initially I compile the sources >> How have you made this step? >> >> -- >> With best regards, Anton Korobeynikov >> Faculty of Mathematics and Mechanics, Saint Petersburg State University > > > > > -- > Rafael Parizi > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/841c426d/attachment.html From parizi.computacao at gmail.com Tue Mar 6 07:57:33 2012 From: parizi.computacao at gmail.com (Rafael Parizi) Date: Tue, 6 Mar 2012 10:57:33 -0300 Subject: [LLVMdev] Assembly Mips from bitecode llvm In-Reply-To: References: Message-ID: llvm-gcc (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2.9) Copyright (C) 2007 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. llc --version: Low Level Virtual Machine (http://llvm.org/): llvm version 2.9 Optimized build. Built Mar 5 2012 (20:21:19). Host: x86_64-unknown-linux-gnu Host CPU: i686 Registered Targets: alpha - Alpha [experimental] arm - ARM bfin - Analog Devices Blackfin [experimental] c - C backend cellspu - STI CBEA Cell SPU [experimental] cpp - C++ backend mblaze - MBlaze mips - Mips mipsel - Mipsel msp430 - MSP430 [experimental] ppc32 - PowerPC 32 ppc64 - PowerPC 64 ptx - PTX sparc - Sparc sparcv9 - Sparc V9 systemz - SystemZ thumb - Thumb x86 - 32-bit X86: Pentium-Pro and above x86-64 - 64-bit X86: EM64T and AMD64 xcore - XCore On Tue, Mar 6, 2012 at 10:36 AM, Anton Korobeynikov wrote: > Ok. And what does llvm-gcc --version show? > > --- > > With best regards, > Anton Korobeynikov > On Mar 6, 2012 5:22 PM, "Rafael Parizi" > wrote: > > > > For compile and link Basicmath files (using shell script): > > > > llvm-gcc -emit-llvm basicmath_small.c -c -o basicmath_small.bc > > llvm-gcc -emit-llvm cubic.c -c -o cubic.bc > > llvm-gcc -emit-llvm isqrt.c -c -o isqrt.bc > > llvm-gcc -emit-llvm rad2deg.c -c -o rad2deg.bc > > llvm-link basicmath_small.bc cubic.bc isqrt.bc rad2deg.bc -o > basicmath.bc > > > > otms="disable-opt#adce#always-inline#argpromotion#block-placement#....." > > IFS=# > > for i in $otms > > do > > printf "\n$i ::::::: " > > opt -$i basicmath.bc -o basicmath.$i.bc > > llc -march=mipsel basicmath.$i.bc -o basicmath.$i.s > > done > > > > On Tue, Mar 6, 2012 at 10:11 AM, Anton Korobeynikov < > anton at korobeynikov.info> wrote: > >> > >> Hello > >> > >> > For this, for example, initially I compile the sources > >> How have you made this step? > >> > >> -- > >> With best regards, Anton Korobeynikov > >> Faculty of Mathematics and Mechanics, Saint Petersburg State University > > > > > > > > > > -- > > Rafael Parizi > > > > > > > > -- *Rafael Parizi* -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/7b7c1df9/attachment.html From jochen.wilhelmy at googlemail.com Tue Mar 6 08:08:57 2012 From: jochen.wilhelmy at googlemail.com (Jochen Wilhelmy) Date: Tue, 06 Mar 2012 15:08:57 +0100 Subject: [LLVMdev] Is there a workaround for the JIT on macos 32 bit bug? Message-ID: <4F561A79.4090905@googlemail.com> Hi! I compiled llvm 3.0 for 32 bit macos and found out that it currently does not work. a bug is in bugzilla: Bug 11178 - Mac JIT code fails on 32-bit compile of LLVM - LLVM 2.9 OK, 64-bit OK which is not fixed yet. is there already a workaround or fix? there is a comment that it works if llvm is compiled without cmake. is this the only solution? -Jochen From neonomaly.x at gmail.com Tue Mar 6 08:17:00 2012 From: neonomaly.x at gmail.com (=?koi8-r?B?7cnIwcnM?=) Date: Tue, 6 Mar 2012 18:17:00 +0400 Subject: [LLVMdev] Work with CallSites Message-ID: <91153D78-6E09-4000-8E8B-7C7ABB589298@gmail.com> Hi. I have a test program: class A { int A; public: virtual void test ( int x ) = 0; }; class B : public A { int B; public: void test ( int x ) {}; }; int main() { A *a = new B(); a->test(1); } We have call site CS: "a->test(1);". CS.getCalledFunction() - return NULL, so we can say that this call site is virtual. My optimization determines, that in this call site B::test() should be called. I've tried to use CSn.setCalledFunction(F) (where F is B::test()), but I've got this: Check function __cxa_pure_virtual Call parameter type does not match function signature! %6 = load %class.A** %a, align 4 %class.B* call void @_ZN1B4testEi(%class.A* %6, i32 1) Ok! I've tried to change the first %class.A %6 to %class.B %6 by "FirstArgumentOfCS->get()->mutateType(FunctionFirstArgement->getType());", but in this case I've got this: Check function __cxa_pure_virtual Load result type does not match pointer operand type! %6 = load %class.A** %a, align 4 %class.A*Instruction does not dominate all uses! %6 = load %class.A** %a, align 4 %7 = bitcast %class.B* %6 to void (%class.A*, i32)*** Instruction does not dominate all uses! %7 = bitcast %class.B* %6 to void (%class.A*, i32)*** %8 = load void (%class.A*, i32)*** %7 Instruction does not dominate all uses! %8 = load void (%class.A*, i32)*** %7 %9 = getelementptr inbounds void (%class.A*, i32)** %8, i64 0 Instruction does not dominate all uses! %9 = getelementptr inbounds void (%class.A*, i32)** %8, i64 0 %10 = load void (%class.A*, i32)** %9 Instruction does not dominate all uses! %6 = load %class.A** %a, align 4 call void @_ZN1B4testEi(%class.B* %6, i32 1) Can you tell me how correctly I can use CSn.setCalledFunction(F) in my case. Thanks! Yours sincerely, Kadysev Mikhail -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/62b1d497/attachment.html From james.molloy at arm.com Tue Mar 6 08:30:47 2012 From: james.molloy at arm.com (James Molloy) Date: Tue, 6 Mar 2012 14:30:47 -0000 Subject: [LLVMdev] Work with CallSites In-Reply-To: <91153D78-6E09-4000-8E8B-7C7ABB589298@gmail.com> References: <91153D78-6E09-4000-8E8B-7C7ABB589298@gmail.com> Message-ID: <006801ccfba5$b903ee80$2b0bcb80$@molloy@arm.com> Hi Mikhail, You probably want to send this to cfe-dev - this mailing list is just for the LLVM mid/backend and some of the frontend guys don't subscribe here. Cheers, James From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of ?????? Sent: 06 March 2012 14:17 To: llvmdev Subject: [LLVMdev] Work with CallSites Hi. I have a test program: class A { int A; public: virtual void test ( int x ) = 0; }; class B : public A { int B; public: void test ( int x ) {}; }; int main() { A *a = new B(); a->test(1); } We have call site CS: "a->test(1);". CS.getCalledFunction() - return NULL, so we can say that this call site is virtual. My optimization determines, that in this call site B::test() should be called. I've tried to use CSn.setCalledFunction(F) (where F is B::test()), but I've got this: Check function __cxa_pure_virtual Call parameter type does not match function signature! %6 = load %class.A** %a, align 4 %class.B* call void @_ZN1B4testEi(%class.A* %6, i32 1) Ok! I've tried to change the first %class.A %6 to %class.B %6 by "FirstArgumentOfCS->get()->mutateType(FunctionFirstArgement->getType());", but in this case I've got this: Check function __cxa_pure_virtual Load result type does not match pointer operand type! %6 = load %class.A** %a, align 4 %class.A*Instruction does not dominate all uses! %6 = load %class.A** %a, align 4 %7 = bitcast %class.B* %6 to void (%class.A*, i32)*** Instruction does not dominate all uses! %7 = bitcast %class.B* %6 to void (%class.A*, i32)*** %8 = load void (%class.A*, i32)*** %7 Instruction does not dominate all uses! %8 = load void (%class.A*, i32)*** %7 %9 = getelementptr inbounds void (%class.A*, i32)** %8, i64 0 Instruction does not dominate all uses! %9 = getelementptr inbounds void (%class.A*, i32)** %8, i64 0 %10 = load void (%class.A*, i32)** %9 Instruction does not dominate all uses! %6 = load %class.A** %a, align 4 call void @_ZN1B4testEi(%class.B* %6, i32 1) Can you tell me how correctly I can use CSn.setCalledFunction(F) in my case. Thanks! Yours sincerely, Kadysev Mikhail -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/16dbdf9b/attachment.html From baldrick at free.fr Tue Mar 6 08:39:29 2012 From: baldrick at free.fr (Duncan Sands) Date: Tue, 06 Mar 2012 15:39:29 +0100 Subject: [LLVMdev] Work with CallSites In-Reply-To: <91153D78-6E09-4000-8E8B-7C7ABB589298@gmail.com> References: <91153D78-6E09-4000-8E8B-7C7ABB589298@gmail.com> Message-ID: <4F5621A1.4070308@free.fr> Hi ??????, > I have a test program: > > class A { > int A; > public: > virtual void test ( int x ) = 0; > }; > > class B : public A { > int B; > public: > void test ( int x ) {}; > }; > > int main() { > A *a = new B(); > a->test(1); > } > > > We have call site CS: "a->test(1);". CS.getCalledFunction() - return NULL, LLVM is already capable of devirtualizing this. For example, I added extern void foo(int); to your testcase, and changed void test ( int x ) {}; to void test ( int x ) { foo(x); }; Compiling with "clang -S -O4 -o -" gives: define i32 @main() uwtable { entry: tail call void @_Z3fooi(i32 1) ret i32 0 } If you want to enhance LLVM's devirtualization, I suggest you start by studying how the optimizers manage to work things like this out, and build on that. Ciao, Duncan. From buse.yilmaz at ozu.edu.tr Tue Mar 6 08:42:13 2012 From: buse.yilmaz at ozu.edu.tr (Buse Yilmaz) Date: Tue, 6 Mar 2012 16:42:13 +0200 Subject: [LLVMdev] problem with llvm pass for call graph & CFG Message-ID: Hi all, I wrote a pass (attached). But I get a runtime error: buse at buse-VB:~/Desktop/llvm-2.7/lib/Transforms/CG_CFG$ opt-2.7 -load ../../../Release/lib/_CG_CFGGen.so -CG_CFGGen < ../../../buse/simpleCode.bc > /dev/null 0xa7bae0: 0 libLLVM-2.7.so.1 0x00007f4a8b6add1f 1 libLLVM-2.7.so.1 0x00007f4a8b6ae36d 2 libpthread.so.0 0x00007f4a8aa98b40 3 _CG_CFGGen.so 0x00007f4a8995891f 4 libLLVM-2.7.so.1 0x00007f4a8b329608 llvm::MPPassManager::runOnModule(llvm::Module&) + 376 5 libLLVM-2.7.so.1 0x00007f4a8b32972b llvm::PassManagerImpl::run(llvm::Module&) + 107 6 opt-2.7 0x0000000000415df3 main + 2035 7 libc.so.6 0x00007f4a89b79d8e __libc_start_main + 254 8 opt-2.7 0x0000000000409569 Stack dump: 0. Program arguments: opt-2.7 -load ../../../Release/lib/_CG_CFGGen.so -CG_CFGGen 1. Running pass 'Unnamed pass: implement Pass::getPassName()' on module ''. Segmentation fault buse at buse-VB:~/Desktop/llvm-2.7/lib/Transforms/CG_CFG$ I couldn't diagnose the problem. Any ideas? -- Buse -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/0843fb58/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: CG_CFGGen.cpp Type: text/x-c++src Size: 2042 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/0843fb58/attachment.bin From hammacher at cs.uni-saarland.de Tue Mar 6 09:29:28 2012 From: hammacher at cs.uni-saarland.de (Clemens Hammacher) Date: Tue, 06 Mar 2012 16:29:28 +0100 Subject: [LLVMdev] Performance degradation when repeatedly exchanging JITted functions Message-ID: <4F562D58.1010200@cs.uni-saarland.de> Hi all, for a research project we need to repeatedly exchange functions in a program running in the JIT compiler. We currently do this by calling recompileAndRelinkFunction(), after changing the body of the function. Of course we synchronize enough to ensure that the JIT doesn't concurrently compile the function (which should only happen if lazy compilation is enabled). Now recompileAndRelinkFunction saves the old function pointer, then runs the JIT, and writes a jump to the new function pointer at the memory of the old function. The problem with this implementation is (and I verified that this really happens) that this builds chains of jumps, that are traversed each time the function is called. This is because the callsites are never updated. There is actually a FIXME in the JITEmitter saying "FIXME: We could rewrite all references to this stub if we knew them.", but of course it would be hard to catch them all, given the variety of call instructions. Another drawback is that the memory of old function memory can never be freed, since it is still used in the jump chain. To measure the performance impact of this, I wrote a small example program, where each second the function is recompiled and the number of method calls is printed (Mcalls = million calls). The performance degradation is quite impressive: After 0 replacements: 335.724 Mcalls/sec After 1 replacements: 274.735 Mcalls/sec ( 82.010% of initial) After 2 replacements: 232.640 Mcalls/sec ( 69.445% of initial) After 3 replacements: 201.898 Mcalls/sec ( 60.268% of initial) After 4 replacements: 177.727 Mcalls/sec ( 53.053% of initial) After 5 replacements: 158.765 Mcalls/sec ( 47.393% of initial) After 10 replacements: 102.098 Mcalls/sec ( 30.477% of initial) After 20 replacements: 60.197 Mcalls/sec ( 17.969% of initial) After 50 replacements: 27.049 Mcalls/sec ( 8.074% of initial) After 200 replacements: 7.438 Mcalls/sec ( 2.220% of initial) After 460 replacements: 3.273 Mcalls/sec ( 0.977% of initial) I think a solution would be to always call a function through it's stub, so that there is a single location to update when the function is exchanged. This would mean that there is always exactly one level of indirection, which is worse for programs that don't exchange functions at runtime, but is much better in our scenario. I tried to add a flag to the JIT to implement that (always return the address of the stub and never update the global mapping), but I gave up since there are too many classes relying on the update of the global map (including the JIT itself). An alternative approach that won't require patching llvm would be to manage an array of all function pointers in the "VM" we are implementing, and then to replace in the bitcode each direct function call by a load from that array, and a call to that address. Then the VM could just update the array after recompiling a function, and all call sites will use the new pointer. The overhead should be comparable to the "always go through stub" method. Some more logic would be required to handle indirect calls, but this could be handled by callbacks into the VM. But before implementing that I wanted to ask if anybody already has a working solution for the problem. Or whether the problem is important enough to address it directly in LLVM. Cheers, Clemens -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: RepeatedMethodExchange.cpp Url: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/5bae9bf9/attachment.pl -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 6392 bytes Desc: S/MIME Cryptographic Signature Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/5bae9bf9/attachment.bin From criswell at illinois.edu Tue Mar 6 09:33:42 2012 From: criswell at illinois.edu (John Criswell) Date: Tue, 6 Mar 2012 09:33:42 -0600 Subject: [LLVMdev] problem with llvm pass for call graph & CFG In-Reply-To: References: Message-ID: <4F562E56.7050706@illinois.edu> On 3/6/12 8:42 AM, Buse Yilmaz wrote: > Hi all, > I wrote a pass (attached). But I get a runtime error: It looks like you're using a Release build of LLVM. I recommend compiling LLVM and your pass in a Debug build; that will generate asserts and provide source line information so that you can more easily debug the problem. I don't have time to look at your code in detail (and chances are good that nobody else on the list does, either), but if I had to guess, you're probably assuming that a value is always non-NULL when it can be NULL. I'd double check that F and other critical values are non-NULL by adding assert statements. -- John T. > > buse at buse-VB:~/Desktop/llvm-2.7/lib/Transforms/CG_CFG$ opt-2.7 -load > ../../../Release/lib/_CG_CFGGen.so -CG_CFGGen < > ../../../buse/simpleCode.bc > /dev/null > 0xa7bae0: > 0 libLLVM-2.7.so.1 0x00007f4a8b6add1f > 1 libLLVM-2.7.so.1 0x00007f4a8b6ae36d > 2 libpthread.so.0 0x00007f4a8aa98b40 > 3 _CG_CFGGen.so 0x00007f4a8995891f > 4 libLLVM-2.7.so.1 0x00007f4a8b329608 > llvm::MPPassManager::runOnModule(llvm::Module&) + 376 > 5 libLLVM-2.7.so.1 0x00007f4a8b32972b > llvm::PassManagerImpl::run(llvm::Module&) + 107 > 6 opt-2.7 0x0000000000415df3 main + 2035 > 7 libc.so.6 0x00007f4a89b79d8e __libc_start_main + 254 > 8 opt-2.7 0x0000000000409569 > Stack dump: > 0. Program arguments: opt-2.7 -load > ../../../Release/lib/_CG_CFGGen.so -CG_CFGGen > 1. Running pass 'Unnamed pass: implement Pass::getPassName()' on > module ''. > Segmentation fault > buse at buse-VB:~/Desktop/llvm-2.7/lib/Transforms/CG_CFG$ > > I couldn't diagnose the problem. Any ideas? > -- > Buse > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/2190d6d6/attachment.html From joerg at britannica.bec.de Tue Mar 6 09:44:45 2012 From: joerg at britannica.bec.de (Joerg Sonnenberger) Date: Tue, 6 Mar 2012 16:44:45 +0100 Subject: [LLVMdev] Performance degradation when repeatedly exchanging JITted functions In-Reply-To: <4F562D58.1010200@cs.uni-saarland.de> References: <4F562D58.1010200@cs.uni-saarland.de> Message-ID: <20120306154445.GA30390@britannica.bec.de> On Tue, Mar 06, 2012 at 04:29:28PM +0100, Clemens Hammacher wrote: > I think a solution would be to always call a function through it's > stub, so that there is a single location to update when the function > is exchanged. This would mean that there is always exactly one level > of indirection, which is worse for programs that don't exchange > functions at runtime, but is much better in our scenario. Actually, you just have to make sure that you always patch the initial function. You don't have to force it to be a stub. Joerg From James.Molloy at arm.com Tue Mar 6 10:09:36 2012 From: James.Molloy at arm.com (James Molloy) Date: Tue, 6 Mar 2012 16:09:36 +0000 Subject: [LLVMdev] Performance degradation when repeatedly exchanging JITted functions In-Reply-To: <20120306154445.GA30390@britannica.bec.de> References: <4F562D58.1010200@cs.uni-saarland.de> <20120306154445.GA30390@britannica.bec.de> Message-ID: Surely you need to patch *all* functions, not just the initial? The point is with the current solution no matter which version of the function another function is linked to, it will hit a sled of JMPs and eventually end up at the newest. If you only patched the first, that sled wouldn't work. So you'd have to patch all instances. That still shouldn't be too hard. Cheers, James -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Joerg Sonnenberger Sent: 06 March 2012 15:45 To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Performance degradation when repeatedly exchanging JITted functions On Tue, Mar 06, 2012 at 04:29:28PM +0100, Clemens Hammacher wrote: > I think a solution would be to always call a function through it's > stub, so that there is a single location to update when the function > is exchanged. This would mean that there is always exactly one level > of indirection, which is worse for programs that don't exchange > functions at runtime, but is much better in our scenario. Actually, you just have to make sure that you always patch the initial function. You don't have to force it to be a stub. Joerg _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From joerg at britannica.bec.de Tue Mar 6 10:28:06 2012 From: joerg at britannica.bec.de (Joerg Sonnenberger) Date: Tue, 6 Mar 2012 17:28:06 +0100 Subject: [LLVMdev] Performance degradation when repeatedly exchanging JITted functions In-Reply-To: References: <4F562D58.1010200@cs.uni-saarland.de> <20120306154445.GA30390@britannica.bec.de> Message-ID: <20120306162806.GA31282@britannica.bec.de> On Tue, Mar 06, 2012 at 04:09:36PM +0000, James Molloy wrote: > Surely you need to patch *all* functions, not just the initial? Depends on whether you always link to the original address or not. If you use link with the latest address, you have to patch all versions to point to the latest, otherwise you can just patch the first. Advantage of using the latest address: one saved jmp per call. Advantage of using the initial address: easier G/C of intermediate versions, less things to keep track of. Joerg From Micah.Villmow at amd.com Tue Mar 6 10:35:36 2012 From: Micah.Villmow at amd.com (Villmow, Micah) Date: Tue, 6 Mar 2012 16:35:36 +0000 Subject: [LLVMdev] OpenCL backend for LLVM In-Reply-To: <1331030951.1776.32.camel@gnarf-laptop> References: <1330981232.1849.7.camel@gnarf-laptop> <88EE5EEF64BDB14686BA3D45C5C30BA31851132D@sausexdag03.amd.com> <1331030951.1776.32.camel@gnarf-laptop> Message-ID: <88EE5EEF64BDB14686BA3D45C5C30BA3185116D6@sausexdag03.amd.com> The person that wrote our structurizer agrees with your analysis. Too bad the licenses are incompatible, it would be nice to merge similar efforts. > -----Original Message----- > From: Simon Moll [mailto:simon.m.moll at googlemail.com] > Sent: Tuesday, March 06, 2012 2:49 AM > To: Villmow, Micah > Cc: llvmdev at cs.uiuc.edu > Subject: RE: [LLVMdev] OpenCL backend for LLVM > > Hi Micah, > > i just had a quick look at your structurizer. Here is what if found > (correct me, if i am mistaken): > * Our approaches for handling Loops with multiple exits are identical. > ("Loop-Exit Enumeration") > * Axtor implements Controlled-Node Splitting and can cope with > irreducible control-flow. > (http://cardit.et.tudelft.nl/MOVE/papers/cc96.ps) > * Axtor translates switches to cascading IF-instructions > * You are cloning nodes for predecessors to restructure IF-structures. > In Axtor, additionally to that, i implemented another method of dealing > with unstructured IFs. That method basically does the same as the loop- > exit structurizer. > When parsing an IF, it collects all branches to conflicting blocks you > would otherwise clone and puts all those blocks behind a landing block. > That block than makes the exit for the IF. > I favour that approach for several reasons: > Firstly, it is not safe to clone blocks that contain memory > barriers/fences (at least not wrt the OpenCL specification, because the > pathes of the threads leading to a barrier might not all be governed by > uniform state). > Secondly, i assumed that it is easier for the "receiving" OpenCL > compiler to recover the original CFG with the landing block approach. It > seems much harder to identify duplicate blocks than to trace a successor > through the landing block. > > The idea behind axtor was to make functions with arbitrary CFGs work on > GPUs (usual exceptions apply: no fnc/block ptrs), such that a reliable > OpenCL backend becomes feasible. > > On Mon, 2012-03-05 at 21:07 +0000, Villmow, Micah wrote: > > Simon, > > Have you looked at the control flow structizer that we have in the > Open Source AMDIL backend? > > > > > -----Original Message----- > > > From: llvmdev-bounces at cs.uiuc.edu > > > [mailto:llvmdev-bounces at cs.uiuc.edu] > > > On Behalf Of Simon Moll > > > Sent: Monday, March 05, 2012 1:01 PM > > > To: llvmdev at cs.uiuc.edu > > > Subject: [LLVMdev] OpenCL backend for LLVM > > > > > > Hi, > > > > > > this is a follow-up on my email from august > > > (http://lists.cs.uiuc.edu/pipermail/llvmdev/2011- > August/042737.html). > > > > > > i have, finally, released my OpenCL backend and control-flow > > > restructuring framework for LLVM (AST-Extractor, or short axtor). > > > The framework restructures function CFGs such that they can be > > > expressed entirely without GOTOs or switch/loop-trickery. Hence, > > > making it possible to emit source-code for strictly control-flow > > > structured languages (OpenCL, GLSL). The code includes a drop-in > > > OpenCL driver that allows source-to-source OpenCL code > > > transformations on existing OpenCL applications. > > > The OpenCL backend has been under development for a while now and > > > was tested against the NVIDIA, AMD and Rodinia demo/benchmark suites > > > with recent NVIDIA/AMD drivers. Results for NVIDIA and AMD show, > > > with few exceptions, that the source-to-source-loop does not > > > introduce any performance penalty on the generated kernels (known > > > exception: AES on recent AMD drivers), > > > > > > However, kernels with sampler types are currently unsupported and > > > the source-to-source-loop may introduce slight imprecisions to > > > floating point operations. > > > > > > The project builds against the current SVN version of LLVM and > Clang. > > > The GLSL backend has been lacking some attention (still at 2.9) and > > > will be ported later to LLVM-svn. > > > > > > To have a look at the source, go to > > > https://bitbucket.org/gnarf/axtor/ > > > where it is hosted under the GPL license. > > > > > > Please get back to me, if you have any questions or want to work on > > > the code (however, i won't be able to regulary check on my emails > > > before April but you will get your reply sooner or later). > > > > > > Kind regards, > > > Simon Moll > > > > > > _______________________________________________ > > > LLVM Developers mailing list > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > From hammacher at cs.uni-saarland.de Tue Mar 6 10:39:27 2012 From: hammacher at cs.uni-saarland.de (Clemens Hammacher) Date: Tue, 06 Mar 2012 17:39:27 +0100 Subject: [LLVMdev] Performance degradation when repeatedly exchanging JITted functions In-Reply-To: <20120306162806.GA31282@britannica.bec.de> References: <4F562D58.1010200@cs.uni-saarland.de> <20120306154445.GA30390@britannica.bec.de> <20120306162806.GA31282@britannica.bec.de> Message-ID: <4F563DBF.50602@cs.uni-saarland.de> On 3/6/12 5:28 PM, Joerg Sonnenberger wrote: > Advantage of using the latest address: one saved jmp per call. Per newly JITted call ;) > Advantage of using the initial address: easier G/C of intermediate > versions, less things to keep track of. I still think both versions require larger changes. When using the latest address, you have to keep track of all JITted functions per Function in order to update them. And their number increases linearly, so the time needed for exchanging a function increases as well. When using the initial address, you also have to patch all places in LLVM that rely on the global mapping being updated, which are more than I initially thought. That's why I stopped working on that. I don't think that a patch implementing any of those approaches would be accepted, that's why I am tending towards implementing it outside of LLVM. Cheers, Clemens -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 6392 bytes Desc: S/MIME Cryptographic Signature Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/7ba10b6c/attachment.bin From bigcheesegs at gmail.com Tue Mar 6 10:50:24 2012 From: bigcheesegs at gmail.com (Michael Spencer) Date: Tue, 6 Mar 2012 08:50:24 -0800 Subject: [LLVMdev] I stole the demo. In-Reply-To: <201203050552.15084.rich@pennware.com> References: <201203041619.34401.rich@pennware.com> <4F547984.4090306@free.fr> <201203050552.15084.rich@pennware.com> Message-ID: On Mon, Mar 5, 2012 at 3:52 AM, Richard Pennington wrote: > On Monday, March 05, 2012 02:29:56 AM Duncan Sands wrote: >> Hi Richard, >> >> > I had a little time on my hands this afternoon, so I stole the Clang/LLVM >> > demo and modified it to allow compiling for several other targets: >> > http://ellcc.org/demo >> >> does it use the correct header files for the target etc? >> >> Ciao, Duncan. > > Yes, it does. The header files are from my port of the NetBSD C library. > I'm tempted to add an option to execute the result under QEMU, but I shudder > to think about the security holes that would open. ;-) > > -Rich We actually do have a safe way to run arbitrary native code. See http://weegen.home.xs4all.nl/eelis/geordi/ . This is what http://ideone.com/ and http://codepad.org/ use for safely running code. Along with clang-bot in the #llvm irc channel. - Michael Spencer From James.Molloy at arm.com Tue Mar 6 11:48:10 2012 From: James.Molloy at arm.com (James Molloy) Date: Tue, 6 Mar 2012 17:48:10 +0000 Subject: [LLVMdev] Performance degradation when repeatedly exchanging JITted functions In-Reply-To: <4F563DBF.50602@cs.uni-saarland.de> References: <4F562D58.1010200@cs.uni-saarland.de> <20120306154445.GA30390@britannica.bec.de> <20120306162806.GA31282@britannica.bec.de>, <4F563DBF.50602@cs.uni-saarland.de> Message-ID: > I don't think that a patch implementing any of those approaches would be > accepted, that's why I am tending towards implementing it outside of LLVM. Why not? If they make LLVM better and aren't hacks, why wouldn't they be accepted? ________________________________________ From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] On Behalf Of Clemens Hammacher [hammacher at cs.uni-saarland.de] Sent: 06 March 2012 16:39 To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Performance degradation when repeatedly exchanging JITted functions On 3/6/12 5:28 PM, Joerg Sonnenberger wrote: > Advantage of using the latest address: one saved jmp per call. Per newly JITted call ;) > Advantage of using the initial address: easier G/C of intermediate > versions, less things to keep track of. I still think both versions require larger changes. When using the latest address, you have to keep track of all JITted functions per Function in order to update them. And their number increases linearly, so the time needed for exchanging a function increases as well. When using the initial address, you also have to patch all places in LLVM that rely on the global mapping being updated, which are more than I initially thought. That's why I stopped working on that. I don't think that a patch implementing any of those approaches would be accepted, that's why I am tending towards implementing it outside of LLVM. Cheers, Clemens -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From anton at korobeynikov.info Tue Mar 6 12:55:05 2012 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Tue, 6 Mar 2012 22:55:05 +0400 Subject: [LLVMdev] Assembly Mips from bitecode llvm In-Reply-To: References: Message-ID: > llvm-gcc (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2.9) > Copyright (C) 2007 Free Software Foundation, Inc. Here I assume that your llvm-gcc is for x86-64-linux, since there was no MIPS release. So, you're feeding x86-oriented IR to MIPS backend. This won't work, you will need to provide MIPS-aware IR. Also, in 2.9 the MIPS was definitely not so mature. Hopefully MIPS folks can comment on this. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From simon.m.moll at googlemail.com Tue Mar 6 13:43:27 2012 From: simon.m.moll at googlemail.com (Simon Moll) Date: Tue, 06 Mar 2012 19:43:27 +0000 Subject: [LLVMdev] OpenCL backend for LLVM In-Reply-To: <88EE5EEF64BDB14686BA3D45C5C30BA3185116D6@sausexdag03.amd.com> References: <1330981232.1849.7.camel@gnarf-laptop> <88EE5EEF64BDB14686BA3D45C5C30BA31851132D@sausexdag03.amd.com> <1331030951.1776.32.camel@gnarf-laptop> <88EE5EEF64BDB14686BA3D45C5C30BA3185116D6@sausexdag03.amd.com> Message-ID: <1331063007.1868.2.camel@gnarf-laptop> I am currently looking into the options in re/multi-licensing it under a more permissive license. On Tue, 2012-03-06 at 16:35 +0000, Villmow, Micah wrote: > The person that wrote our structurizer agrees with your analysis. Too bad the licenses are incompatible, it would be nice to merge similar efforts. > > > -----Original Message----- > > From: Simon Moll [mailto:simon.m.moll at googlemail.com] > > Sent: Tuesday, March 06, 2012 2:49 AM > > To: Villmow, Micah > > Cc: llvmdev at cs.uiuc.edu > > Subject: RE: [LLVMdev] OpenCL backend for LLVM > > > > Hi Micah, > > > > i just had a quick look at your structurizer. Here is what if found > > (correct me, if i am mistaken): > > * Our approaches for handling Loops with multiple exits are identical. > > ("Loop-Exit Enumeration") > > * Axtor implements Controlled-Node Splitting and can cope with > > irreducible control-flow. > > (http://cardit.et.tudelft.nl/MOVE/papers/cc96.ps) > > * Axtor translates switches to cascading IF-instructions > > * You are cloning nodes for predecessors to restructure IF-structures. > > In Axtor, additionally to that, i implemented another method of dealing > > with unstructured IFs. That method basically does the same as the loop- > > exit structurizer. > > When parsing an IF, it collects all branches to conflicting blocks you > > would otherwise clone and puts all those blocks behind a landing block. > > That block than makes the exit for the IF. > > I favour that approach for several reasons: > > Firstly, it is not safe to clone blocks that contain memory > > barriers/fences (at least not wrt the OpenCL specification, because the > > pathes of the threads leading to a barrier might not all be governed by > > uniform state). > > Secondly, i assumed that it is easier for the "receiving" OpenCL > > compiler to recover the original CFG with the landing block approach. It > > seems much harder to identify duplicate blocks than to trace a successor > > through the landing block. > > > > The idea behind axtor was to make functions with arbitrary CFGs work on > > GPUs (usual exceptions apply: no fnc/block ptrs), such that a reliable > > OpenCL backend becomes feasible. > > > > On Mon, 2012-03-05 at 21:07 +0000, Villmow, Micah wrote: > > > Simon, > > > Have you looked at the control flow structizer that we have in the > > Open Source AMDIL backend? > > > > > > > -----Original Message----- > > > > From: llvmdev-bounces at cs.uiuc.edu > > > > [mailto:llvmdev-bounces at cs.uiuc.edu] > > > > On Behalf Of Simon Moll > > > > Sent: Monday, March 05, 2012 1:01 PM > > > > To: llvmdev at cs.uiuc.edu > > > > Subject: [LLVMdev] OpenCL backend for LLVM > > > > > > > > Hi, > > > > > > > > this is a follow-up on my email from august > > > > (http://lists.cs.uiuc.edu/pipermail/llvmdev/2011- > > August/042737.html). > > > > > > > > i have, finally, released my OpenCL backend and control-flow > > > > restructuring framework for LLVM (AST-Extractor, or short axtor). > > > > The framework restructures function CFGs such that they can be > > > > expressed entirely without GOTOs or switch/loop-trickery. Hence, > > > > making it possible to emit source-code for strictly control-flow > > > > structured languages (OpenCL, GLSL). The code includes a drop-in > > > > OpenCL driver that allows source-to-source OpenCL code > > > > transformations on existing OpenCL applications. > > > > The OpenCL backend has been under development for a while now and > > > > was tested against the NVIDIA, AMD and Rodinia demo/benchmark suites > > > > with recent NVIDIA/AMD drivers. Results for NVIDIA and AMD show, > > > > with few exceptions, that the source-to-source-loop does not > > > > introduce any performance penalty on the generated kernels (known > > > > exception: AES on recent AMD drivers), > > > > > > > > However, kernels with sampler types are currently unsupported and > > > > the source-to-source-loop may introduce slight imprecisions to > > > > floating point operations. > > > > > > > > The project builds against the current SVN version of LLVM and > > Clang. > > > > The GLSL backend has been lacking some attention (still at 2.9) and > > > > will be ported later to LLVM-svn. > > > > > > > > To have a look at the source, go to > > > > https://bitbucket.org/gnarf/axtor/ > > > > where it is hosted under the GPL license. > > > > > > > > Please get back to me, if you have any questions or want to work on > > > > the code (however, i won't be able to regulary check on my emails > > > > before April but you will get your reply sooner or later). > > > > > > > > Kind regards, > > > > Simon Moll > > > > > > > > _______________________________________________ > > > > LLVM Developers mailing list > > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > > > > > From ahatanak at gmail.com Tue Mar 6 14:11:40 2012 From: ahatanak at gmail.com (Akira Hatanaka) Date: Tue, 6 Mar 2012 12:11:40 -0800 Subject: [LLVMdev] Assembly Mips from bitecode llvm In-Reply-To: References: Message-ID: Can you use clang? llvm-gcc has been deprecated and it generates incorrect bitcode for mips. The benchmark programs you are running, including the ones that failed, are in the llvm test-suite, and they are all passing on my local machine. I think they should compile if you switch to using ToT clang & llvm. On Tue, Mar 6, 2012 at 10:55 AM, Anton Korobeynikov wrote: >> llvm-gcc (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2.9) >> Copyright (C) 2007 Free Software Foundation, Inc. > Here I assume that your llvm-gcc is for x86-64-linux, since there was > no MIPS release. > So, you're feeding x86-oriented IR to MIPS backend. This won't work, > you will need to provide MIPS-aware IR. > > Also, in 2.9 the MIPS was definitely not so mature. Hopefully MIPS > folks can comment on this. > > -- > With best regards, Anton Korobeynikov > Faculty of Mathematics and Mechanics, Saint Petersburg State University > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From grosbach at apple.com Tue Mar 6 15:48:43 2012 From: grosbach at apple.com (Jim Grosbach) Date: Tue, 06 Mar 2012 13:48:43 -0800 Subject: [LLVMdev] printing hex format for floating point number In-Reply-To: <4F5471AD.1050205@codeaurora.org> References: <4F5471AD.1050205@codeaurora.org> Message-ID: You can bitcast it to an integer of the same size and print that, I believe. APFloat::bitcastToAPInt(). -Jim On Mar 4, 2012, at 11:56 PM, Sirish Pande wrote: > Hi, > > I am trying to print a hex value ( 4111999A for 9.1) for a corresponding floating point number. The routine convertToHexString in APFFloat class only prints in C99 Floating point hexagondecimal constant (eg 1.e00000p3). > > Without writing my own routine, how do I get to print the hexadecimal representation for a floating point value? > > Sirish > -- > Qualcomm Innovation Center, Inc is a member of Code Aurora Forum > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/0b03155e/attachment.html From grosbach at apple.com Tue Mar 6 15:54:55 2012 From: grosbach at apple.com (Jim Grosbach) Date: Tue, 06 Mar 2012 13:54:55 -0800 Subject: [LLVMdev] Data/Address registers In-Reply-To: <4F521304.1030900@gmail.com> References: <4F521304.1030900@gmail.com> Message-ID: <46738C23-06A8-4F28-8068-FEBC0B39798E@apple.com> Hi Ivan, On Mar 3, 2012, at 4:48 AM, Ivan Llopard wrote: > Hi, > > I'm facing a problem in llvm while porting it to a new target and I'll > need some support. > We have 2 kind of register, one for general purposes (i.e. arithmetic, > comparisons, etc.) and the other for memory addressing. OK. Separate register classes should be able to handle this. > Cross copies are not allowed (no data path). You mean you can't copy directly from a general purpose register to an address register? That's an unfortunate architectural quirk. You may have to write some interesting and potentially ugly code in copyPhysReg() to handle that. > We use clang 3.0 to produce assembler code. > Because both registers have the same size and type (i16), I don't know > what would be the best solution to distinguish them in order to match > the right instructions. The register classes should take care of this. > Moreover, the standard pointer arithmetic is not > enough for us (we need to support modulo operations also). > I thought that I could manually match every arithmetic operation while > matching the addressing mode but it doesn't work because intermediate > results are sometimes reused for other purposes (e.g. comparisons). I suggest getting things working correctly first and then coming back to things like this as an optimization. > Do I need to add another type to clang/llvm ? > Unlikely. Regards, Jim > Thanks in advance, > > Ivan > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From grosbach at apple.com Tue Mar 6 16:02:58 2012 From: grosbach at apple.com (Jim Grosbach) Date: Tue, 06 Mar 2012 14:02:58 -0800 Subject: [LLVMdev] Question on debug information In-Reply-To: References: Message-ID: <241D34DD-8883-4A42-8670-0A48B36DD185@apple.com> On Mar 6, 2012, at 5:31 AM, Seb wrote: > Hi all, > > Anyone have ideas/info on this topic ? > Thanks > Seb > > 2012/3/2 Seb > Hi all, > > I'm using my own front-end to generate following code .ll file targeting x86 32-bit: > > ; ModuleID = 'check.c' > target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32" > target triple = "i386-pc-linux-gnu" > @.str581 = internal constant [52 x i8] c"---- test number %d failed. result %d expected %d\0a\00" > @.str584 = internal constant [61 x i8] c"---- %3d tests completed. %d tests PASSED. %d tests failed.\0a\00" > @.str587 = internal constant [61 x i8] c"---- %3d tests completed. %d tests passed. %d tests FAILED.\0a\00" > define void @check(i32* %result, i32* %expect, i32 %n) { > L.entry: > %tests_passed = alloca i32 > %tests_failed = alloca i32 > %i = alloca i32 > call void @llvm.dbg.value (metadata !{i32* %result}, i64 0, metadata !9), !dbg !4 > call void @llvm.dbg.value (metadata !{i32* %expect}, i64 0, metadata !10), !dbg !4 > call void @llvm.dbg.value (metadata !{i32 %n}, i64 0, metadata !11), !dbg !4 > call void @llvm.dbg.declare (metadata !{i32* %tests_passed}, metadata !13), !dbg !4 > store i32 0, i32* %tests_passed, !dbg !12 > call void @llvm.dbg.declare (metadata !{i32* %tests_failed}, metadata !15), !dbg !4 > store i32 0, i32* %tests_failed, !dbg !14 > call void @llvm.dbg.declare (metadata !{i32* %i}, metadata !17), !dbg !4 > store i32 0, i32* %i, !dbg !16 > br label %L.B0000 > L.B0000: > %0 = load i32* %i, !dbg !16 > %1 = icmp sge i32 %0, %n, !dbg !16 > br i1 %1, label %L.B0001, label %L.B0008, !dbg !16 > L.B0008: > %2 = bitcast i32* %expect to i8*, !dbg !18 > %3 = load i32* %i, !dbg !18 > %4 = mul i32 %3, 4, !dbg !18 > %5 = getelementptr i8* %2, i32 %4, !dbg !18 > %6 = bitcast i8* %5 to i32*, !dbg !18 > %7 = load i32* %6, !dbg !18 > %8 = bitcast i32* %result to i8*, !dbg !18 > %9 = load i32* %i, !dbg !18 > %10 = mul i32 %9, 4, !dbg !18 > %11 = getelementptr i8* %8, i32 %10, !dbg !18 > %12 = bitcast i8* %11 to i32*, !dbg !18 > %13 = load i32* %12, !dbg !18 > %14 = icmp ne i32 %7, %13, !dbg !18 > br i1 %14, label %L.B0003, label %L.B0009, !dbg !18 > L.B0009: > %15 = load i32* %tests_passed, !dbg !18 > > %16 = add i32 %15, 1, !dbg !18 > store i32 %16, i32* %tests_passed, !dbg !18 > br label %L.B0004, !dbg !19 > L.B0003: > %17 = load i32* %tests_failed, !dbg !20 > > %18 = add i32 %17, 1, !dbg !20 > store i32 %18, i32* %tests_failed, !dbg !20 > %19 = bitcast [52 x i8]* @.str581 to i8*, !dbg !21 > %20 = load i32* %i, !dbg !21 > %21 = bitcast i32* %result to i8*, !dbg !21 > %22 = load i32* %i, !dbg !21 > %23 = mul i32 %22, 4, !dbg !21 > %24 = getelementptr i8* %21, i32 %23, !dbg !21 > %25 = bitcast i8* %24 to i32*, !dbg !21 > %26 = load i32* %25, !dbg !21 > %27 = bitcast i32* %expect to i8*, !dbg !21 > %28 = load i32* %i, !dbg !21 > %29 = mul i32 %28, 4, !dbg !21 > %30 = getelementptr i8* %27, i32 %29, !dbg !21 > %31 = bitcast i8* %30 to i32*, !dbg !21 > %32 = load i32* %31, !dbg !21 > %33 = call i32 (i8*, ...)* @printf (i8* %19, i32 %20, i32 %26, i32 %32), !dbg !21 > br label %L.B0004 > L.B0004: > %34 = load i32* %i, !dbg !22 > > %35 = add i32 %34, 1, !dbg !22 > store i32 %35, i32* %i, !dbg !22 > br label %L.B0000, !dbg !22 > L.B0001: > %36 = load i32* %tests_failed, !dbg !23 > %37 = icmp ne i32 %36, 0, !dbg !23 > br i1 %37, label %L.B0006, label %L.B0010, !dbg !23 > L.B0010: > %38 = bitcast [61 x i8]* @.str584 to i8*, !dbg !24 > %39 = load i32* %tests_passed, !dbg !24 > %40 = load i32* %tests_failed, !dbg !24 > %41 = call i32 (i8*, ...)* @printf (i8* %38, i32 %n, i32 %39, i32 %40), !dbg !24 > br label %L.B0007, !dbg !25 > L.B0006: > %42 = bitcast [61 x i8]* @.str587 to i8*, !dbg !26 > %43 = load i32* %tests_passed, !dbg !26 > %44 = load i32* %tests_failed, !dbg !26 > %45 = call i32 (i8*, ...)* @printf (i8* %42, i32 %n, i32 %43, i32 %44), !dbg !26 > br label %L.B0007 > L.B0007: > ret void, !dbg !27 > } > > declare void @llvm.dbg.value(metadata, i64, metadata) > declare void @llvm.dbg.declare(metadata, metadata) > declare i32 @printf(i8*,...) > > !llvm.dbg.sp = !{!3} > > !llvm.dbg.lv.check = !{!9, !10, !11} > > !0 = metadata !{i32 589841, i32 0, i32 2, metadata !"check.c", metadata !".", metadata !" Seb Rel Dev-r02.27", i1 1, i1 0, metadata !"", i32 0} ; DW_TAG_compile_unit > !1 = metadata !{i32 589865, metadata !"check.c", metadata !".", metadata !0} ; DW_TAG_file_type > !2 = metadata !{i32 589845, metadata !1, metadata !"", metadata !1, i32 0, i64 0, i64 0, i32 0, i32 0, i32 0, null, i32 0, i32 0} ; DW_TAG_subroutine_type > !3 = metadata !{i32 589870, i32 0, metadata !1, metadata !"check", metadata !"check", metadata !"", metadata !1, i32 7, metadata !2, i1 0, i1 1, i32 0, i32 0, i32 0, i32 0, i1 0, void (i32*, i32*, i32)* @check} ; DW_TAG_subprogram > !4 = metadata !{i32 0, i32 0, metadata !3, null} > !5 = metadata !{i32 589835, metadata !3, i32 7, i32 0, metadata !1, i32 0} ; DW_TAG_lexical_block > !6 = metadata !{i32 0, i32 0, metadata !5, null} > !7 = metadata !{i32 589860, metadata !0, metadata !"int", null, i32 0, i64 32, i64 32, i64 0, i32 0, i32 5} ; DW_TAG_base_type > !8 = metadata !{i32 589839, metadata !0, metadata !"", null, i32 0, i64 32, i64 32, i64 0, i32 0, metadata !7} ; DW_TAG_pointer_type > !9 = metadata !{i32 590081, metadata !3, metadata !"result", metadata !1, i32 16777216, metadata !8, i32 0} ; DW_TAG_arg_variable > !10 = metadata !{i32 590081, metadata !3, metadata !"expect", metadata !1, i32 33554432, metadata !8, i32 0} ; DW_TAG_arg_variable > !11 = metadata !{i32 590081, metadata !3, metadata !"n", metadata !1, i32 50331648, metadata !7, i32 0} ; DW_TAG_arg_variable > !12 = metadata !{i32 9, i32 0, metadata !5, null} > !13 = metadata !{i32 590080, metadata !5, metadata !"tests_passed", metadata !1, i32 0, metadata !7, i32 0} ; DW_TAG_auto_variable > !14 = metadata !{i32 10, i32 0, metadata !5, null} > !15 = metadata !{i32 590080, metadata !5, metadata !"tests_failed", metadata !1, i32 0, metadata !7, i32 0} ; DW_TAG_auto_variable > !16 = metadata !{i32 12, i32 0, metadata !5, null} > !17 = metadata !{i32 590080, metadata !5, metadata !"i", metadata !1, i32 0, metadata !7, i32 0} ; DW_TAG_auto_variable > !18 = metadata !{i32 13, i32 0, metadata !5, null} > !19 = metadata !{i32 14, i32 0, metadata !5, null} > !20 = metadata !{i32 15, i32 0, metadata !5, null} > !21 = metadata !{i32 17, i32 0, metadata !5, null} > !22 = metadata !{i32 19, i32 0, metadata !5, null} > !23 = metadata !{i32 20, i32 0, metadata !5, null} > !24 = metadata !{i32 22, i32 0, metadata !5, null} > !25 = metadata !{i32 23, i32 0, metadata !5, null} > !26 = metadata !{i32 25, i32 0, metadata !5, null} > !27 = metadata !{i32 26, i32 0, metadata !5, null} > > When I use llc 2.9 as follows: Try using current trunk LLVM. There have been a *lot* of debug info improvements since 2.9. -Jim > llc check.ll -march=x86 -o check.s > and > gcc -m32 -c check.s > > I've got a check.o file generated that targets x86 32-bit. > Reading dwarf symbol using > readelf --debug-dump check.o > > I've got for 'n' parameter: > > <2><71>: Abbrev Number: 3 (DW_TAG_formal_parameter) > <72> DW_AT_name : n > <74> DW_AT_type : <0xb3> > <78> DW_AT_location : 0x0 (location list) > > I would have expected a DW_AT_location that is FP related and not 0x0. > Is my LL file incorrect ? > Is there something I can use in metadata to enforce a FP relative DW_AT_location to be generated ? > > Thanks for your answers > Best Regards > Seb > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120306/88c41301/attachment.html From tlinth at codeaurora.org Tue Mar 6 16:43:01 2012 From: tlinth at codeaurora.org (Tony Linthicum) Date: Tue, 06 Mar 2012 16:43:01 -0600 Subject: [LLVMdev] Predicate registers/condition codes question In-Reply-To: References: <4F4D2836.9000806@codeaurora.org> Message-ID: <4F5692F5.10205@codeaurora.org> On 3/1/2012 2:21 PM, Eli Friedman wrote: > On Tue, Feb 28, 2012 at 11:17 AM, Tony Linthicum wrote: >> Hey folks, >> >> We are having some difficulty with how we have been representing our >> predicate registers, and wanted some advice from the list. First, we >> had been representing our predicate registers as 1 bit (i1). The truth, >> however, is that they are 8 bits. The reason for this is that they >> serve as predicates for conditional execution of instructions, branch >> condition codes, and also as vector mask registers for conditional >> selection of vector elements. >> >> We have run into problems with type mismatches with intrinsics for some >> of our vector operations. We decided to try to solve it by representing >> the predicate registers as what they really are, namely i8. We changed >> our intrinsic and instruction definitions accordingly, changed the data >> type of the predicate registers to be i8, and changed >> getSetCCResultType() to return i8. After doing this, the compiler >> builds just fine but dies at runtime trying to match some target >> independent operations (e.g. setcc/brcond) that appear to want an i1 for >> the condition code. >> >> So, my question is this: is it even possible to represent our predicate >> registers (and our condition codes) as i8, and if so, what hook are we >> missing? > > Making getSetCCResultType return i8 is definitely supported, and > brcond should be okay with that. It's not obvious what is going > wrong; are you sure there isn't anything in your target still > expecting an i1? > > -Eli Thanks, Eli. We'll take another look at our target dependent information to see if some i1's are still lurking about. It's good to know that this should work. Tony -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum. From ahatanak at gmail.com Tue Mar 6 19:05:03 2012 From: ahatanak at gmail.com (Akira Hatanaka) Date: Tue, 6 Mar 2012 17:05:03 -0800 Subject: [LLVMdev] Question about post RA scheduler Message-ID: I am having trouble trying to enable post RA scheduler for the Mips backend. This is the bit code of the function I am compiling: (gdb) p MF.Fn->dump() define void @PointToHPoint(%struct.HPointStruct* noalias sret %agg.result, %struct.ObjPointStruct* byval %P) nounwind { entry: %res = alloca %struct.HPointStruct, align 8 %x2 = bitcast %struct.ObjPointStruct* %P to double* %0 = load double* %x2, align 8 The third instruction is loading the first floating point double of structure %P which is being passed by value. This is the machine function right after completion of isel: (gdb) p MF->dump() # Machine code for function PointToHPoint: Frame Objects: fi#-1: size=48, align=8, fixed, at location [SP+8] fi#0: size=32, align=8, at location [SP] Function Live Ins: %A0 in %vreg0, %A2 in %vreg1, %A3 in %vreg2 BB#0: derived from LLVM BB %entry SW %vreg2, , 4; mem:ST4[FixedStack-1+4] CPURegs:%vreg2 SW %vreg1, , 0; mem:ST4[FixedStack-1](align=8) CPURegs:%vreg1 %vreg3 = COPY %vreg0; CPURegs:%vreg3,%vreg0 %vreg4 = LDC1 , 0; mem:LD8[%x2] AFGR64:%vreg4 The first two stores write the values in argument registers $6 and $7 to frame object -1 (Mips stores byval arguments passed in registers to the stack). The fourth instruction LDC1 loads the value written by the first two stores as a floating point double. This is the machine function just before post RA scheduling: (gdb) p MF.dump() # Machine code for function PointToHPoint: Frame Objects: fi#-1: size=48, align=8, fixed, at location [SP+8] fi#0: size=32, align=8, at location [SP-32] Function Live Ins: %A0 in %vreg0, %A2 in %vreg1, %A3 in %vreg2 BB#0: derived from LLVM BB %entry Live Ins: %A0 %A2 %A3 %SP = ADDiu %SP, -32 PROLOG_LABEL SW %A3, %SP, 44; mem:ST4[FixedStack-1+4] SW %A2, %SP, 40; mem:ST4[FixedStack-1](align=8) %D0 = LDC1 %SP, 40; mem:LD8[%x2] The frame index operands of the first two stores and the fourth load have been lowered to real addresses. Since the first two SWs store to ($sp + 44) and ($sp + 40), and instruction LDC1 loads from ($sp + 40), there should be a dependency between these instructions. However, when ScheduleDAGInstrs::BuildSchedGraph(AliasAnalysis *AA) builds the schedule graph, there are no dependency edges added between the two SWs and LDC1 because getUnderlyingObjectForInstr returns different objects for these instructions: underlying object of SWs: FixedStack-1 underlying object of LDC1: struct.ObjPointStruct* %P Is this a bug? Or are there ways to tell BuildSchedGraph it should add dependency edges? From atrick at apple.com Tue Mar 6 20:01:44 2012 From: atrick at apple.com (Andrew Trick) Date: Tue, 06 Mar 2012 18:01:44 -0800 Subject: [LLVMdev] Question about post RA scheduler In-Reply-To: References: Message-ID: <8EABC991-2BE1-45E5-A388-ABCCE5E826FB@apple.com> On Mar 6, 2012, at 5:05 PM, Akira Hatanaka wrote: > I am having trouble trying to enable post RA scheduler for the Mips backend. > > This is the bit code of the function I am compiling: > > (gdb) p MF.Fn->dump() > > define void @PointToHPoint(%struct.HPointStruct* noalias sret > %agg.result, %struct.ObjPointStruct* byval %P) nounwind { > entry: > %res = alloca %struct.HPointStruct, align 8 > %x2 = bitcast %struct.ObjPointStruct* %P to double* > %0 = load double* %x2, align 8 > > The third instruction is loading the first floating point double of > structure %P which is being passed by value. > > This is the machine function right after completion of isel: > (gdb) p MF->dump() > # Machine code for function PointToHPoint: > Frame Objects: > fi#-1: size=48, align=8, fixed, at location [SP+8] > fi#0: size=32, align=8, at location [SP] > Function Live Ins: %A0 in %vreg0, %A2 in %vreg1, %A3 in %vreg2 > > BB#0: derived from LLVM BB %entry > SW %vreg2, , 4; mem:ST4[FixedStack-1+4] CPURegs:%vreg2 > SW %vreg1, , 0; mem:ST4[FixedStack-1](align=8) CPURegs:%vreg1 > %vreg3 = COPY %vreg0; CPURegs:%vreg3,%vreg0 > %vreg4 = LDC1 , 0; mem:LD8[%x2] AFGR64:%vreg4 > > > The first two stores write the values in argument registers $6 and $7 > to frame object -1 > (Mips stores byval arguments passed in registers to the stack). > The fourth instruction LDC1 loads the value written by the first two > stores as a floating point double. > > This is the machine function just before post RA scheduling: > (gdb) p MF.dump() > # Machine code for function PointToHPoint: > Frame Objects: > fi#-1: size=48, align=8, fixed, at location [SP+8] > fi#0: size=32, align=8, at location [SP-32] > Function Live Ins: %A0 in %vreg0, %A2 in %vreg1, %A3 in %vreg2 > > BB#0: derived from LLVM BB %entry > Live Ins: %A0 %A2 %A3 > %SP = ADDiu %SP, -32 > PROLOG_LABEL > SW %A3, %SP, 44; mem:ST4[FixedStack-1+4] > SW %A2, %SP, 40; mem:ST4[FixedStack-1](align=8) > %D0 = LDC1 %SP, 40; mem:LD8[%x2] > > > The frame index operands of the first two stores and the fourth load > have been lowered to real addresses. > Since the first two SWs store to ($sp + 44) and ($sp + 40), and > instruction LDC1 loads from ($sp + 40), > there should be a dependency between these instructions. > > However, when ScheduleDAGInstrs::BuildSchedGraph(AliasAnalysis *AA) > builds the schedule graph, > there are no dependency edges added between the two SWs and LDC1 because > getUnderlyingObjectForInstr returns different objects for these instructions: > > underlying object of SWs: FixedStack-1 > underlying object of LDC1: struct.ObjPointStruct* %P > > > Is this a bug? > Or are there ways to tell BuildSchedGraph it should add dependency edges? This is a wild guess. But it looks to me like your load's machineMemOperand should have been converted to refer to the stack frame. I would call that an ISEL bug. I can't say where the bug is without stepping through a test case. Maybe someone who's worked in this area of ISEL can give you a better hint. In the meantime, I would file a PR. -Andy From STPWORLD at narod.ru Wed Mar 7 01:54:45 2012 From: STPWORLD at narod.ru (Stepan Dyatkovskiy) Date: Wed, 07 Mar 2012 11:54:45 +0400 Subject: [LLVMdev] How to unroll loop with non-constant boundary In-Reply-To: References: <4247593E-2A36-48CA-8FF1-31EB511282DB@googlemail.com> <4F4BD6C1.2080303@free.fr> Message-ID: <439111331106885@web46.yandex.ru> Hi guys, I attached the modified patch that handles cases with low==end and stride!=1. Please find it for review. -Stepan 28.02.2012, 17:41, "Benjamin Kramer" : > On 27.02.2012, at 20:17, Duncan Sands wrote: > >> ?Hi Benjamin, >>> ?LLVM misses this optimization because ScalarEvolution's ComputeExitLimitFromICmp doesn't handle signed<= (SLE) and thus can't compute the number of times the loop is executed. I wonder if there's a reason for this, it seems like something simple to add. >> ?instsimplify could also be enhanced to clean it up in this particular case, but >> ?it would be better to make scev smarter. > > I filed http://llvm.org/bugs/show_bug.cgi?id=12110 to track this. > > - Ben > >> ?Ciao, Duncan. >> ?_______________________________________________ >> ?LLVM Developers mailing list >> ?LLVMdev at cs.uiuc.edu ????????http://llvm.cs.uiuc.edu >> ?http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu ????????http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- A non-text attachment was scrubbed... Name: pr12110.patch Type: application/octet-stream Size: 5358 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/61ebd16b/attachment.obj From babslachem at gmail.com Wed Mar 7 02:00:15 2012 From: babslachem at gmail.com (Seb) Date: Wed, 7 Mar 2012 09:00:15 +0100 Subject: [LLVMdev] Question on debug information In-Reply-To: <241D34DD-8883-4A42-8670-0A48B36DD185@apple.com> References: <241D34DD-8883-4A42-8670-0A48B36DD185@apple.com> Message-ID: Hi Jim, Thanks for the advice. Since I'm using LLVM 2.9 style of debug information. Will this code benefit from those improvement or should I generate LLVM 3.0 style of debug information ? Best Regards Seb 2012/3/6 Jim Grosbach > > On Mar 6, 2012, at 5:31 AM, Seb wrote: > > Hi all, > > Anyone have ideas/info on this topic ? > Thanks > Seb > > 2012/3/2 Seb > >> Hi all, >> >> I'm using my own front-end to generate following code .ll file targeting >> x86 32-bit: >> >> ; ModuleID = 'check.c' >> target datalayout = >> "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32" >> target triple = "i386-pc-linux-gnu" >> @.str581 = internal constant [52 x i8] c"---- test number %d failed. >> result %d expected %d\0a\00" >> @.str584 = internal constant [61 x i8] c"---- %3d tests completed. %d >> tests PASSED. %d tests failed.\0a\00" >> @.str587 = internal constant [61 x i8] c"---- %3d tests completed. %d >> tests passed. %d tests FAILED.\0a\00" >> define void @check(i32* %result, i32* %expect, i32 %n) { >> L.entry: >> %tests_passed = alloca i32 >> %tests_failed = alloca i32 >> %i = alloca i32 >> call void @llvm.dbg.value (metadata !{i32* %result}, i64 0, >> metadata !9), !dbg !4 >> call void @llvm.dbg.value (metadata !{i32* %expect}, i64 0, >> metadata !10), !dbg !4 >> call void @llvm.dbg.value (metadata !{i32 %n}, i64 0, metadata >> !11), !dbg !4 >> call void @llvm.dbg.declare (metadata !{i32* %tests_passed}, >> metadata !13), !dbg !4 >> store i32 0, i32* %tests_passed, !dbg !12 >> call void @llvm.dbg.declare (metadata !{i32* %tests_failed}, >> metadata !15), !dbg !4 >> store i32 0, i32* %tests_failed, !dbg !14 >> call void @llvm.dbg.declare (metadata !{i32* %i}, metadata !17), >> !dbg !4 >> store i32 0, i32* %i, !dbg !16 >> br label %L.B0000 >> L.B0000: >> %0 = load i32* %i, !dbg !16 >> %1 = icmp sge i32 %0, %n, !dbg !16 >> br i1 %1, label %L.B0001, label %L.B0008, !dbg !16 >> L.B0008: >> %2 = bitcast i32* %expect to i8*, !dbg !18 >> %3 = load i32* %i, !dbg !18 >> %4 = mul i32 %3, 4, !dbg !18 >> %5 = getelementptr i8* %2, i32 %4, !dbg !18 >> %6 = bitcast i8* %5 to i32*, !dbg !18 >> %7 = load i32* %6, !dbg !18 >> %8 = bitcast i32* %result to i8*, !dbg !18 >> %9 = load i32* %i, !dbg !18 >> %10 = mul i32 %9, 4, !dbg !18 >> %11 = getelementptr i8* %8, i32 %10, !dbg !18 >> %12 = bitcast i8* %11 to i32*, !dbg !18 >> %13 = load i32* %12, !dbg !18 >> %14 = icmp ne i32 %7, %13, !dbg !18 >> br i1 %14, label %L.B0003, label %L.B0009, !dbg !18 >> L.B0009: >> %15 = load i32* %tests_passed, !dbg !18 >> >> %16 = add i32 %15, 1, !dbg !18 >> store i32 %16, i32* %tests_passed, !dbg !18 >> br label %L.B0004, !dbg !19 >> L.B0003: >> %17 = load i32* %tests_failed, !dbg !20 >> >> %18 = add i32 %17, 1, !dbg !20 >> store i32 %18, i32* %tests_failed, !dbg !20 >> %19 = bitcast [52 x i8]* @.str581 to i8*, !dbg !21 >> %20 = load i32* %i, !dbg !21 >> %21 = bitcast i32* %result to i8*, !dbg !21 >> %22 = load i32* %i, !dbg !21 >> %23 = mul i32 %22, 4, !dbg !21 >> %24 = getelementptr i8* %21, i32 %23, !dbg !21 >> %25 = bitcast i8* %24 to i32*, !dbg !21 >> %26 = load i32* %25, !dbg !21 >> %27 = bitcast i32* %expect to i8*, !dbg !21 >> %28 = load i32* %i, !dbg !21 >> %29 = mul i32 %28, 4, !dbg !21 >> %30 = getelementptr i8* %27, i32 %29, !dbg !21 >> %31 = bitcast i8* %30 to i32*, !dbg !21 >> %32 = load i32* %31, !dbg !21 >> %33 = call i32 (i8*, ...)* @printf (i8* %19, i32 %20, i32 %26, >> i32 %32), !dbg !21 >> br label %L.B0004 >> L.B0004: >> %34 = load i32* %i, !dbg !22 >> >> %35 = add i32 %34, 1, !dbg !22 >> store i32 %35, i32* %i, !dbg !22 >> br label %L.B0000, !dbg !22 >> L.B0001: >> %36 = load i32* %tests_failed, !dbg !23 >> %37 = icmp ne i32 %36, 0, !dbg !23 >> br i1 %37, label %L.B0006, label %L.B0010, !dbg !23 >> L.B0010: >> %38 = bitcast [61 x i8]* @.str584 to i8*, !dbg !24 >> %39 = load i32* %tests_passed, !dbg !24 >> %40 = load i32* %tests_failed, !dbg !24 >> %41 = call i32 (i8*, ...)* @printf (i8* %38, i32 %n, i32 %39, >> i32 %40), !dbg !24 >> br label %L.B0007, !dbg !25 >> L.B0006: >> %42 = bitcast [61 x i8]* @.str587 to i8*, !dbg !26 >> %43 = load i32* %tests_passed, !dbg !26 >> %44 = load i32* %tests_failed, !dbg !26 >> %45 = call i32 (i8*, ...)* @printf (i8* %42, i32 %n, i32 %43, >> i32 %44), !dbg !26 >> br label %L.B0007 >> L.B0007: >> ret void, !dbg !27 >> } >> >> declare void @llvm.dbg.value(metadata, i64, metadata) >> declare void @llvm.dbg.declare(metadata, metadata) >> declare i32 @printf(i8*,...) >> >> !llvm.dbg.sp = !{!3} >> >> !llvm.dbg.lv.check = !{!9, !10, !11} >> >> !0 = metadata !{i32 589841, i32 0, i32 2, metadata !"check.c", metadata >> !".", metadata !" Seb Rel Dev-r02.27", i1 1, i1 0, metadata !"", i32 0} ; >> DW_TAG_compile_unit >> !1 = metadata !{i32 589865, metadata !"check.c", metadata !".", metadata >> !0} ; DW_TAG_file_type >> !2 = metadata !{i32 589845, metadata !1, metadata !"", metadata !1, i32 >> 0, i64 0, i64 0, i32 0, i32 0, i32 0, null, i32 0, i32 0} ; >> DW_TAG_subroutine_type >> !3 = metadata !{i32 589870, i32 0, metadata !1, metadata !"check", >> metadata !"check", metadata !"", metadata !1, i32 7, metadata !2, i1 0, i1 >> 1, i32 0, i32 0, i32 0, i32 0, i1 0, void (i32*, i32*, i32)* @check} ; >> DW_TAG_subprogram >> !4 = metadata !{i32 0, i32 0, metadata !3, null} >> !5 = metadata !{i32 589835, metadata !3, i32 7, i32 0, metadata !1, i32 >> 0} ; DW_TAG_lexical_block >> !6 = metadata !{i32 0, i32 0, metadata !5, null} >> !7 = metadata !{i32 589860, metadata !0, metadata !"int", null, i32 0, >> i64 32, i64 32, i64 0, i32 0, i32 5} ; DW_TAG_base_type >> !8 = metadata !{i32 589839, metadata !0, metadata !"", null, i32 0, i64 >> 32, i64 32, i64 0, i32 0, metadata !7} ; DW_TAG_pointer_type >> !9 = metadata !{i32 590081, metadata !3, metadata !"result", metadata >> !1, i32 16777216, metadata !8, i32 0} ; DW_TAG_arg_variable >> !10 = metadata !{i32 590081, metadata !3, metadata !"expect", metadata >> !1, i32 33554432, metadata !8, i32 0} ; DW_TAG_arg_variable >> !11 = metadata !{i32 590081, metadata !3, metadata !"n", metadata !1, >> i32 50331648, metadata !7, i32 0} ; DW_TAG_arg_variable >> !12 = metadata !{i32 9, i32 0, metadata !5, null} >> !13 = metadata !{i32 590080, metadata !5, metadata !"tests_passed", >> metadata !1, i32 0, metadata !7, i32 0} ; DW_TAG_auto_variable >> !14 = metadata !{i32 10, i32 0, metadata !5, null} >> !15 = metadata !{i32 590080, metadata !5, metadata !"tests_failed", >> metadata !1, i32 0, metadata !7, i32 0} ; DW_TAG_auto_variable >> !16 = metadata !{i32 12, i32 0, metadata !5, null} >> !17 = metadata !{i32 590080, metadata !5, metadata !"i", metadata !1, >> i32 0, metadata !7, i32 0} ; DW_TAG_auto_variable >> !18 = metadata !{i32 13, i32 0, metadata !5, null} >> !19 = metadata !{i32 14, i32 0, metadata !5, null} >> !20 = metadata !{i32 15, i32 0, metadata !5, null} >> !21 = metadata !{i32 17, i32 0, metadata !5, null} >> !22 = metadata !{i32 19, i32 0, metadata !5, null} >> !23 = metadata !{i32 20, i32 0, metadata !5, null} >> !24 = metadata !{i32 22, i32 0, metadata !5, null} >> !25 = metadata !{i32 23, i32 0, metadata !5, null} >> !26 = metadata !{i32 25, i32 0, metadata !5, null} >> !27 = metadata !{i32 26, i32 0, metadata !5, null} >> >> When I use llc 2.9 as follows: >> > > Try using current trunk LLVM. There have been a *lot* of debug info > improvements since 2.9. > > -Jim > > llc check.ll -march=x86 -o check.s >> and >> gcc -m32 -c check.s >> >> I've got a check.o file generated that targets x86 32-bit. >> Reading dwarf symbol using >> readelf --debug-dump check.o >> >> I've got for 'n' parameter: >> >> <2><71>: Abbrev Number: 3 (DW_TAG_formal_parameter) >> <72> DW_AT_name : n >> <74> DW_AT_type : <0xb3> >> <78> DW_AT_location : 0x0 (location list) >> >> I would have expected a DW_AT_location that is FP related and not 0x0. >> Is my LL file incorrect ? >> Is there something I can use in metadata to enforce a FP relative >> DW_AT_location to be generated ? >> >> Thanks for your answers >> Best Regards >> Seb > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/0cb8fbd3/attachment-0001.html From echristo at apple.com Wed Mar 7 02:15:36 2012 From: echristo at apple.com (Eric Christopher) Date: Wed, 07 Mar 2012 00:15:36 -0800 Subject: [LLVMdev] Question on debug information In-Reply-To: References: <241D34DD-8883-4A42-8670-0A48B36DD185@apple.com> Message-ID: <17C9D80F-7A15-4644-8FA5-C86A5BB062BF@apple.com> On Mar 7, 2012, at 12:00 AM, Seb wrote: > Hi Jim, > > Thanks for the advice. Since I'm using LLVM 2.9 style of debug information. Will this code benefit from those improvement or should I generate LLVM 3.0 style of debug information ? Development isn't going on the 2.9 code base any more and a bunch of changes and fixes have gone in from then until now. Generating current debug information is your likely best direction. -eric From nicholas at mxc.ca Wed Mar 7 02:20:01 2012 From: nicholas at mxc.ca (Nick Lewycky) Date: Wed, 07 Mar 2012 00:20:01 -0800 Subject: [LLVMdev] How to unroll loop with non-constant boundary In-Reply-To: <439111331106885@web46.yandex.ru> References: <4247593E-2A36-48CA-8FF1-31EB511282DB@googlemail.com> <4F4BD6C1.2080303@free.fr> <439111331106885@web46.yandex.ru> Message-ID: <4F571A31.5080309@mxc.ca> Stepan Dyatkovskiy wrote: > Hi guys, > I attached the modified patch that handles cases with low==end and stride!=1. I don't see how this could be correct. Your patch treats 'X s<= Y' as 'X s< Y+1', which is incorrect when Y is INT_MAX. Wouldn't that turn an infinite loop into a zero-trip loop? To be clear, this is the flaw in Benjamin's patch which you appear to have extended that makes it unsuitable for committing to the compiler. It would cause miscompiles. That said, I haven't had the chance to look into fixing this, but wouldn't testing AddRec for nsw/nuw be an easy fix that would salvage the rest of the logic? Nick > Please find it for review. > > -Stepan > > 28.02.2012, 17:41, "Benjamin Kramer": >> On 27.02.2012, at 20:17, Duncan Sands wrote: >> >>> Hi Benjamin, >>>> LLVM misses this optimization because ScalarEvolution's ComputeExitLimitFromICmp doesn't handle signed<= (SLE) and thus can't compute the number of times the loop is executed. I wonder if there's a reason for this, it seems like something simple to add. >>> instsimplify could also be enhanced to clean it up in this particular case, but >>> it would be better to make scev smarter. >> >> I filed http://llvm.org/bugs/show_bug.cgi?id=12110 to track this. >> >> - Ben >> >>> Ciao, Duncan. >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From babslachem at gmail.com Wed Mar 7 03:49:12 2012 From: babslachem at gmail.com (Seb) Date: Wed, 7 Mar 2012 10:49:12 +0100 Subject: [LLVMdev] Can't check out LLVM trunk ? Message-ID: Hi all, Following command was working for me before: svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm Now it fails as follows: svn: Server sent unexpected return value (500 Internal Server Error) in response to OPTIONS request for 'http://llvm.org/svn/llvm-project/llvm/trunk ' Any idea ? Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/7925aa0e/attachment.html From babslachem at gmail.com Wed Mar 7 04:17:16 2012 From: babslachem at gmail.com (Seb) Date: Wed, 7 Mar 2012 11:17:16 +0100 Subject: [LLVMdev] Can't check out LLVM trunk ? In-Reply-To: References: Message-ID: OK, pilot error on my side, it'w working now. 2012/3/7 Seb > Hi all, > > Following command was working for me before: > svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm > Now it fails as follows: > svn: Server sent unexpected return value (500 Internal Server Error) in > response to OPTIONS request for ' > http://llvm.org/svn/llvm-project/llvm/trunk' > Any idea ? > Best Regards > Seb > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/2639254f/attachment.html From babslachem at gmail.com Wed Mar 7 07:17:08 2012 From: babslachem at gmail.com (Seb) Date: Wed, 7 Mar 2012 14:17:08 +0100 Subject: [LLVMdev] Problem with x86 32-bit debug information ? Message-ID: Hi all, I'm using trunk version of LLVM/CLANG. When I compile attached files on my 64-bit Ubuntu 10.04 LTS system as follows: clang -O2 -g check.c main.c -o check64 When I do gdb check64 and set a breakpoint to the check routine and executes to the breakpoint, I've got: Breakpoint 1, check (result=0x601110, expect=0x601020, n=53) at check.c:7 7 { As you can see I can inspect 'n' value. Now if I compile for x86 32-bit as follows: clang -m32 -O2 -g check.c main.c -o check32 When I do gdb check32 and set a breakpoint to the check routine and executes to the breakpoint, I've got: Breakpoint 1, check (result=, expect=, n=0) at check.c:7 7 { As you can see I can NOT inspect 'n' value. Is there a way to inforce even at -O2 clang to generate debug informations so that I can inspect 'n' value ? Or is it a BUG from clang for x86 32-bit ? Thanks for your answers. Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/7b8e4233/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: check.c Type: text/x-csrc Size: 723 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/7b8e4233/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: main.c Type: text/x-csrc Size: 898 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/7b8e4233/attachment-0001.bin From joel.gouly at gmail.com Wed Mar 7 07:23:55 2012 From: joel.gouly at gmail.com (Joey Gouly) Date: Wed, 7 Mar 2012 13:23:55 +0000 Subject: [LLVMdev] [PATCH] Add -version to llvm-mc In-Reply-To: References: Message-ID: Hi all, llvm-mc has a -version option, but it doesn't print all registered targets like llc. Attached is the patch to fix that! Thanks, Joey -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/2c9e281d/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: add_version_llvmmc.diff Type: text/x-patch Size: 583 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/2c9e281d/attachment.bin From james.molloy at arm.com Wed Mar 7 07:23:47 2012 From: james.molloy at arm.com (James Molloy) Date: Wed, 7 Mar 2012 13:23:47 -0000 Subject: [LLVMdev] Problem with x86 32-bit debug information ? In-Reply-To: References: Message-ID: <006e01ccfc65$87261c60$95725520$@molloy@arm.com> Hi Seb, Clang cannot generate debug information for something that it has optimised away. You should reduce the optimisation level. In general debug information is only really accurate at -O0. Cheers, James From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Seb Sent: 07 March 2012 13:17 To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] Problem with x86 32-bit debug information ? Hi all, I'm using trunk version of LLVM/CLANG. When I compile attached files on my 64-bit Ubuntu 10.04 LTS system as follows: clang -O2 -g check.c main.c -o check64 When I do gdb check64 and set a breakpoint to the check routine and executes to the breakpoint, I've got: Breakpoint 1, check (result=0x601110, expect=0x601020, n=53) at check.c:7 7 { As you can see I can inspect 'n' value. Now if I compile for x86 32-bit as follows: clang -m32 -O2 -g check.c main.c -o check32 When I do gdb check32 and set a breakpoint to the check routine and executes to the breakpoint, I've got: Breakpoint 1, check (result=, expect=, n=0) at check.c:7 7 { As you can see I can NOT inspect 'n' value. Is there a way to inforce even at -O2 clang to generate debug informations so that I can inspect 'n' value ? Or is it a BUG from clang for x86 32-bit ? Thanks for your answers. Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/6bb03ae5/attachment.html From babslachem at gmail.com Wed Mar 7 07:37:16 2012 From: babslachem at gmail.com (Seb) Date: Wed, 7 Mar 2012 14:37:16 +0100 Subject: [LLVMdev] Problem with x86 32-bit debug information ? In-Reply-To: <4f576184.04e7d80a.7225.194cSMTPIN_ADDED@mx.google.com> References: <4f576184.04e7d80a.7225.194cSMTPIN_ADDED@mx.google.com> Message-ID: Hi James, clang is able to generate correct debug informations for 64-bit target at -O2. My feeling, given some other experiments I've done, is that debug information generated for x86 32-bit might be broken for parameters as long as they are not 'homed' in the code (local copy to an automatic variable). It seems that when llvm.declare is turned into a llvm.value for parameter there is something incorrect with respect to parameters debug informations that is generated by clang/llvm. I just would like confirmation of this. Thanks for your answer Best Regards Seb 2012/3/7 James Molloy > Hi Seb,**** > > ** ** > > Clang cannot generate debug information for something that it has > optimised away. You should reduce the optimisation level.**** > > ** ** > > In general debug information is only really accurate at ?O0.**** > > ** ** > > Cheers,**** > > ** ** > > James**** > > ** ** > > *From:* llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] *On > Behalf Of *Seb > *Sent:* 07 March 2012 13:17 > *To:* llvmdev at cs.uiuc.edu > *Subject:* [LLVMdev] Problem with x86 32-bit debug information ?**** > > ** ** > > Hi all, > > I'm using trunk version of LLVM/CLANG. > When I compile attached files on my 64-bit Ubuntu 10.04 LTS system as > follows: > > clang -O2 -g check.c main.c -o check64 > > When I do gdb check64 and set a breakpoint to the check routine and > executes to the breakpoint, I've got: > > Breakpoint 1, check (result=0x601110, expect=0x601020, n=53) at check.c:7 > 7 { > > As you can see I can inspect 'n' value. > > Now if I compile for x86 32-bit as follows: > > clang -m32 -O2 -g check.c main.c -o check32 > > When I do gdb check32 and set a breakpoint to the check routine and > executes to the breakpoint, I've got: > > Breakpoint 1, check (result=, > expect=, n=0) at check.c:7 > 7 { > > As you can see I can NOT inspect 'n' value. Is there a way to inforce even > at -O2 clang to generate debug informations so that I can inspect 'n' value > ? > Or is it a BUG from clang for x86 32-bit ? > Thanks for your answers. > Best Regards > Seb**** > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/ecf0d074/attachment.html From ivanllopard at gmail.com Wed Mar 7 08:23:00 2012 From: ivanllopard at gmail.com (Ivan Llopard) Date: Wed, 07 Mar 2012 15:23:00 +0100 Subject: [LLVMdev] Data/Address registers In-Reply-To: <46738C23-06A8-4F28-8068-FEBC0B39798E@apple.com> References: <4F521304.1030900@gmail.com> <46738C23-06A8-4F28-8068-FEBC0B39798E@apple.com> Message-ID: <4F576F44.6000801@gmail.com> Hi Jim, Thanks for your response. Le 06/03/2012 22:54, Jim Grosbach a ?crit : > Hi Ivan, > On Mar 3, 2012, at 4:48 AM, Ivan Llopard wrote: > >> Hi, >> >> I'm facing a problem in llvm while porting it to a new target and I'll >> need some support. >> We have 2 kind of register, one for general purposes (i.e. arithmetic, >> comparisons, etc.) and the other for memory addressing. > OK. Separate register classes should be able to handle this. > >> Cross copies are not allowed (no data path). > You mean you can't copy directly from a general purpose register to an address register? That's an unfortunate architectural quirk. You may have to write some interesting and potentially ugly code in copyPhysReg() to handle that. > Actually, I can't copy them in any way, it's just impossible :-/. >> We use clang 3.0 to produce assembler code. >> Because both registers have the same size and type (i16), I don't know >> what would be the best solution to distinguish them in order to match >> the right instructions. > The register classes should take care of this. I tried but IMO the matching rule should be context-dependent, i.e. an i16 addition should match machine additions with operands being either data registers or address registers depending on its usage. Even if I look at index operands of load/stores (into the DAG) to match target's addressing modes, I can't assume that some operations are not being used for something else than basic arithmetics (like comparisons which are not supported for address regs). Is it still possible to get ride of this with register classes ? I can make a pass before ISel to annotate the code identifying those registers which are only used for addressing (by doing a simple data-flow analysis), can it help ISelector later ? Because I could not find how to get metadata from the DAG to drive matching rules or lowering phases, is it possible ? How is metadata transferred to the DAG, where should I look for it ? Ivan >> Moreover, the standard pointer arithmetic is not >> enough for us (we need to support modulo operations also). >> I thought that I could manually match every arithmetic operation while >> matching the addressing mode but it doesn't work because intermediate >> results are sometimes reused for other purposes (e.g. comparisons). > I suggest getting things working correctly first and then coming back to things like this as an optimization. > >> Do I need to add another type to clang/llvm ? >> > Unlikely. > > Regards, > Jim > > >> Thanks in advance, >> >> Ivan >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From james.molloy at arm.com Wed Mar 7 08:24:57 2012 From: james.molloy at arm.com (James Molloy) Date: Wed, 7 Mar 2012 14:24:57 -0000 Subject: [LLVMdev] Problem with x86 32-bit debug information ? In-Reply-To: References: <4f576184.04e7d80a.7225.194cSMTPIN_ADDED@mx.google.com> Message-ID: <007301ccfc6e$12b7f6b0$3827e410$@molloy@arm.com> Hi Seb, I'm going to reiterate - Clang can decide when it wants to optimise away a variable. You asked for that behaviour when you specified -O2. You can't expect deterministically the same behaviour on both x86 and x86-64 platforms - the procedure call standards are different and different decisions go in to deciding how to optimise. You can't expect debug information for an optimised build to fully track that of the source because by definition the source is being modified to optimise. Cheers, James From: Seb [mailto:babslachem at gmail.com] Sent: 07 March 2012 13:37 To: James Molloy Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Problem with x86 32-bit debug information ? Hi James, clang is able to generate correct debug informations for 64-bit target at -O2. My feeling, given some other experiments I've done, is that debug information generated for x86 32-bit might be broken for parameters as long as they are not 'homed' in the code (local copy to an automatic variable). It seems that when llvm.declare is turned into a llvm.value for parameter there is something incorrect with respect to parameters debug informations that is generated by clang/llvm. I just would like confirmation of this. Thanks for your answer Best Regards Seb 2012/3/7 James Molloy Hi Seb, Clang cannot generate debug information for something that it has optimised away. You should reduce the optimisation level. In general debug information is only really accurate at -O0. Cheers, James From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Seb Sent: 07 March 2012 13:17 To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] Problem with x86 32-bit debug information ? Hi all, I'm using trunk version of LLVM/CLANG. When I compile attached files on my 64-bit Ubuntu 10.04 LTS system as follows: clang -O2 -g check.c main.c -o check64 When I do gdb check64 and set a breakpoint to the check routine and executes to the breakpoint, I've got: Breakpoint 1, check (result=0x601110, expect=0x601020, n=53) at check.c:7 7 { As you can see I can inspect 'n' value. Now if I compile for x86 32-bit as follows: clang -m32 -O2 -g check.c main.c -o check32 When I do gdb check32 and set a breakpoint to the check routine and executes to the breakpoint, I've got: Breakpoint 1, check (result=, expect=, n=0) at check.c:7 7 { As you can see I can NOT inspect 'n' value. Is there a way to inforce even at -O2 clang to generate debug informations so that I can inspect 'n' value ? Or is it a BUG from clang for x86 32-bit ? Thanks for your answers. Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/ddccedab/attachment.html From borya.043 at gmail.com Wed Mar 7 07:50:16 2012 From: borya.043 at gmail.com (Borya Egorov) Date: Wed, 7 Mar 2012 19:20:16 +0530 Subject: [LLVMdev] Alias analysis result Message-ID: Hello everyone, I am trying to find the alias between a store instruction's pointer operand and function arguments. This is the code, virtual void getAnalysisUsage(AnalysisUsage &AU) const { AU.addRequiredTransitive(); AU.addPreserved(); } virtual bool runOnFunction(Function &F) { AliasAnalysis &AA = getAnalysis(); for(Function::iterator i=F.begin();i!=F.end();++i){ for(BasicBlock::iterator j=i->begin();j!=i->end();++j) { if(dyn_cast(j)){ const StoreInst *SI=dyn_cast(j); AliasAnalysis::Location LocA = AA.getLocation(SI); const Value *si_v= SI->getPointerOperand(); for(Function::arg_iterator k=F.arg_begin(); k!=F.arg_end();++k) { Value *v=dyn_cast(k); AliasAnalysis::Location loc=AliasAnalysis::Location(v); AliasAnalysis::AliasResult ar=AA.alias(LocA,loc); switch(ar) { case 0:errs()<< "NoAlias\n"; break; ///< No dependencies. case 1:errs()<<"MayAlias\n"; ///< Anything goes break; case 2: errs()<<"PartialAlias\n";///< Pointers differ, but pointees overlap. break; case 3: errs()<<"MustAlias\n"; } } } return true; } }; } But I get MayAlias result even if the store instruction's pointer operand is not referencing the function argument. Is there something wrong with the logic? Are there any files in the LLVM source code that contain code to do something similar. Thanks:) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/4b1b3d3c/attachment.html From babslachem at gmail.com Wed Mar 7 08:50:59 2012 From: babslachem at gmail.com (Seb) Date: Wed, 7 Mar 2012 15:50:59 +0100 Subject: [LLVMdev] Problem with x86 32-bit debug information ? In-Reply-To: <4f576fad.a705b40a.44bd.ffffcc63SMTPIN_ADDED@mx.google.com> References: <4f576184.04e7d80a.7225.194cSMTPIN_ADDED@mx.google.com> <4f576fad.a705b40a.44bd.ffffcc63SMTPIN_ADDED@mx.google.com> Message-ID: Hi James, I fully agree with you and understand your statement about -O2. Now some questions for you: Did you try to reproduce experiments described in my previous e-mail ? Did you look at debug informations generated for 'n' parameter on x86 32-bit & x86 64-bit ? I'm working on my own front-end for LLVM and I had difficulties with debug information when they are related to x86 32-bits. So far there are two options: 1) metadata that I generate are incorrect. 2) LLVM is not handling in a correct manner those metadata for x86 32-bit target. I've already posted problem related to metadata that I generate and they are in LLVM 2.9 format. I've been adviced to move to most recent format. Before starting any move into that direction, I would like to be sure that LLVM trunk could solve the problem. Using clang at -O2 -g is giving me some indication that it won't solve my problem and that we are failing into option (2). So to summarize, I would be nice if someone can confirm that debug informations generated on this specific case are correct for x86 32-bit and that I would have to deal with that. Thanks Best Regards Seb 2012/3/7 James Molloy > Hi Seb,**** > > ** ** > > I?m going to reiterate ? Clang can decide when it wants to optimise away a > variable. You asked for that behaviour when you specified ?O2. You can?t > expect deterministically the same behaviour on both x86 and x86-64 > platforms ? the procedure call standards are different and different > decisions go in to deciding how to optimise.**** > > ** ** > > You can?t expect debug information for an optimised build to fully track > that of the source because by definition the source is being modified to > optimise.**** > > ** ** > > Cheers,**** > > ** ** > > James**** > > ** ** > > *From:* Seb [mailto:babslachem at gmail.com] > *Sent:* 07 March 2012 13:37 > *To:* James Molloy > *Cc:* llvmdev at cs.uiuc.edu > *Subject:* Re: [LLVMdev] Problem with x86 32-bit debug information ?**** > > ** ** > > Hi James, > > clang is able to generate correct debug informations for 64-bit target at > -O2. My feeling, given some other experiments I've done, is that debug > information generated for x86 32-bit might be broken for parameters as long > as they are not 'homed' in the code (local copy to an automatic variable). > It seems that when llvm.declare is turned into a llvm.value for parameter > there is something incorrect with respect to parameters debug informations > that is generated by clang/llvm. I just would like confirmation of this. > > Thanks for your answer > Best Regards > Seb**** > > 2012/3/7 James Molloy **** > > Hi Seb,**** > > **** > > Clang cannot generate debug information for something that it has > optimised away. You should reduce the optimisation level.**** > > **** > > In general debug information is only really accurate at ?O0.**** > > **** > > Cheers,**** > > **** > > James**** > > **** > > *From:* llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] *On > Behalf Of *Seb > *Sent:* 07 March 2012 13:17 > *To:* llvmdev at cs.uiuc.edu > *Subject:* [LLVMdev] Problem with x86 32-bit debug information ?**** > > **** > > Hi all, > > I'm using trunk version of LLVM/CLANG. > When I compile attached files on my 64-bit Ubuntu 10.04 LTS system as > follows: > > clang -O2 -g check.c main.c -o check64 > > When I do gdb check64 and set a breakpoint to the check routine and > executes to the breakpoint, I've got: > > Breakpoint 1, check (result=0x601110, expect=0x601020, n=53) at check.c:7 > 7 { > > As you can see I can inspect 'n' value. > > Now if I compile for x86 32-bit as follows: > > clang -m32 -O2 -g check.c main.c -o check32 > > When I do gdb check32 and set a breakpoint to the check routine and > executes to the breakpoint, I've got: > > Breakpoint 1, check (result=, > expect=, n=0) at check.c:7 > 7 { > > As you can see I can NOT inspect 'n' value. Is there a way to inforce even > at -O2 clang to generate debug informations so that I can inspect 'n' value > ? > Or is it a BUG from clang for x86 32-bit ? > Thanks for your answers. > Best Regards > Seb**** > > ** ** > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/1be04d2a/attachment.html From baldrick at free.fr Wed Mar 7 08:54:46 2012 From: baldrick at free.fr (Duncan Sands) Date: Wed, 07 Mar 2012 15:54:46 +0100 Subject: [LLVMdev] Alias analysis result In-Reply-To: References: Message-ID: <4F5776B6.3020006@free.fr> Hi Borya, > But I get MayAlias result even if the store instruction's pointer operand is not > referencing the function argument. Is there something wrong with the logic? Are > there any files in the LLVM source code that contain code to do something similar. the default alias analysis is no-aa which does what the name suggests :) Did schedule basic-aa (for example) before your pass? Ciao, Duncan. From grosbach at apple.com Wed Mar 7 10:36:44 2012 From: grosbach at apple.com (Jim Grosbach) Date: Wed, 07 Mar 2012 08:36:44 -0800 Subject: [LLVMdev] Data/Address registers In-Reply-To: <4F576F44.6000801@gmail.com> References: <4F521304.1030900@gmail.com> <46738C23-06A8-4F28-8068-FEBC0B39798E@apple.com> <4F576F44.6000801@gmail.com> Message-ID: <4AE96120-FE98-4DC2-B963-A60C043B33E2@apple.com> On Mar 7, 2012, at 6:23 AM, Ivan Llopard wrote: > Hi Jim, > > Thanks for your response. > > Le 06/03/2012 22:54, Jim Grosbach a ?crit : >> Hi Ivan, >> On Mar 3, 2012, at 4:48 AM, Ivan Llopard wrote: >> >>> Hi, >>> >>> I'm facing a problem in llvm while porting it to a new target and I'll >>> need some support. >>> We have 2 kind of register, one for general purposes (i.e. arithmetic, >>> comparisons, etc.) and the other for memory addressing. >> OK. Separate register classes should be able to handle this. >> >>> Cross copies are not allowed (no data path). >> You mean you can't copy directly from a general purpose register to an address register? That's an unfortunate architectural quirk. You may have to write some interesting and potentially ugly code in copyPhysReg() to handle that. >> > > Actually, I can't copy them in any way, it's just impossible :-/. Do you have load/store instructions for each register class? Worst case you could do a push/pop pair on the stack. It's really, really important that there be a way, even a very expensive way, to do this. > >>> We use clang 3.0 to produce assembler code. >>> Because both registers have the same size and type (i16), I don't know >>> what would be the best solution to distinguish them in order to match >>> the right instructions. >> The register classes should take care of this. > > I tried but IMO the matching rule should be context-dependent, i.e. an i16 addition should match machine additions with operands being either data registers or address registers depending on its usage. Even if I look at index operands of load/stores (into the DAG) to match target's addressing modes, I can't assume that some operations are not being used for something else than basic arithmetics (like comparisons which are not supported for address regs). Is it still possible to get ride of this with register classes ? It should be, yes. For a contrived example of a simple add-immediate instruction for each: def ADD_address_reg: myBaseInstrClass<(outs ADDR_REG:$dst), (ins ADDR_REG:$src, i32imm:$imm), [(set ADDR_REG:$dst, (add ADDR_REG:$dst, i32imm:$imm)]>; def ADD_general_reg: myBaseInstrClass<(outs GPR:$dst), (ins GPR:$src, i32imm:$imm), [(set GPR:$dst, (add GPR:$dst, i32imm:$imm)]>; Likewise, other operations that can target either register class should have a variant for each. ISel will choose the appropriate one based on the rest of the operands. > I can make a pass before ISel to annotate the code identifying those registers which are only used for addressing (by doing a simple data-flow analysis), can it help ISelector later ? > Because I could not find how to get metadata from the DAG to drive matching rules or lowering phases, is it possible ? How is metadata transferred to the DAG, where should I look for it ? > Metadata should not be necessary for this. In general, metadata should never be used for anything that's required information, only for optional information. I.e., if it's stripped out of the IR, the backend should still generate correct code. -Jim > Ivan > >>> Moreover, the standard pointer arithmetic is not >>> enough for us (we need to support modulo operations also). >>> I thought that I could manually match every arithmetic operation while >>> matching the addressing mode but it doesn't work because intermediate >>> results are sometimes reused for other purposes (e.g. comparisons). >> I suggest getting things working correctly first and then coming back to things like this as an optimization. >> >>> Do I need to add another type to clang/llvm ? >>> >> Unlikely. >> >> Regards, >> Jim >> >> >>> Thanks in advance, >>> >>> Ivan >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From hammacher at cs.uni-saarland.de Wed Mar 7 10:42:34 2012 From: hammacher at cs.uni-saarland.de (Clemens Hammacher) Date: Wed, 07 Mar 2012 17:42:34 +0100 Subject: [LLVMdev] [PATCH] Performance degradation when repeatedly exchanging JITted functions In-Reply-To: References: <4F562D58.1010200@cs.uni-saarland.de> <20120306154445.GA30390@britannica.bec.de> <20120306162806.GA31282@britannica.bec.de>, <4F563DBF.50602@cs.uni-saarland.de> Message-ID: <4F578FFA.7070002@cs.uni-saarland.de> On 3/6/12 6:48 PM, James Molloy wrote: >> I don't think that a patch implementing any of those approaches would be >> accepted, that's why I am tending towards implementing it outside of LLVM. > > Why not? If they make LLVM better and aren't hacks, why wouldn't they be accepted? Okay, that motivated me to work on the patch again. I think I found a compromise of the discussed approaches. The original stub (which is being hold by the JITResolver anyway) is updated to point to the new version in any case. Additionally you can set a flag in the ExecutionEngine to always use the stub when calling a function. If this flag is set, a recompileAndRelinkFunction does *not* patch the old function pointer to jump to the new function, since all calls use the stub anyway. Since - as I wrote - several places in the JIT rely on the global mapping being updated to the start of the newly jitted function, I didn't change that. Instead, after jitting a function, the mapping is changend back to the stub, if the KeepStubs flag is set. The only drawback of this is that *directly* recursive calls still bypass the stub and jump back directly to the function pointer. But since exchanging a function while another thread is executing it is unsafe anyway, this shouldn't matter. Even exchanging a function running in the same thread (e.g. from a callback into the VM) is unsafe in the current implementation, since you would overwrite the original function code at the start of the method. So I think this should be fine. I attached a patch implementing this, and a test case for the new flag. Both apply to trunk. Should I send them to the commits list, or does anyone with commit rights find them here? If so, that person can also apply the fix and testcase for bug 12197, which I stumbled across and is slightly related to this one. http://llvm.org/bugs/show_bug.cgi?id=12197 Cheers, Clemens -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: implement_KeepStubs.patch Url: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/d709bfdb/attachment.pl -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: testcase_KeepStubs.patch Url: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/d709bfdb/attachment-0001.pl -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 6392 bytes Desc: S/MIME Cryptographic Signature Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/d709bfdb/attachment.bin From afylot at gmail.com Wed Mar 7 10:45:45 2012 From: afylot at gmail.com (simona bellavista) Date: Wed, 7 Mar 2012 17:45:45 +0100 Subject: [LLVMdev] compiling llvm 3.0 with gcc 4.4.6 on linux x86_64: test-suite fails on two tests Message-ID: Hi, I compiled llvm v3.0 on linux x86_64 with gcc 4.4.6 and test-suite fails on two tests: SingleSource/UnitTests/Vector/SSE/sse.expandfft SingleSource/UnitTests/Vector/SSE/sse.stepfft I configured and compiled with with ../llvm/configure CFLAGS=-O3 --prefix=/scratch/user/local/llvm-release_30_opt --enable-optimized make -j2 make check-all and tested with gmake TEST=simple report report.html I got two failures, which flags should I use to compile with optimization but avoiding these errors? ========= '/scratch/user/download/release_30/build_optimized/projects/test-suite/SingleSource/UnitTests/Vector/SSE/sse.expandfft' Program TEST-PASS: compile /scratch/user/download/release_30/build_optimized/projects/test-suite/SingleSource/UnitTests/Vector/SSE/sse.expandfft TEST-RESULT-compile-success: pass TEST-RESULT-compile-time: program 0.160000 TEST-FAIL: exec /scratch/user/download/release_30/build_optimized/projects/test-suite/SingleSource/UnitTests/Vector/SSE/sse.expandfft TEST-RESULT-exec-time: program 0.690000 ========= '/scratch/user/download/release_30/build_optimized/projects/test-suite/SingleSource/UnitTests/Vector/SSE/sse.stepfft' Program TEST-PASS: compile /scratch/user/download/release_30/build_optimized/projects/test-suite/SingleSource/UnitTests/Vector/SSE/sse.stepfft TEST-RESULT-compile-success: pass TEST-RESULT-compile-time: program 0.170000 TEST-FAIL: exec /scratch/user/download/release_30/build_optimized/projects/test-suite/SingleSource/UnitTests/Vector/SSE/sse.stepfft TEST-RESULT-exec-time: program 0.840000 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/bc095ad2/attachment.html From James.Molloy at arm.com Wed Mar 7 10:57:27 2012 From: James.Molloy at arm.com (James Molloy) Date: Wed, 7 Mar 2012 16:57:27 +0000 Subject: [LLVMdev] [PATCH] Performance degradation when repeatedly exchanging JITted functions In-Reply-To: <4F578FFA.7070002@cs.uni-saarland.de> References: <4F562D58.1010200@cs.uni-saarland.de> <20120306154445.GA30390@britannica.bec.de> <20120306162806.GA31282@britannica.bec.de>, <4F563DBF.50602@cs.uni-saarland.de> <4F578FFA.7070002@cs.uni-saarland.de> Message-ID: Hi Clemens, You should send to the commits list, as you suggest :) Cheers, James -----Original Message----- From: Clemens Hammacher [mailto:hammacher at cs.uni-saarland.de] Sent: 07 March 2012 16:43 To: James Molloy; llvmdev at cs.uiuc.edu Subject: [PATCH] Performance degradation when repeatedly exchanging JITted functions On 3/6/12 6:48 PM, James Molloy wrote: >> I don't think that a patch implementing any of those approaches would be >> accepted, that's why I am tending towards implementing it outside of LLVM. > > Why not? If they make LLVM better and aren't hacks, why wouldn't they be accepted? Okay, that motivated me to work on the patch again. I think I found a compromise of the discussed approaches. The original stub (which is being hold by the JITResolver anyway) is updated to point to the new version in any case. Additionally you can set a flag in the ExecutionEngine to always use the stub when calling a function. If this flag is set, a recompileAndRelinkFunction does *not* patch the old function pointer to jump to the new function, since all calls use the stub anyway. Since - as I wrote - several places in the JIT rely on the global mapping being updated to the start of the newly jitted function, I didn't change that. Instead, after jitting a function, the mapping is changend back to the stub, if the KeepStubs flag is set. The only drawback of this is that *directly* recursive calls still bypass the stub and jump back directly to the function pointer. But since exchanging a function while another thread is executing it is unsafe anyway, this shouldn't matter. Even exchanging a function running in the same thread (e.g. from a callback into the VM) is unsafe in the current implementation, since you would overwrite the original function code at the start of the method. So I think this should be fine. I attached a patch implementing this, and a test case for the new flag. Both apply to trunk. Should I send them to the commits list, or does anyone with commit rights find them here? If so, that person can also apply the fix and testcase for bug 12197, which I stumbled across and is slightly related to this one. http://llvm.org/bugs/show_bug.cgi?id=12197 Cheers, Clemens -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From johnso87 at crhc.illinois.edu Wed Mar 7 12:38:01 2012 From: johnso87 at crhc.illinois.edu (Matt Johnson) Date: Wed, 7 Mar 2012 12:38:01 -0600 Subject: [LLVMdev] "Machine LICM" for Constants? Message-ID: <4F57AB09.2000101@crhc.illinois.edu> Hi All, I work on a backend for a target similar to Mips, where large immediates are loaded into registers with 2 instructions, 1 to load the MSBits and 1 to load the LSBits. I've noticed a recurring pattern where, despite low register pressure, these constants will be rematerialized in every iteration of a loop, rather than being hoisted. Here's an example using the mips-unknown-unknown target and Clang/LLVM HEAD. From newlib's implementation of strncat: #define DETECTNULL(X) (((X) - 0x01010101) & ~(X) & 0x80808080) while (!DETECTNULL (*aligned_s1)) aligned_s1++; This loop gets lowered under -O3 to: $BB0_5: lui $3, 32896 lui $7, 65278 ori $3, $3, 32896 ###### Materialize 0x80808080 lw $8, 4($2) nop and $9, $8, $3 ori $7, $7, 65279 ###### Materialize -(0x01010101) addiu $2, $2, 4 xor $3, $9, $3 addu $7, $8, $7 and $3, $3, $7 beq $3, $zero, $BB0_5 There are a ton of unused caller-saved registers in this small function, so I expected the constant materialization to be hoisted out of the tight loop. I'm still learning about the new register allocator and am not immediately able to make sense of its debug output (and the 'problem' may be elsewhere in any case). I'm happy to post the results of -debug-only regalloc if they're useful. Is my desire to hoist the constants out of the loop reasonable? Is there something I can do (hints or passes in my backend, clang/opt flag, etc.) to make this happen today? If not, what is the root cause? Maybe there's no way to hoist things out of a loop once IR is lowered into a SelectionDAG? Thanks, Matt From ahatanak at gmail.com Wed Mar 7 13:34:10 2012 From: ahatanak at gmail.com (Akira Hatanaka) Date: Wed, 7 Mar 2012 11:34:10 -0800 Subject: [LLVMdev] Question about post RA scheduler In-Reply-To: <8EABC991-2BE1-45E5-A388-ABCCE5E826FB@apple.com> References: <8EABC991-2BE1-45E5-A388-ABCCE5E826FB@apple.com> Message-ID: I filed a bug report (Bug 12205). Please take a look when you have time. Per your suggestion, I also attached a patch which attaches to load or store nodes a machinepointerinfo that points to a stack frame object when it can infer they are actually reading from or writing to the stack. The test that was failing passes if I apply this patch, but I doubt this is the right approach, because this will fail if InferPointerInfo in SelectionDAG.cpp cannot discover a load or store is accessing a stack object (it can only infer the information if the expression for the pointer is simple, for example add FI + const). An alternative approach might be to make the machinepointerinfo of the stores refer to %struct.ObjPointStruct* byval %P or refer to nothing, but that currently doesn't seem to be possible. On Tue, Mar 6, 2012 at 6:01 PM, Andrew Trick wrote: > On Mar 6, 2012, at 5:05 PM, Akira Hatanaka wrote: >> I am having trouble trying to enable post RA scheduler for the Mips backend. >> >> This is the bit code of the function I am compiling: >> >> (gdb) p MF.Fn->dump() >> >> define void @PointToHPoint(%struct.HPointStruct* noalias sret >> %agg.result, %struct.ObjPointStruct* byval %P) nounwind { >> entry: >> ?%res = alloca %struct.HPointStruct, align 8 >> ?%x2 = bitcast %struct.ObjPointStruct* %P to double* >> ?%0 = load double* %x2, align 8 >> >> The third instruction is loading the first floating point double of >> structure %P which is being passed by value. >> >> This is the machine function right after completion of isel: >> (gdb) p MF->dump() >> # Machine code for function PointToHPoint: >> Frame Objects: >> ?fi#-1: size=48, align=8, fixed, at location [SP+8] >> ?fi#0: size=32, align=8, at location [SP] >> Function Live Ins: %A0 in %vreg0, %A2 in %vreg1, %A3 in %vreg2 >> >> BB#0: derived from LLVM BB %entry >> ? ? ? SW %vreg2, , 4; mem:ST4[FixedStack-1+4] CPURegs:%vreg2 >> ? ? ? SW %vreg1, , 0; mem:ST4[FixedStack-1](align=8) CPURegs:%vreg1 >> ? ? ? %vreg3 = COPY %vreg0; CPURegs:%vreg3,%vreg0 >> ? ? ? %vreg4 = LDC1 , 0; mem:LD8[%x2] AFGR64:%vreg4 >> >> >> The first two stores write the values in argument registers $6 and $7 >> to frame object -1 >> (Mips stores byval arguments passed in registers to the stack). >> The fourth instruction LDC1 loads the value written by the first two >> stores as a floating point double. >> >> This is the machine function just before post RA scheduling: >> (gdb) p MF.dump() >> # Machine code for function PointToHPoint: >> Frame Objects: >> ?fi#-1: size=48, align=8, fixed, at location [SP+8] >> ?fi#0: size=32, align=8, at location [SP-32] >> Function Live Ins: %A0 in %vreg0, %A2 in %vreg1, %A3 in %vreg2 >> >> BB#0: derived from LLVM BB %entry >> ? ?Live Ins: %A0 %A2 %A3 >> ? ? ? %SP = ADDiu %SP, -32 >> ? ? ? PROLOG_LABEL >> ? ? ? SW %A3, %SP, 44; mem:ST4[FixedStack-1+4] >> ? ? ? SW %A2, %SP, 40; mem:ST4[FixedStack-1](align=8) >> ? ? ? %D0 = LDC1 %SP, 40; mem:LD8[%x2] >> >> >> The frame index operands of the first two stores and the fourth load >> have been lowered to real addresses. >> Since the first two SWs store to ($sp + 44) and ?($sp + 40), and >> instruction LDC1 loads from ($sp + 40), >> there should be a dependency between these instructions. >> >> However, when ScheduleDAGInstrs::BuildSchedGraph(AliasAnalysis *AA) >> builds the schedule graph, >> there are no dependency edges added between the two SWs and LDC1 because >> getUnderlyingObjectForInstr returns different objects for these instructions: >> >> underlying object of SWs: FixedStack-1 >> underlying object of LDC1: struct.ObjPointStruct* %P >> >> >> Is this a bug? >> Or are there ways to tell BuildSchedGraph it should add dependency edges? > > This is a wild guess. But it looks to me like your load's machineMemOperand should have been converted to refer to the stack frame. I would call that an ISEL bug. I can't say where the bug is without stepping through a test case. > > Maybe someone who's worked in this area of ISEL can give you a better hint. In the meantime, I would file a PR. > > -Andy -------------- next part -------------- A non-text attachment was scrubbed... Name: machineptrinfo.patch Type: text/x-patch Size: 1122 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/20204c6d/attachment.bin From ryta1203 at gmail.com Wed Mar 7 14:03:02 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Wed, 7 Mar 2012 12:03:02 -0800 Subject: [LLVMdev] Updating value from PHI Message-ID: I am splitting a one BB loop into two BB. Basically, the one loop BB has 3 incoming values, one form back edge two from other edges. I want to extract the PHIs from the other two edges out into it's own BB and delete that from the loop, then redirect the backedge to the loopbody (non extracted portion) and create a new PHI coming from the extracted BB and the backedge. I can do this; however, the PHIs following in all the other BBs are not getting updated, neither are the statements in the loopbody. What is the easieset way to propagate these changes downward? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/bc52aa4e/attachment.html From evan.cheng at apple.com Wed Mar 7 14:45:49 2012 From: evan.cheng at apple.com (Evan Cheng) Date: Wed, 07 Mar 2012 12:45:49 -0800 Subject: [LLVMdev] "Machine LICM" for Constants? In-Reply-To: <4F57AB09.2000101@crhc.illinois.edu> References: <4F57AB09.2000101@crhc.illinois.edu> Message-ID: <0799047A-66AF-4002-9935-B864E257B5C7@apple.com> Yes machine-licm can and should hoist constant materialization instructions out of the loop. If it's not doing that, it's probably because the target is not modeling the instruction correctly. I would walk through MachineLICM::IsLoopInvariantInst() in the debugger to figure it out. You can also try compiling the same bitcode for a target like ARM or X86 as a comparison. Evan On Mar 7, 2012, at 10:38 AM, Matt Johnson wrote: > Hi All, > I work on a backend for a target similar to Mips, where large > immediates are loaded into registers with 2 instructions, 1 to load the > MSBits and 1 to load the LSBits. I've noticed a recurring pattern > where, despite low register pressure, these constants will be > rematerialized in every iteration of a loop, rather than being hoisted. > Here's an example using the mips-unknown-unknown target and Clang/LLVM > HEAD. From newlib's implementation of strncat: > > #define DETECTNULL(X) (((X) - 0x01010101) & ~(X) & 0x80808080) > while (!DETECTNULL (*aligned_s1)) > aligned_s1++; > > This loop gets lowered under -O3 to: > > $BB0_5: > lui $3, 32896 > lui $7, 65278 > ori $3, $3, 32896 ###### Materialize 0x80808080 > lw $8, 4($2) > nop > and $9, $8, $3 > ori $7, $7, 65279 ###### Materialize -(0x01010101) > addiu $2, $2, 4 > xor $3, $9, $3 > addu $7, $8, $7 > and $3, $3, $7 > beq $3, $zero, $BB0_5 > > > There are a ton of unused caller-saved registers in this small function, > so I expected the constant materialization to be hoisted out of the > tight loop. I'm still learning about the new register allocator and am > not immediately able to make sense of its debug output (and the > 'problem' may be elsewhere in any case). I'm happy to post the results > of -debug-only regalloc if they're useful. > > Is my desire to hoist the constants out of the loop reasonable? Is > there something I can do (hints or passes in my backend, clang/opt flag, > etc.) to make this happen today? If not, what is the root cause? Maybe > there's no way to hoist things out of a loop once IR is lowered into a > SelectionDAG? > > Thanks, > Matt > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From nicolas.capens at gmail.com Wed Mar 7 14:47:47 2012 From: nicolas.capens at gmail.com (Nicolas Capens) Date: Wed, 07 Mar 2012 15:47:47 -0500 Subject: [LLVMdev] Scalar replacement of arrays Message-ID: <4F57C973.4000805@gmail.com> Hi all, I'm implementing a virtual processor which features dynamic register indexing, and I'm struggling to make LLVM 3.0 produce good code for it. The register set is implemented as an LLVM array so it can be dynamically indexed using GEP. However, most of the time the virtual processor's registers are just statically indexed, and so I expected/hoped the code would be as optimal as when the virtual registers are implemented using individual scalars, which are allocated to the target machine's physical registers as much as possible. But that turns out not to be the case and I end up with code which constantly reads and writes memory to access my virtual registers. The "Scalar Replacement of Aggregates" pass (scalarrepl) seems to be capable of splitting structures into separate fields so that mem2reg can produce efficient code which avoids redundant memory operations. But it skips my array entirely. Here's a small piece of C code which illustrates the problem: int foo(int x, int y) { int r[2]; r[0] = x; r[1] = y; r[0] = r[0] + r[1]; return r[0]; } This gives me the following (x86) assembly code: sub esp,8 mov eax,dword ptr [esp+0Ch] mov dword ptr [esp],eax mov eax,dword ptr [esp+10h] mov dword ptr [esp+4],eax add eax,dword ptr [esp] mov dword ptr [esp],eax add esp,8 ret If I replace the array with two individual scalars, I get the following perfect result instead: mov eax,dword ptr [esp+8] add eax,dword ptr [esp+4] ret Unfortunately, I don't think that having scalarrepl handle arrays will do the trick. It will work for the above trivial example, but my array of registers does get indexed dynamically from time to time, and this would completely prevent scalarrepl from doing anything, right? Ideally LLVM should keep things in physical registers as long as possible, and when the virtual register array is being dynamically indexed it should write the physical registers back to the array... So does anyone know if this can already be achieved using some other passes or settings? If not, what would be the best approach to implement it? Thanks for any help, Nicolas From fabian.scheler at gmail.com Wed Mar 7 14:48:37 2012 From: fabian.scheler at gmail.com (Fabian Scheler) Date: Wed, 7 Mar 2012 21:48:37 +0100 Subject: [LLVMdev] 2.9 segfault when requesting for both LoopInfo and DominatorTree analyses. In-Reply-To: References: Message-ID: Hi, folks! Are there any news regarding the findAnalysis-issue. This week I started to port my project from LLVM 2.7 to LLVM 3.0 and ran into it yesterday. I have no clue how to resolve it. How did you "make it go away"? BTW: Is there a helpful documentation how all these INITIALIZE_PASS-macros shall be used? Ciao, Fabian From eli.friedman at gmail.com Wed Mar 7 15:00:11 2012 From: eli.friedman at gmail.com (Eli Friedman) Date: Wed, 7 Mar 2012 13:00:11 -0800 Subject: [LLVMdev] Scalar replacement of arrays In-Reply-To: <4F57C973.4000805@gmail.com> References: <4F57C973.4000805@gmail.com> Message-ID: On Wed, Mar 7, 2012 at 12:47 PM, Nicolas Capens wrote: > Hi all, > > I'm implementing a virtual processor which features dynamic register > indexing, and I'm struggling to make LLVM 3.0 produce good code for it. > The register set is implemented as an LLVM array so it can be > dynamically indexed using GEP. However, most of the time the virtual > processor's registers are just statically indexed, and so I > expected/hoped the code would be as optimal as when the virtual > registers are implemented using individual scalars, which are allocated > to the target machine's physical registers as much as possible. But that > turns out not to be the case and I end up with code which constantly > reads and writes memory to access my virtual registers. > > The "Scalar Replacement of Aggregates" pass (scalarrepl) seems to be > capable of splitting structures into separate fields so that mem2reg can > produce efficient code which avoids redundant memory operations. But it > skips my array entirely. Here's a small piece of C code which > illustrates the problem: > > int foo(int x, int y) > { > ? ? int r[2]; > ? ? r[0] = x; > ? ? r[1] = y; > ? ? r[0] = r[0] + r[1]; > ? ? return r[0]; > } clang -O2 for that C code gives: pushl %ebp movl %esp, %ebp movl 12(%ebp), %eax addl 8(%ebp), %eax popl %ebp ret > If I replace the array with two individual scalars, I get the following > perfect result instead: > > ?mov ? ? ? ? eax,dword ptr [esp+8] > ?add ? ? ? ? eax,dword ptr [esp+4] > ?ret > > Unfortunately, I don't think that having scalarrepl handle arrays will > do the trick. It will work for the above trivial example, but my array > of registers does get indexed dynamically from time to time, and this > would completely prevent scalarrepl from doing anything, right? Yes; you wouldn't really want it to try. > Ideally LLVM should keep things in physical registers as long as > possible, and when the virtual register array is being dynamically > indexed it should write the physical registers back to the array... > > So does anyone know if this can already be achieved using some other > passes or settings? If not, what would be the best approach to implement it? Conceptually, we ought to be able to handle that sort of issue with a combination of GVN and dead store elimination (DSE). Unfortunately, LLVM's DSE pass is rather weak. so that approach might not be so effective in practice. -Eli From evan.cheng at apple.com Wed Mar 7 17:51:50 2012 From: evan.cheng at apple.com (Evan Cheng) Date: Wed, 07 Mar 2012 15:51:50 -0800 Subject: [LLVMdev] x86-64 sign extension for parameters and return values In-Reply-To: References: Message-ID: <9B07AD09-5165-459A-843C-2713D469D9EB@apple.com> Hi Meador, Have you filed a bugzilla report? What's the PR number? Evan On Feb 23, 2012, at 3:11 PM, Meador Inge wrote: > On Thu, Feb 23, 2012 at 3:54 PM, Eli Friedman wrote: > >> LLVM has traditionally assumed that all integer argument and return >> types narrower than int are promoted to int on all architectures. >> Nobody has actually noticed any issues with this before now, as far as >> I know. > > The only reason that I noticed was that Python ctypes started misbehaving > when we went to build/test it on OS X Lion (http://bugs.python.org/issue13370). > After investigating the failure I found this. Python uses libffi and > libffi implements > the GCC ABI. So I would expect any project using libffi with clang to > have problems. > >> If gcc has decided to assume no sign/zero-extension on x86-64, we need >> to follow their lead, at least on Linux. Please file at >> http://llvm.org/bugs/ ; an executable testcase to go with this would >> be nice, so we can compare various compilers and different platforms. > > Will do. Thanks. > > -- Meador > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From ryta1203 at gmail.com Wed Mar 7 18:04:47 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Wed, 7 Mar 2012 16:04:47 -0800 Subject: [LLVMdev] Updating value from PHI In-Reply-To: References: Message-ID: I have attached a case of what I am trying to do, I'm pretty sure I'm just missing some simple API call. In the cfg you can see that although Im setting "lsr.iv441" as "lsr.iv44" from for.body.387.i it's not propagating that through the block or graph. On Wed, Mar 7, 2012 at 12:03 PM, Ryan Taylor wrote: > I am splitting a one BB loop into two BB. > > Basically, the one loop BB has 3 incoming values, one form back edge two > from other edges. I want to extract the PHIs from the other two edges out > into it's own BB and delete that from the loop, then redirect the backedge > to the loopbody (non extracted portion) and create a new PHI coming from > the extracted BB and the backedge. > > I can do this; however, the PHIs following in all the other BBs are not > getting updated, neither are the statements in the loopbody. > > What is the easieset way to propagate these changes downward? > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/78118dad/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: BasicBlock Type: application/octet-stream Size: 42180 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/78118dad/attachment.obj From ryta1203 at gmail.com Wed Mar 7 18:08:17 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Wed, 7 Mar 2012 16:08:17 -0800 Subject: [LLVMdev] Updating value from PHI In-Reply-To: References: Message-ID: Here is the code snippet that I am using to create the PHIs in the loop according to the PHIs in the new preheader. At this point I have already redirected the loop backedge and removed the preheader from the loop. for (BasicBlock::iterator II = loopHeaderBB->begin(); (PN=dyn_cast(II)); ++II) { // remove loop back PHI and add it to split BB errs()<<*II<<"\n"; PHINode *newPHIvalue = PHINode::Create(PN->getType(), 2, PN->getName().str(), splitBB->getFirstInsertionPt()); int IDX = PN->getBasicBlockIndex(splitBB); while (IDX != -1) { Value *oldValue = PN->getIncomingValue((unsigned(IDX))); PN->removeIncomingValue(IDX, false); newPHIvalue->addIncoming(oldValue, loopLatchBB); newPHIvalue->addIncoming(PN, loopHeaderBB); IDX = PN->getBasicBlockIndex(splitBB); } } On Wed, Mar 7, 2012 at 4:04 PM, Ryan Taylor wrote: > I have attached a case of what I am trying to do, I'm pretty sure I'm just > missing some simple API call. In the cfg you can see that although Im > setting "lsr.iv441" as "lsr.iv44" from for.body.387.i it's not propagating > that through the block or graph. > > > On Wed, Mar 7, 2012 at 12:03 PM, Ryan Taylor wrote: > >> I am splitting a one BB loop into two BB. >> >> Basically, the one loop BB has 3 incoming values, one form back edge two >> from other edges. I want to extract the PHIs from the other two edges out >> into it's own BB and delete that from the loop, then redirect the backedge >> to the loopbody (non extracted portion) and create a new PHI coming from >> the extracted BB and the backedge. >> >> I can do this; however, the PHIs following in all the other BBs are not >> getting updated, neither are the statements in the loopbody. >> >> What is the easieset way to propagate these changes downward? >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120307/33f31947/attachment-0001.html From meadori at gmail.com Wed Mar 7 18:32:35 2012 From: meadori at gmail.com (Meador Inge) Date: Wed, 7 Mar 2012 18:32:35 -0600 Subject: [LLVMdev] x86-64 sign extension for parameters and return values In-Reply-To: <9B07AD09-5165-459A-843C-2713D469D9EB@apple.com> References: <9B07AD09-5165-459A-843C-2713D469D9EB@apple.com> Message-ID: On Wed, Mar 7, 2012 at 5:51 PM, Evan Cheng wrote: > Hi Meador, > > Have you filed a bugzilla report? What's the PR number? http://llvm.org/bugs/show_bug.cgi?id=12207 -- Meador From pranavb at codeaurora.org Wed Mar 7 19:19:16 2012 From: pranavb at codeaurora.org (Pranav Bhandarkar) Date: Wed, 7 Mar 2012 19:19:16 -0600 Subject: [LLVMdev] A question about DBG_VALUE and Frame Index Message-ID: <000001ccfcc9$7b495010$71dbf030$@org> Hi, I have a case that is causing me grief in the form of an assert. The prolog Epilog inserter tries to remove Frame Index references. I have a DBG_VALUE instruction that looks like this (alongwith the Frame Index). This is for the Hexagon backend. ************************** fi#2: size=4, align=4, at location [SP-84] DBG_VALUE , 0, !"fooBar"; line no:299 ************************** Clearly, the FI in question is at an offset of -84 from the SP at entry to the function i.e. FP - 84. So I remove the FI by changing the instruction to. ************************** DBG_VALUE %R30, -84, !"fooBar"; line no:299 ************************** (R30 is the frame pointer register in Hexagon.) So, logically we have moved from frame indices to actually base + offset representation. However the assembly printer, while trying to emit debug info, sticks to the frame index representation and looks for a base+offset reference for -84 !! This is at DwarfCompileUnit.cpp:1334 ************************** int Offset = TFI->getFrameIndexReference(*Asm->MF, DVInsn->getOperand(1).getImm(), FrameReg); ************************** In my view we have lost information that (R30-84) is . The above statement is asking the Frame Lowering Information to give it a base+offset pair for the frame index -84. I do not think this is correct or am I missing something here ? For the sake of completeness, I must mention that Hexagon uses the base class version of getFrameIndexReference and does not provide its own. Pranav Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. From rjmccall at apple.com Wed Mar 7 20:25:14 2012 From: rjmccall at apple.com (John McCall) Date: Wed, 07 Mar 2012 18:25:14 -0800 Subject: [LLVMdev] [cfe-dev] Microsoft constructors implementation problem. In-Reply-To: <4F547BE6.1060006@gmail.com> References: <4F4B6C08.1080104@gmail.com> <4F547BE6.1060006@gmail.com> Message-ID: <74333B5B-4B2F-42A8-8EE3-B88556ACAEFD@apple.com> On Mar 5, 2012, at 12:40 AM, r4start wrote: > I have another question. > If ctor was called from other ctor then additional parameter must be > equal 0 otherwise it`s equal 1. The rule isn't "Is this constructor being called from another constructor?", it's "Is this constructor being used to initialize a base subobject?". That's equivalent to the Itanium ABI's concept of a constructor variant. EmitCXXConstructorCall gets this information already. John. From johnso87 at crhc.illinois.edu Wed Mar 7 20:28:08 2012 From: johnso87 at crhc.illinois.edu (Matt Johnson) Date: Wed, 7 Mar 2012 20:28:08 -0600 Subject: [LLVMdev] "Machine LICM" for Constants? In-Reply-To: <0799047A-66AF-4002-9935-B864E257B5C7@apple.com> References: <4F57AB09.2000101@crhc.illinois.edu> <0799047A-66AF-4002-9935-B864E257B5C7@apple.com> Message-ID: <4F581938.3030704@crhc.illinois.edu> Thanks for the tip! I looked into it and it looks like the problem as of SVN HEAD is that the lui and ori instructions in Mips are considered cheap (1-cycle def-use latency) by MachineLICM::IsCheapInstruction(), but are not trivially materializable because their register operands are not always available. This makes MachineLICM::IsProfitableToHoist() return false, preventing the hoist even though MachineLICM::IsLoopInvariantInst() returns true. The comment in IsProfitableToHoist() is: // If the instruction is cheap, only hoist if it is re-materilizable [sic]. LICM // will increase register pressure. It's probably not worth it if the // instruction is cheap. The function then proceeds to actually *estimate* register pressure for non-cheap instructions to determine whether or not to hoist them. This heuristic seems reasonable, but doesn't seem to do the right thing in this case. Hacking the instruction itineraries to make the instructions not seem cheap doesn't seem like the right answer either. I'm guessing the motivation for this heuristic is that, in a loop with many possible hoists, some cheap and some expensive, we would prefer to hoist the expensive ones rather than wasting all our register slack on the cheap ones. Is there another way to accomplish this goal while still performing the hoist in situations where register pressure is low enough? Say, considering the instructions in a loop for hoisting in descending order of cost, rather than in program order? Note that ARM gets around this by creating a pseudo-instruction for 32-bit immediate loads (MOVi32imm) , rather than putting a pattern directly in ARMInstrInfo.td. This fused instruction *is* rematerializable (since it defines the entire register), even though either of the two half-register instructions by themselves cannot be. This is one way my target and Mips could hack around the problem, but for my target at least it has the disadvantage of having to add an ExpandPseudo pass to my backend and put logic in C++ that seems (IMO) to belong in TableGen. -Matt On 03/07/2012 02:45 PM, Evan Cheng wrote: > Yes machine-licm can and should hoist constant materialization instructions out of the loop. If it's not doing that, it's probably because the target is not modeling the instruction correctly. I would walk through MachineLICM::IsLoopInvariantInst() in the debugger to figure it out. You can also try compiling the same bitcode for a target like ARM or X86 as a comparison. > > Evan > > On Mar 7, 2012, at 10:38 AM, Matt Johnson wrote: > >> Hi All, >> I work on a backend for a target similar to Mips, where large >> immediates are loaded into registers with 2 instructions, 1 to load the >> MSBits and 1 to load the LSBits. I've noticed a recurring pattern >> where, despite low register pressure, these constants will be >> rematerialized in every iteration of a loop, rather than being hoisted. >> Here's an example using the mips-unknown-unknown target and Clang/LLVM >> HEAD. From newlib's implementation of strncat: >> >> #define DETECTNULL(X) (((X) - 0x01010101)& ~(X)& 0x80808080) >> while (!DETECTNULL (*aligned_s1)) >> aligned_s1++; >> >> This loop gets lowered under -O3 to: >> >> $BB0_5: >> lui $3, 32896 >> lui $7, 65278 >> ori $3, $3, 32896 ###### Materialize 0x80808080 >> lw $8, 4($2) >> nop >> and $9, $8, $3 >> ori $7, $7, 65279 ###### Materialize -(0x01010101) >> addiu $2, $2, 4 >> xor $3, $9, $3 >> addu $7, $8, $7 >> and $3, $3, $7 >> beq $3, $zero, $BB0_5 >> >> >> There are a ton of unused caller-saved registers in this small function, >> so I expected the constant materialization to be hoisted out of the >> tight loop. I'm still learning about the new register allocator and am >> not immediately able to make sense of its debug output (and the >> 'problem' may be elsewhere in any case). I'm happy to post the results >> of -debug-only regalloc if they're useful. >> >> Is my desire to hoist the constants out of the loop reasonable? Is >> there something I can do (hints or passes in my backend, clang/opt flag, >> etc.) to make this happen today? If not, what is the root cause? Maybe >> there's no way to hoist things out of a loop once IR is lowered into a >> SelectionDAG? >> >> Thanks, >> Matt >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From zarzycki at apple.com Wed Mar 7 23:20:59 2012 From: zarzycki at apple.com (Dave Zarzycki) Date: Wed, 07 Mar 2012 21:20:59 -0800 Subject: [LLVMdev] Job Opening: Runtime Engineer Message-ID: <6466196B-F0EC-4F03-8417-3B27B53D74EE@apple.com> Low-Level Runtime Engineer The Apple compiler organization is seeking an engineer who is strongly motivated to build fundamental runtime libraries for both iOS and OS X. Our team defines and evolves the C, C++, and Objective-C standard libraries at Apple; as well as common shared resources such as the dynamic linker and various ABI support libraries. We directly contribute to the LLVM project with the libc++, libc++abi, and lld projects. As a key member of the Apple Runtime Team, you will apply your strong background and experience towards the deployment of ground breaking new language and runtime features as well as lead the continued evolution of existing features in ways that surprise and delight our developers. You will join a small team of highly motivated senior engineers who build first-class runtime libraries and apply them in new and innovative ways. Required experience: * Ideal candidate will have experience with similar fundamental software * Very strong assembly and C++ skills * Strong background in OS design, algorithms, and data structures * Strong communication and teamwork skills * Experience with library/compiler interactions and platform ABIs * Knowledge of object file formats and semantics * Knowledge of Objective-C specific features is a plus * Knowledge of low-level Apple specific runtime features is a plus Job #13014171 -- Dave Zarzycki (zarzycki at apple.com) Low-Level Runtime Manager Apple Inc. From STPWORLD at narod.ru Wed Mar 7 23:40:48 2012 From: STPWORLD at narod.ru (Stepan Dyatkovskiy) Date: Thu, 08 Mar 2012 09:40:48 +0400 Subject: [LLVMdev] How to unroll loop with non-constant boundary In-Reply-To: <4F571A31.5080309@mxc.ca> References: <4247593E-2A36-48CA-8FF1-31EB511282DB@googlemail.com> <4F4BD6C1.2080303@free.fr> <439111331106885@web46.yandex.ru> <4F571A31.5080309@mxc.ca> Message-ID: <415361331185248@web3.yandex.ru> I treat "X s<= Y + Step" as "X s<= Y + 1". If X == Y (even if Y == INT_MAX) only two cases are possible: 1. Step == 0. Means infinite loop. Could not be unrolled. 2. Step != 0. Loop with only iteration. May be unrolled. I forgot to fix comments "X s<= Y+1" with the "X s<= Y+Step". Sorry. Fixed. -Stepan. 07.03.2012, 12:20, "Nick Lewycky" : > Stepan Dyatkovskiy wrote: > >> ?Hi guys, >> ?I attached the modified patch that handles cases with low==end and stride!=1. > > I don't see how this could be correct. Your patch treats 'X s<= Y' as 'X > s< Y+1', which is incorrect when Y is INT_MAX. Wouldn't that turn an > infinite loop into a zero-trip loop? > > To be clear, this is the flaw in Benjamin's patch which you appear to > have extended that makes it unsuitable for committing to the compiler. > It would cause miscompiles. > > That said, I haven't had the chance to look into fixing this, but > wouldn't testing AddRec for nsw/nuw be an easy fix that would salvage > the rest of the logic? > > Nick > >> ?Please find it for review. >> >> ?-Stepan >> >> ?28.02.2012, 17:41, "Benjamin Kramer": >>> ?On 27.02.2012, at 20:17, Duncan Sands wrote: >>>> ???Hi Benjamin, >>>>> ???LLVM misses this optimization because ScalarEvolution's ComputeExitLimitFromICmp doesn't handle signed<= (SLE) and thus can't compute the number of times the loop is executed. I wonder if there's a reason for this, it seems like something simple to add. >>>> ???instsimplify could also be enhanced to clean it up in this particular case, but >>>> ???it would be better to make scev smarter. >>> ?I filed http://llvm.org/bugs/show_bug.cgi?id=12110 to track this. >>> >>> ?- Ben >>>> ???Ciao, Duncan. >>>> ???_______________________________________________ >>>> ???LLVM Developers mailing list >>>> ???LLVMdev at cs.uiuc.edu ????????http://llvm.cs.uiuc.edu >>>> ???http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> ?_______________________________________________ >>> ?LLVM Developers mailing list >>> ?LLVMdev at cs.uiuc.edu ????????http://llvm.cs.uiuc.edu >>> ?http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> ?_______________________________________________ >>> ?LLVM Developers mailing list >>> ?LLVMdev at cs.uiuc.edu ????????http://llvm.cs.uiuc.edu >>> ?http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- A non-text attachment was scrubbed... Name: pr12110.patch Type: application/octet-stream Size: 5369 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120308/e0fa5cc5/attachment.obj From evan.cheng at apple.com Thu Mar 8 00:57:40 2012 From: evan.cheng at apple.com (Evan Cheng) Date: Wed, 07 Mar 2012 22:57:40 -0800 Subject: [LLVMdev] "Machine LICM" for Constants? In-Reply-To: <4F581938.3030704@crhc.illinois.edu> References: <4F57AB09.2000101@crhc.illinois.edu> <0799047A-66AF-4002-9935-B864E257B5C7@apple.com> <4F581938.3030704@crhc.illinois.edu> Message-ID: On Mar 7, 2012, at 6:28 PM, Matt Johnson wrote: > Thanks for the tip! I looked into it and it looks like the problem as of SVN HEAD is that the lui and ori instructions in Mips are considered cheap (1-cycle def-use latency) by MachineLICM::IsCheapInstruction(), but are not trivially materializable because their register operands are not always available. This makes MachineLICM::IsProfitableToHoist() return false, preventing the hoist even though MachineLICM::IsLoopInvariantInst() returns true. > > The comment in IsProfitableToHoist() is: > > // If the instruction is cheap, only hoist if it is re-materilizable [sic]. LICM > // will increase register pressure. It's probably not worth it if the > // instruction is cheap. > > The function then proceeds to actually *estimate* register pressure for non-cheap instructions to determine whether or not to hoist them. > This heuristic seems reasonable, but doesn't seem to do the right thing in this case. Hacking the instruction itineraries to make the instructions not seem cheap doesn't seem like the right answer either. I'm guessing the motivation for this heuristic is that, in a loop with many possible hoists, some cheap and some expensive, we would prefer to hoist the expensive ones rather than wasting all our register slack on the cheap ones. > > Is there another way to accomplish this goal while still performing the hoist in situations where register pressure is low enough? Say, considering the instructions in a loop for hoisting in descending order of cost, rather than in program order? > > Note that ARM gets around this by creating a pseudo-instruction for 32-bit immediate loads (MOVi32imm) , rather than putting a pattern directly in ARMInstrInfo.td. This fused instruction *is* rematerializable (since it defines the entire register), even though either of the two half-register instructions by themselves cannot be. This is one way my target and Mips could hack around the problem, but for my target at least it has the disadvantage of having to add an ExpandPseudo pass to my backend and put logic in C++ that seems (IMO) to belong in TableGen. The pseudo instruction approach is the only quick solution I can think of. MachineLICM is designed to be conservative so I don't think we should change the register pressure heuristics. Evan > > -Matt > > On 03/07/2012 02:45 PM, Evan Cheng wrote: >> Yes machine-licm can and should hoist constant materialization instructions out of the loop. If it's not doing that, it's probably because the target is not modeling the instruction correctly. I would walk through MachineLICM::IsLoopInvariantInst() in the debugger to figure it out. You can also try compiling the same bitcode for a target like ARM or X86 as a comparison. >> >> Evan >> >> On Mar 7, 2012, at 10:38 AM, Matt Johnson wrote: >> >>> Hi All, >>> I work on a backend for a target similar to Mips, where large >>> immediates are loaded into registers with 2 instructions, 1 to load the >>> MSBits and 1 to load the LSBits. I've noticed a recurring pattern >>> where, despite low register pressure, these constants will be >>> rematerialized in every iteration of a loop, rather than being hoisted. >>> Here's an example using the mips-unknown-unknown target and Clang/LLVM >>> HEAD. From newlib's implementation of strncat: >>> >>> #define DETECTNULL(X) (((X) - 0x01010101)& ~(X)& 0x80808080) >>> while (!DETECTNULL (*aligned_s1)) >>> aligned_s1++; >>> >>> This loop gets lowered under -O3 to: >>> >>> $BB0_5: >>> lui $3, 32896 >>> lui $7, 65278 >>> ori $3, $3, 32896 ###### Materialize 0x80808080 >>> lw $8, 4($2) >>> nop >>> and $9, $8, $3 >>> ori $7, $7, 65279 ###### Materialize -(0x01010101) >>> addiu $2, $2, 4 >>> xor $3, $9, $3 >>> addu $7, $8, $7 >>> and $3, $3, $7 >>> beq $3, $zero, $BB0_5 >>> >>> >>> There are a ton of unused caller-saved registers in this small function, >>> so I expected the constant materialization to be hoisted out of the >>> tight loop. I'm still learning about the new register allocator and am >>> not immediately able to make sense of its debug output (and the >>> 'problem' may be elsewhere in any case). I'm happy to post the results >>> of -debug-only regalloc if they're useful. >>> >>> Is my desire to hoist the constants out of the loop reasonable? Is >>> there something I can do (hints or passes in my backend, clang/opt flag, >>> etc.) to make this happen today? If not, what is the root cause? Maybe >>> there's no way to hoist things out of a loop once IR is lowered into a >>> SelectionDAG? >>> >>> Thanks, >>> Matt >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From viral at mayin.org Thu Mar 8 07:04:42 2012 From: viral at mayin.org (Viral Shah) Date: Thu, 8 Mar 2012 18:34:42 +0530 Subject: [LLVMdev] Introducing julia, and gauging interest in a julia BOF session at the upcoming LLVM conference in London Message-ID: <23793385-DB60-4E87-94C1-3045266E19FD@mayin.org> Folks, We are contemplating holding a Birds of a Feather session titled "Julia and LLVM: Implementing a fast dynamic language for technical computing" at the LLVM 2012 European Conference on April 12-13 in London. http://llvm.org/devmtg/2012-04-12/ Would this be of interest to the LLVM developer and user community? It would be great if you could drop me a line. It will help us gauge the interest and decide if we should hold the session or not. A little bit about julia: Julia is an open source language for technical computing that strives to be in the same class of productivity as Matlab, R, python+numpy, etc., but targets the performance of C and Fortran. It is due to LLVM that julia has been able to achieve such good performance (in my opinion), with relatively little effort in a short amount of time. While we have not been active on the LLVM mailing list, we look forward to every new release, and the goodies it brings. We are really looking forward to integrating polly, GPU, autovectorization capabilities, etc. Julia has been in development by a core group consisting of our compiler writer Jeff Bezanson, with Stefan Karpinski and myself contributing the runtime and libraries, and Prof. Alan Edelman focussing on numerical accuracy. We do look forward to meeting with all the folks behind LLVM and the community at large. Julia was quietly announced in the release notes of LLVM 3.0. It was officially announced only recently in a blog post on Feb 14, 2012. Our "Why Julia" blog post links to various blogs and discussion forums: http://julialang.org/blog/2012/02/why-we-created-julia/ The github site for julia development has also attracted a number of developers and we are hoping that this is the start of a great community: https://github.com/juliaLang/julia/ Do hop on to our website, where we show some simple micro-benchmarks comparing julia to other languages: http://www.julialang.org/ Thanks, -viral From borya.043 at gmail.com Wed Mar 7 22:17:54 2012 From: borya.043 at gmail.com (borya043) Date: Wed, 7 Mar 2012 20:17:54 -0800 (PST) Subject: [LLVMdev] Alias analysis result In-Reply-To: References: Message-ID: <33462761.post@talk.nabble.com> That's the reason I have defined getAnalysisUsage method. Isn't that the right way to do it? borya043 wrote: > > Hello everyone, > I am trying to find the alias between a store instruction's pointer > operand > and function arguments. This is the code, > virtual void getAnalysisUsage(AnalysisUsage &AU) const { > > AU.addRequiredTransitive(); > AU.addPreserved(); > } > virtual bool runOnFunction(Function &F) { > > > AliasAnalysis &AA = getAnalysis(); > > for(Function::iterator i=F.begin();i!=F.end();++i){ > for(BasicBlock::iterator j=i->begin();j!=i->end();++j) > { > > > if(dyn_cast(j)){ > > const StoreInst *SI=dyn_cast(j); > > AliasAnalysis::Location LocA = > AA.getLocation(SI); > > > const Value *si_v= SI->getPointerOperand(); > > for(Function::arg_iterator k=F.arg_begin(); > k!=F.arg_end();++k) > { > > > Value *v=dyn_cast(k); > > AliasAnalysis::Location > loc=AliasAnalysis::Location(v); > AliasAnalysis::AliasResult ar=AA.alias(LocA,loc); > > switch(ar) > { > case 0:errs()<< "NoAlias\n"; > break; > ///< No dependencies. > case 1:errs()<<"MayAlias\n"; ///< Anything goes > break; > case 2: errs()<<"PartialAlias\n";///< Pointers > differ, but pointees overlap. > break; > > case 3: errs()<<"MustAlias\n"; > } > > > > > > > } > } > > > > return true; > } > }; > } > > But I get MayAlias result even if the store instruction's pointer operand > is not referencing the function argument. Is there something wrong with > the logic? Are there any files in the LLVM source code that contain code > to > do something similar. > Thanks:) > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -- View this message in context: http://old.nabble.com/Alias-analysis-result-tp33458451p33462761.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From info at swort.eu Thu Mar 8 02:34:04 2012 From: info at swort.eu (derk jochems) Date: Thu, 8 Mar 2012 09:34:04 +0100 Subject: [LLVMdev] Question about the license of LLVM Message-ID: <5DC56B66-307D-4BED-AE76-30D783B197B5@swort.eu> Dear LLVM Team. (this is actually dedicated to commercial IDE's and Compilers using LLVM, which restrict some open-source or commercial usage) IDE (with compiler using LLVM) ==> Software (wants to create IDE with Compiler using LLVM) Agreements state that: "" To create any work that is to be distributed, sold, or provided to a product user that has a primary function of interpreting code using a work created by [Name Removed]. "Primary function" is defined to mean that the work is primarily directed to accepting, creating, or manipulating source code as a primary purpose or critical component of the work. Works that compile, verify, or interpret source code as part of a tangential function of the work or as a feature that assist in the primary function of the software are not prohibited by this section. "" The license of LLVM does not state anything about: - Restrictions (if any) which may not be given by the "using" company/open-source project. A restriction given for example in an End-User-License-Agreement for a Compiler (may be including an IDE) that it's End Users may not develop or deploy projects that do the same thing. Meaning that when i use an IDE which has a Compiler using LLVM lays a restriction that i (as end user of that IDE) may not create an IDE with compiler using LLVM (for example). Will there be away so that these restrictions can't be layer upon the "End User" of that IDE? If LLVM would have a license stating: "Using LLVM, software restrictions to the End Users of software that creates software are not allowed in any way" That would open the world to the end users of products made with LLVM (or using LLVM) which would bring back open-source projects held back by these kind of licenses. It would be helpful to have the LLVM license to begin with opening this so that there is made better use of it's resources. Thanks in advance! Kind Regards, Met vriendelijke groet, Derk Jochems - SWORT The Netherlands -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120308/b8ccb8c8/attachment.html From liuqingrui422 at gmail.com Thu Mar 8 08:22:21 2012 From: liuqingrui422 at gmail.com (Qingrui Liu) Date: Thu, 8 Mar 2012 22:22:21 +0800 Subject: [LLVMdev] fix a "does not name a type" bug in VASTContext.h Message-ID: Hi all, I find a bug in the VASTContext.h of the latest clang. I fixed it and commit a patch for it. As follows: >From 447d31176b513a03b253eb25ef314c2a3c0e428a Mon Sep 17 00:00:00 2001 From: Tsingray Date: Thu, 8 Mar 2012 22:11:54 +0800 Subject: [PATCH] fix a 'does not name a type' bug in VASTContext.h --- include/clang/AST/ASTContext.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/clang/AST/ASTContext.h b/include/clang/AST/ASTContext.h index 3bdac2d..530f957 100644 --- a/include/clang/AST/ASTContext.h +++ b/include/clang/AST/ASTContext.h @@ -480,7 +480,7 @@ public: const FieldDecl *LastFD) const; // Access to the set of methods overridden by the given C++ method. - typedef CXXMethodVector::const_iterator overridden_cxx_method_iterator; + typedef CXXMethodVector::iterator overridden_cxx_method_iterator; overridden_cxx_method_iterator overridden_methods_begin(const CXXMethodDecl *Method) const; -- 1.7.0.4 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120308/7ee1a1f8/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-fix-a-does-not-name-a-type-bug-in-VASTContext.h.patch Type: application/octet-stream Size: 934 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120308/7ee1a1f8/attachment.obj From babslachem at gmail.com Thu Mar 8 10:19:37 2012 From: babslachem at gmail.com (Seb) Date: Thu, 8 Mar 2012 17:19:37 +0100 Subject: [LLVMdev] attribute for disabling fp elim Message-ID: Hi all, Is there a way to specify an attribute in a .ll file so that it will disable fp elim as if llc has been invoked with -disable-fp-elim on command line ? Thanks for your answers Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120308/1d4875f1/attachment.html From joerg at britannica.bec.de Thu Mar 8 10:31:42 2012 From: joerg at britannica.bec.de (Joerg Sonnenberger) Date: Thu, 8 Mar 2012 17:31:42 +0100 Subject: [LLVMdev] Question about the license of LLVM In-Reply-To: <5DC56B66-307D-4BED-AE76-30D783B197B5@swort.eu> References: <5DC56B66-307D-4BED-AE76-30D783B197B5@swort.eu> Message-ID: <20120308163142.GA22775@britannica.bec.de> On Thu, Mar 08, 2012 at 09:34:04AM +0100, derk jochems wrote: > Will there be away so that these restrictions can't be layer upon the "End User" of that IDE? > If LLVM would have a license stating: "Using LLVM, software restrictions to the End Users of software that creates software are not allowed in any way" No. The license is BSD-like and not the GPL. If you want to create closed source code based on LLVM, you can. Consider Apple's Xcode for example. On the other hand, if you do create a closed source product based on LLVM and you don't push changes back upstream, you will have quite some maintainance overhead. LLVM is quite a fast moving target after all. Joerg From andrew at xmos.com Thu Mar 8 10:35:20 2012 From: andrew at xmos.com (Andrew Stanford-Jason) Date: Thu, 8 Mar 2012 16:35:20 +0000 Subject: [LLVMdev] MCInsrAnalysis extansion Message-ID: <4F58DFC8.3030709@xmos.com> Hello, I'm using the MCInsrAnalysis and would like to extend it to have methods like: * bool mayWritePC(MCInstr * Instr); returns true if Inst might write to the PC, i.e. might change the program flow * uint64_t evaluateLoadAddress(MCInstr * Instr, uint64_t Addr, uint64_t Size); returns the address that Instr will load from if can be calculated Does anyone have any thoughts or mind me doing this? Thanks From baldrick at free.fr Thu Mar 8 10:52:51 2012 From: baldrick at free.fr (Duncan Sands) Date: Thu, 08 Mar 2012 17:52:51 +0100 Subject: [LLVMdev] attribute for disabling fp elim In-Reply-To: References: Message-ID: <4F58E3E3.90204@free.fr> Hi Seb, > Is there a way to specify an attribute in a .ll file so that it will disable fp > elim as if llc has been invoked with -disable-fp-elim on command line ? no. Ciao, Duncan. From baldrick at free.fr Thu Mar 8 10:55:40 2012 From: baldrick at free.fr (Duncan Sands) Date: Thu, 08 Mar 2012 17:55:40 +0100 Subject: [LLVMdev] Alias analysis result In-Reply-To: <33462761.post@talk.nabble.com> References: <33462761.post@talk.nabble.com> Message-ID: <4F58E48C.3000405@free.fr> Hi, > That's the reason I have defined getAnalysisUsage method. Isn't that the > right way to do it? no, that gives you access to whatever alias analysis has been computed, but it doesn't specify what kind of alias analysis should be computed (there are several). Try something like this: opt -load=my_pass.so -basic-aa -run_my_pass ... Ciao, Duncan. From stoklund at 2pi.dk Thu Mar 8 10:58:10 2012 From: stoklund at 2pi.dk (Jakob Stoklund Olesen) Date: Thu, 08 Mar 2012 08:58:10 -0800 Subject: [LLVMdev] A question about DBG_VALUE and Frame Index In-Reply-To: <000001ccfcc9$7b495010$71dbf030$@org> References: <000001ccfcc9$7b495010$71dbf030$@org> Message-ID: <914AACD8-F87E-4319-AB85-DB6B21123428@2pi.dk> On Mar 7, 2012, at 5:19 PM, Pranav Bhandarkar wrote: > Hi, > > I have a case that is causing me grief in the form of an assert. The prolog > Epilog inserter tries to remove Frame Index references. I have a DBG_VALUE > instruction that looks like this (alongwith the Frame Index). This is for > the Hexagon backend. > ************************** > fi#2: size=4, align=4, at location [SP-84] > DBG_VALUE , 0, !"fooBar"; line no:299 > ************************** > > Clearly, the FI in question is at an offset of -84 from the SP at entry to > the function i.e. FP - 84. So I remove the FI by changing the instruction > to. > ************************** > DBG_VALUE %R30, -84, !"fooBar"; line no:299 > ************************** > (R30 is the frame pointer register in Hexagon.) The offset field on a DBG_VALUE instruction refers to the user variable, not the first register argument. Your DBG_VALUE above is saying that fooBar[-84] can be found in %R30. You want something like: DBG_VALUE %R30, -84, 0, !"fooBar" That is a target-dependent DBG_VALUE, you will need to implement the target hooks to create and parse it. Target-dependent DBG_VALUE instrs are recognized by having more than 3 operands. /jakob From gavin.har at gmail.com Thu Mar 8 11:23:34 2012 From: gavin.har at gmail.com (Gavin Harrison) Date: Thu, 8 Mar 2012 12:23:34 -0500 Subject: [LLVMdev] -indvars issues? Message-ID: <85F219B4-294E-4C6E-AB22-5DDDAD1CDE1B@gmail.com> Hi, Is the -indvars pass functional? I've done some small test to check it, but this fails to canonicalize: > int *x; > int *y; > int i; > ... > for (i = 1; i < 100; i+=2) { > x[i] = y[i] + 3; > } The IR produced after -indvars: > br label %for.cond > > for.cond: ; preds = %for.inc, %entry > %indvars.iv = phi i64 [ %indvars.iv.next, %for.inc ], [ 1, %entry ] > %0 = trunc i64 %indvars.iv to i32 > %cmp = icmp slt i32 %0, 100 > br i1 %cmp, label %for.body, label %for.end > > for.body: ; preds = %for.cond > %arrayidx = getelementptr inbounds i32* %y, i64 %indvars.iv > %1 = load i32* %arrayidx, align 4 > %add = add nsw i32 %1, 3 > %arrayidx2 = getelementptr inbounds i32* %x, i64 %indvars.iv > store i32 %add, i32* %arrayidx2, align 4 > br label %for.inc > > for.inc: ; preds = %for.body > %indvars.iv.next = add i64 %indvars.iv, 2 > br label %for.cond > > for.end: ; preds = %for.cond Which isn't in canonical form. Is there some trick to getting this pass to work? I've tried adding various other passes ahead of it, like -aa-eval, -scalar-evolution, -mem2reg, -lcssa, -loop-simplify, etc but to no avail. Thank you, Gavin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120308/0aaeb898/attachment.html From ryta1203 at gmail.com Thu Mar 8 12:04:33 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Thu, 8 Mar 2012 10:04:33 -0800 Subject: [LLVMdev] Updating value from PHI In-Reply-To: References: Message-ID: I guess I thought that once I redirected the branches and created new PHIs that LLVM would correct the variable usage when I return true (changed CFG) from the pass. Is this not the case? On Wed, Mar 7, 2012 at 4:08 PM, Ryan Taylor wrote: > Here is the code snippet that I am using to create the PHIs in the loop > according to the PHIs in the new preheader. At this point I have already > redirected the loop backedge and removed the preheader from the loop. > > for (BasicBlock::iterator II = loopHeaderBB->begin(); > (PN=dyn_cast(II)); ++II) { > // remove loop back PHI and add it to split BB > errs()<<*II<<"\n"; > PHINode *newPHIvalue = PHINode::Create(PN->getType(), 2, > PN->getName().str(), splitBB->getFirstInsertionPt()); > int IDX = PN->getBasicBlockIndex(splitBB); > while (IDX != -1) { > Value *oldValue = PN->getIncomingValue((unsigned(IDX))); > PN->removeIncomingValue(IDX, false); > newPHIvalue->addIncoming(oldValue, loopLatchBB); > newPHIvalue->addIncoming(PN, loopHeaderBB); > IDX = PN->getBasicBlockIndex(splitBB); > > } > } > > On Wed, Mar 7, 2012 at 4:04 PM, Ryan Taylor wrote: > >> I have attached a case of what I am trying to do, I'm pretty sure I'm >> just missing some simple API call. In the cfg you can see that although Im >> setting "lsr.iv441" as "lsr.iv44" from for.body.387.i it's not propagating >> that through the block or graph. >> >> >> On Wed, Mar 7, 2012 at 12:03 PM, Ryan Taylor wrote: >> >>> I am splitting a one BB loop into two BB. >>> >>> Basically, the one loop BB has 3 incoming values, one form back edge two >>> from other edges. I want to extract the PHIs from the other two edges out >>> into it's own BB and delete that from the loop, then redirect the backedge >>> to the loopbody (non extracted portion) and create a new PHI coming from >>> the extracted BB and the backedge. >>> >>> I can do this; however, the PHIs following in all the other BBs are not >>> getting updated, neither are the statements in the loopbody. >>> >>> What is the easieset way to propagate these changes downward? >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120308/2b1adad9/attachment.html From nicolas.capens at gmail.com Thu Mar 8 12:14:09 2012 From: nicolas.capens at gmail.com (Nicolas Capens) Date: Thu, 08 Mar 2012 13:14:09 -0500 Subject: [LLVMdev] Scalar replacement of arrays In-Reply-To: References: <4F57C973.4000805@gmail.com> Message-ID: <4F58F6F1.10401@gmail.com> Hi Eli, I was surprised to see that you're getting more optimal code. But I'm using the JIT so I realized there must be some extra optimization passes in -O2 which I wasn't using. It turns out that a second scalarrepl followed by ADCE did the trick for that small sample. Initially it didn't work for my larger project though. So I dug a little deeper and found out that scalarrepl uses a default threshold of 128 bytes for arrays. Increasing that to the size of the virtual processor's register array made it work as expected! That is, as long as the array isn't being dynamically indexed. In any case this is a little more promising than I thought since scalarrepl does handle arrays. However, I'm not sure if that's going to help achieve optimal code for when the array is sometimes being dynamically indexed. Essentially there should be some kind of store to load copy propagation. As far as I know that's exactly what mem2reg does, except that it only considers scalars and not elements of arrays. So would it be hard to extend mem2reg to also consider elements of arrays for promotion? It should obviously not perform the promotion when in between the store and load there's a dynamically indexed access to the array. Correct me if I'm wrong, but that seems it would be superior to scalarrepl itself (for arrays). Is there anyone experienced with mem2reg who wants to implement this? If not, any advice on how to best approach this? Thanks, Nicolas On 07/03/2012 4:00 PM, Eli Friedman wrote: > On Wed, Mar 7, 2012 at 12:47 PM, Nicolas Capens > wrote: >> Hi all, >> >> I'm implementing a virtual processor which features dynamic register >> indexing, and I'm struggling to make LLVM 3.0 produce good code for it. >> The register set is implemented as an LLVM array so it can be >> dynamically indexed using GEP. However, most of the time the virtual >> processor's registers are just statically indexed, and so I >> expected/hoped the code would be as optimal as when the virtual >> registers are implemented using individual scalars, which are allocated >> to the target machine's physical registers as much as possible. But that >> turns out not to be the case and I end up with code which constantly >> reads and writes memory to access my virtual registers. >> >> The "Scalar Replacement of Aggregates" pass (scalarrepl) seems to be >> capable of splitting structures into separate fields so that mem2reg can >> produce efficient code which avoids redundant memory operations. But it >> skips my array entirely. Here's a small piece of C code which >> illustrates the problem: >> >> int foo(int x, int y) >> { >> int r[2]; >> r[0] = x; >> r[1] = y; >> r[0] = r[0] + r[1]; >> return r[0]; >> } > clang -O2 for that C code gives: > > pushl %ebp > movl %esp, %ebp > movl 12(%ebp), %eax > addl 8(%ebp), %eax > popl %ebp > ret > > >> If I replace the array with two individual scalars, I get the following >> perfect result instead: >> >> mov eax,dword ptr [esp+8] >> add eax,dword ptr [esp+4] >> ret >> >> Unfortunately, I don't think that having scalarrepl handle arrays will >> do the trick. It will work for the above trivial example, but my array >> of registers does get indexed dynamically from time to time, and this >> would completely prevent scalarrepl from doing anything, right? > Yes; you wouldn't really want it to try. > >> Ideally LLVM should keep things in physical registers as long as >> possible, and when the virtual register array is being dynamically >> indexed it should write the physical registers back to the array... >> >> So does anyone know if this can already be achieved using some other >> passes or settings? If not, what would be the best approach to implement it? > Conceptually, we ought to be able to handle that sort of issue with a > combination of GVN and dead store elimination (DSE). Unfortunately, > LLVM's DSE pass is rather weak. so that approach might not be so > effective in practice. > > -Eli From canarbekmatay at yahoo.com Thu Mar 8 12:53:07 2012 From: canarbekmatay at yahoo.com (janarbek) Date: Thu, 8 Mar 2012 10:53:07 -0800 (PST) Subject: [LLVMdev] CDFG (Controil Data Flow Graph) with LLVM Message-ID: <1331232787.35801.YahooMailClassic@web110204.mail.gq1.yahoo.com> Hi All, I am wondering if there is anyone who is working on generating Control/Data Flow Graph generation from C code ?? Can anyone point me where I should look in order to generate CDFG ? I am planning to develop one for myself. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120308/29cf0563/attachment.html From pranavb at codeaurora.org Thu Mar 8 14:08:02 2012 From: pranavb at codeaurora.org (Pranav Bhandarkar) Date: Thu, 8 Mar 2012 14:08:02 -0600 Subject: [LLVMdev] A question about DBG_VALUE and Frame Index In-Reply-To: <914AACD8-F87E-4319-AB85-DB6B21123428@2pi.dk> References: <000001ccfcc9$7b495010$71dbf030$@org> <914AACD8-F87E-4319-AB85-DB6B21123428@2pi.dk> Message-ID: <000601ccfd67$2ae8f230$80bad690$@org> > The offset field on a DBG_VALUE instruction refers to the user > variable, not the first register argument. > > Your DBG_VALUE above is saying that fooBar[-84] can be found in %R30. > > You want something like: > > DBG_VALUE %R30, -84, 0, !"fooBar" > > That is a target-dependent DBG_VALUE, you will need to implement the > target hooks to create and parse it. Target-dependent DBG_VALUE instrs > are recognized by having more than 3 operands. > > /jakob Right, thanks for the information Jakob. I'll dig through the source code / documentation, looking for the necessary target hooks and bother folks on this list, if I have questions regarding this. Thanks again, Pranav From vhscampos at gmail.com Thu Mar 8 14:59:38 2012 From: vhscampos at gmail.com (Victor Campos) Date: Thu, 8 Mar 2012 17:59:38 -0300 Subject: [LLVMdev] Updating value from PHI In-Reply-To: References: Message-ID: >From my experience, this is not the case really. What you can do is call 'replaceUsesOfWith' on every user of the PHI'ed variable (lsr.iv44) that is inside the dominator tree of the new basicblock (for.body.387.i.split). LLVM has a dominator tree analysis that you can use to do it. 2012/3/8 Ryan Taylor > I guess I thought that once I redirected the branches and created new PHIs > that LLVM would correct the variable usage when I return true (changed CFG) > from the pass. Is this not the case? > > > On Wed, Mar 7, 2012 at 4:08 PM, Ryan Taylor wrote: > >> Here is the code snippet that I am using to create the PHIs in the loop >> according to the PHIs in the new preheader. At this point I have already >> redirected the loop backedge and removed the preheader from the loop. >> >> for (BasicBlock::iterator II = loopHeaderBB->begin(); >> (PN=dyn_cast(II)); ++II) { >> // remove loop back PHI and add it to split BB >> errs()<<*II<<"\n"; >> PHINode *newPHIvalue = PHINode::Create(PN->getType(), 2, >> PN->getName().str(), splitBB->getFirstInsertionPt()); >> int IDX = PN->getBasicBlockIndex(splitBB); >> while (IDX != -1) { >> Value *oldValue = PN->getIncomingValue((unsigned(IDX))); >> PN->removeIncomingValue(IDX, false); >> newPHIvalue->addIncoming(oldValue, loopLatchBB); >> newPHIvalue->addIncoming(PN, loopHeaderBB); >> IDX = PN->getBasicBlockIndex(splitBB); >> >> } >> } >> >> On Wed, Mar 7, 2012 at 4:04 PM, Ryan Taylor wrote: >> >>> I have attached a case of what I am trying to do, I'm pretty sure I'm >>> just missing some simple API call. In the cfg you can see that although Im >>> setting "lsr.iv441" as "lsr.iv44" from for.body.387.i it's not propagating >>> that through the block or graph. >>> >>> >>> On Wed, Mar 7, 2012 at 12:03 PM, Ryan Taylor wrote: >>> >>>> I am splitting a one BB loop into two BB. >>>> >>>> Basically, the one loop BB has 3 incoming values, one form back edge >>>> two from other edges. I want to extract the PHIs from the other two edges >>>> out into it's own BB and delete that from the loop, then redirect the >>>> backedge to the loopbody (non extracted portion) and create a new PHI >>>> coming from the extracted BB and the backedge. >>>> >>>> I can do this; however, the PHIs following in all the other BBs are not >>>> getting updated, neither are the statements in the loopbody. >>>> >>>> What is the easieset way to propagate these changes downward? >>>> >>> >>> >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120308/953ea5f2/attachment.html From dekruijf at cs.wisc.edu Thu Mar 8 15:28:57 2012 From: dekruijf at cs.wisc.edu (Marc de Kruijf) Date: Thu, 8 Mar 2012 15:28:57 -0600 Subject: [LLVMdev] Updating value from PHI In-Reply-To: References: Message-ID: It sounds like Transforms/Utils/SSAUpdater may be what you are looking for. A good example of how to use it -- one that sounds very similar to what you're doing -- can be found in Transforms/Scalar/LoopRotation.cpp On Wed, Mar 7, 2012 at 2:03 PM, Ryan Taylor wrote: > I am splitting a one BB loop into two BB. > > Basically, the one loop BB has 3 incoming values, one form back edge two > from other edges. I want to extract the PHIs from the other two edges out > into it's own BB and delete that from the loop, then redirect the backedge > to the loopbody (non extracted portion) and create a new PHI coming from > the extracted BB and the backedge. > > I can do this; however, the PHIs following in all the other BBs are not > getting updated, neither are the statements in the loopbody. > > What is the easieset way to propagate these changes downward? > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120308/3532603d/attachment.html From jhw at conjury.org Thu Mar 8 16:17:26 2012 From: jhw at conjury.org (james woodyatt) Date: Thu, 8 Mar 2012 14:17:26 -0800 Subject: [LLVMdev] Tail Call Optimization In-Reply-To: References: Message-ID: On Feb 29, 2012, at 00:21 , David Rogers wrote: > Compiling 3.0 (but not 2.8 or earlier) I also had to run: > touch llvm-3.0.src/bindings/ocaml/llvm/Release/META.llvm > to passify make install, since it tried to install metadata, but > didn't have any. I found the following patch made my life easier... diff -r d9c523bb68e7 -r 4052755f0c5b bindings/ocaml/llvm/Makefile --- a/bindings/ocaml/llvm/Makefile Wed Mar 07 15:25:01 2012 -0800 +++ b/bindings/ocaml/llvm/Makefile Wed Mar 07 21:38:56 2012 -0800 @@ -27,7 +27,7 @@ # Easy way of generating META in the objdir copy-meta: $(OcamlDir)/META.llvm -$(OcamlDir)/META.llvm: META.llvm +$(ObjDir)/META.llvm: META.llvm $(Verb) $(CP) -f $< $@ install-meta:: $(ObjDir)/META.llvm -- j h woodyatt From pranavb at codeaurora.org Thu Mar 8 16:29:40 2012 From: pranavb at codeaurora.org (Pranav Bhandarkar) Date: Thu, 8 Mar 2012 16:29:40 -0600 Subject: [LLVMdev] Dwarf info and .debug_pubnames section Message-ID: <000701ccfd7a$f43b5bb0$dcb21310$@org> Hi All, I just enabled the generation of dwarf debugging information for Hexagon. It did not require much save for the setting of a flag in MCAsmInfo. However, now I see that the ".debug_pubnames" sections is not generated. I did read discussion about the section not really being useful for debuggers in terms of accelerated access, but I have code that uses libdwarf to check for global variables. The particular libdwarf API queries the ".debug_pubnames" This code is no longer functional because my LLVM generated executables do not seem to have the .debug_pubnames sections. From the discussion I did not gather that the sections would be removed completely. Is that the case ? TIA, Pranav Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. From pogo.work at gmail.com Thu Mar 8 16:59:57 2012 From: pogo.work at gmail.com (Paul Robinson) Date: Thu, 8 Mar 2012 14:59:57 -0800 Subject: [LLVMdev] Problem with x86 32-bit debug information ? In-Reply-To: References: <4f576184.04e7d80a.7225.194cSMTPIN_ADDED@mx.google.com> <4f576fad.a705b40a.44bd.ffffcc63SMTPIN_ADDED@mx.google.com> Message-ID: On Wed, Mar 7, 2012 at 6:50 AM, Seb wrote: > Hi James, > > I fully agree with you and understand your statement about -O2. > > Now some questions for you: > Did you try to reproduce experiments described in my previous e-mail ? > Did you look at debug informations generated for 'n' parameter on x86 32-bit > & x86 64-bit ? > I'm working on my own front-end for LLVM and I had difficulties with debug > information when they are related to x86 32-bits. So far there are two > options: > 1) metadata that I generate are incorrect. > 2) LLVM is not handling in a correct manner those metadata for x86 32-bit > target. > I've already posted problem related to metadata that I generate and they are > in LLVM 2.9 format. I've been adviced to move to most recent format. Before > starting any move into that direction, I would like to be sure that LLVM > trunk could solve the problem. Using clang at -O2 -g is giving me some > indication that it won't solve my problem and that we are failing into > option (2). > So to summarize, I would be nice if someone can confirm that debug > informations generated on this specific case are correct for x86 32-bit and > that I would have to deal with that. > > Thanks > Best Regards > Seb > > 2012/3/7 James Molloy >> >> Hi Seb, >> >> >> >> I?m going to reiterate ? Clang can decide when it wants to optimise away a >> variable. You asked for that behaviour when you specified ?O2. You can?t >> expect deterministically the same behaviour on both x86 and x86-64 platforms >> ? the procedure call standards are different and different decisions go in >> to deciding how to optimise. >> >> >> >> You can?t expect debug information for an optimised build to fully track >> that of the source because by definition the source is being modified to >> optimise. >> >> >> >> Cheers, >> >> >> >> James >> >> >> >> From: Seb [mailto:babslachem at gmail.com] >> >> Sent: 07 March 2012 13:37 >> To: James Molloy >> Cc: llvmdev at cs.uiuc.edu >> Subject: Re: [LLVMdev] Problem with x86 32-bit debug information ? >> >> >> >> Hi James, >> >> clang is able to generate correct debug informations for 64-bit target at >> -O2. My feeling, given some other experiments I've done,? is that debug >> information generated for x86 32-bit might be broken for parameters as long >> as they are not 'homed' in the code (local copy to an automatic variable). >> It seems that when llvm.declare is turned into a llvm.value for parameter >> there is something incorrect with respect to parameters debug informations >> that is generated by clang/llvm. I just would like confirmation of this. >> >> Thanks for your answer >> Best Regards >> Seb >> >> 2012/3/7 James Molloy >> >> Hi Seb, >> >> >> >> Clang cannot generate debug information for something that it has >> optimised away. You should reduce the optimisation level. >> >> >> >> In general debug information is only really accurate at ?O0. >> >> >> >> Cheers, >> >> >> >> James >> >> >> >> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On >> Behalf Of Seb >> Sent: 07 March 2012 13:17 >> To: llvmdev at cs.uiuc.edu >> Subject: [LLVMdev] Problem with x86 32-bit debug information ? >> >> >> >> Hi all, >> >> I'm using trunk version of LLVM/CLANG. >> When I compile attached files on my 64-bit Ubuntu 10.04 LTS system as >> follows: >> >> clang -O2 -g check.c main.c -o check64 >> >> When I do gdb check64 and set a breakpoint to the check routine and >> executes to the breakpoint, I've got: >> >> Breakpoint 1, check (result=0x601110, expect=0x601020, n=53) at check.c:7 >> 7??? { >> >> As you can see I can inspect 'n' value. >> >> Now if I compile for x86 32-bit as follows: >> >> clang -m32 -O2 -g check.c main.c -o check32 >> >> When I do gdb check32 and set a breakpoint to the check routine and >> executes to the breakpoint, I've got: >> >> Breakpoint 1, check (result=, >> ??? expect=, n=0) at check.c:7 >> 7??? { >> >> As you can see I can NOT inspect 'n' value. Is there a way to inforce even >> at -O2 clang to generate debug informations so that I can inspect 'n' value >> ? >> Or is it a BUG from clang for x86 32-bit ? >> Thanks for your answers. >> Best Regards >> Seb >> >> > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > I do have to take exception to James Molloy's assessment of the variable "n" as "optimized away" because the debug info clearly thought it wasn't. (The "n=0" shows that the debug info described some kind of location; if "n" was indeed optimized away, it should have said so. Either the debug info does have a bug--giving a location for a nonexistent variable--or something else is interfering with our expectations.) It is very easy for people to dismiss problems with optimized-code debugging with a "well, what did you expect??" kind of attitude. Actually there are three broad categories that can apply to any such issue. (a) The compiler is trying to keep the debug info in sync with the generated code, but got it wrong. This is a correctness bug. (b) The compiler isn't bothering to keep the debug info in sync with the generated code, even though it reasonably could do so. This is a quality-of-implementation issue. (c) Optimization has made the situation too complicated for the debug info to reasonably keep track of things. This happens. There is some slop between the categories, because there are judgement calls involved in whether something is bad enough to be a bug or is more of a heuristic that isn't as good as we'd like; equally, there are judgement calls in what's a "reasonable" degree of effort to keep track of complicated cases. Blithely tossing every problem into the third category is inappropriate. I spent nontrivial time (on a previous compiler project) tracking down optimized-code debugging issues, and probably half of the time I could do something easy to address it. Sometimes the compiler was attaching the wrong source location (bug), sometimes it wasn't bothering to keep track at all even though it would be easy (quality of implementation). In a few cases, generating accurate debug info required extra analysis, and once or twice we went to that extra trouble; the rest of the time, it didn't seem worthwhile (judgement calls). Getting down to the specifics of this case, I downloaded the example programs and tried them as described. I got the same behavior as Seb described. As for our friend "n=0", after I hit the breakpoint I tried stepping once. At that point, "print n" showed "53" just as we would want. While it would be ideal to see "n=53" at the breakpoint, sometimes debug info and function prologs don't line up exactly, and stepping sometimes causes things to become more sensible. So, I think the 32-bit debug info is doing something reasonable, if not exactly what you would want. Pogo From tobias at grosser.es Thu Mar 8 17:23:09 2012 From: tobias at grosser.es (Tobias Grosser) Date: Fri, 09 Mar 2012 00:23:09 +0100 Subject: [LLVMdev] -indvars issues? In-Reply-To: <85F219B4-294E-4C6E-AB22-5DDDAD1CDE1B@gmail.com> References: <85F219B4-294E-4C6E-AB22-5DDDAD1CDE1B@gmail.com> Message-ID: <4F593F5D.9010805@grosser.es> On 03/08/2012 06:23 PM, Gavin Harrison wrote: > Hi, > > Is the -indvars pass functional? I've done some small test to check it, > but this fails to canonicalize: > >> int *x; >> int *y; >> int i; >> ... >> for (i = 1; i < 100; i+=2) { >> x[i] = y[i] + 3; >> } > > The IR produced after -indvars: > >> br label %for.cond >> >> for.cond: ; preds = %for.inc, %entry >> %indvars.iv = phi i64 [ %indvars.iv.next, %for.inc ], [ 1, %entry ] >> %0 = trunc i64 %indvars.iv to i32 >> %cmp = icmp slt i32 %0, 100 >> br i1 %cmp, label %for.body, label %for.end >> >> for.body: ; preds = %for.cond >> %arrayidx = getelementptr inbounds i32* %y, i64 %indvars.iv >> %1 = load i32* %arrayidx, align 4 >> %add = add nsw i32 %1, 3 >> %arrayidx2 = getelementptr inbounds i32* %x, i64 %indvars.iv >> store i32 %add, i32* %arrayidx2, align 4 >> br label %for.inc >> >> for.inc: ; preds = %for.body >> %indvars.iv.next = add i64 %indvars.iv, 2 >> br label %for.cond >> >> for.end: ; preds = %for.cond > > Which isn't in canonical form. Is there some trick to getting this pass > to work? I've tried adding various other passes ahead of it, like > -aa-eval, -scalar-evolution, -mem2reg, -lcssa, -loop-simplify, etc but > to no avail. -indvars does not canonicalize as much any more, as more passes can handle non canonical loops. To get the old canonicalization add -enable-iv-rewrite on the command line. Though I would not rely on this, as this flag is about to be removed. Cheers Tobi From joe.matarazzo at gmail.com Thu Mar 8 17:52:19 2012 From: joe.matarazzo at gmail.com (Joe Matarazzo) Date: Thu, 8 Mar 2012 15:52:19 -0800 Subject: [LLVMdev] Register coalescing Message-ID: Need some guidance about the right way to model this -- how would you model a backend with a handful of read-only physical registers that are passed as arguments to a function? I was emitting copyFromReg nodes in the LowerFormalArgument() routine, but then the register allocator and coalescer are resisting coalescing the COPY MI's for various reasons - for example, the read-only register class contains too few registers and the live range threshold cancels the coalescing. A simple example (post-ISEL): %vreg2 = COPY %C1; GPReg:%vreg2 ... %vreg11 = MUL %vreg7, %vreg2; GPreg:%vreg11,%vreg7,%vreg2 I'd want it to propagate %C1 into the MUL, replacing %vreg2. How is this supposed to work? Is there a DAG operation or MF pass that should handle this before regalloc, or some other means? Thanks, Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120308/0f85703b/attachment.html From daniel at zuster.org Thu Mar 8 18:57:49 2012 From: daniel at zuster.org (Daniel Dunbar) Date: Thu, 8 Mar 2012 16:57:49 -0800 Subject: [LLVMdev] RFE: Rename LLVM_ATTRIBUTE_{READONLY, READNONE} to LLVM_{READONLY, READNONE} Message-ID: Hi all, I would like to rename LLVM_ATTRIBUTE_READONLY to LLVM_READONLY and LLVM_ATTRIBUTE_READNONE to LLVM_READNONE Any objections? - Daniel From stoklund at 2pi.dk Thu Mar 8 19:11:25 2012 From: stoklund at 2pi.dk (Jakob Stoklund Olesen) Date: Thu, 08 Mar 2012 17:11:25 -0800 Subject: [LLVMdev] Register coalescing In-Reply-To: References: Message-ID: On Mar 8, 2012, at 3:52 PM, Joe Matarazzo wrote: > Need some guidance about the right way to model this -- how would you model a backend with a handful of read-only physical registers that are passed as arguments to a function? I was emitting copyFromReg nodes in the LowerFormalArgument() routine, but then the register allocator and coalescer are resisting coalescing the COPY MI's for various reasons - for example, the read-only register class contains too few registers and the live range threshold cancels the coalescing. > > A simple example (post-ISEL): > > %vreg2 = COPY %C1; GPReg:%vreg2 > ... > %vreg11 = MUL %vreg7, %vreg2; GPreg:%vreg11,%vreg7,%vreg2 > > I'd want it to propagate %C1 into the MUL, replacing %vreg2. How is this supposed to work? Is there a DAG operation or MF pass that should handle this before regalloc, or some other means? You should model the live-in registers like other targets do, let MachineRegisterInfo::EmitLiveInCopies() produce the copies. Make sure the constant registers belong to the register class you are using (GPReg), otherwise coalescing is impossible. Mark the constant registers as reserved, and RegisterCoalescer should take care of the rest. /jakob From wendling at apple.com Thu Mar 8 20:31:23 2012 From: wendling at apple.com (Bill Wendling) Date: Thu, 08 Mar 2012 18:31:23 -0800 Subject: [LLVMdev] RFE: Rename LLVM_ATTRIBUTE_{READONLY, READNONE} to LLVM_{READONLY, READNONE} In-Reply-To: References: Message-ID: <83758F5D-01C0-473F-ABA3-C7CA8CA904B8@apple.com> What's the reason for renaming them? -bw On Mar 8, 2012, at 4:57 PM, Daniel Dunbar wrote: > Hi all, > > I would like to rename > LLVM_ATTRIBUTE_READONLY to LLVM_READONLY > and > LLVM_ATTRIBUTE_READNONE to LLVM_READNONE > > Any objections? > > - Daniel > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From clattner at apple.com Thu Mar 8 21:03:07 2012 From: clattner at apple.com (Chris Lattner) Date: Thu, 08 Mar 2012 19:03:07 -0800 Subject: [LLVMdev] RFE: Rename LLVM_ATTRIBUTE_{READONLY, READNONE} to LLVM_{READONLY, READNONE} In-Reply-To: <83758F5D-01C0-473F-ABA3-C7CA8CA904B8@apple.com> References: <83758F5D-01C0-473F-ABA3-C7CA8CA904B8@apple.com> Message-ID: <2BD90B2D-9597-49F7-A766-3C62D36367B8@apple.com> On Mar 8, 2012, at 6:31 PM, Bill Wendling wrote: > What's the reason for renaming them? I can think of two things: 1) it being an attribute is an implementation detail 2) it being long makes it annoying to use :) -Chris > > -bw > > On Mar 8, 2012, at 4:57 PM, Daniel Dunbar wrote: > >> Hi all, >> >> I would like to rename >> LLVM_ATTRIBUTE_READONLY to LLVM_READONLY >> and >> LLVM_ATTRIBUTE_READNONE to LLVM_READNONE >> >> Any objections? >> >> - Daniel >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From echristo at apple.com Thu Mar 8 21:39:20 2012 From: echristo at apple.com (Eric Christopher) Date: Thu, 08 Mar 2012 19:39:20 -0800 Subject: [LLVMdev] Dwarf info and .debug_pubnames section In-Reply-To: <000701ccfd7a$f43b5bb0$dcb21310$@org> References: <000701ccfd7a$f43b5bb0$dcb21310$@org> Message-ID: <74111B68-3F58-4C04-A448-A3436519D788@apple.com> On Mar 8, 2012, at 2:29 PM, Pranav Bhandarkar wrote: > Hi All, > > I just enabled the generation of dwarf debugging information for Hexagon. It > did not require much save for the setting of a flag in MCAsmInfo. > > However, now I see that the ".debug_pubnames" sections is not generated. I > did read discussion about the section not really being useful for debuggers > in terms of accelerated access, but I have code that uses libdwarf to check > for global variables. The particular libdwarf API queries the > ".debug_pubnames" This code is no longer functional because my LLVM > generated executables do not seem to have the .debug_pubnames sections. From > the discussion I did not gather that the sections would be removed > completely. Is that the case ? Nope, it's removed completely. No debugger that I know of uses it at all and it's useless for many reasons. Adding the code back in to generate it is possible (it's a fairly small commit that's easily reverted), but I see no reason to have it generated by default. What are you doing that involves looking for global variables in the pubnames section? -eric From daniel at zuster.org Thu Mar 8 22:08:05 2012 From: daniel at zuster.org (Daniel Dunbar) Date: Thu, 8 Mar 2012 20:08:05 -0800 Subject: [LLVMdev] RFE: Rename LLVM_ATTRIBUTE_{READONLY, READNONE} to LLVM_{READONLY, READNONE} In-Reply-To: <2BD90B2D-9597-49F7-A766-3C62D36367B8@apple.com> References: <83758F5D-01C0-473F-ABA3-C7CA8CA904B8@apple.com> <2BD90B2D-9597-49F7-A766-3C62D36367B8@apple.com> Message-ID: On Thu, Mar 8, 2012 at 7:03 PM, Chris Lattner wrote: > > On Mar 8, 2012, at 6:31 PM, Bill Wendling wrote: > >> What's the reason for renaming them? > > I can think of two things: > > 1) it being an attribute is an implementation detail > 2) it being long makes it annoying to use :) Exactly. - Daniel > -Chris > >> >> -bw >> >> On Mar 8, 2012, at 4:57 PM, Daniel Dunbar wrote: >> >>> Hi all, >>> >>> I would like to rename >>> LLVM_ATTRIBUTE_READONLY to LLVM_READONLY >>> and >>> LLVM_ATTRIBUTE_READNONE to LLVM_READNONE >>> >>> Any objections? >>> >>> - Daniel >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From pranavb at codeaurora.org Thu Mar 8 22:37:22 2012 From: pranavb at codeaurora.org (Pranav Bhandarkar) Date: Thu, 8 Mar 2012 22:37:22 -0600 Subject: [LLVMdev] Dwarf info and .debug_pubnames section In-Reply-To: <74111B68-3F58-4C04-A448-A3436519D788@apple.com> References: <000701ccfd7a$f43b5bb0$dcb21310$@org> <74111B68-3F58-4C04-A448-A3436519D788@apple.com> Message-ID: <000e01ccfdae$52425f00$f6c71d00$@org> Hi Eric, > Nope, it's removed completely. No debugger that I know of uses it at > all and it's > useless for many reasons. Adding the code back in to generate it is > possible (it's > a fairly small commit that's easily reverted), but I see no reason to > have it generated > by default. What are you doing that involves looking for global > variables in the pubnames > section? > Thanks for the information. After I sent out the email, I saw the patch you reverted on viewVC. I have some proprietary code that looks at the debug info in an executable using libdwarf; It queries the debug info on demand with a variable name and uses dwarf_get_globals which, I believe, uses .debug_pubnames. I believe you made only one change to take out support for this section, right ? I will revert that change and give it a go. Thanks, Pranav Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. From echristo at apple.com Fri Mar 9 00:09:58 2012 From: echristo at apple.com (Eric Christopher) Date: Thu, 08 Mar 2012 22:09:58 -0800 Subject: [LLVMdev] Dwarf info and .debug_pubnames section In-Reply-To: <000e01ccfdae$52425f00$f6c71d00$@org> References: <000701ccfd7a$f43b5bb0$dcb21310$@org> <74111B68-3F58-4C04-A448-A3436519D788@apple.com> <000e01ccfdae$52425f00$f6c71d00$@org> Message-ID: <131872BC-8596-43C2-8C0D-D01AA5928518@apple.com> On Mar 8, 2012, at 8:37 PM, Pranav Bhandarkar wrote: > Hi Eric, > >> Nope, it's removed completely. No debugger that I know of uses it at >> all and it's >> useless for many reasons. Adding the code back in to generate it is >> possible (it's >> a fairly small commit that's easily reverted), but I see no reason to >> have it generated >> by default. What are you doing that involves looking for global >> variables in the pubnames >> section? >> > > Thanks for the information. After I sent out the email, I saw the patch you > reverted on viewVC. I have some proprietary code that looks at the debug > info in an executable using libdwarf; It queries the debug info on demand > with a variable name and uses dwarf_get_globals which, I believe, uses > .debug_pubnames. > That's odd and likely problematic. The code would be better off iterating through anything with a DW_TAG_variable from whichever DIE you want to iterate and looking for DW_AT_external. Slightly longer, less likely to be buggy. (However, probably the only possible use of the pubnames section that isn't totally useless) > I believe you made only one change to take out support for this section, > right ? I will revert that change and give it a go. Correct. Re-enabling the pubnames section will increase the size of your dwarf debug info quite significantly since it requires a full copy of every string even if you're using DW_FORM_strp for strings in the rest of the debug info. -eric From chenwj at iis.sinica.edu.tw Fri Mar 9 00:52:07 2012 From: chenwj at iis.sinica.edu.tw (=?utf-8?B?6Zmz6Z+L5Lu7?=) Date: Fri, 9 Mar 2012 14:52:07 +0800 Subject: [LLVMdev] Introducing julia, and gauging interest in a julia BOF session at the upcoming LLVM conference in London In-Reply-To: <23793385-DB60-4E87-94C1-3045266E19FD@mayin.org> References: <23793385-DB60-4E87-94C1-3045266E19FD@mayin.org> Message-ID: <20120309065207.GA34998@cs.nctu.edu.tw> Hi Viral, I skim through the article talking about why you guys creat Julia [1], very ambitious object, I must say. :) Anyway, I notice there is link to a Chinese tranlation on [1]. Actually, it's a Simplified Chinese tranlation. I have written a Traditional Chinese tranlation for Julia. Would you mind to put the link [2] on the page for me? Thanks! :) Regards, chenwj [1] http://julialang.org/blog/2012/02/why-we-created-julia/ [2] http://www.hellogcc.org/archives/666 -- Wei-Ren Chen (???) Computer Systems Lab, Institute of Information Science, Academia Sinica, Taiwan (R.O.C.) Tel:886-2-2788-3799 #1667 Homepage: http://people.cs.nctu.edu.tw/~chenwj From james.molloy at arm.com Fri Mar 9 02:10:24 2012 From: james.molloy at arm.com (James Molloy) Date: Fri, 9 Mar 2012 08:10:24 -0000 Subject: [LLVMdev] Problem with x86 32-bit debug information ? In-Reply-To: References: <4f576184.04e7d80a.7225.194cSMTPIN_ADDED@mx.google.com> <4f576fad.a705b40a.44bd.ffffcc63SMTPIN_ADDED@mx.google.com> Message-ID: <008101ccfdcc$149422b0$3dbc6810$@molloy@arm.com> Hi, > I do have to take exception to James Molloy's assessment of the variable "n" > as "optimized away" because the debug info clearly thought it wasn't. Mea culpa here, I misread the original email and saw "" for the wrong parameter. Then didn't recheck the original email when follow-ups happened. I apologise. Cheers, James -----Original Message----- From: Paul Robinson [mailto:pogo.work at gmail.com] Sent: 08 March 2012 23:00 To: Seb Cc: llvmdev at cs.uiuc.edu; James Molloy Subject: Re: [LLVMdev] Problem with x86 32-bit debug information ? On Wed, Mar 7, 2012 at 6:50 AM, Seb wrote: > Hi James, > > I fully agree with you and understand your statement about -O2. > > Now some questions for you: > Did you try to reproduce experiments described in my previous e-mail ? > Did you look at debug informations generated for 'n' parameter on x86 32-bit > & x86 64-bit ? > I'm working on my own front-end for LLVM and I had difficulties with debug > information when they are related to x86 32-bits. So far there are two > options: > 1) metadata that I generate are incorrect. > 2) LLVM is not handling in a correct manner those metadata for x86 32-bit > target. > I've already posted problem related to metadata that I generate and they are > in LLVM 2.9 format. I've been adviced to move to most recent format. Before > starting any move into that direction, I would like to be sure that LLVM > trunk could solve the problem. Using clang at -O2 -g is giving me some > indication that it won't solve my problem and that we are failing into > option (2). > So to summarize, I would be nice if someone can confirm that debug > informations generated on this specific case are correct for x86 32-bit and > that I would have to deal with that. > > Thanks > Best Regards > Seb > > 2012/3/7 James Molloy >> >> Hi Seb, >> >> >> >> I?m going to reiterate ? Clang can decide when it wants to optimise away a >> variable. You asked for that behaviour when you specified ?O2. You can?t >> expect deterministically the same behaviour on both x86 and x86-64 platforms >> ? the procedure call standards are different and different decisions go in >> to deciding how to optimise. >> >> >> >> You can?t expect debug information for an optimised build to fully track >> that of the source because by definition the source is being modified to >> optimise. >> >> >> >> Cheers, >> >> >> >> James >> >> >> >> From: Seb [mailto:babslachem at gmail.com] >> >> Sent: 07 March 2012 13:37 >> To: James Molloy >> Cc: llvmdev at cs.uiuc.edu >> Subject: Re: [LLVMdev] Problem with x86 32-bit debug information ? >> >> >> >> Hi James, >> >> clang is able to generate correct debug informations for 64-bit target at >> -O2. My feeling, given some other experiments I've done,? is that debug >> information generated for x86 32-bit might be broken for parameters as long >> as they are not 'homed' in the code (local copy to an automatic variable). >> It seems that when llvm.declare is turned into a llvm.value for parameter >> there is something incorrect with respect to parameters debug informations >> that is generated by clang/llvm. I just would like confirmation of this. >> >> Thanks for your answer >> Best Regards >> Seb >> >> 2012/3/7 James Molloy >> >> Hi Seb, >> >> >> >> Clang cannot generate debug information for something that it has >> optimised away. You should reduce the optimisation level. >> >> >> >> In general debug information is only really accurate at ?O0. >> >> >> >> Cheers, >> >> >> >> James >> >> >> >> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On >> Behalf Of Seb >> Sent: 07 March 2012 13:17 >> To: llvmdev at cs.uiuc.edu >> Subject: [LLVMdev] Problem with x86 32-bit debug information ? >> >> >> >> Hi all, >> >> I'm using trunk version of LLVM/CLANG. >> When I compile attached files on my 64-bit Ubuntu 10.04 LTS system as >> follows: >> >> clang -O2 -g check.c main.c -o check64 >> >> When I do gdb check64 and set a breakpoint to the check routine and >> executes to the breakpoint, I've got: >> >> Breakpoint 1, check (result=0x601110, expect=0x601020, n=53) at check.c:7 >> 7??? { >> >> As you can see I can inspect 'n' value. >> >> Now if I compile for x86 32-bit as follows: >> >> clang -m32 -O2 -g check.c main.c -o check32 >> >> When I do gdb check32 and set a breakpoint to the check routine and >> executes to the breakpoint, I've got: >> >> Breakpoint 1, check (result=, >> ??? expect=, n=0) at check.c:7 >> 7??? { >> >> As you can see I can NOT inspect 'n' value. Is there a way to inforce even >> at -O2 clang to generate debug informations so that I can inspect 'n' value >> ? >> Or is it a BUG from clang for x86 32-bit ? >> Thanks for your answers. >> Best Regards >> Seb >> >> > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > I do have to take exception to James Molloy's assessment of the variable "n" as "optimized away" because the debug info clearly thought it wasn't. (The "n=0" shows that the debug info described some kind of location; if "n" was indeed optimized away, it should have said so. Either the debug info does have a bug--giving a location for a nonexistent variable--or something else is interfering with our expectations.) It is very easy for people to dismiss problems with optimized-code debugging with a "well, what did you expect??" kind of attitude. Actually there are three broad categories that can apply to any such issue. (a) The compiler is trying to keep the debug info in sync with the generated code, but got it wrong. This is a correctness bug. (b) The compiler isn't bothering to keep the debug info in sync with the generated code, even though it reasonably could do so. This is a quality-of-implementation issue. (c) Optimization has made the situation too complicated for the debug info to reasonably keep track of things. This happens. There is some slop between the categories, because there are judgement calls involved in whether something is bad enough to be a bug or is more of a heuristic that isn't as good as we'd like; equally, there are judgement calls in what's a "reasonable" degree of effort to keep track of complicated cases. Blithely tossing every problem into the third category is inappropriate. I spent nontrivial time (on a previous compiler project) tracking down optimized-code debugging issues, and probably half of the time I could do something easy to address it. Sometimes the compiler was attaching the wrong source location (bug), sometimes it wasn't bothering to keep track at all even though it would be easy (quality of implementation). In a few cases, generating accurate debug info required extra analysis, and once or twice we went to that extra trouble; the rest of the time, it didn't seem worthwhile (judgement calls). Getting down to the specifics of this case, I downloaded the example programs and tried them as described. I got the same behavior as Seb described. As for our friend "n=0", after I hit the breakpoint I tried stepping once. At that point, "print n" showed "53" just as we would want. While it would be ideal to see "n=53" at the breakpoint, sometimes debug info and function prologs don't line up exactly, and stepping sometimes causes things to become more sensible. So, I think the 32-bit debug info is doing something reasonable, if not exactly what you would want. Pogo From tobias at grosser.es Fri Mar 9 02:34:26 2012 From: tobias at grosser.es (Tobias Grosser) Date: Fri, 09 Mar 2012 09:34:26 +0100 Subject: [LLVMdev] Introducing julia, and gauging interest in a julia BOF session at the upcoming LLVM conference in London In-Reply-To: <23793385-DB60-4E87-94C1-3045266E19FD@mayin.org> References: <23793385-DB60-4E87-94C1-3045266E19FD@mayin.org> Message-ID: <4F59C092.2000809@grosser.es> On 03/08/2012 02:04 PM, Viral Shah wrote: > Folks, > > We are contemplating holding a Birds of a Feather session titled "Julia and LLVM: Implementing a fast dynamic language for technical computing" at the LLVM 2012 European Conference on April 12-13 in London. > > http://llvm.org/devmtg/2012-04-12/ > > Would this be of interest to the LLVM developer and user community? It would be great if you could drop me a line. It will help us gauge the interest and decide if we should hold the session or not. Sure. Sounds interesting to me. Tobi From babslachem at gmail.com Fri Mar 9 02:47:30 2012 From: babslachem at gmail.com (Seb) Date: Fri, 9 Mar 2012 09:47:30 +0100 Subject: [LLVMdev] Problem with x86 32-bit debug information ? In-Reply-To: <4f59bad6.8c3ad80a.48ba.3ae2SMTPIN_ADDED@mx.google.com> References: <4f576184.04e7d80a.7225.194cSMTPIN_ADDED@mx.google.com> <4f576fad.a705b40a.44bd.ffffcc63SMTPIN_ADDED@mx.google.com> <4f59bad6.8c3ad80a.48ba.3ae2SMTPIN_ADDED@mx.google.com> Message-ID: Hi Pogo & James, Pogo, that is exactly the kind of answer I was expecting. Thanks for the time you spend on this problem. I myself did also some experimenst and found way to get what I'm expecting but I think that at least for x86 or any parameter passed on the stack for a different architecture the way LLVM handle debug information might be a problem. So here was the situation: My front-end at -O0 generates direct access to parameters and I used llvm.dbg.value to associate metadata and I couldn't get llc to generate fp related debug info that would have made 'n' value available at first breakpoint. I eventually found looking at what's generated by clang at -O0 that I should first 'home' the parameter (make a local copy to a variable) then use llc -disable-fp-elim so that 'n' value can be inspected. My feeling is that LLVM debug info generation is based on LLVM code-style emitted by clang and might not deal when code is emitted by a different front-end. James, no need to apology we all make mistakes, including myself. Best Regards Seb 2012/3/9 James Molloy > Hi, > > > I do have to take exception to James Molloy's assessment of the variable > "n" > > as "optimized away" because the debug info clearly thought it wasn't. > > Mea culpa here, I misread the original email and saw " out>" > for the wrong parameter. Then didn't recheck the original email when > follow-ups happened. > > I apologise. > > Cheers, > > James > > -----Original Message----- > From: Paul Robinson [mailto:pogo.work at gmail.com] > Sent: 08 March 2012 23:00 > To: Seb > Cc: llvmdev at cs.uiuc.edu; James Molloy > Subject: Re: [LLVMdev] Problem with x86 32-bit debug information ? > > On Wed, Mar 7, 2012 at 6:50 AM, Seb wrote: > > Hi James, > > > > I fully agree with you and understand your statement about -O2. > > > > Now some questions for you: > > Did you try to reproduce experiments described in my previous e-mail ? > > Did you look at debug informations generated for 'n' parameter on x86 > 32-bit > > & x86 64-bit ? > > I'm working on my own front-end for LLVM and I had difficulties with > debug > > information when they are related to x86 32-bits. So far there are two > > options: > > 1) metadata that I generate are incorrect. > > 2) LLVM is not handling in a correct manner those metadata for x86 32-bit > > target. > > I've already posted problem related to metadata that I generate and they > are > > in LLVM 2.9 format. I've been adviced to move to most recent format. > Before > > starting any move into that direction, I would like to be sure that LLVM > > trunk could solve the problem. Using clang at -O2 -g is giving me some > > indication that it won't solve my problem and that we are failing into > > option (2). > > So to summarize, I would be nice if someone can confirm that debug > > informations generated on this specific case are correct for x86 32-bit > and > > that I would have to deal with that. > > > > Thanks > > Best Regards > > Seb > > > > 2012/3/7 James Molloy > >> > >> Hi Seb, > >> > >> > >> > >> I?m going to reiterate ? Clang can decide when it wants to optimise away > a > >> variable. You asked for that behaviour when you specified ?O2. You can?t > >> expect deterministically the same behaviour on both x86 and x86-64 > platforms > >> ? the procedure call standards are different and different decisions go > in > >> to deciding how to optimise. > >> > >> > >> > >> You can?t expect debug information for an optimised build to fully track > >> that of the source because by definition the source is being modified to > >> optimise. > >> > >> > >> > >> Cheers, > >> > >> > >> > >> James > >> > >> > >> > >> From: Seb [mailto:babslachem at gmail.com] > >> > >> Sent: 07 March 2012 13:37 > >> To: James Molloy > >> Cc: llvmdev at cs.uiuc.edu > >> Subject: Re: [LLVMdev] Problem with x86 32-bit debug information ? > >> > >> > >> > >> Hi James, > >> > >> clang is able to generate correct debug informations for 64-bit target > at > >> -O2. My feeling, given some other experiments I've done, is that debug > >> information generated for x86 32-bit might be broken for parameters as > long > >> as they are not 'homed' in the code (local copy to an automatic > variable). > >> It seems that when llvm.declare is turned into a llvm.value for > parameter > >> there is something incorrect with respect to parameters debug > informations > >> that is generated by clang/llvm. I just would like confirmation of this. > >> > >> Thanks for your answer > >> Best Regards > >> Seb > >> > >> 2012/3/7 James Molloy > >> > >> Hi Seb, > >> > >> > >> > >> Clang cannot generate debug information for something that it has > >> optimised away. You should reduce the optimisation level. > >> > >> > >> > >> In general debug information is only really accurate at ?O0. > >> > >> > >> > >> Cheers, > >> > >> > >> > >> James > >> > >> > >> > >> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On > >> Behalf Of Seb > >> Sent: 07 March 2012 13:17 > >> To: llvmdev at cs.uiuc.edu > >> Subject: [LLVMdev] Problem with x86 32-bit debug information ? > >> > >> > >> > >> Hi all, > >> > >> I'm using trunk version of LLVM/CLANG. > >> When I compile attached files on my 64-bit Ubuntu 10.04 LTS system as > >> follows: > >> > >> clang -O2 -g check.c main.c -o check64 > >> > >> When I do gdb check64 and set a breakpoint to the check routine and > >> executes to the breakpoint, I've got: > >> > >> Breakpoint 1, check (result=0x601110, expect=0x601020, n=53) at > check.c:7 > >> 7 { > >> > >> As you can see I can inspect 'n' value. > >> > >> Now if I compile for x86 32-bit as follows: > >> > >> clang -m32 -O2 -g check.c main.c -o check32 > >> > >> When I do gdb check32 and set a breakpoint to the check routine and > >> executes to the breakpoint, I've got: > >> > >> Breakpoint 1, check (result=, > >> expect=, n=0) at check.c:7 > >> 7 { > >> > >> As you can see I can NOT inspect 'n' value. Is there a way to inforce > even > >> at -O2 clang to generate debug informations so that I can inspect 'n' > value > >> ? > >> Or is it a BUG from clang for x86 32-bit ? > >> Thanks for your answers. > >> Best Regards > >> Seb > >> > >> > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > I do have to take exception to James Molloy's assessment of the variable > "n" > as "optimized away" because the debug info clearly thought it wasn't. (The > "n=0" shows that the debug info described some kind of location; if "n" was > indeed optimized away, it should have said so. Either the debug info does > have a bug--giving a location for a nonexistent variable--or something else > is > interfering with our expectations.) > > It is very easy for people to dismiss problems with optimized-code > debugging > with a "well, what did you expect??" kind of attitude. Actually there are > three > broad categories that can apply to any such issue. > > (a) The compiler is trying to keep the debug info in sync with the > generated > code, but got it wrong. This is a correctness bug. > (b) The compiler isn't bothering to keep the debug info in sync with the > generated code, even though it reasonably could do so. This is a > quality-of-implementation issue. > (c) Optimization has made the situation too complicated for the debug info > to > reasonably keep track of things. This happens. > > There is some slop between the categories, because there are judgement > calls involved in whether something is bad enough to be a bug or is more of > a heuristic that isn't as good as we'd like; equally, there are judgement > calls > in what's a "reasonable" degree of effort to keep track of complicated > cases. > > Blithely tossing every problem into the third category is > inappropriate. I spent > nontrivial time (on a previous compiler project) tracking down > optimized-code > debugging issues, and probably half of the time I could do something easy > to > address it. Sometimes the compiler was attaching the wrong source location > (bug), sometimes it wasn't bothering to keep track at all even though > it would be > easy (quality of implementation). In a few cases, generating accurate > debug > info required extra analysis, and once or twice we went to that extra > trouble; > the rest of the time, it didn't seem worthwhile (judgement calls). > > Getting down to the specifics of this case, I downloaded the example > programs > and tried them as described. I got the same behavior as Seb described. > > As for our friend "n=0", after I hit the breakpoint I tried stepping > once. At that > point, "print n" showed "53" just as we would want. While it would be > ideal > to > see "n=53" at the breakpoint, sometimes debug info and function prologs > don't > line up exactly, and stepping sometimes causes things to become more > sensible. > > So, I think the 32-bit debug info is doing something reasonable, if not > exactly > what you would want. > > Pogo > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120309/8e5e8bad/attachment.html From jobnoorman at gmail.com Fri Mar 9 04:52:46 2012 From: jobnoorman at gmail.com (Job Noorman) Date: Fri, 9 Mar 2012 11:52:46 +0100 Subject: [LLVMdev] Stack protector performance Message-ID: I have a question about the performance of the implementation of the stack protector in LLVM. Consider the following C program: ===== void canary() { char buf[20]; buf[0]++; } int main() { int i; for (i = 0; i < 1000000000; ++i) canary(); return 0; } ===== This should definately run slower when stack protection is enabled, right? I have measured the runtime of this program on two different systems compiled with GCC and LLVM. Here are the results (percentages are the difference with the unprotected version of the program): | Desktop | Laptop | -----+---------+--------+ GCC | +13% | +277% | LLVM | -3%(!) | +330% | (These measurements are the median values of 10 runs.) So the obvious question is: can anybody explain how it is possible that using the stack protector causes the program to run 3% faster on my desktop? I have tried profiling the program using valgrind (cachegrind & callgrind) but the results show absolutely no reason at all for these measurements. I have attached an archive with the source code and compiled binaries. Here are the specs of the two systems: * Desktop - Ubuntu 11.10 - Linux 3.0.0-16-generic-pae - Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz (2048K cache) * Laptop - Ubuntu 11.10 - Linux 3.0.0-16-generic - Intel(R) Atom(TM) CPU N450 @ 1.66GHz (512K cache) Kind regards, Job -------------- next part -------------- A non-text attachment was scrubbed... Name: canary.tgz Type: application/x-gzip Size: 4321 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120309/22de8da6/attachment.tgz From james.molloy at arm.com Fri Mar 9 06:09:06 2012 From: james.molloy at arm.com (James Molloy) Date: Fri, 9 Mar 2012 12:09:06 -0000 Subject: [LLVMdev] Euro-LLVM 2012 - BoF and lightning talk deadline Message-ID: <008a01ccfded$6d03e4f0$470baed0$@molloy@arm.com> Hi, [Apologies to all you US-types for the spam!] The deadline for BoF proposal has now been set as two weeks from now: Thursday 22nd March 2012, 12:00 BST. We currently have no BoF proposals, so please do send some in! J We also have space for lightning talks, and will run a lightning talk session if there is enough interest. Please send lightning talk proposals to the same address as usual, Euro-LLVM at arm.com, for the same deadline as above. Also please remember we have subsidised rooms available at the hotel - this has cost us money in deposits so please take advantage of them! https://hotelres.vbookings.co.uk/b/armeullvm/ Cheers! James -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120309/42609737/attachment.html From geovanisouza92 at gmail.com Fri Mar 9 08:14:50 2012 From: geovanisouza92 at gmail.com (geovanisouza92 at gmail.com) Date: Fri, 9 Mar 2012 11:14:50 -0300 Subject: [LLVMdev] How to avoid include the same source-file more than once? Message-ID: How can I avoid include the same source-file more than once or the different files with the same content, when all files will be merged in only one binary file? Let me clear this: In my programming language project, two classes with the same names will be merged, not generating a semantic error or whatever. Isn't a bug, is a feature. :o) E.g. ## file1.arc include "base.arc" ## file2.arc include "base.arc" I don't want use C/C++-like style guard or similar. I think about an list of complete path of each file inside the compiler, or an checksum, generated over the bytes of file, like git index does. -- @geovanisouza92 - Geovani de Souza -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120309/28e629aa/attachment.html From baldrick at free.fr Fri Mar 9 08:42:50 2012 From: baldrick at free.fr (Duncan Sands) Date: Fri, 09 Mar 2012 15:42:50 +0100 Subject: [LLVMdev] How to avoid include the same source-file more than once? In-Reply-To: References: Message-ID: <4F5A16EA.6070303@free.fr> Hi, On 09/03/12 15:14, geovanisouza92 at gmail.com wrote: > How can I avoid include the same source-file more than once or the different > files with the same content, when all files will be merged in only one binary file? > > Let me clear this: In my programming language project, two classes with the same > names will be merged, not generating a semantic error or whatever. Isn't a bug, > is a feature. :o) it sounds like you are looking for weak linkage (there are several kinds of weak linkage, see http://llvm.org/docs/LangRef.html#linkage). Ciao, Duncan. > > E.g. > > ## file1.arc > include "base.arc" > > ## file2.arc > include "base.arc" > > I don't want use C/C++-like style guard or similar. > I think about an list of complete path of each file inside the compiler, or an > checksum, generated over the bytes of file, like git index does. > > -- > @geovanisouza92 - Geovani de Souza > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From rkotler at mips.com Fri Mar 9 09:35:30 2012 From: rkotler at mips.com (Reed Kotler) Date: Fri, 9 Mar 2012 07:35:30 -0800 Subject: [LLVMdev] complete llvm ports Message-ID: <4F5A2342.2030004@mips.com> There used to be a list of all the llvm ports and the status. The x86 was the only compiler that was a "full port". We are preparing to add out native linux compiler to the official build bots. Are there various official "gating" criteria for different levels of llvm "doneness" so to speak? There is a matrix I see in http://llvm.org/releases/3.0/docs/CodeGenerator.html#targetfeatures which seems to be old. For example, I would definitely consider the MIPS port to be reliable and other things like .o writing are definitely in there. The various current llvm build bots seem to do different levels of testing. From kcc at google.com Fri Mar 9 09:39:23 2012 From: kcc at google.com (Kostya Serebryany) Date: Fri, 9 Mar 2012 07:39:23 -0800 Subject: [LLVMdev] Stack protector performance In-Reply-To: References: Message-ID: What optimization level are you using? -O0 is not interesting, and at -O1 the optimizer nukes all the code In your example, the stack variable and the stack accesses are optimized away: % ./build/Release+Asserts/bin/clang -O1 -S -emit-llvm -o - stack.c define void @canary() nounwind uwtable readnone { entry: ret void } define i32 @main() nounwind uwtable readnone { for.end: ret i32 0 } You need to prepare a more optimizer-resistant benchmark. --kcc On Fri, Mar 9, 2012 at 2:52 AM, Job Noorman wrote: > I have a question about the performance of the implementation of the stack > protector in LLVM. > > Consider the following C program: > ===== > void canary() > { > char buf[20]; > buf[0]++; > } > > int main() > { > int i; > for (i = 0; i < 1000000000; ++i) > canary(); > return 0; > } > ===== > > This should definately run slower when stack protection is enabled, right? > > I have measured the runtime of this program on two different systems > compiled > with GCC and LLVM. Here are the results (percentages are the difference > with > the unprotected version of the program): > > | Desktop | Laptop | > -----+---------+--------+ > GCC | +13% | +277% | > LLVM | -3%(!) | +330% | > > (These measurements are the median values of 10 runs.) > > So the obvious question is: can anybody explain how it is possible that > using > the stack protector causes the program to run 3% faster on my desktop? > > I have tried profiling the program using valgrind (cachegrind & callgrind) > but > the results show absolutely no reason at all for these measurements. > > I have attached an archive with the source code and compiled binaries. > > Here are the specs of the two systems: > * Desktop > - Ubuntu 11.10 > - Linux 3.0.0-16-generic-pae > - Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz (2048K cache) > * Laptop > - Ubuntu 11.10 > - Linux 3.0.0-16-generic > - Intel(R) Atom(TM) CPU N450 @ 1.66GHz (512K cache) > > Kind regards, > Job > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120309/fd7103c1/attachment.html From criswell at illinois.edu Fri Mar 9 09:47:09 2012 From: criswell at illinois.edu (John Criswell) Date: Fri, 9 Mar 2012 09:47:09 -0600 Subject: [LLVMdev] Introducing julia, and gauging interest in a julia BOF session at the upcoming LLVM conference in London In-Reply-To: <23793385-DB60-4E87-94C1-3045266E19FD@mayin.org> References: <23793385-DB60-4E87-94C1-3045266E19FD@mayin.org> Message-ID: <4F5A25FD.5040108@illinois.edu> Dear Viral, Would you like Julia listed on the LLVM User's page (http://llvm.org/Users.html)? If so, please email me an appropriate description, and I'll add it. On a related note, if someone asked for an entry to be added and it hasn't yet, please feel free to remind me. :) -- John T. From preston.briggs at gmail.com Fri Mar 9 11:34:14 2012 From: preston.briggs at gmail.com (Preston Briggs) Date: Fri, 9 Mar 2012 09:34:14 -0800 Subject: [LLVMdev] Scalar replacement of arrays Message-ID: Nicolas Capens wrote: > [...] > I'm not sure if that's going to help achieve optimal code > for when the array is sometimes being dynamically indexed. > Essentially there should be some kind of store to load copy > propagation. As far as I know that's exactly what mem2reg > does, except that it only considers scalars and not elements > of arrays. > > So would it be hard to extend mem2reg to also consider elements > of arrays for promotion? It should obviously not perform the promotion > when in between the store and load there's a dynamically indexed > access to the array. Correct me if I'm wrong, but that seems it would > be superior to scalarrepl itself (for arrays). > > Is there anyone experienced with mem2reg who wants to implement this? > If not, any advice on how to best approach this? Classically, we use dependence analysis to support such optimizations. For example, see Chapter 8 in Allen & Kennedy's book, "Optimizing Compilers for Modern Architectures." Preston From baldrick at free.fr Fri Mar 9 12:41:44 2012 From: baldrick at free.fr (Duncan Sands) Date: Fri, 09 Mar 2012 19:41:44 +0100 Subject: [LLVMdev] LLVM Value Tracking Analysis In-Reply-To: References: <4F529A93.5040109@free.fr> Message-ID: <4F5A4EE8.2090505@free.fr> Hi Xin, On 04/03/12 00:57, Xin Tong wrote: > On Sat, Mar 3, 2012 at 5:26 PM, Duncan Sands wrote: >> Hi Xin, >> >>> It seems to me that LLVM does not do too much on value range analysis. >>> i.e. what are the value constraints on a variable at a given point in >>> the program. The closest thing i can find is the ValueTracking API, >>> which can do some simple analysis on the value of a variables. Am I >>> missing something/Is there a plan on the implementation of a more >>> powerful value range analysis ? >> >> as far as I know there have been two implementations of this kind of thing >> in the past, but they were each removed in turn. IIRC, this was due to them >> significantly increasing compilation time without a sufficient improvement in >> the quality of code to justify the compile time cost. Currently the closest >> thing is the correlated value propagation pass, but I doubt it will be useful >> for >> you. > > The correlated value propagation pass is what is currently in the > ValueTrack.cpp file ? no, it is in lib/Transforms/Scalar/CorrelatedValuePropagation.cpp. But the analysis it uses, that you might want to use, is LazyValueInfo, see include/llvm/Analysis/LazyValueInfo.h do you know when the two implementations are > removed ? and where i can get them ? and how difficult is it to bring > them up to the current src tree. The original was called predsimplify and was removed in LLVM 2.7. The other was based on the ABCD algorithm and SSI and was removed in 2.8. Ciao, Duncan. From ryta1203 at gmail.com Fri Mar 9 13:18:21 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Fri, 9 Mar 2012 11:18:21 -0800 Subject: [LLVMdev] Updating value from PHI In-Reply-To: References: Message-ID: I think the SSAupdater might do the trick. The issue with replaceUses is that it would work great for the current block but what about the PHIs in continuing blocks, think that would be a problem. Thanks. On Thu, Mar 8, 2012 at 1:28 PM, Marc de Kruijf wrote: > It sounds like Transforms/Utils/SSAUpdater may be what you are looking > for. > A good example of how to use it -- one that sounds very similar to what > you're doing -- can be found in Transforms/Scalar/LoopRotation.cpp > > On Wed, Mar 7, 2012 at 2:03 PM, Ryan Taylor wrote: > >> I am splitting a one BB loop into two BB. >> >> Basically, the one loop BB has 3 incoming values, one form back edge two >> from other edges. I want to extract the PHIs from the other two edges out >> into it's own BB and delete that from the loop, then redirect the backedge >> to the loopbody (non extracted portion) and create a new PHI coming from >> the extracted BB and the backedge. >> >> I can do this; however, the PHIs following in all the other BBs are not >> getting updated, neither are the statements in the loopbody. >> >> What is the easieset way to propagate these changes downward? >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120309/019a3109/attachment.html From wendling at apple.com Fri Mar 9 14:21:36 2012 From: wendling at apple.com (Bill Wendling) Date: Fri, 09 Mar 2012 12:21:36 -0800 Subject: [LLVMdev] RFE: Rename LLVM_ATTRIBUTE_{READONLY, READNONE} to LLVM_{READONLY, READNONE} In-Reply-To: References: <83758F5D-01C0-473F-ABA3-C7CA8CA904B8@apple.com> <2BD90B2D-9597-49F7-A766-3C62D36367B8@apple.com> Message-ID: <65C2F9E3-B520-4DC2-9745-A8FC8F5C8C0C@apple.com> On Mar 8, 2012, at 8:08 PM, Daniel Dunbar wrote: > On Thu, Mar 8, 2012 at 7:03 PM, Chris Lattner wrote: >> >> On Mar 8, 2012, at 6:31 PM, Bill Wendling wrote: >> >>> What's the reason for renaming them? >> >> I can think of two things: >> >> 1) it being an attribute is an implementation detail >> 2) it being long makes it annoying to use :) > > Exactly. > Okay. -bw From fanl at csail.mit.edu Fri Mar 9 16:10:02 2012 From: fanl at csail.mit.edu (Fan Long) Date: Fri, 9 Mar 2012 17:10:02 -0500 Subject: [LLVMdev] How to keep FunctionPass analysis result alive in Module Pass? Message-ID: <3EDC7AC1-478A-41CD-A970-7D75B5BD0E67@csail.mit.edu> Hello, I am trying to write a new ModulePass using LoopInfo analysis result, but it seems I misunderstand some concept about PassManager. Basically I want to keep LoopInfo analysis result alive. Here is an example showing the problem I encountered, assuming I already addRequired() in getAnalysisUsage: void foo(llvm::Function *F1, llvm::Function *F2) { llvm::LoopInfo *LI1, LI2; LI1 = &getAnalysis(*F1); llvm::Loop* L1 = LI1->getLoopFor(F1->begin()); LI2 = &getAnalysis(*F2); llvm::Loop* L2 = LI2->getLoopFor(F2->begin()); L1->dump(); // crash L2->dump(); } I checked why this program crashes. It is because the getAnalysis returns same LoopInfo instance. Each time it clears previous results and run it on the new function. Thus it invalidate the pointer L1 after calling &getAnalysis(*F2). My questions is whether there is a way to get around this, and to keep the analysis result of Function Pass of all functions alive during my Module Pass? I am using LLVM-3.1-svn version. I would really appreciate your help! Best, Fan -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3744 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120309/cab130e2/attachment.bin From criswell at illinois.edu Fri Mar 9 16:20:00 2012 From: criswell at illinois.edu (John Criswell) Date: Fri, 9 Mar 2012 16:20:00 -0600 Subject: [LLVMdev] How to keep FunctionPass analysis result alive in Module Pass? In-Reply-To: <3EDC7AC1-478A-41CD-A970-7D75B5BD0E67@csail.mit.edu> References: <3EDC7AC1-478A-41CD-A970-7D75B5BD0E67@csail.mit.edu> Message-ID: <4F5A8210.3010004@illinois.edu> On 3/9/12 4:10 PM, Fan Long wrote: > Hello, > I am trying to write a new ModulePass using LoopInfo analysis result, but it seems I misunderstand some concept about PassManager. Basically I want to keep LoopInfo analysis result alive. Here is an example showing the problem I encountered, assuming I already addRequired() in getAnalysisUsage: > > void foo(llvm::Function *F1, llvm::Function *F2) { > llvm::LoopInfo *LI1, LI2; > LI1 =&getAnalysis(*F1); > llvm::Loop* L1 = LI1->getLoopFor(F1->begin()); > LI2 =&getAnalysis(*F2); > llvm::Loop* L2 = LI2->getLoopFor(F2->begin()); > L1->dump(); // crash > L2->dump(); > } > > I checked why this program crashes. It is because the getAnalysis returns same LoopInfo instance. Each time it clears previous results and run it on the new function. Thus it invalidate the pointer L1 after calling&getAnalysis(*F2). To the best of my knowledge, the LLVM pass manager never preserves a FunctionPass analysis that is requested by a ModulePass; every time you call getAnalysis for a function, the FunctionPass is re-run. > > My questions is whether there is a way to get around this, and to keep the analysis result of Function Pass of all functions alive during my Module Pass? I am using LLVM-3.1-svn version. I would really appreciate your help! The trick I've used is to structure the code so that getAnalysis<>() is only called once per function. For example, your ModulePass can have a std::map that maps between Function * and LoopInfo *. You then provide a method getLoopInfo(Function * F) that checks to see if F is in the map. If it is, it returns what is in the map. If it isn't, it calls getAnalysis on F, stores the result in the map, and returns the LoopInfo pointer. This is important not only for functionality (in your case) but also for performance; you don't want to calculate an analysis twice for the same function. -- John T. > > Best, > Fan > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120309/458b5e04/attachment.html From fanl at csail.mit.edu Fri Mar 9 16:28:54 2012 From: fanl at csail.mit.edu (Fan Long) Date: Fri, 9 Mar 2012 17:28:54 -0500 Subject: [LLVMdev] How to keep FunctionPass analysis result alive in Module Pass? In-Reply-To: <4F5A8210.3010004@illinois.edu> References: <3EDC7AC1-478A-41CD-A970-7D75B5BD0E67@csail.mit.edu> <4F5A8210.3010004@illinois.edu> Message-ID: <7DB7911A-4FBD-4601-A117-D6FF5441636A@csail.mit.edu> Thank you for your quick reply. Actually I am using a std::map to map Function* to LoopInfo*, but that does not help in this case. Each time I call getAnalysis(*F), it returns the same instance of llvm::LoopInfo, so the std::map is just mapping every function into the same instance. It seems only the analysis result for the last function is valid, because all the result for all previous functions are erased. The only workaround solution I have now is to copy all analysis result out of the data structure of LoopInfo before I call next &getAnalysis(). Because llvm::LoopInfo does not provide copy method, this will be very dirty to do so. Best, Fan On Mar 9, 2012, at 5:20 PM, John Criswell wrote: > On 3/9/12 4:10 PM, Fan Long wrote: >> >> Hello, >> I am trying to write a new ModulePass using LoopInfo analysis result, but it seems I misunderstand some concept about PassManager. Basically I want to keep LoopInfo analysis result alive. Here is an example showing the problem I encountered, assuming I already addRequired() in getAnalysisUsage: >> >> void foo(llvm::Function *F1, llvm::Function *F2) { >> llvm::LoopInfo *LI1, LI2; >> LI1 = &getAnalysis(*F1); >> llvm::Loop* L1 = LI1->getLoopFor(F1->begin()); >> LI2 = &getAnalysis(*F2); >> llvm::Loop* L2 = LI2->getLoopFor(F2->begin()); >> L1->dump(); // crash >> L2->dump(); >> } >> >> I checked why this program crashes. It is because the getAnalysis returns same LoopInfo instance. Each time it clears previous results and run it on the new function. Thus it invalidate the pointer L1 after calling &getAnalysis(*F2). > > To the best of my knowledge, the LLVM pass manager never preserves a FunctionPass analysis that is requested by a ModulePass; every time you call getAnalysis for a function, the FunctionPass is re-run. >> >> My questions is whether there is a way to get around this, and to keep the analysis result of Function Pass of all functions alive during my Module Pass? I am using LLVM-3.1-svn version. I would really appreciate your help! > > The trick I've used is to structure the code so that getAnalysis<>() is only called once per function. For example, your ModulePass can have a std::map that maps between Function * and LoopInfo *. You then provide a method getLoopInfo(Function * F) that checks to see if F is in the map. If it is, it returns what is in the map. If it isn't, it calls getAnalysis on F, stores the result in the map, and returns the LoopInfo pointer. > > This is important not only for functionality (in your case) but also for performance; you don't want to calculate an analysis twice for the same function. > > -- John T. > >> >> Best, >> Fan >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120309/f8452b45/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3744 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120309/f8452b45/attachment.bin From criswell at illinois.edu Fri Mar 9 16:34:58 2012 From: criswell at illinois.edu (John Criswell) Date: Fri, 9 Mar 2012 16:34:58 -0600 Subject: [LLVMdev] How to keep FunctionPass analysis result alive in Module Pass? In-Reply-To: <7DB7911A-4FBD-4601-A117-D6FF5441636A@csail.mit.edu> References: <3EDC7AC1-478A-41CD-A970-7D75B5BD0E67@csail.mit.edu> <4F5A8210.3010004@illinois.edu> <7DB7911A-4FBD-4601-A117-D6FF5441636A@csail.mit.edu> Message-ID: <4F5A8592.4000600@illinois.edu> On 3/9/12 4:28 PM, Fan Long wrote: > Thank you for your quick reply. > > Actually I am using a std::map to map Function* to LoopInfo*, but that > does not help in this case. Each time I call > getAnalysis(*F), it returns the same instance of > llvm::LoopInfo, so the std::map is just mapping every function into > the same instance. It seems only the analysis result for the last > function is valid, because all the result for all previous functions > are erased. Just to make sure I understand: you are saying that every time you call getAnalysis(), you get the *same* LoopInfo * regardless of whether you call it on the same function or on a different function. Is that correct? Getting the same LoopInfo * when you call getAnalysis<> on the same function twice would not surprise me. Getting the same LoopInfo * when you call getAnalysis on F1 and F2 where F1 and F2 are different functions would surprise me greatly. > > The only workaround solution I have now is to copy all analysis result > out of the data structure of LoopInfo before I call next > &getAnalysis(). Because llvm::LoopInfo does not provide copy method, > this will be very dirty to do so. Yes, that may be what you have to do. -- John T. > > Best, > Fan > > On Mar 9, 2012, at 5:20 PM, John Criswell wrote: > >> On 3/9/12 4:10 PM, Fan Long wrote: >>> Hello, >>> I am trying to write a new ModulePass using LoopInfo analysis result, but it seems I misunderstand some concept about PassManager. Basically I want to keep LoopInfo analysis result alive. Here is an example showing the problem I encountered, assuming I already addRequired() in getAnalysisUsage: >>> >>> void foo(llvm::Function *F1, llvm::Function *F2) { >>> llvm::LoopInfo *LI1, LI2; >>> LI1 =&getAnalysis(*F1); >>> llvm::Loop* L1 = LI1->getLoopFor(F1->begin()); >>> LI2 =&getAnalysis(*F2); >>> llvm::Loop* L2 = LI2->getLoopFor(F2->begin()); >>> L1->dump(); // crash >>> L2->dump(); >>> } >>> >>> I checked why this program crashes. It is because the getAnalysis returns same LoopInfo instance. Each time it clears previous results and run it on the new function. Thus it invalidate the pointer L1 after calling&getAnalysis(*F2). >> >> To the best of my knowledge, the LLVM pass manager never preserves a >> FunctionPass analysis that is requested by a ModulePass; every time >> you call getAnalysis for a function, the FunctionPass is re-run. >>> >>> My questions is whether there is a way to get around this, and to keep the analysis result of Function Pass of all functions alive during my Module Pass? I am using LLVM-3.1-svn version. I would really appreciate your help! >> >> The trick I've used is to structure the code so that getAnalysis<>() >> is only called once per function. For example, your ModulePass can >> have a std::map that maps between Function * and LoopInfo *. You >> then provide a method getLoopInfo(Function * F) that checks to see if >> F is in the map. If it is, it returns what is in the map. If it >> isn't, it calls getAnalysis on F, stores the result in the map, and >> returns the LoopInfo pointer. >> >> This is important not only for functionality (in your case) but also >> for performance; you don't want to calculate an analysis twice for >> the same function. >> >> -- John T. >> >>> Best, >>> Fan >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120309/59a76693/attachment.html From fanl at csail.mit.edu Fri Mar 9 16:42:14 2012 From: fanl at csail.mit.edu (Fan Long) Date: Fri, 9 Mar 2012 17:42:14 -0500 Subject: [LLVMdev] How to keep FunctionPass analysis result alive in Module Pass? In-Reply-To: <4F5A8592.4000600@illinois.edu> References: <3EDC7AC1-478A-41CD-A970-7D75B5BD0E67@csail.mit.edu> <4F5A8210.3010004@illinois.edu> <7DB7911A-4FBD-4601-A117-D6FF5441636A@csail.mit.edu> <4F5A8592.4000600@illinois.edu> Message-ID: <1DB2DC88-0B01-4BEF-A25D-4BA66FE3100A@csail.mit.edu> This surprises me too. Here is the real code from my module pass: 89 bool SymbolicDataflow::runOnModule(llvm::Module &M) { 90 // Init per module goes here 91 AA = &getAnalysis(); 92 LIs.clear(); 93 DTs.clear(); 94 for (llvm::Module::iterator it = M.begin(); it != M.end(); ++it) { 95 llvm::Function *F = &*it; 96 if (!F->isDeclaration()) { 97 llvm::LoopInfo *LI = &getAnalysis(*F); 98 llvm::DominatorTree *DT = &getAnalysis(*F); 99 LIs[F] = LI; 100 DTs[F] = DT; 101 DEBUG(llvm::errs() << "PASS INIT " << LI << " " << DT << " " << F->getName() << "\n"); 102 } 103 } ?? It prints out the poiner value of each instance, and it is same for all Function? At least on my machine... Best, Fan On Mar 9, 2012, at 5:34 PM, John Criswell wrote: > On 3/9/12 4:28 PM, Fan Long wrote: >> >> Thank you for your quick reply. >> >> Actually I am using a std::map to map Function* to LoopInfo*, but that does not help in this case. Each time I call getAnalysis(*F), it returns the same instance of llvm::LoopInfo, so the std::map is just mapping every function into the same instance. It seems only the analysis result for the last function is valid, because all the result for all previous functions are erased. > > Just to make sure I understand: you are saying that every time you call getAnalysis(), you get the *same* LoopInfo * regardless of whether you call it on the same function or on a different function. Is that correct? > > Getting the same LoopInfo * when you call getAnalysis<> on the same function twice would not surprise me. Getting the same LoopInfo * when you call getAnalysis on F1 and F2 where F1 and F2 are different functions would surprise me greatly. > >> >> The only workaround solution I have now is to copy all analysis result out of the data structure of LoopInfo before I call next &getAnalysis(). Because llvm::LoopInfo does not provide copy method, this will be very dirty to do so. > > Yes, that may be what you have to do. > > -- John T. > >> >> Best, >> Fan >> >> On Mar 9, 2012, at 5:20 PM, John Criswell wrote: >> >>> On 3/9/12 4:10 PM, Fan Long wrote: >>>> >>>> Hello, >>>> I am trying to write a new ModulePass using LoopInfo analysis result, but it seems I misunderstand some concept about PassManager. Basically I want to keep LoopInfo analysis result alive. Here is an example showing the problem I encountered, assuming I already addRequired() in getAnalysisUsage: >>>> >>>> void foo(llvm::Function *F1, llvm::Function *F2) { >>>> llvm::LoopInfo *LI1, LI2; >>>> LI1 = &getAnalysis(*F1); >>>> llvm::Loop* L1 = LI1->getLoopFor(F1->begin()); >>>> LI2 = &getAnalysis(*F2); >>>> llvm::Loop* L2 = LI2->getLoopFor(F2->begin()); >>>> L1->dump(); // crash >>>> L2->dump(); >>>> } >>>> >>>> I checked why this program crashes. It is because the getAnalysis returns same LoopInfo instance. Each time it clears previous results and run it on the new function. Thus it invalidate the pointer L1 after calling &getAnalysis(*F2). >>> >>> To the best of my knowledge, the LLVM pass manager never preserves a FunctionPass analysis that is requested by a ModulePass; every time you call getAnalysis for a function, the FunctionPass is re-run. >>>> >>>> My questions is whether there is a way to get around this, and to keep the analysis result of Function Pass of all functions alive during my Module Pass? I am using LLVM-3.1-svn version. I would really appreciate your help! >>> >>> The trick I've used is to structure the code so that getAnalysis<>() is only called once per function. For example, your ModulePass can have a std::map that maps between Function * and LoopInfo *. You then provide a method getLoopInfo(Function * F) that checks to see if F is in the map. If it is, it returns what is in the map. If it isn't, it calls getAnalysis on F, stores the result in the map, and returns the LoopInfo pointer. >>> >>> This is important not only for functionality (in your case) but also for performance; you don't want to calculate an analysis twice for the same function. >>> >>> -- John T. >>> >>>> Best, >>>> Fan >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120309/e1da9d6b/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3744 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120309/e1da9d6b/attachment.bin From deeppatel1987 at gmail.com Fri Mar 9 19:47:33 2012 From: deeppatel1987 at gmail.com (Sandeep Patel) Date: Sat, 10 Mar 2012 01:47:33 +0000 Subject: [LLVMdev] Adding a new function attribute In-Reply-To: References: Message-ID: On Sun, Mar 4, 2012 at 9:07 PM, Borja Ferrer wrote: > I'm adding a new function attribute in clang and llvm for a backend I'm > writing that treats prolog and epilogue code in a special way inside > interrupt handlers, similar to what naked does. One way I've seen to do this > is to add a new attribute type in Attributes.h, however to me it feels bad > to add a target dependent attribute into this place which is very target > independent. So what's the best way to do this or is there an api to handle > this kind of issues? FWIW, GCC has something similar. In LLVM IR, interrupt handlers could be considered either a function attribute or a calling convention. I don't have a particular preference. If using a CC, I'd suggest making it be target neutral like fastcc. deep From bwendling at apple.com Fri Mar 9 22:17:08 2012 From: bwendling at apple.com (Bill Wendling) Date: Fri, 09 Mar 2012 20:17:08 -0800 Subject: [LLVMdev] Stack protector performance In-Reply-To: References: Message-ID: <5745C02D-DAFA-4BBD-A923-0463F258D246@apple.com> If you compile this with optimizations, then the 'canary()' function should be totally inlined into the 'main()' function. In that case, the cost of the stack protectors will be very small compared to the loop. -bw On Mar 9, 2012, at 2:52 AM, Job Noorman wrote: > I have a question about the performance of the implementation of the stack > protector in LLVM. > > Consider the following C program: > ===== > void canary() > { > char buf[20]; > buf[0]++; > } > > int main() > { > int i; > for (i = 0; i < 1000000000; ++i) > canary(); > return 0; > } > ===== > > This should definately run slower when stack protection is enabled, right? > > I have measured the runtime of this program on two different systems compiled > with GCC and LLVM. Here are the results (percentages are the difference with > the unprotected version of the program): > > | Desktop | Laptop | > -----+---------+--------+ > GCC | +13% | +277% | > LLVM | -3%(!) | +330% | > > (These measurements are the median values of 10 runs.) > > So the obvious question is: can anybody explain how it is possible that using > the stack protector causes the program to run 3% faster on my desktop? > > I have tried profiling the program using valgrind (cachegrind & callgrind) but > the results show absolutely no reason at all for these measurements. > > I have attached an archive with the source code and compiled binaries. > > Here are the specs of the two systems: > * Desktop > - Ubuntu 11.10 > - Linux 3.0.0-16-generic-pae > - Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz (2048K cache) > * Laptop > - Ubuntu 11.10 > - Linux 3.0.0-16-generic > - Intel(R) Atom(TM) CPU N450 @ 1.66GHz (512K cache) > > Kind regards, > Job > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From bwendling at apple.com Fri Mar 9 22:22:44 2012 From: bwendling at apple.com (Bill Wendling) Date: Fri, 09 Mar 2012 20:22:44 -0800 Subject: [LLVMdev] complete llvm ports In-Reply-To: <4F5A2342.2030004@mips.com> References: <4F5A2342.2030004@mips.com> Message-ID: <3CFD8A92-C171-4857-9771-223BC15BFA9B@apple.com> On Mar 9, 2012, at 7:35 AM, Reed Kotler wrote: > There used to be a list of all the llvm ports and the status. The x86 > was the only compiler > that was a "full port". > > We are preparing to add out native linux compiler to the official build > bots. > > Are there various official "gating" criteria for different levels of > llvm "doneness" so to speak? > > There is a matrix I see in > http://llvm.org/releases/3.0/docs/CodeGenerator.html#targetfeatures > which seems to be old. For example, I would definitely consider the MIPS > port to be reliable and other > things like .o writing are definitely in there. > > The various current llvm build bots seem to do different levels of testing. While some platforms have very mature code and are generally reliable, we do not list them as officially supported because they don't go through the official release testing that X86 goes through. The official release testing is a fairly lengthy testing process (usually about a month) where we test clang/LLVM before we are confident that it can be called an official release. This involves running the full testsuite and making sure that there are no regressions from the previous release, and having the community compile their projects with the compiler and reporting any issues. We don't say that we support anything other that X86 because we don't have resources -- both testers and equipment -- to run an official release on them. If we do have the resources, then we would be glad to list them as officially supported. The caveat being that testing is a fairly extensive commitment, but is very very welcome. :-) -bw From jobnoorman at gmail.com Sat Mar 10 04:00:03 2012 From: jobnoorman at gmail.com (Job Noorman) Date: Sat, 10 Mar 2012 11:00:03 +0100 Subject: [LLVMdev] Stack protector performance In-Reply-To: <5745C02D-DAFA-4BBD-A923-0463F258D246@apple.com> References: <5745C02D-DAFA-4BBD-A923-0463F258D246@apple.com> Message-ID: <4471878.lWPpbSDvYz@squatpc> > If you compile this with optimizations, then the 'canary()' function should > be totally inlined into the 'main()' function. In that case, the cost of > the stack protectors will be very small compared to the loop. Yes, I know. I'm just really interested in an explanation on how it is possible that the use of canaries results in faster code in the binaries I attached to my original message (which are unoptimized). If you look at the binaries, you see that the bodies of canary() are exactly the same except that in the protected binary, it has some extra stuff in the prologue/epilogue. So, how can it be that a function that does exactly the same plus something extra runs faster? On Friday 09 March 2012 20:17:08 you wrote: > If you compile this with optimizations, then the 'canary()' function should > be totally inlined into the 'main()' function. In that case, the cost of > the stack protectors will be very small compared to the loop. > > -bw > > On Mar 9, 2012, at 2:52 AM, Job Noorman wrote: > > I have a question about the performance of the implementation of the stack > > protector in LLVM. > > > > Consider the following C program: > > ===== > > void canary() > > { > > > > char buf[20]; > > buf[0]++; > > > > } > > > > int main() > > { > > > > int i; > > for (i = 0; i < 1000000000; ++i) > > > > canary(); > > > > return 0; > > > > } > > ===== > > > > This should definately run slower when stack protection is enabled, right? > > > > I have measured the runtime of this program on two different systems > > compiled with GCC and LLVM. Here are the results (percentages are the > > difference with> > > the unprotected version of the program): > > | Desktop | Laptop | > > > > -----+---------+--------+ > > GCC | +13% | +277% | > > LLVM | -3%(!) | +330% | > > > > (These measurements are the median values of 10 runs.) > > > > So the obvious question is: can anybody explain how it is possible that > > using the stack protector causes the program to run 3% faster on my > > desktop? > > > > I have tried profiling the program using valgrind (cachegrind & callgrind) > > but the results show absolutely no reason at all for these measurements. > > > > I have attached an archive with the source code and compiled binaries. > > > > Here are the specs of the two systems: > > * Desktop > > - Ubuntu 11.10 > > - Linux 3.0.0-16-generic-pae > > - Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz (2048K cache) > > * Laptop > > - Ubuntu 11.10 > > - Linux 3.0.0-16-generic > > - Intel(R) Atom(TM) CPU N450 @ 1.66GHz (512K cache) > > > > Kind regards, > > Job > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From fandawei.s at gmail.com Sat Mar 10 11:34:58 2012 From: fandawei.s at gmail.com (Fan Dawei) Date: Sat, 10 Mar 2012 12:34:58 -0500 Subject: [LLVMdev] scalarrepl fails to promote array of vector Message-ID: Hi all, I want to use scalarrepl pass to eliminate the allocation of mat_alloc which is of type [4 x <4 x float>] in the following program. $cat test.ll ; ModuleID = 'test.ll' define void @main(<4 x float>* %inArg, <4 x float>* %outArg, [4 x <4 x float>]* %constants) nounwind { entry: %inArg1 = load <4 x float>* %inArg %mat_alloc = alloca [4 x <4 x float>] %matVal = load [4 x <4 x float>]* %constants store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc %0 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 0 %1 = load <4 x float>* %0 %2 = fmul <4 x float> %1, %inArg1 %3 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 1 %4 = load <4 x float>* %3 %5 = fmul <4 x float> %4, %inArg1 %6 = fadd <4 x float> %2, %5 %7 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 2 %8 = load <4 x float>* %7 %9 = fmul <4 x float> %8, %inArg1 %10 = fadd <4 x float> %6, %9 %11 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 3 %12 = load <4 x float>* %11 %13 = fadd <4 x float> %10, %12 %14 = getelementptr <4 x float>* %outArg, i32 1 store <4 x float> %13, <4 x float>* %14 ret void } $ opt -S -stats -scalarrepl test.ll No transformation is performed. I've examined the source code of scalarrepl. It seems this pass does not handle array allocations. Is there other transformation pass I can use to eliminate this allocation? Thanks, David -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120310/655a4200/attachment-0001.html From clattner at apple.com Sat Mar 10 15:22:40 2012 From: clattner at apple.com (Chris Lattner) Date: Sat, 10 Mar 2012 13:22:40 -0800 Subject: [LLVMdev] scalarrepl fails to promote array of vector In-Reply-To: References: Message-ID: On Mar 10, 2012, at 9:34 AM, Fan Dawei wrote: > Hi all, > > I want to use scalarrepl pass to eliminate the allocation of mat_alloc which is of type [4 x <4 x float>] in the following program. > > $cat test.ll > > ; ModuleID = 'test.ll' > > define void @main(<4 x float>* %inArg, <4 x float>* %outArg, [4 x <4 x float>]* %constants) nounwind { > entry: > %inArg1 = load <4 x float>* %inArg > %mat_alloc = alloca [4 x <4 x float>] > %matVal = load [4 x <4 x float>]* %constants > store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc > %0 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 0 > %1 = load <4 x float>* %0 > %2 = fmul <4 x float> %1, %inArg1 > %3 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 1 > %4 = load <4 x float>* %3 > %5 = fmul <4 x float> %4, %inArg1 > %6 = fadd <4 x float> %2, %5 > %7 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 2 > %8 = load <4 x float>* %7 > %9 = fmul <4 x float> %8, %inArg1 > %10 = fadd <4 x float> %6, %9 > %11 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 3 > %12 = load <4 x float>* %11 > %13 = fadd <4 x float> %10, %12 > %14 = getelementptr <4 x float>* %outArg, i32 1 > store <4 x float> %13, <4 x float>* %14 > ret void > } > > $ opt -S -stats -scalarrepl test.ll > > No transformation is performed. I've examined the source code of scalarrepl. It seems this pass does not handle array allocations. Is there other transformation pass I can use to eliminate this allocation? Hi David, ScalarRepl gets shy about loads and stores of the entire aggregate: > %matVal = load [4 x <4 x float>]* %constants > store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc It is possible to generalize scalarrepl to handle these similar to the way it handles memcpy, but noone has done that yet. Also, it's not generally recommended to do stuff like this, because you'll get inefficient code from many parts of the optimizer and code generator. -Chris From javier.e.martinez at intel.com Sat Mar 10 17:06:30 2012 From: javier.e.martinez at intel.com (Martinez, Javier E) Date: Sat, 10 Mar 2012 23:06:30 +0000 Subject: [LLVMdev] Expand vector type In-Reply-To: <4F551105.9060007@free.fr> References: <004a01ccf6cd$ce071140$6a1533c0$@molloy@arm.com> <4F551105.9060007@free.fr> Message-ID: Hi Duncan, I somehow missed your reply. I didn't try 3.0 but looking at the code in the truck the behavior is the same. The original email has the call sequence I expect in 3.0. The value returned from FindMemType() is the same in 2.7 and 3.0 a vec2 if that's a legal type for the target. I'm my opinion it's the target that should be in charge of deciding how the types are converted but in this case the value returned by TLI.getTypeToTransformTo() is not the one used. Thanks, Javier -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Duncan Sands Sent: Monday, March 05, 2012 11:16 AM To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Expand vector type Hi Javier, On 05/03/12 18:10, Martinez, Javier E wrote: > I still haven't received any feedback on me adding support for > widening of stores. Is there interest? did you try LLVM 3.0? Ciao, Duncan. > > Thanks, > > Javier > > *From:*llvmdev-bounces at cs.uiuc.edu > [mailto:llvmdev-bounces at cs.uiuc.edu] *On Behalf Of *Martinez, Javier E > *Sent:* Wednesday, February 29, 2012 11:35 AM > *To:* James Molloy; llvmdev at cs.uiuc.edu > *Subject:* Re: [LLVMdev] Expand vector type > > James, > > Thanks for your response. I'm working in LLVM 2.7 (I know, it's old) > and the default behavior is already promote. This means that for > example a call to > DAGTypeLegalizer::getTypeAction(v3i32) in my case and I presume in ARM > NEON returns TypeWidenVector. From here legalization calls > WidenVectorOperand() to process the STORE node and follows the call > chain I have on my original email to FindMemType(). > > If my analysis is correct then your v316 STOREs are being broken into > multiple ones depending on ARM NEON support. Can you please confirm? > > Thanks, > > Javier > > *From:*James Molloy [mailto:james.molloy at arm.com] > > *Sent:* Wednesday, February 29, 2012 2:35 AM > *To:* Martinez, Javier E; llvmdev at cs.uiuc.edu > > *Subject:* RE: Expand vector type > > Hi, > > * *Is there a way to setup LLVM to automatically convert vec3s to > vec4s? * > > ** > > Yes, if you specify v3i16 and friends as "promote" instead of "legal", > llvm will promote it to a v4i16. The ARM NEON backend does this > already. I'm surprised you haven't got this happening already as you > mention that LLVM widens your loads to 4-element vectors... (this should happen during DAG type legalization, by the way). > > Cheers, > > James > > *From:*llvmdev-bounces at cs.uiuc.edu > > [mailto:llvmdev-bounces at cs.uiuc.edu] > *On Behalf Of *Martinez, > Javier E > *Sent:* 29 February 2012 00:27 > *To:* llvmdev at cs.uiuc.edu > *Subject:* [LLVMdev] Expand vector type > > Hello, > > My input language has support for 3 and 4 element vectors but my > target only has support for the latter. The language defines vec3 with > the same storage space as > vec4 so from a backend perspective they are both the same. I'd really > like if I could have LLVM treat vec3 as vec4 but I haven't found out how. > > Currently the target has emulated support for vec3 through LLVM. Loads > are already widened by LLVM to a vec4. Stores are kind of funny. By > default LLVM sets the action to 'widen' but in GenWidenVectorStores > what ends up happening is an 2:1 split of the vector that's less > efficient in this case than actually widening the vector. The reason > is that at this point the call to FindMemType assumes that stores can > never be widened to a bigger type and so those types are not > considered. The call sequence I'm looking at is WidenVectorOperand() > -> > WidenVecOp_STORE() -> GenWidenVectorStores() -> FindMemType(). I've > made a very small modification to enable support for widening stores to a larger type. > > Before spending more time on working on a generic solution I have a > couple of > questions: > > * *Is there a way to setup LLVM to automatically convert vec3s to > vec4s?* > > * *Is there interest in adding support for widened vector stores to a > larger type?* > > Thanks, > > Javier > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From r4start at gmail.com Sun Mar 11 03:04:58 2012 From: r4start at gmail.com (r4start) Date: Sun, 11 Mar 2012 12:04:58 +0400 Subject: [LLVMdev] [cfe-dev] Microsoft constructors implementation problem. In-Reply-To: <74333B5B-4B2F-42A8-8EE3-B88556ACAEFD@apple.com> References: <4F4B6C08.1080104@gmail.com> <4F547BE6.1060006@gmail.com> <74333B5B-4B2F-42A8-8EE3-B88556ACAEFD@apple.com> Message-ID: <4F5C5CAA.7080202@gmail.com> On 08/03/2012 06:25, John McCall wrote: > On Mar 5, 2012, at 12:40 AM, r4start wrote: >> I have another question. >> If ctor was called from other ctor then additional parameter must be >> equal 0 otherwise it`s equal 1. > The rule isn't "Is this constructor being called from another constructor?", > it's "Is this constructor being used to initialize a base subobject?". That's > equivalent to the Itanium ABI's concept of a constructor variant. > EmitCXXConstructorCall gets this information already. > Thx for your response. I forgot about CXXCtorType. - Dmitry. From screwer at gmail.com Sun Mar 11 13:03:34 2012 From: screwer at gmail.com (screwer) Date: Sun, 11 Mar 2012 22:03:34 +0400 Subject: [LLVMdev] LLVM backend for Z80 CPU Message-ID: I want to make Z80 codegen backend to the LLVM. CLang+LLVM looks perfect, while quick playing around. Some questions available: 1) Does LLVM support to generate "mixed" code ? For example: in microcontrollers world the subroutine call with const arguments may be decorated like this: call subroutine defw wArg1 defw wArg2 ; ordinary code continues here subroutine pop return address from the stack, read arguments, do some work, and returns via jump to addess after readed arguments. This is much more faster than pushing arguments to the stack/ 2) Does LLVM support to generate self-modifying code ? For example: save register value direct to the opcode. Because constant loading faster than indirect addressing. Example: ld (hl_restore+1), hl ; some code modifying hl hl_restore: ld hl, 0 ; load stored value as constant 3) There are possible to commit new backend in the main LLVM repository ? Thank in advance, Dmitry. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120311/c76e7050/attachment.html From fandawei.s at gmail.com Sun Mar 11 22:35:19 2012 From: fandawei.s at gmail.com (Fan Dawei) Date: Mon, 12 Mar 2012 11:35:19 +0800 Subject: [LLVMdev] scalarrepl fails to promote array of vector In-Reply-To: References: Message-ID: Hi Chris, Thanks for your reply. You said that scalarRepl gets shy about loads and stores of the entire aggregate. Then I use a test case: ; ModuleID = 'test1.ll' define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly { %stackArray = alloca <4 x i32> %XC = bitcast i32* %X to <4 x i32>* %arrayVal = load <4 x i32>* %XC store <4 x i32> %arrayVal, <4 x i32>* %stackArray %arrayVal1 = load <4 x i32>* %stackArray %1 = extractelement <4 x i32> %arrayVal1, i32 1 ret i32 %1 } $ opt -S -stats -scalarrepl test1.ll ; ModuleID = 'test1.ll' define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly { %XC = bitcast i32* %X to <4 x i32>* %arrayVal = load <4 x i32>* %XC %1 = extractelement <4 x i32> %arrayVal, i32 1 ret i32 %1 } ===-------------------------------------------------------------------------=== ... Statistics Collected ... ===-------------------------------------------------------------------------=== 1 mem2reg - Number of alloca's promoted with a single store 1 scalarrepl - Number of allocas promoted You can see that the stackArray is eliminated, although there is loads and stores of the entire aggregate. However, the optimised code is still not optimal. I want the code just load one element from X instead of the whole array. Thanks, David On Sun, Mar 11, 2012 at 5:22 AM, Chris Lattner wrote: > > On Mar 10, 2012, at 9:34 AM, Fan Dawei wrote: > > > Hi all, > > > > I want to use scalarrepl pass to eliminate the allocation of mat_alloc > which is of type [4 x <4 x float>] in the following program. > > > > $cat test.ll > > > > ; ModuleID = 'test.ll' > > > > define void @main(<4 x float>* %inArg, <4 x float>* %outArg, [4 x <4 x > float>]* %constants) nounwind { > > entry: > > %inArg1 = load <4 x float>* %inArg > > %mat_alloc = alloca [4 x <4 x float>] > > %matVal = load [4 x <4 x float>]* %constants > > store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc > > %0 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 0 > > %1 = load <4 x float>* %0 > > %2 = fmul <4 x float> %1, %inArg1 > > %3 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 1 > > %4 = load <4 x float>* %3 > > %5 = fmul <4 x float> %4, %inArg1 > > %6 = fadd <4 x float> %2, %5 > > %7 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 2 > > %8 = load <4 x float>* %7 > > %9 = fmul <4 x float> %8, %inArg1 > > %10 = fadd <4 x float> %6, %9 > > %11 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 > 3 > > %12 = load <4 x float>* %11 > > %13 = fadd <4 x float> %10, %12 > > %14 = getelementptr <4 x float>* %outArg, i32 1 > > store <4 x float> %13, <4 x float>* %14 > > ret void > > } > > > > $ opt -S -stats -scalarrepl test.ll > > > > No transformation is performed. I've examined the source code of > scalarrepl. It seems this pass does not handle array allocations. Is there > other transformation pass I can use to eliminate this allocation? > > Hi David, > > ScalarRepl gets shy about loads and stores of the entire aggregate: > > > %matVal = load [4 x <4 x float>]* %constants > > store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc > > It is possible to generalize scalarrepl to handle these similar to the way > it handles memcpy, but noone has done that yet. Also, it's not generally > recommended to do stuff like this, because you'll get inefficient code from > many parts of the optimizer and code generator. > > -Chris > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120312/eed446ef/attachment.html From baldrick at free.fr Mon Mar 12 03:20:14 2012 From: baldrick at free.fr (Duncan Sands) Date: Mon, 12 Mar 2012 09:20:14 +0100 Subject: [LLVMdev] scalarrepl fails to promote array of vector In-Reply-To: References: Message-ID: <4F5DB1BE.2010704@free.fr> Hi Fan, > You said that scalarRepl gets shy about loads and stores of the entire > aggregate. Then I use a test case: > > ; ModuleID = 'test1.ll' > define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly { > %stackArray = alloca <4 x i32> > %XC = bitcast i32* %X to <4 x i32>* > %arrayVal = load <4 x i32>* %XC > store <4 x i32> %arrayVal, <4 x i32>* %stackArray > %arrayVal1 = load <4 x i32>* %stackArray > %1 = extractelement <4 x i32> %arrayVal1, i32 1 > ret i32 %1 > } > > $ opt -S -stats -scalarrepl test1.ll > ; ModuleID = 'test1.ll' > > define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly { > %XC = bitcast i32* %X to <4 x i32>* > %arrayVal = load <4 x i32>* %XC > %1 = extractelement <4 x i32> %arrayVal, i32 1 > ret i32 %1 > } > ===-------------------------------------------------------------------------=== > ... Statistics Collected ... > ===-------------------------------------------------------------------------=== > > 1 mem2reg - Number of alloca's promoted with a single store > 1 scalarrepl - Number of allocas promoted > > You can see that the stackArray is eliminated, I think you may be confusing arrays and vectors: there is no stack array in your example, only the vector <4 x i32>. As a general rule hardly any optimization is done for loads and stores of arrays because front-ends don't produce them much. Much more effort is made for vectors because they can be important for getting good performance. Ciao, Duncan. although there is loads and > stores of the entire aggregate. > > However, the optimised code is still not optimal. I want the code just load one > element from X instead of the whole array. > > Thanks, > David > > > > > > On Sun, Mar 11, 2012 at 5:22 AM, Chris Lattner > wrote: > > > On Mar 10, 2012, at 9:34 AM, Fan Dawei wrote: > > > Hi all, > > > > I want to use scalarrepl pass to eliminate the allocation of mat_alloc > which is of type [4 x <4 x float>] in the following program. > > > > $cat test.ll > > > > ; ModuleID = 'test.ll' > > > > define void @main(<4 x float>* %inArg, <4 x float>* %outArg, [4 x <4 x > float>]* %constants) nounwind { > > entry: > > %inArg1 = load <4 x float>* %inArg > > %mat_alloc = alloca [4 x <4 x float>] > > %matVal = load [4 x <4 x float>]* %constants > > store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc > > %0 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 0 > > %1 = load <4 x float>* %0 > > %2 = fmul <4 x float> %1, %inArg1 > > %3 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 1 > > %4 = load <4 x float>* %3 > > %5 = fmul <4 x float> %4, %inArg1 > > %6 = fadd <4 x float> %2, %5 > > %7 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 2 > > %8 = load <4 x float>* %7 > > %9 = fmul <4 x float> %8, %inArg1 > > %10 = fadd <4 x float> %6, %9 > > %11 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 3 > > %12 = load <4 x float>* %11 > > %13 = fadd <4 x float> %10, %12 > > %14 = getelementptr <4 x float>* %outArg, i32 1 > > store <4 x float> %13, <4 x float>* %14 > > ret void > > } > > > > $ opt -S -stats -scalarrepl test.ll > > > > No transformation is performed. I've examined the source code of > scalarrepl. It seems this pass does not handle array allocations. Is there > other transformation pass I can use to eliminate this allocation? > > Hi David, > > ScalarRepl gets shy about loads and stores of the entire aggregate: > > > %matVal = load [4 x <4 x float>]* %constants > > store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc > > It is possible to generalize scalarrepl to handle these similar to the way > it handles memcpy, but noone has done that yet. Also, it's not generally > recommended to do stuff like this, because you'll get inefficient code from > many parts of the optimizer and code generator. > > -Chris > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From patrik.h.hagglund at ericsson.com Mon Mar 12 05:38:00 2012 From: patrik.h.hagglund at ericsson.com (=?iso-8859-1?Q?Patrik_H=E4gglund_H?=) Date: Mon, 12 Mar 2012 11:38:00 +0100 Subject: [LLVMdev] Assignment of large objects, optimization? Message-ID: Hi, My fronted generates (bad) code, which I see that LLVM is unable to optimize. For example, code similar to: %a = type [32 x i16] declare void @set_obj(%a*) declare void @use_obj(%a*) define void @foo() { entry: %a1 = alloca %a %a2 = alloca %a call void @set_obj(%a* %a2) %a3 = load %a* %a2 store %a %a3, %a* %a1 call void @use_obj(%a* %a1) ret void } (Or with load/store replaced with memcpy). In C pseudo-code this is similar to: a a1; a a2 = set_obj(); a1 = a2; use_obj(a1); and the corresponding LLVM IR in foo() can be simplified to: %a1 = alloca %a call void @set_obj(%a* %a1) call void @use_obj(%a* %a1) Is it unreasonable to expect LLVM to do this kind of simplifications? On a side note: Why isn't there an assignment operator in the LLVM IR? Other compilers I have seen have some kind of assignment operator in the IR. /Patrik H?gglund -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120312/e3a795df/attachment.html From xerox.time.tech at gmail.com Mon Mar 12 07:17:08 2012 From: xerox.time.tech at gmail.com (Xin Tong) Date: Mon, 12 Mar 2012 08:17:08 -0400 Subject: [LLVMdev] GPU thread/block/grid size contraints in LLVM PTX backend Message-ID: I am wondering that how does the LLVM PTX backend find out the constraints on executing GPU thread/block/grid size ( i.e. a block can at most have 1024 threads). Can anyone point me to the code ? I need information in the optimizer, how can I get it ? Thanks Xin From frasercrmck at gmail.com Mon Mar 12 07:38:17 2012 From: frasercrmck at gmail.com (Fraser Cormack) Date: Mon, 12 Mar 2012 05:38:17 -0700 (PDT) Subject: [LLVMdev] LLI Segfaulting Message-ID: <33486161.post@talk.nabble.com> Hi, I've been stuck with this problem for a while now, and my supervisor's starting to think it's a bug in lli, but I thought I'd ask here before going down that route. I have this code, which stores an array in my 'MainClass', and prints out an element of it. Note that the print statement is irrelevant here, it segfaults regardless, and this code has been run with -O2 optimization level, but segfaults either way (the code is just a lot shorter and easier to post this way) %MainClass = type { { i32, [0 x i32] } } @.gvar_array = private unnamed_addr constant [5 x i32] [i32 1, i32 2, i32 3, i32 4, i32 5] define void @"MainClass::()"(%MainClass* nocapture %this_ptr) nounwind { allocas: %0 = getelementptr inbounds %MainClass* %this_ptr, i64 0, i32 0, i32 0 store i32 5, i32* %0, align 4 %1 = getelementptr inbounds %MainClass* %this_ptr, i64 0, i32 0, i32 1 %2 = bitcast [0 x i32]* %1 to i8* tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* bitcast ([5 x i32]* @.gvar_array to i8*), i64 20, i32 4, i1 false) ret void } define void @main() nounwind { allocas: %0 = alloca { i32, [0 x i32] }, align 8 %1 = getelementptr inbounds { i32, [0 x i32] }* %0, i64 0, i32 0 store i32 5, i32* %1, align 8 %2 = getelementptr inbounds { i32, [0 x i32] }* %0, i64 0, i32 1 %3 = bitcast [0 x i32]* %2 to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* bitcast ([5 x i32]* @.gvar_array to i8*), i64 20, i32 4, i1 false) %4 = getelementptr inbounds [0 x i32]* %2, i64 0, i64 0 %5 = load i32* %4, align 4 %6 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @.int_and_newline, i64 0, i64 0), i32 %5) nounwind ret void } declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32, i1) nounwind > lli array_ex.bc 1 Segmentation fault If anyone would care to let me know what information is required to help here, then I'll supply it. Thanks, Fraser -- View this message in context: http://old.nabble.com/LLI-Segfaulting-tp33486161p33486161.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From baldrick at free.fr Mon Mar 12 07:57:36 2012 From: baldrick at free.fr (Duncan Sands) Date: Mon, 12 Mar 2012 13:57:36 +0100 Subject: [LLVMdev] LLI Segfaulting In-Reply-To: <33486161.post@talk.nabble.com> References: <33486161.post@talk.nabble.com> Message-ID: <4F5DF2C0.9040108@free.fr> Hi Fraser, it looks to me like you are smashing the stack. > define void @main() nounwind { > allocas: > %0 = alloca { i32, [0 x i32] }, align 8 ^ this allocates 4 bytes on the stack. > %2 = getelementptr inbounds { i32, [0 x i32] }* %0, i64 0, i32 1 ^ this gets a pointer to the byte after the 4 allocated bytes. > %3 = bitcast [0 x i32]* %2 to i8* > call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* bitcast ([5 x i32]* > @.gvar_array to i8*), i64 20, i32 4, i1 false) This copies 20 bytes there, kaboom! Ciao, Duncan. From david.tweed at gmail.com Mon Mar 12 07:28:04 2012 From: david.tweed at gmail.com (David Tweed) Date: Mon, 12 Mar 2012 12:28:04 +0000 Subject: [LLVMdev] LLVM for automatic differentiation or linear algebra? Message-ID: Hi, no-one else has said anything more pertinent so here's my two-pence. I have been thinking for a while about LLVM in the context of simulating _small_ stochastic systems by which I mean very much non-trivial stochastic transition functions, but still small enough that if compiled carefully down to machine code via LLVM with a good chance that they'll be faster. (With even "moderate sized" transition functions I suspect that things are going to be spilled from registers enough that the speed advantage carefully optimised "generic" code is going to be pretty insignificant.) I thought about generating a complete block of LLVM code and then applying automatic differentiation to it, and decided that, at least for an initial prototype, it made looked like a better bet to do the automatic differentiation on a pre-LLVM form before generating final instructions. This was primarily because it's easier for the code to know which instructions could contribute to numerical results and which ones are "infrastructure" (eg, computing array indices to load data from) and so don't even need consideration. In contrast, once things have become LLVM I've either got to attach metadata expressing that high-level knowledge before applying auto-diff or just blindly auto-diff everything and rely on quite intelligent and agressive constant-propagation and dead-code elimination to remove all the pointless instructions. So the basic approach I've been working on is lowering to LLVM IR, adding any necessary instructions to track the derivatives implicit in a higher level instruction at the same time as the instruction itself. (This also has the advantage that at the higher level it's easier to know that certain conditions are mutually exclusive, rather than needing to do some intricate inference on groups of LLVM conditional branches and blocks.) However, that was in my personal problem context, and I would be very interested in a more "generic" auto-diff directly on LLVM. (Indeed, maybe after prototyping I'll have a better handle on whether doing it generically on LLVM would actually work quite well.) Anyway, hope this is of interest, David Tweed From amit.poojary.15 at gmail.com Sun Mar 11 21:08:24 2012 From: amit.poojary.15 at gmail.com (amit poojary) Date: Mon, 12 Mar 2012 07:38:24 +0530 Subject: [LLVMdev] Finding the value of variables Message-ID: Hello, Are there any built in functions to find the value of a variable on the current stack frame? How do I go about it? Eg 1 : %1 = alloca i32, align 4 store i32 0, i32* %1 How do I know the constant value that %1 contains? Eg 2: %struct.a = type { [10 x i32], float } %as = alloca %struct.a*, align 4 How do I find the size of memory that has been allocated to 'as'? Thank You. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120312/93e3a334/attachment.html From frasercrmck at gmail.com Mon Mar 12 09:35:59 2012 From: frasercrmck at gmail.com (Fraser Cormack) Date: Mon, 12 Mar 2012 07:35:59 -0700 (PDT) Subject: [LLVMdev] LLI Segfaulting In-Reply-To: <4F5DF2C0.9040108@free.fr> References: <33486161.post@talk.nabble.com> <4F5DF2C0.9040108@free.fr> Message-ID: <33486962.post@talk.nabble.com> Hi Duncan, Duncan Sands wrote: > > Hi Fraser, it looks to me like you are smashing the stack. > >> define void @main() nounwind { >> allocas: >> %0 = alloca { i32, [0 x i32] }, align 8 > > ^ this allocates 4 bytes on the stack. > >> %2 = getelementptr inbounds { i32, [0 x i32] }* %0, i64 0, i32 1 > > ^ this gets a pointer to the byte after the 4 allocated bytes. > >> %3 = bitcast [0 x i32]* %2 to i8* >> call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* bitcast ([5 x i32]* >> @.gvar_array to i8*), i64 20, i32 4, i1 false) > > This copies 20 bytes there, kaboom! > Such a painfully obvious answer, thank you! I'm assuming this is what happens when I use the unoptimized version of the code and call > %0 = alloca %MainClass then transfer the array into that. If I'm taking a MainClass pointer into my function, can I then just re-allocate it as a { i32, [5 x i32] } when I learn about the length? That doesn't sound like the nicest option. I'm not aware of a way of only allocating a part of a literal struct, is that possible? Cheers, Fraser -- View this message in context: http://old.nabble.com/LLI-Segfaulting-tp33486161p33486962.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From gavin.har at gmail.com Mon Mar 12 09:43:09 2012 From: gavin.har at gmail.com (Gavin Harrison) Date: Mon, 12 Mar 2012 10:43:09 -0400 Subject: [LLVMdev] LLI Segfaulting In-Reply-To: <33486962.post@talk.nabble.com> References: <33486161.post@talk.nabble.com> <4F5DF2C0.9040108@free.fr> <33486962.post@talk.nabble.com> Message-ID: Hi Fraser, Is there anything preventing you from using a pointer for the second part of the structure and allocating memory for it later? Thanks, Gavin On Mar 12, 2012, at 10:35 AM, Fraser Cormack wrote: > > Hi Duncan, > > > Duncan Sands wrote: >> >> Hi Fraser, it looks to me like you are smashing the stack. >> >>> define void @main() nounwind { >>> allocas: >>> %0 = alloca { i32, [0 x i32] }, align 8 >> >> ^ this allocates 4 bytes on the stack. >> >>> %2 = getelementptr inbounds { i32, [0 x i32] }* %0, i64 0, i32 1 >> >> ^ this gets a pointer to the byte after the 4 allocated bytes. >> >>> %3 = bitcast [0 x i32]* %2 to i8* >>> call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* bitcast ([5 x i32]* >>> @.gvar_array to i8*), i64 20, i32 4, i1 false) >> >> This copies 20 bytes there, kaboom! >> > > Such a painfully obvious answer, thank you! I'm assuming this is what > happens when I use the unoptimized version of the code and call > >> %0 = alloca %MainClass > > then transfer the array into that. If I'm taking a MainClass pointer into my > function, can I then just re-allocate it as a { i32, [5 x i32] } when > I learn about the length? That doesn't sound like the nicest option. I'm not > aware of a way of only allocating a part of a literal struct, is that > possible? > > Cheers, > Fraser > -- > View this message in context: http://old.nabble.com/LLI-Segfaulting-tp33486161p33486962.html > Sent from the LLVM - Dev mailing list archive at Nabble.com. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From frasercrmck at gmail.com Mon Mar 12 09:59:25 2012 From: frasercrmck at gmail.com (Fraser Cormack) Date: Mon, 12 Mar 2012 07:59:25 -0700 (PDT) Subject: [LLVMdev] LLI Segfaulting In-Reply-To: References: <33486161.post@talk.nabble.com> <4F5DF2C0.9040108@free.fr> <33486962.post@talk.nabble.com> Message-ID: <33487147.post@talk.nabble.com> Hi Gavin, Do you mean something along the lines of having my array struct as { i32, i32* } and then indexing it with a gep and allocating the appropriate memory when I learn of it? Thanks, Fraser Gavin Harrison-2 wrote: > > Hi Fraser, > > Is there anything preventing you from using a pointer for the second part > of the structure and allocating memory for it later? > > Thanks, > Gavin > > On Mar 12, 2012, at 10:35 AM, Fraser Cormack wrote: > >> >> Hi Duncan, >> >> >> Duncan Sands wrote: >>> >>> Hi Fraser, it looks to me like you are smashing the stack. >>> >>>> define void @main() nounwind { >>>> allocas: >>>> %0 = alloca { i32, [0 x i32] }, align 8 >>> >>> ^ this allocates 4 bytes on the stack. >>> >>>> %2 = getelementptr inbounds { i32, [0 x i32] }* %0, i64 0, i32 1 >>> >>> ^ this gets a pointer to the byte after the 4 allocated bytes. >>> >>>> %3 = bitcast [0 x i32]* %2 to i8* >>>> call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* bitcast ([5 x i32]* >>>> @.gvar_array to i8*), i64 20, i32 4, i1 false) >>> >>> This copies 20 bytes there, kaboom! >>> >> >> Such a painfully obvious answer, thank you! I'm assuming this is what >> happens when I use the unoptimized version of the code and call >> >>> %0 = alloca %MainClass >> >> then transfer the array into that. If I'm taking a MainClass pointer into >> my >> function, can I then just re-allocate it as a { i32, [5 x i32] } >> when >> I learn about the length? That doesn't sound like the nicest option. I'm >> not >> aware of a way of only allocating a part of a literal struct, is that >> possible? >> >> Cheers, >> Fraser >> -- >> View this message in context: >> http://old.nabble.com/LLI-Segfaulting-tp33486161p33486962.html >> Sent from the LLVM - Dev mailing list archive at Nabble.com. >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -- View this message in context: http://old.nabble.com/LLI-Segfaulting-tp33486161p33487147.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From baldrick at free.fr Mon Mar 12 10:24:50 2012 From: baldrick at free.fr (Duncan Sands) Date: Mon, 12 Mar 2012 16:24:50 +0100 Subject: [LLVMdev] Assignment of large objects, optimization? In-Reply-To: References: Message-ID: <4F5E1542.7050709@free.fr> Hi Patrik, > My fronted generates (bad) code, which I see that LLVM is unable to optimize. > For example, code similar to: > %a = type [32 x i16] > declare void @set_obj(%a*) > declare void @use_obj(%a*) > define void @foo() { > entry: > %a1 = alloca %a > %a2 = alloca %a > call void @set_obj(%a* %a2) > %a3 = load %a* %a2 > store %a %a3, %a* %a1 > call void @use_obj(%a* %a1) > ret void > } > (Or with load/store replaced with memcpy). > In C pseudo-code this is similar to: > a a1; > a a2 = set_obj(); > a1 = a2; > use_obj(a1); > and the corresponding LLVM IR in foo() can be simplified to: > %a1 = alloca %a > call void @set_obj(%a* %a1) > call void @use_obj(%a* %a1) no it can't. That's because set_obj may have remembered the address passed to it, for example by storing it in a global variable. Then use_obj might compare the address passed to it with the address that set_obj stashes away, and make decisions based on whether they compare equal or not. > Is it unreasonable to expect LLVM to do this kind of simplifications? Try adding the nocapture attribute to the argument of set_obj. > On a side note: Why isn't there an assignment operator in the LLVM IR? Other > compilers I have seen have some kind of assignment operator in the IR. That's because LLVM IR is always in SSA form. SSA form makes assignments pointless. For example, suppose you could write %x := %y (assignment). Thanks to SSA form, you know that %x can only get a value once, and thus %y is that value: %x is equal to %y throughout the function. But then what's the point of %x? You might as well just use %y wherever you see %x. Ciao, Duncan. From gavin.har at gmail.com Mon Mar 12 10:28:57 2012 From: gavin.har at gmail.com (Gavin Harrison) Date: Mon, 12 Mar 2012 11:28:57 -0400 Subject: [LLVMdev] LLI Segfaulting In-Reply-To: <33487147.post@talk.nabble.com> References: <33486161.post@talk.nabble.com> <4F5DF2C0.9040108@free.fr> <33486962.post@talk.nabble.com> <33487147.post@talk.nabble.com> Message-ID: Yes, that is what I mean. :-) On Mar 12, 2012 11:02 AM, "Fraser Cormack" wrote: > > Hi Gavin, > > Do you mean something along the lines of having my array struct as { i32, > i32* } and then indexing it with a gep and allocating the appropriate > memory > when I learn of it? > > Thanks, > Fraser > > > Gavin Harrison-2 wrote: > > > > Hi Fraser, > > > > Is there anything preventing you from using a pointer for the second part > > of the structure and allocating memory for it later? > > > > Thanks, > > Gavin > > > > On Mar 12, 2012, at 10:35 AM, Fraser Cormack wrote: > > > >> > >> Hi Duncan, > >> > >> > >> Duncan Sands wrote: > >>> > >>> Hi Fraser, it looks to me like you are smashing the stack. > >>> > >>>> define void @main() nounwind { > >>>> allocas: > >>>> %0 = alloca { i32, [0 x i32] }, align 8 > >>> > >>> ^ this allocates 4 bytes on the stack. > >>> > >>>> %2 = getelementptr inbounds { i32, [0 x i32] }* %0, i64 0, i32 1 > >>> > >>> ^ this gets a pointer to the byte after the 4 allocated bytes. > >>> > >>>> %3 = bitcast [0 x i32]* %2 to i8* > >>>> call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* bitcast ([5 x i32]* > >>>> @.gvar_array to i8*), i64 20, i32 4, i1 false) > >>> > >>> This copies 20 bytes there, kaboom! > >>> > >> > >> Such a painfully obvious answer, thank you! I'm assuming this is what > >> happens when I use the unoptimized version of the code and call > >> > >>> %0 = alloca %MainClass > >> > >> then transfer the array into that. If I'm taking a MainClass pointer > into > >> my > >> function, can I then just re-allocate it as a { i32, [5 x i32] } > >> when > >> I learn about the length? That doesn't sound like the nicest option. I'm > >> not > >> aware of a way of only allocating a part of a literal struct, is that > >> possible? > >> > >> Cheers, > >> Fraser > >> -- > >> View this message in context: > >> http://old.nabble.com/LLI-Segfaulting-tp33486161p33486962.html > >> Sent from the LLVM - Dev mailing list archive at Nabble.com. > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > -- > View this message in context: > http://old.nabble.com/LLI-Segfaulting-tp33486161p33487147.html > Sent from the LLVM - Dev mailing list archive at Nabble.com. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120312/b48f1bf3/attachment.html From fandawei.s at gmail.com Mon Mar 12 11:25:43 2012 From: fandawei.s at gmail.com (Fan Dawei) Date: Tue, 13 Mar 2012 00:25:43 +0800 Subject: [LLVMdev] scalarrepl fails to promote array of vector In-Reply-To: <4F5DB1BE.2010704@free.fr> References: <4F5DB1BE.2010704@free.fr> Message-ID: Thanks Duncan and Chris! I have this problem solved after I add the target layout definition at the beginning of the ii source code. It seems that the optimization pass rely on this information during transformation. I'll figure it out. All the allocations including the array of vector in the previous examples are eliminated. Now my compiler can generate pretty neat and efficient code. Thanks! Cheers! David On Mon, Mar 12, 2012 at 4:20 PM, Duncan Sands wrote: > Hi Fan, > > > You said that scalarRepl gets shy about loads and stores of the entire > > aggregate. Then I use a test case: > > > > ; ModuleID = 'test1.ll' > > define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly { > > %stackArray = alloca <4 x i32> > > %XC = bitcast i32* %X to <4 x i32>* > > %arrayVal = load <4 x i32>* %XC > > store <4 x i32> %arrayVal, <4 x i32>* %stackArray > > %arrayVal1 = load <4 x i32>* %stackArray > > %1 = extractelement <4 x i32> %arrayVal1, i32 1 > > ret i32 %1 > > } > > > > $ opt -S -stats -scalarrepl test1.ll > > ; ModuleID = 'test1.ll' > > > > define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly { > > %XC = bitcast i32* %X to <4 x i32>* > > %arrayVal = load <4 x i32>* %XC > > %1 = extractelement <4 x i32> %arrayVal, i32 1 > > ret i32 %1 > > } > > > ===-------------------------------------------------------------------------=== > > ... Statistics Collected ... > > > ===-------------------------------------------------------------------------=== > > > > 1 mem2reg - Number of alloca's promoted with a single store > > 1 scalarrepl - Number of allocas promoted > > > > You can see that the stackArray is eliminated, > > I think you may be confusing arrays and vectors: there is no stack array in > your example, only the vector <4 x i32>. As a general rule hardly any > optimization is done for loads and stores of arrays because front-ends > don't > produce them much. Much more effort is made for vectors because they can > be > important for getting good performance. > > Ciao, Duncan. > > although there is loads and > > stores of the entire aggregate. > > > > However, the optimised code is still not optimal. I want the code just > load one > > element from X instead of the whole array. > > > > Thanks, > > David > > > > > > > > > > > > On Sun, Mar 11, 2012 at 5:22 AM, Chris Lattner > > wrote: > > > > > > On Mar 10, 2012, at 9:34 AM, Fan Dawei wrote: > > > > > Hi all, > > > > > > I want to use scalarrepl pass to eliminate the allocation of > mat_alloc > > which is of type [4 x <4 x float>] in the following program. > > > > > > $cat test.ll > > > > > > ; ModuleID = 'test.ll' > > > > > > define void @main(<4 x float>* %inArg, <4 x float>* %outArg, [4 x > <4 x > > float>]* %constants) nounwind { > > > entry: > > > %inArg1 = load <4 x float>* %inArg > > > %mat_alloc = alloca [4 x <4 x float>] > > > %matVal = load [4 x <4 x float>]* %constants > > > store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc > > > %0 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 > 0, i32 0 > > > %1 = load <4 x float>* %0 > > > %2 = fmul <4 x float> %1, %inArg1 > > > %3 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 > 0, i32 1 > > > %4 = load <4 x float>* %3 > > > %5 = fmul <4 x float> %4, %inArg1 > > > %6 = fadd <4 x float> %2, %5 > > > %7 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 > 0, i32 2 > > > %8 = load <4 x float>* %7 > > > %9 = fmul <4 x float> %8, %inArg1 > > > %10 = fadd <4 x float> %6, %9 > > > %11 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 > 0, i32 3 > > > %12 = load <4 x float>* %11 > > > %13 = fadd <4 x float> %10, %12 > > > %14 = getelementptr <4 x float>* %outArg, i32 1 > > > store <4 x float> %13, <4 x float>* %14 > > > ret void > > > } > > > > > > $ opt -S -stats -scalarrepl test.ll > > > > > > No transformation is performed. I've examined the source code of > > scalarrepl. It seems this pass does not handle array allocations. Is > there > > other transformation pass I can use to eliminate this allocation? > > > > Hi David, > > > > ScalarRepl gets shy about loads and stores of the entire aggregate: > > > > > %matVal = load [4 x <4 x float>]* %constants > > > store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc > > > > It is possible to generalize scalarrepl to handle these similar to > the way > > it handles memcpy, but noone has done that yet. Also, it's not > generally > > recommended to do stuff like this, because you'll get inefficient > code from > > many parts of the optimizer and code generator. > > > > -Chris > > > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120313/24eff74f/attachment.html From fandawei.s at gmail.com Mon Mar 12 11:29:16 2012 From: fandawei.s at gmail.com (Fan Dawei) Date: Tue, 13 Mar 2012 00:29:16 +0800 Subject: [LLVMdev] scalarrepl fails to promote array of vector In-Reply-To: <4F5DB1BE.2010704@free.fr> References: <4F5DB1BE.2010704@free.fr> Message-ID: Thanks Duncan! I have this problem solved after I add the target layout definition at the beginning of the ii source code. It seems that the optimization pass rely on this information during transformation. All the allocations including the array of vector in the previous examples are eliminated. Now my compiler can generate pretty neat and efficient code. Thanks! Cheers! David On Mon, Mar 12, 2012 at 4:20 PM, Duncan Sands wrote: > Hi Fan, > > > You said that scalarRepl gets shy about loads and stores of the entire > > aggregate. Then I use a test case: > > > > ; ModuleID = 'test1.ll' > > define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly { > > %stackArray = alloca <4 x i32> > > %XC = bitcast i32* %X to <4 x i32>* > > %arrayVal = load <4 x i32>* %XC > > store <4 x i32> %arrayVal, <4 x i32>* %stackArray > > %arrayVal1 = load <4 x i32>* %stackArray > > %1 = extractelement <4 x i32> %arrayVal1, i32 1 > > ret i32 %1 > > } > > > > $ opt -S -stats -scalarrepl test1.ll > > ; ModuleID = 'test1.ll' > > > > define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly { > > %XC = bitcast i32* %X to <4 x i32>* > > %arrayVal = load <4 x i32>* %XC > > %1 = extractelement <4 x i32> %arrayVal, i32 1 > > ret i32 %1 > > } > > > ===-------------------------------------------------------------------------=== > > ... Statistics Collected ... > > > ===-------------------------------------------------------------------------=== > > > > 1 mem2reg - Number of alloca's promoted with a single store > > 1 scalarrepl - Number of allocas promoted > > > > You can see that the stackArray is eliminated, > > I think you may be confusing arrays and vectors: there is no stack array in > your example, only the vector <4 x i32>. As a general rule hardly any > optimization is done for loads and stores of arrays because front-ends > don't > produce them much. Much more effort is made for vectors because they can > be > important for getting good performance. > > Ciao, Duncan. > > although there is loads and > > stores of the entire aggregate. > > > > However, the optimised code is still not optimal. I want the code just > load one > > element from X instead of the whole array. > > > > Thanks, > > David > > > > > > > > > > > > On Sun, Mar 11, 2012 at 5:22 AM, Chris Lattner > > wrote: > > > > > > On Mar 10, 2012, at 9:34 AM, Fan Dawei wrote: > > > > > Hi all, > > > > > > I want to use scalarrepl pass to eliminate the allocation of > mat_alloc > > which is of type [4 x <4 x float>] in the following program. > > > > > > $cat test.ll > > > > > > ; ModuleID = 'test.ll' > > > > > > define void @main(<4 x float>* %inArg, <4 x float>* %outArg, [4 x > <4 x > > float>]* %constants) nounwind { > > > entry: > > > %inArg1 = load <4 x float>* %inArg > > > %mat_alloc = alloca [4 x <4 x float>] > > > %matVal = load [4 x <4 x float>]* %constants > > > store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc > > > %0 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 > 0, i32 0 > > > %1 = load <4 x float>* %0 > > > %2 = fmul <4 x float> %1, %inArg1 > > > %3 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 > 0, i32 1 > > > %4 = load <4 x float>* %3 > > > %5 = fmul <4 x float> %4, %inArg1 > > > %6 = fadd <4 x float> %2, %5 > > > %7 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 > 0, i32 2 > > > %8 = load <4 x float>* %7 > > > %9 = fmul <4 x float> %8, %inArg1 > > > %10 = fadd <4 x float> %6, %9 > > > %11 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 > 0, i32 3 > > > %12 = load <4 x float>* %11 > > > %13 = fadd <4 x float> %10, %12 > > > %14 = getelementptr <4 x float>* %outArg, i32 1 > > > store <4 x float> %13, <4 x float>* %14 > > > ret void > > > } > > > > > > $ opt -S -stats -scalarrepl test.ll > > > > > > No transformation is performed. I've examined the source code of > > scalarrepl. It seems this pass does not handle array allocations. Is > there > > other transformation pass I can use to eliminate this allocation? > > > > Hi David, > > > > ScalarRepl gets shy about loads and stores of the entire aggregate: > > > > > %matVal = load [4 x <4 x float>]* %constants > > > store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc > > > > It is possible to generalize scalarrepl to handle these similar to > the way > > it handles memcpy, but noone has done that yet. Also, it's not > generally > > recommended to do stuff like this, because you'll get inefficient > code from > > many parts of the optimizer and code generator. > > > > -Chris > > > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120313/99058af4/attachment-0001.html From ryta1203 at gmail.com Mon Mar 12 12:58:55 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Mon, 12 Mar 2012 10:58:55 -0700 Subject: [LLVMdev] [cfe-dev] Compiling Multiple Files In-Reply-To: References: Message-ID: I believe it might actually. Do you know if it's possible to inline functions without an external node? It doesn't appear to be so. On Mon, Mar 12, 2012 at 1:24 AM, James Molloy wrote: > Hi Ryan,**** > > ** ** > > I would just compile to multiple IR files then link them together with > llvm-link.**** > > ** ** > > Would that work for you?**** > > ** ** > > Cheers,**** > > ** ** > > James**** > > ** ** > > *From:* cfe-dev-bounces at cs.uiuc.edu [mailto:cfe-dev-bounces at cs.uiuc.edu] *On > Behalf Of *Ryan Taylor > *Sent:* 09 March 2012 19:32 > *To:* cfe-dev at cs.uiuc.edu > *Subject:* [cfe-dev] Compiling Multiple Files**** > > ** ** > > Clangers, > > What's the best way to compile multiple files in one LLVM IR file? It > doesn't appear that clang supports the gcc -combine feature.**** > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120312/8e857ca5/attachment.html From James.Molloy at arm.com Mon Mar 12 13:23:39 2012 From: James.Molloy at arm.com (James Molloy) Date: Mon, 12 Mar 2012 18:23:39 +0000 Subject: [LLVMdev] [cfe-dev] Compiling Multiple Files In-Reply-To: References: , Message-ID: Hi Ryan, > Do you know if it's possible to inline functions without an external node? Sorry, I don't know to what you're referring here. Could you please rephrase? what do you mean be "external node"? Cheers, James ________________________________________ From: Ryan Taylor [ryta1203 at gmail.com] Sent: 12 March 2012 17:58 To: James Molloy Cc: llvmdev at cs.uiuc.edu Subject: Re: [cfe-dev] Compiling Multiple Files I believe it might actually. Do you know if it's possible to inline functions without an external node? It doesn't appear to be so. On Mon, Mar 12, 2012 at 1:24 AM, James Molloy > wrote: Hi Ryan, I would just compile to multiple IR files then link them together with llvm-link. Would that work for you? Cheers, James From: cfe-dev-bounces at cs.uiuc.edu [mailto:cfe-dev-bounces at cs.uiuc.edu] On Behalf Of Ryan Taylor Sent: 09 March 2012 19:32 To: cfe-dev at cs.uiuc.edu Subject: [cfe-dev] Compiling Multiple Files Clangers, What's the best way to compile multiple files in one LLVM IR file? It doesn't appear that clang supports the gcc -combine feature. -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From ryta1203 at gmail.com Mon Mar 12 13:25:10 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Mon, 12 Mar 2012 11:25:10 -0700 Subject: [LLVMdev] [cfe-dev] Compiling Multiple Files In-Reply-To: References: Message-ID: James, Sure. I want to inline functions in a C program that has no external node, or "main". So the "top" function is not main and there does not exist a main in the file. Thanks. On Mon, Mar 12, 2012 at 11:23 AM, James Molloy wrote: > Hi Ryan, > > > Do you know if it's possible to inline functions without an external > node? > > Sorry, I don't know to what you're referring here. Could you please > rephrase? what do you mean be "external node"? > > Cheers, > > James > ________________________________________ > From: Ryan Taylor [ryta1203 at gmail.com] > Sent: 12 March 2012 17:58 > To: James Molloy > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [cfe-dev] Compiling Multiple Files > > I believe it might actually. Do you know if it's possible to inline > functions without an external node? It doesn't appear to be so. > > On Mon, Mar 12, 2012 at 1:24 AM, James Molloy > wrote: > Hi Ryan, > > I would just compile to multiple IR files then link them together with > llvm-link. > > Would that work for you? > > Cheers, > > James > > From: cfe-dev-bounces at cs.uiuc.edu > [mailto:cfe-dev-bounces at cs.uiuc.edu] > On Behalf Of Ryan Taylor > Sent: 09 March 2012 19:32 > To: cfe-dev at cs.uiuc.edu > Subject: [cfe-dev] Compiling Multiple Files > > Clangers, > > What's the best way to compile multiple files in one LLVM IR file? It > doesn't appear that clang supports the gcc -combine feature. > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120312/e44d9d8a/attachment.html From James.Molloy at arm.com Mon Mar 12 13:30:48 2012 From: James.Molloy at arm.com (James Molloy) Date: Mon, 12 Mar 2012 18:30:48 +0000 Subject: [LLVMdev] [cfe-dev] Compiling Multiple Files In-Reply-To: References: , Message-ID: Hi Ryan, I see. Well, that shouldn't be an issue. If you link the bitcode files together with llvm-link you can then do several things: (1) Run clang on it as you normally would with -O3 for maximum inlining (2) Run 'llc' manually with -O3 and LTO, which will do the maximum link time optimisations. (3) Run 'opt' manually with -O3, LTO which will produce another bitcode file, which you can then again give to Clang to codegen. Cheers, James ________________________________________ From: Ryan Taylor [ryta1203 at gmail.com] Sent: 12 March 2012 18:25 To: James Molloy Cc: llvmdev at cs.uiuc.edu Subject: Re: [cfe-dev] Compiling Multiple Files James, Sure. I want to inline functions in a C program that has no external node, or "main". So the "top" function is not main and there does not exist a main in the file. Thanks. On Mon, Mar 12, 2012 at 11:23 AM, James Molloy > wrote: Hi Ryan, > Do you know if it's possible to inline functions without an external node? Sorry, I don't know to what you're referring here. Could you please rephrase? what do you mean be "external node"? Cheers, James ________________________________________ From: Ryan Taylor [ryta1203 at gmail.com] Sent: 12 March 2012 17:58 To: James Molloy Cc: llvmdev at cs.uiuc.edu Subject: Re: [cfe-dev] Compiling Multiple Files I believe it might actually. Do you know if it's possible to inline functions without an external node? It doesn't appear to be so. On Mon, Mar 12, 2012 at 1:24 AM, James Molloy >> wrote: Hi Ryan, I would just compile to multiple IR files then link them together with llvm-link. Would that work for you? Cheers, James From: cfe-dev-bounces at cs.uiuc.edu> [mailto:cfe-dev-bounces at cs.uiuc.edu>] On Behalf Of Ryan Taylor Sent: 09 March 2012 19:32 To: cfe-dev at cs.uiuc.edu> Subject: [cfe-dev] Compiling Multiple Files Clangers, What's the best way to compile multiple files in one LLVM IR file? It doesn't appear that clang supports the gcc -combine feature. -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From frasercrmck at gmail.com Mon Mar 12 13:43:23 2012 From: frasercrmck at gmail.com (Fraser Cormack) Date: Mon, 12 Mar 2012 11:43:23 -0700 (PDT) Subject: [LLVMdev] LLI Segfaulting In-Reply-To: References: <33486161.post@talk.nabble.com> <4F5DF2C0.9040108@free.fr> <33486962.post@talk.nabble.com> <33487147.post@talk.nabble.com> Message-ID: <33489084.post@talk.nabble.com> Sorry for my ignorance, but I'm unaware as to how I'd achieve this. I know I'd have this, %MainClass = type { { i32, i32* } } And something along these lines: %1 = getelementptr inbounds %MainClass* %0, i32 0, i32 0 ; get to the 'array' %2 = getelementptr inbounds { i32, i32* }* %1, i32 0, i32 1 ; index the data pointer And then I'd want to: alloca i32, i64 5 But I don't understand how I'd associate this allocation with the existing integer 'array' pointer. Also, what would happen when the function returns, wouldn't the new array allocation portion go out of scope, as it's allocated on the function's stack frame? Thanks, Fraser Gavin Harrison-2 wrote: > > Yes, that is what I mean. :-) > On Mar 12, 2012 11:02 AM, "Fraser Cormack" wrote: > >> >> Hi Gavin, >> >> Do you mean something along the lines of having my array struct as { i32, >> i32* } and then indexing it with a gep and allocating the appropriate >> memory >> when I learn of it? >> >> Thanks, >> Fraser >> >> >> Gavin Harrison-2 wrote: >> > >> > Hi Fraser, >> > >> > Is there anything preventing you from using a pointer for the second >> part >> > of the structure and allocating memory for it later? >> > >> > Thanks, >> > Gavin >> > >> > On Mar 12, 2012, at 10:35 AM, Fraser Cormack wrote: >> > >> >> >> >> Hi Duncan, >> >> >> >> >> >> Duncan Sands wrote: >> >>> >> >>> Hi Fraser, it looks to me like you are smashing the stack. >> >>> >> >>>> define void @main() nounwind { >> >>>> allocas: >> >>>> %0 = alloca { i32, [0 x i32] }, align 8 >> >>> >> >>> ^ this allocates 4 bytes on the stack. >> >>> >> >>>> %2 = getelementptr inbounds { i32, [0 x i32] }* %0, i64 0, i32 1 >> >>> >> >>> ^ this gets a pointer to the byte after the 4 allocated bytes. >> >>> >> >>>> %3 = bitcast [0 x i32]* %2 to i8* >> >>>> call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* bitcast ([5 x >> i32]* >> >>>> @.gvar_array to i8*), i64 20, i32 4, i1 false) >> >>> >> >>> This copies 20 bytes there, kaboom! >> >>> >> >> >> >> Such a painfully obvious answer, thank you! I'm assuming this is what >> >> happens when I use the unoptimized version of the code and call >> >> >> >>> %0 = alloca %MainClass >> >> >> >> then transfer the array into that. If I'm taking a MainClass pointer >> into >> >> my >> >> function, can I then just re-allocate it as a { i32, [5 x i32] >> } >> >> when >> >> I learn about the length? That doesn't sound like the nicest option. >> I'm >> >> not >> >> aware of a way of only allocating a part of a literal struct, is that >> >> possible? >> >> >> >> Cheers, >> >> Fraser >> >> -- >> >> View this message in context: >> >> http://old.nabble.com/LLI-Segfaulting-tp33486161p33486962.html >> >> Sent from the LLVM - Dev mailing list archive at Nabble.com. >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > >> > >> >> -- >> View this message in context: >> http://old.nabble.com/LLI-Segfaulting-tp33486161p33487147.html >> Sent from the LLVM - Dev mailing list archive at Nabble.com. >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -- View this message in context: http://old.nabble.com/LLI-Segfaulting-tp33486161p33489084.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From ryta1203 at gmail.com Mon Mar 12 13:53:38 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Mon, 12 Mar 2012 11:53:38 -0700 Subject: [LLVMdev] [cfe-dev] Compiling Multiple Files In-Reply-To: References: Message-ID: James, Thanks. It wouldn't take the LTO option; however, I can get it to inline using -cppgen=inline. However, when I run clang the second time it gives me an error, stating that it expects a top level entity. I think I've run into this issue before. Any ideas? On Mon, Mar 12, 2012 at 11:30 AM, James Molloy wrote: > Hi Ryan, > > I see. Well, that shouldn't be an issue. If you link the bitcode files > together with llvm-link you can then do several things: > > (1) Run clang on it as you normally would with -O3 for maximum inlining > (2) Run 'llc' manually with -O3 and LTO, which will do the maximum link > time optimisations. > (3) Run 'opt' manually with -O3, LTO which will produce another bitcode > file, which you can then again give to Clang to codegen. > > Cheers, > > James > ________________________________________ > From: Ryan Taylor [ryta1203 at gmail.com] > Sent: 12 March 2012 18:25 > To: James Molloy > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [cfe-dev] Compiling Multiple Files > > James, > > Sure. I want to inline functions in a C program that has no external > node, or "main". So the "top" function is not main and there does not exist > a main in the file. > > Thanks. > > On Mon, Mar 12, 2012 at 11:23 AM, James Molloy > wrote: > Hi Ryan, > > > Do you know if it's possible to inline functions without an external > node? > > Sorry, I don't know to what you're referring here. Could you please > rephrase? what do you mean be "external node"? > > Cheers, > > James > ________________________________________ > From: Ryan Taylor [ryta1203 at gmail.com] > Sent: 12 March 2012 17:58 > To: James Molloy > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [cfe-dev] Compiling Multiple Files > > I believe it might actually. Do you know if it's possible to inline > functions without an external node? It doesn't appear to be so. > > On Mon, Mar 12, 2012 at 1:24 AM, James Molloy James.Molloy at arm.com>>> wrote: > Hi Ryan, > > I would just compile to multiple IR files then link them together with > llvm-link. > > Would that work for you? > > Cheers, > > James > > From: cfe-dev-bounces at cs.uiuc.edu >> > [mailto:cfe-dev-bounces at cs.uiuc.edu >>] > On Behalf Of Ryan Taylor > Sent: 09 March 2012 19:32 > To: cfe-dev at cs.uiuc.edu cfe-dev at cs.uiuc.edu> > Subject: [cfe-dev] Compiling Multiple Files > > Clangers, > > What's the best way to compile multiple files in one LLVM IR file? It > doesn't appear that clang supports the gcc -combine feature. > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120312/1936f91d/attachment.html From James.Molloy at arm.com Mon Mar 12 13:58:52 2012 From: James.Molloy at arm.com (James Molloy) Date: Mon, 12 Mar 2012 18:58:52 +0000 Subject: [LLVMdev] [cfe-dev] Compiling Multiple Files In-Reply-To: References: , Message-ID: Hi Ryan, Well, when linking, it'll need a top level entity. Else what is it going to link to? (unless you're building a shared library?) ________________________________________ From: Ryan Taylor [ryta1203 at gmail.com] Sent: 12 March 2012 18:53 To: James Molloy Cc: llvmdev at cs.uiuc.edu Subject: Re: [cfe-dev] Compiling Multiple Files James, Thanks. It wouldn't take the LTO option; however, I can get it to inline using -cppgen=inline. However, when I run clang the second time it gives me an error, stating that it expects a top level entity. I think I've run into this issue before. Any ideas? On Mon, Mar 12, 2012 at 11:30 AM, James Molloy > wrote: Hi Ryan, I see. Well, that shouldn't be an issue. If you link the bitcode files together with llvm-link you can then do several things: (1) Run clang on it as you normally would with -O3 for maximum inlining (2) Run 'llc' manually with -O3 and LTO, which will do the maximum link time optimisations. (3) Run 'opt' manually with -O3, LTO which will produce another bitcode file, which you can then again give to Clang to codegen. Cheers, James ________________________________________ From: Ryan Taylor [ryta1203 at gmail.com] Sent: 12 March 2012 18:25 To: James Molloy Cc: llvmdev at cs.uiuc.edu Subject: Re: [cfe-dev] Compiling Multiple Files James, Sure. I want to inline functions in a C program that has no external node, or "main". So the "top" function is not main and there does not exist a main in the file. Thanks. On Mon, Mar 12, 2012 at 11:23 AM, James Molloy >> wrote: Hi Ryan, > Do you know if it's possible to inline functions without an external node? Sorry, I don't know to what you're referring here. Could you please rephrase? what do you mean be "external node"? Cheers, James ________________________________________ From: Ryan Taylor [ryta1203 at gmail.com>] Sent: 12 March 2012 17:58 To: James Molloy Cc: llvmdev at cs.uiuc.edu> Subject: Re: [cfe-dev] Compiling Multiple Files I believe it might actually. Do you know if it's possible to inline functions without an external node? It doesn't appear to be so. On Mon, Mar 12, 2012 at 1:24 AM, James Molloy >>>> wrote: Hi Ryan, I would just compile to multiple IR files then link them together with llvm-link. Would that work for you? Cheers, James From: cfe-dev-bounces at cs.uiuc.edu>>> [mailto:cfe-dev-bounces at cs.uiuc.edu>>>] On Behalf Of Ryan Taylor Sent: 09 March 2012 19:32 To: cfe-dev at cs.uiuc.edu>>> Subject: [cfe-dev] Compiling Multiple Files Clangers, What's the best way to compile multiple files in one LLVM IR file? It doesn't appear that clang supports the gcc -combine feature. -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From gavin.har at gmail.com Mon Mar 12 14:02:46 2012 From: gavin.har at gmail.com (Gavin Harrison) Date: Mon, 12 Mar 2012 15:02:46 -0400 Subject: [LLVMdev] LLI Segfaulting In-Reply-To: <33489084.post@talk.nabble.com> References: <33486161.post@talk.nabble.com> <4F5DF2C0.9040108@free.fr> <33486962.post@talk.nabble.com> <33487147.post@talk.nabble.com> <33489084.post@talk.nabble.com> Message-ID: <9110B5C8-12C3-4B9C-B44E-697D22534043@gmail.com> One way to find out how to do these sorts of things is to write them in C or C++ and see what clang does :) In C you would do something like this: a = malloc(sizeof(int) * n); The associated LLVM IR would look similar to this: %a = alloca i32* ; %a's type is i32** %n = alloca i32 ... %1 = load i32 %n %2 = sext i32 %1 to i64 %3 = mul i64 4, %2 %4 = call i8 * @malloc(i64 %2) %5 = bitcast i8* %4 to i32* store i32* %5, i32** %a Thanks, Gavin On Mar 12, 2012, at 2:43 PM, Fraser Cormack wrote: > > Sorry for my ignorance, but I'm unaware as to how I'd achieve this. > > I know I'd have this, > > %MainClass = type { { i32, i32* } } > > And something along these lines: > > %1 = getelementptr inbounds %MainClass* %0, i32 0, i32 0 ; get to the > 'array' > %2 = getelementptr inbounds { i32, i32* }* %1, i32 0, i32 1 ; index the > data pointer > > And then I'd want to: > > alloca i32, i64 5 > > But I don't understand how I'd associate this allocation with the existing > integer 'array' pointer. Also, what would happen when the function returns, > wouldn't the new array allocation portion go out of scope, as it's allocated > on the function's stack frame? > > Thanks, > Fraser > > > > Gavin Harrison-2 wrote: >> >> Yes, that is what I mean. :-) >> On Mar 12, 2012 11:02 AM, "Fraser Cormack" wrote: >> >>> >>> Hi Gavin, >>> >>> Do you mean something along the lines of having my array struct as { i32, >>> i32* } and then indexing it with a gep and allocating the appropriate >>> memory >>> when I learn of it? >>> >>> Thanks, >>> Fraser >>> >>> >>> Gavin Harrison-2 wrote: >>>> >>>> Hi Fraser, >>>> >>>> Is there anything preventing you from using a pointer for the second >>> part >>>> of the structure and allocating memory for it later? >>>> >>>> Thanks, >>>> Gavin >>>> >>>> On Mar 12, 2012, at 10:35 AM, Fraser Cormack wrote: >>>> >>>>> >>>>> Hi Duncan, >>>>> >>>>> >>>>> Duncan Sands wrote: >>>>>> >>>>>> Hi Fraser, it looks to me like you are smashing the stack. >>>>>> >>>>>>> define void @main() nounwind { >>>>>>> allocas: >>>>>>> %0 = alloca { i32, [0 x i32] }, align 8 >>>>>> >>>>>> ^ this allocates 4 bytes on the stack. >>>>>> >>>>>>> %2 = getelementptr inbounds { i32, [0 x i32] }* %0, i64 0, i32 1 >>>>>> >>>>>> ^ this gets a pointer to the byte after the 4 allocated bytes. >>>>>> >>>>>>> %3 = bitcast [0 x i32]* %2 to i8* >>>>>>> call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* bitcast ([5 x >>> i32]* >>>>>>> @.gvar_array to i8*), i64 20, i32 4, i1 false) >>>>>> >>>>>> This copies 20 bytes there, kaboom! >>>>>> >>>>> >>>>> Such a painfully obvious answer, thank you! I'm assuming this is what >>>>> happens when I use the unoptimized version of the code and call >>>>> >>>>>> %0 = alloca %MainClass >>>>> >>>>> then transfer the array into that. If I'm taking a MainClass pointer >>> into >>>>> my >>>>> function, can I then just re-allocate it as a { i32, [5 x i32] >>> } >>>>> when >>>>> I learn about the length? That doesn't sound like the nicest option. >>> I'm >>>>> not >>>>> aware of a way of only allocating a part of a literal struct, is that >>>>> possible? >>>>> >>>>> Cheers, >>>>> Fraser >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/LLI-Segfaulting-tp33486161p33486962.html >>>>> Sent from the LLVM - Dev mailing list archive at Nabble.com. >>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/LLI-Segfaulting-tp33486161p33487147.html >>> Sent from the LLVM - Dev mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > -- > View this message in context: http://old.nabble.com/LLI-Segfaulting-tp33486161p33489084.html > Sent from the LLVM - Dev mailing list archive at Nabble.com. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From eli.friedman at gmail.com Mon Mar 12 19:04:09 2012 From: eli.friedman at gmail.com (Eli Friedman) Date: Mon, 12 Mar 2012 17:04:09 -0700 Subject: [LLVMdev] fix a "does not name a type" bug in VASTContext.h In-Reply-To: References: Message-ID: On Thu, Mar 8, 2012 at 6:22 AM, Qingrui Liu wrote: > Hi all, > ? ?I find a bug in the?VASTContext.h of the latest clang. I fixed it and > commit a patch for it. As follows: > > From 447d31176b513a03b253eb25ef314c2a3c0e428a Mon Sep 17 00:00:00 2001 > From: Tsingray > Date: Thu, 8 Mar 2012 22:11:54 +0800 > Subject: [PATCH] fix a 'does not name a type' bug in VASTContext.h > > --- > ?include/clang/AST/ASTContext.h | ? ?2 +- > ?1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/include/clang/AST/ASTContext.h b/include/clang/AST/ASTContext.h > index 3bdac2d..530f957 100644 > --- a/include/clang/AST/ASTContext.h > +++ b/include/clang/AST/ASTContext.h > @@ -480,7 +480,7 @@ public: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?const FieldDecl *LastFD) const; > > ? ?// Access to the set of methods overridden by the given C++ method. > - ?typedef CXXMethodVector::const_iterator overridden_cxx_method_iterator; > + ?typedef CXXMethodVector::iterator overridden_cxx_method_iterator; > ? ?overridden_cxx_method_iterator > ? ?overridden_methods_begin(const CXXMethodDecl *Method) const; Can you "svn up" your llvm and clang sources and check if you are still having issues? -Eli From atrick at apple.com Tue Mar 13 00:39:28 2012 From: atrick at apple.com (Andrew Trick) Date: Mon, 12 Mar 2012 22:39:28 -0700 Subject: [LLVMdev] Question about post RA scheduler In-Reply-To: References: <8EABC991-2BE1-45E5-A388-ABCCE5E826FB@apple.com> Message-ID: On Mar 7, 2012, at 11:34 AM, Akira Hatanaka wrote: > I filed a bug report (Bug 12205). > Please take a look when you have time. > > Per your suggestion, I also attached a patch which attaches to load or > store nodes a machinepointerinfo that points to a stack frame object > when it can infer they are actually reading from or writing to the > stack. The test that was failing passes if I apply this patch, but I > doubt this is the right approach, because this will fail if > InferPointerInfo in SelectionDAG.cpp cannot discover a load or store > is accessing a stack object (it can only infer the information if the > expression for the pointer is simple, for example add FI + const). > > An alternative approach might be to make the machinepointerinfo of the > stores refer to %struct.ObjPointStruct* byval %P or refer to nothing, > but that currently doesn't seem to be possible. I've thought of several ways we could potentially handle this. All are fairly messy without recognizing the situation during argument lowering. I'm not very familiar with the argument lowering code. But it seems to me you should be able to lookup the Value for the formal argument when you generate stack stores. Can you create a MachinePointerInfo for each store that refers to the argument value and proper offset? These initializers will no longer appear to alias with stack accesses, but that's probably ok. What exactly do you think is not possible? If finding the formal argument value and offset is too hard, I suppose there are other hacks you could try. I'm not encouraging it though. Is it valid to set MachinePointerInfo.V = 0? You could try overriding it after calling getStore. If that's not valid, you could probably create a PseudoSourceValue that aliases with everything. I suppose the hackiest thing would be marking the store volatile. The alternative would be to define a new MachineMemOperand flag. I really don't think we should have to go that far though. -Andy > On Tue, Mar 6, 2012 at 6:01 PM, Andrew Trick wrote: >> On Mar 6, 2012, at 5:05 PM, Akira Hatanaka wrote: >>> I am having trouble trying to enable post RA scheduler for the Mips backend. >>> >>> This is the bit code of the function I am compiling: >>> >>> (gdb) p MF.Fn->dump() >>> >>> define void @PointToHPoint(%struct.HPointStruct* noalias sret >>> %agg.result, %struct.ObjPointStruct* byval %P) nounwind { >>> entry: >>> %res = alloca %struct.HPointStruct, align 8 >>> %x2 = bitcast %struct.ObjPointStruct* %P to double* >>> %0 = load double* %x2, align 8 >>> >>> The third instruction is loading the first floating point double of >>> structure %P which is being passed by value. >>> >>> This is the machine function right after completion of isel: >>> (gdb) p MF->dump() >>> # Machine code for function PointToHPoint: >>> Frame Objects: >>> fi#-1: size=48, align=8, fixed, at location [SP+8] >>> fi#0: size=32, align=8, at location [SP] >>> Function Live Ins: %A0 in %vreg0, %A2 in %vreg1, %A3 in %vreg2 >>> >>> BB#0: derived from LLVM BB %entry >>> SW %vreg2, , 4; mem:ST4[FixedStack-1+4] CPURegs:%vreg2 >>> SW %vreg1, , 0; mem:ST4[FixedStack-1](align=8) CPURegs:%vreg1 >>> %vreg3 = COPY %vreg0; CPURegs:%vreg3,%vreg0 >>> %vreg4 = LDC1 , 0; mem:LD8[%x2] AFGR64:%vreg4 >>> >>> >>> The first two stores write the values in argument registers $6 and $7 >>> to frame object -1 >>> (Mips stores byval arguments passed in registers to the stack). >>> The fourth instruction LDC1 loads the value written by the first two >>> stores as a floating point double. >>> >>> This is the machine function just before post RA scheduling: >>> (gdb) p MF.dump() >>> # Machine code for function PointToHPoint: >>> Frame Objects: >>> fi#-1: size=48, align=8, fixed, at location [SP+8] >>> fi#0: size=32, align=8, at location [SP-32] >>> Function Live Ins: %A0 in %vreg0, %A2 in %vreg1, %A3 in %vreg2 >>> >>> BB#0: derived from LLVM BB %entry >>> Live Ins: %A0 %A2 %A3 >>> %SP = ADDiu %SP, -32 >>> PROLOG_LABEL >>> SW %A3, %SP, 44; mem:ST4[FixedStack-1+4] >>> SW %A2, %SP, 40; mem:ST4[FixedStack-1](align=8) >>> %D0 = LDC1 %SP, 40; mem:LD8[%x2] >>> >>> >>> The frame index operands of the first two stores and the fourth load >>> have been lowered to real addresses. >>> Since the first two SWs store to ($sp + 44) and ($sp + 40), and >>> instruction LDC1 loads from ($sp + 40), >>> there should be a dependency between these instructions. >>> >>> However, when ScheduleDAGInstrs::BuildSchedGraph(AliasAnalysis *AA) >>> builds the schedule graph, >>> there are no dependency edges added between the two SWs and LDC1 because >>> getUnderlyingObjectForInstr returns different objects for these instructions: >>> >>> underlying object of SWs: FixedStack-1 >>> underlying object of LDC1: struct.ObjPointStruct* %P >>> >>> >>> Is this a bug? >>> Or are there ways to tell BuildSchedGraph it should add dependency edges? >> >> This is a wild guess. But it looks to me like your load's machineMemOperand should have been converted to refer to the stack frame. I would call that an ISEL bug. I can't say where the bug is without stepping through a test case. >> >> Maybe someone who's worked in this area of ISEL can give you a better hint. In the meantime, I would file a PR. >> >> -Andy > From clchiou at gmail.com Tue Mar 13 04:19:13 2012 From: clchiou at gmail.com (Che-Liang Chiou) Date: Tue, 13 Mar 2012 17:19:13 +0800 Subject: [LLVMdev] GPU thread/block/grid size contraints in LLVM PTX backend In-Reply-To: References: Message-ID: You specify shader model, bit size and etc. arch-specified parameters though -march, -mattr and -mcpu, but AFAIK, PTX backend does not use the GPU thread/block/grid size information in optimization yet. On Mon, Mar 12, 2012 at 8:17 PM, Xin Tong wrote: > I am wondering that how does the LLVM PTX backend find out the > constraints on executing GPU thread/block/grid size ( i.e. a block can > at most have 1024 threads). Can anyone point me to the code ? I need > information in the optimizer, ?how can I get it ? > > Thanks > > Xin > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From patrik.h.hagglund at ericsson.com Tue Mar 13 07:10:44 2012 From: patrik.h.hagglund at ericsson.com (=?iso-8859-1?Q?Patrik_H=E4gglund_H?=) Date: Tue, 13 Mar 2012 13:10:44 +0100 Subject: [LLVMdev] Assignment of large objects, optimization? In-Reply-To: <4F5E1542.7050709@free.fr> References: <4F5E1542.7050709@free.fr> Message-ID: Hi Duncan, > Try adding the nocapture attribute to the argument of set_obj. Thanks! My fault. However, that don't seems to make any difference in this example. Adding nocapture in use_obj as well does the trick, but I don't think that can be applied to the code from my front-end. (And it don't seems to work when replacing load+store with memcpy). Here is corresponding C code: typedef struct obj { unsigned arr[32]; } obj_t; void use_obj(obj_t *a1); obj_t set_obj(void); void foo() { obj_t a1, a2 = set_obj(); a1 = a2; use_obj(&a1); } (Both clang-trunk and gcc-4.6.2 retain the a1 = a2 copying. But one of our other compilers, partly developed in-house, seems to remove the copying.) > SSA form makes assignments pointless. At the LLVM assembler level (the interface for the front-end), redundancies are sometimes helpful, and therefore not completely pointless. For example, being able to do such things as %x = add %y, 0 (which implies %x := %y), may be convenient. However, in this case, I mostly thought of memory objects, i.e. *a := *b instead of using memcpy (or load+store). (For example, byval parameters seems to be constructed by targets using memcpy nodes.) I found another example (simple, but contrived), using Clang, where the reasoning about memory copies seems suboptimal: typedef struct obj { unsigned arr[32]; } obj_t; obj_t a; obj_t bar(void) { obj_t b = a, c = b; return b; // ignoring c! } Using clang -c -Os (on x86-64) I get far from space optimized code (due to memcpy replaced with load+store). /Patrik H?gglund -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Duncan Sands Sent: den 12 mars 2012 16:25 To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Assignment of large objects, optimization? Hi Patrik, > My fronted generates (bad) code, which I see that LLVM is unable to optimize. > For example, code similar to: > %a = type [32 x i16] > declare void @set_obj(%a*) > declare void @use_obj(%a*) > define void @foo() { > entry: > %a1 = alloca %a > %a2 = alloca %a > call void @set_obj(%a* %a2) > %a3 = load %a* %a2 > store %a %a3, %a* %a1 > call void @use_obj(%a* %a1) > ret void > } > (Or with load/store replaced with memcpy). > In C pseudo-code this is similar to: > a a1; > a a2 = set_obj(); > a1 = a2; > use_obj(a1); > and the corresponding LLVM IR in foo() can be simplified to: > %a1 = alloca %a > call void @set_obj(%a* %a1) > call void @use_obj(%a* %a1) no it can't. That's because set_obj may have remembered the address passed to it, for example by storing it in a global variable. Then use_obj might compare the address passed to it with the address that set_obj stashes away, and make decisions based on whether they compare equal or not. > Is it unreasonable to expect LLVM to do this kind of simplifications? Try adding the nocapture attribute to the argument of set_obj. > On a side note: Why isn't there an assignment operator in the LLVM IR? Other > compilers I have seen have some kind of assignment operator in the IR. That's because LLVM IR is always in SSA form. SSA form makes assignments pointless. For example, suppose you could write %x := %y (assignment). Thanks to SSA form, you know that %x can only get a value once, and thus %y is that value: %x is equal to %y throughout the function. But then what's the point of %x? You might as well just use %y wherever you see %x. Ciao, Duncan. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From kursh at ispras.ru Tue Mar 13 08:27:58 2012 From: kursh at ispras.ru (Shamil Kurmangaleev) Date: Tue, 13 Mar 2012 17:27:58 +0400 Subject: [LLVMdev] MC JIT on ARM can't generate valid code for external functions call Message-ID: <4F5F4B5E.5030902@ispras.ru> Hello. We found the following problem with MC JIT, on ARM it can't generate valid code for instruction "bl " like: bl printf Because the ELF file in memory generated by MC JIT does not have the .plt section, but we need to have the following code to be emitted in it: .plt:00008290 STR LR, [SP,#-4]! .plt:00008294 LDR LR, =_GLOBAL_OFFSET_TABLE_ ; PIC mode .plt:00008298 NOP .plt:0000829C LDR PC, [LR,#8]! Also GOT section doesn't exists. To fix this we need to generate the valid entries in GOT and PLT sections We propose adding these sections and generating a thunk, same as in the usual compilation pipeline. What is the best way to fix these issues? --- Kurmangaleev Shamil, From dengjunqi06323011 at gmail.com Tue Mar 13 08:31:42 2012 From: dengjunqi06323011 at gmail.com (Jun-qi Deng) Date: Tue, 13 Mar 2012 21:31:42 +0800 Subject: [LLVMdev] llvm-config --cxxflags does not give the result the configuration script wants? Message-ID: Hello LLVM-DEV! Recently I think I found a bug in llvm's CMakeLists.(I use llvm 3.1svn and Clang 3.1) I follow the normal way to try to compile an application that utilize both clang and llvm: I ./autogen.sh it, ./configure it and make it. But the make fails. At last I found out the failure is because that the Makefile's CXXFLAGS does not contain -fno-rtti option which is needed by the compilation. The "configure" file of the application use llvm-config --cxxflags to assign to the CXXFLAGS. But Although I see that, under my platform, almost all the the files inside LLVM and Clang are compiled with -fno-rtti, the llvm-config --cxxflags does not give the correct flags that those files are compiled. The following are something I thought is valuable to mention here: As I probe the CMAKE_CXX_FLAGS of some of the CMakeLists.txt, I found there is a "baseline" of this variable, mine is "-fPIC -fvisibility-inlines-hidden". It means, no matter how a single CMakeLists.txt changes CMAKE_CXX_FLAGS, it will remain the same when it goes into another CMakeLists.txt. Inside the llvm-config's source folder, the CMakeLists.txt shows how its --cxxflags is set: set(CXX_FLGS "${CMAKE_CXX_FLAGS} ${CMAKE_CXX_FLAGS_${uppercase_ CMAKE_BUILD_TYPE}} ${LLVM_DEFINITIONS}") ... COMMAND echo s!@LLVM_CXXFLAGS@!${CXX_FLGS}! >> ${SEDSCRIPT_OBJPATH} and my LLVM_DEFINITIONS is: -D_GNU_SOURCE -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS I also use "make VERBOSE=1" to check how llvm and clang is compiled. I see clearly that they are both compiled with -fno-rtti. So, at least under my system, when using cmake to configure the project, no matter how the files are compiled, llvm-config --cxxflags will be the same. It is: -fPIC -fvisibility-inlines-hidden -D_GNU_SOURCE -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS. It is determined by the "baseline" of CMAKE_CXX_FLAGS and LLVM_DEFINITIONS. Is this probably a problem? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120313/f4434bf7/attachment.html From joerg at britannica.bec.de Tue Mar 13 08:47:06 2012 From: joerg at britannica.bec.de (Joerg Sonnenberger) Date: Tue, 13 Mar 2012 14:47:06 +0100 Subject: [LLVMdev] MC JIT on ARM can't generate valid code for external functions call In-Reply-To: <4F5F4B5E.5030902@ispras.ru> References: <4F5F4B5E.5030902@ispras.ru> Message-ID: <20120313134706.GB6633@britannica.bec.de> On Tue, Mar 13, 2012 at 05:27:58PM +0400, Shamil Kurmangaleev wrote: > Because the ELF file in memory generated by MC JIT does not have the > .plt section, but we need to have the following code to be emitted in it: Why do you need it to emit PIC? You know the offsets of all functions it is calling. Joerg From xerox.time.tech at gmail.com Tue Mar 13 08:58:31 2012 From: xerox.time.tech at gmail.com (Xin Tong) Date: Tue, 13 Mar 2012 09:58:31 -0400 Subject: [LLVMdev] GPU thread/block/grid size contraints in LLVM PTX backend In-Reply-To: References: Message-ID: but does it have default values ? Thanks Xin On Tue, Mar 13, 2012 at 5:19 AM, Che-Liang Chiou wrote: > You specify shader model, bit size and etc. arch-specified parameters > though -march, -mattr and -mcpu, but AFAIK, PTX backend does not use > the GPU thread/block/grid size information in optimization yet. > > On Mon, Mar 12, 2012 at 8:17 PM, Xin Tong wrote: >> I am wondering that how does the LLVM PTX backend find out the >> constraints on executing GPU thread/block/grid size ( i.e. a block can >> at most have 1024 threads). Can anyone point me to the code ? I need >> information in the optimizer, ?how can I get it ? >> >> Thanks >> >> Xin >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From ofv at wanadoo.es Tue Mar 13 09:31:41 2012 From: ofv at wanadoo.es (=?utf-8?Q?=C3=93scar_Fuentes?=) Date: Tue, 13 Mar 2012 15:31:41 +0100 Subject: [LLVMdev] llvm-config --cxxflags does not give the result the configuration script wants? References: Message-ID: <87aa3klghe.fsf@wanadoo.es> Jun-qi Deng writes: > Recently I think I found a bug in llvm's CMakeLists.(I use llvm 3.1svn and > Clang 3.1) I follow the normal way to try to compile an application that > utilize both clang and llvm: I ./autogen.sh it, ./configure it and make it. > But the make fails. At last I found out the failure is because that the > Makefile's CXXFLAGS does not contain -fno-rtti option which is needed by > the compilation. Is it? In my experience, it isn't. Please show the relevant command generated by your makefile and the associated error message(s). RTTI is an on/off option that changes per LLVM library, so setting -fno-rtti for using LLVM makes no sense. VMCore and Support have -frtti while most of the rest have -fno-rtti. The switch is decided on cmake/modules/LLVMProcessSources.cmake depending on the value of LLVM_REQUIRES_RTTI. [snip] From baldrick at free.fr Tue Mar 13 10:26:25 2012 From: baldrick at free.fr (Duncan Sands) Date: Tue, 13 Mar 2012 16:26:25 +0100 Subject: [LLVMdev] Your commit 103140 Message-ID: <4F5F6721.60308@free.fr> Hi Chris, your commit 103140 broke PR397 for llvm-gcc (in LLVM 2.9) and dragonegg. In the PR, asm renaming creates two linkonce functions with the same asm name (in the IR they are @"\01lstat64" and "@lstat64". What used to happen is that they were both output to the assembler file, both with the name lstat64, exactly like GCC does. The assembler and linker are perfectly happy about this, presumably because the functions have weak linkage. What happens now is that compilation fails with "label emitted multiple times to assembly file". Do you agree that it is reasonable to support outputting multiple functions with the same name, as long as they have weak linkage? Ciao, Duncan. PS: The alternative is to follow clang and have the front-end take care of dropping one of the functions, rather than leaving it to the linker. > Index: test/CodeGen/X86/label-redefinition.ll > =================================================================== > --- test/CodeGen/X86/label-redefinition.ll (revision 0) > +++ test/CodeGen/X86/label-redefinition.ll (revision 103140) > @@ -0,0 +1,15 @@ > +; PR7054 > +; RUN: not llc %s -o - |& grep {'_foo' label emitted multiple times to assembly} > +target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32" > +target triple = "i386-apple-darwin10.0.0" > + > +define i32 @"\01_foo"() { > + unreachable > +} > + > +define i32 @foo() { > +entry: > + unreachable > +} > + > +declare i32 @xstat64(i32, i8*, i8*) > Index: lib/CodeGen/AsmPrinter/AsmPrinter.cpp > =================================================================== > --- lib/CodeGen/AsmPrinter/AsmPrinter.cpp (revision 103139) > +++ lib/CodeGen/AsmPrinter/AsmPrinter.cpp (revision 103140) > @@ -408,7 +408,13 @@ > /// EmitFunctionEntryLabel - Emit the label that is the entrypoint for the > /// function. This can be overridden by targets as required to do custom stuff. > void AsmPrinter::EmitFunctionEntryLabel() { > - OutStreamer.EmitLabel(CurrentFnSym); > + // The function label could have already been emitted if two symbols end up > + // conflicting due to asm renaming. Detect this and emit an error. > + if (CurrentFnSym->isUndefined()) > + return OutStreamer.EmitLabel(CurrentFnSym); > + > + report_fatal_error("'" + Twine(CurrentFnSym->getName()) + > + "' label emitted multiple times to assembly file"); > } > From arnamoy at ualberta.ca Tue Mar 13 11:30:17 2012 From: arnamoy at ualberta.ca (Arnamoy Bhattacharyya) Date: Tue, 13 Mar 2012 10:30:17 -0600 Subject: [LLVMdev] About Implementation of Pettis-Hansen's / Gloy's Code Layout Transformation in LLVM Message-ID: Hi; I was planning to implement a profile guided optimization technique in LLVM. In the open source projects list of the LLVM site; I saw "code layout" is a transformation that can be worth looking at as it will use of profiles (possibly path profiles). So I was thinking of implementing either Pettis-Hansen's (Profile guided code positioning, Pettis & Hansen) or Gloy's (Procedure Placement Using Temporal-Ordering Information, Gloy, Smith) procedure placement algorithm in LLVM because first of all I have not seen any implementation of it in LLVM till** and second, these are the classic procedure placement algorithms. So I would like to get some advice from the seniors - 1. Has there been any attempt to implement these before that I don't know about. And what was the success/ failure of that implementation? 2. Gloy's algorithm aims at reducing the I-Cache misses. But how wise would it be to aim at that to optimize performance while the LLVM already does something to reduce I-Cache misses? To rephrase my question, is there really any scope of improvement for I-Cache misses? (I know the answer depends on the kind of application we are trying to compile, but let's say we are using applications which can have a large number of I-Cache conflict misses and have a large Working Set as well e.g gcc, go, postscript etc. I really don't know how well LLVM handles I-Cache misses for these programs. I mentioned them because these are the benchmarks Gloy used to measure performance of his transformation and they have interesting instruction memory behaviour ) 3. Is this a good idea in terms of the complexity of implementing it? (To be frank, I will be doing this work for my Master's thesis and I have just more than a year in my hand) Any comment on my idea (whether it is stupid / wise / can't tell without actually implementing it) would be appreciated. Also any pointers to how I-Cache misses are handled (reduced) by LLVM will be good. Thank you for your help; ** - I have seen that there is already PH's Basic Block Placement transformation is implemented in LLVM but not Code Layout. -- Arnamoy Bhattacharyya Athabasca Hall 143 Department of Computing Science - University of Alberta Edmonton, Alberta, Canada, T6G 2E8 780-680-7073 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120313/4e0446ad/attachment.html From tobias.von.koch at gmail.com Tue Mar 13 11:39:14 2012 From: tobias.von.koch at gmail.com (Tobias von Koch) Date: Tue, 13 Mar 2012 16:39:14 +0000 Subject: [LLVMdev] How to keep FunctionPass analysis result alive in Module Pass? In-Reply-To: <4F5A8592.4000600@illinois.edu> References: <3EDC7AC1-478A-41CD-A970-7D75B5BD0E67@csail.mit.edu> <4F5A8210.3010004@illinois.edu> <7DB7911A-4FBD-4601-A117-D6FF5441636A@csail.mit.edu> <4F5A8592.4000600@illinois.edu> Message-ID: Hi John & Fan, I hit the exact same problem today. I can confirm that Fan's observation of getting the *same* LoopInfo* from subsequent calls to getAnalysis(function) for *distinct* functions is indeed true. I was very surprised by this at first as well, but I think I've found an explanation - please anyone correct me if this is wrong: What you're getting from getAnalysis<>(function) is a reference to the function pass after it has been run on the specified function. While you can run a function pass on many different functions, there still exists only *one* instance of the pass itself. The only thing that changes between different calls to getAnalysis(F) is the analysis information held by the LoopInfo pass in its LoopInfoBase member. It gets released and overwritten on every call to LoopInfo::runOnFunction() - see the call to releaseMemory() right at the beginning. The idea of creating some sort of Map of Function* ----> LoopInfo* therefore won't work. It also doesn't make sense to keep Loop* pointers around after getAnalysis() has been called again because all that memory gets released (which is how I hit this problem)... Now, Fan, the practical consequence of this is that if you want to use LoopInfo in a ModulePass, you either have to do all your work that uses LoopInfo in between getAnalysis calls (if that's possible you're probably better off writing a FunctionPass in the first place) *OR* keep re-running getAnalysis which is very inefficient. I'd imagine the same goes for DominatorTree. In general, it would be nice if there was some logical separation between a *Function Pass *and the *Analysis Information *it produces. For LoopInfo, it's kind of there since all the data is in this LoopInfoBase object but there is no way of taking ownership of that... -- Tobias On Fri, Mar 9, 2012 at 22:34, John Criswell wrote: > On 3/9/12 4:28 PM, Fan Long wrote: > > Thank you for your quick reply. > > Actually I am using a std::map to map Function* to LoopInfo*, but that > does not help in this case. Each time I call > getAnalysis(*F), it returns the same instance of > llvm::LoopInfo, so the std::map is just mapping every function into the > same instance. It seems only the analysis result for the last function is > valid, because all the result for all previous functions are erased. > > > Just to make sure I understand: you are saying that every time you call > getAnalysis(), you get the *same* LoopInfo * regardless of > whether you call it on the same function or on a different function. Is > that correct? > > Getting the same LoopInfo * when you call getAnalysis<> on the same > function twice would not surprise me. Getting the same LoopInfo * when you > call getAnalysis on F1 and F2 where F1 and F2 are different functions would > surprise me greatly. > > > > The only workaround solution I have now is to copy all analysis result > out of the data structure of LoopInfo before I call next &getAnalysis(). > Because llvm::LoopInfo does not provide copy method, this will be very > dirty to do so. > > > Yes, that may be what you have to do. > > > -- John T. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120313/d87f3053/attachment.html From criswell at illinois.edu Tue Mar 13 11:41:52 2012 From: criswell at illinois.edu (John Criswell) Date: Tue, 13 Mar 2012 11:41:52 -0500 Subject: [LLVMdev] How to keep FunctionPass analysis result alive in Module Pass? In-Reply-To: References: <3EDC7AC1-478A-41CD-A970-7D75B5BD0E67@csail.mit.edu> <4F5A8210.3010004@illinois.edu> <7DB7911A-4FBD-4601-A117-D6FF5441636A@csail.mit.edu> <4F5A8592.4000600@illinois.edu> Message-ID: <4F5F78D0.8020907@illinois.edu> On 3/13/12 11:39 AM, Tobias von Koch wrote: > Hi John & Fan, > > I hit the exact same problem today. I can confirm that Fan's > observation of getting the /*same*/ LoopInfo* from subsequent calls to > getAnalysis(function) for /*distinct*/ functions is indeed true. > > I was very surprised by this at first as well, but I think I've found > an explanation - please anyone correct me if this is wrong: > > What you're getting from getAnalysis<>(function) is a reference to the > function pass after it has been run on the specified function. While > you can run a function pass on many different functions, there still > exists only *one* instance of the pass itself. The only thing that > changes between different calls to getAnalysis(F) is the > analysis information held by the LoopInfo pass in its LoopInfoBase > member. It gets released and overwritten on every call to > LoopInfo::runOnFunction() - see the call to releaseMemory() right at > the beginning. That seems like a reasonable explanation. > > The idea of creating some sort of Map of Function* ----> LoopInfo* > therefore won't work. It also doesn't make sense to keep Loop* > pointers around after getAnalysis() has been called again > because all that memory gets released (which is how I hit this problem)... > > Now, Fan, the practical consequence of this is that if you want to use > LoopInfo in a ModulePass, you either have to do all your work that > uses LoopInfo in between getAnalysis calls (if that's > possible you're probably better off writing a FunctionPass in the > first place) /OR/ keep re-running getAnalysis which is very > inefficient. I'd imagine the same goes for DominatorTree. > > In general, it would be nice if there was some logical separation > between a /Function Pass /and the /Analysis Information /it produces. > For LoopInfo, it's kind of there since all the data is in this > LoopInfoBase object but there is no way of taking ownership of that... Can't you just copy the analysis results out of LoopInfo as Fan suggested? I would think that if you can query it, you can copy it. -- John T. > > -- Tobias > > > On Fri, Mar 9, 2012 at 22:34, John Criswell > wrote: > > On 3/9/12 4:28 PM, Fan Long wrote: >> Thank you for your quick reply. >> >> Actually I am using a std::map to map Function* to LoopInfo*, but >> that does not help in this case. Each time I call >> getAnalysis(*F), it returns the same instance of >> llvm::LoopInfo, so the std::map is just mapping every function >> into the same instance. It seems only the analysis result for the >> last function is valid, because all the result for all previous >> functions are erased. > > Just to make sure I understand: you are saying that every time you > call getAnalysis(), you get the *same* LoopInfo * > regardless of whether you call it on the same function or on a > different function. Is that correct? > > Getting the same LoopInfo * when you call getAnalysis<> on the > same function twice would not surprise me. Getting the same > LoopInfo * when you call getAnalysis on F1 and F2 where F1 and F2 > are different functions would surprise me greatly. > > >> >> The only workaround solution I have now is to copy all analysis >> result out of the data structure of LoopInfo before I call next >> &getAnalysis(). Because llvm::LoopInfo does not provide copy >> method, this will be very dirty to do so. > > Yes, that may be what you have to do. > > > -- John T. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120313/cc87a0c7/attachment.html From tobias.von.koch at gmail.com Tue Mar 13 12:03:44 2012 From: tobias.von.koch at gmail.com (Tobias von Koch) Date: Tue, 13 Mar 2012 17:03:44 +0000 Subject: [LLVMdev] How to keep FunctionPass analysis result alive in Module Pass? In-Reply-To: <4F5F78D0.8020907@illinois.edu> References: <3EDC7AC1-478A-41CD-A970-7D75B5BD0E67@csail.mit.edu> <4F5A8210.3010004@illinois.edu> <7DB7911A-4FBD-4601-A117-D6FF5441636A@csail.mit.edu> <4F5A8592.4000600@illinois.edu> <4F5F78D0.8020907@illinois.edu> Message-ID: Hi John, glad the explanation made sense :) On Tue, Mar 13, 2012 at 16:41, John Criswell wrote: [...] > Can't you just copy the analysis results out of LoopInfo as Fan > suggested? I would think that if you can query it, you can copy it. > > Well, LoopInfoBase has a private copy constructor (for good reasons) so you can't just copy the entire thing out. The same goes for the Loop class. At the end of the day, you just have to do the actual work already at this point which you were going to do with the analysis results later on. That's possible in my case (although it means I have to restructure my module pass quite a bit), but it might not always be? -- Tobias -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120313/fd13428c/attachment.html From hfinkel at anl.gov Tue Mar 13 12:40:55 2012 From: hfinkel at anl.gov (Hal Finkel) Date: Tue, 13 Mar 2012 12:40:55 -0500 Subject: [LLVMdev] OpenMP support for LLVM In-Reply-To: <4F169BAD.6040401@playingwithpointers.com> References: <1315508012.73397.YahooMailNeo@web130205.mail.mud.yahoo.com> <4E6E28D0.1070401@grosser.es> <1315844883.92889.YahooMailNeo@web130213.mail.mud.yahoo.com> <4E6E3622.9020701@grosser.es> <4F07AC84.2030501@grosser.es> <4F144CE5.3090609@grosser.es> <1326732428.14506.382.camel@sapling> <4F169BAD.6040401@playingwithpointers.com> Message-ID: <20120313124055.169023ed@sapling2> On Wed, 18 Jan 2012 15:45:09 +0530 Sanjoy Das wrote: > Hi all, > > I'd like to put in some effort into this too -- perhaps I can write an > backend for libgomp while someone else works on a libmpc one. > > As far as the architecture is concerned, I concur with what has > already been discussed: mapping OpenMP constructs to LLVM intrinsics. > I think it would make sense to leave out intrinsics for things like > "parallel for", "parallel sections", which are essentially syntactic > sugar. I agree. > > One way to denote a structured block could be by nesting it within two > intrinsics like I am not sure how well this will work in practice, but we should come up with a plan. As far as I can tell, we have two options: intrinsics and metadata; we may want to use some combination of the two. One key issue is to decide how optimization passes will interact with the parallelization constructs. I think that it is particularly important that using OpenMP does not preclude loop unrolling and LICM, for example. As you mention, we'll also want to provide the ability to do target-specific lowering of certain constructs (like atomics). We also want to make sure that we can recognize and optimize expressions that are constant for a given thread within a parallel region (like a get-thread-id operation). Thoughts? -Hal > > llvm.openmp.parallel_begin (llvm.openmp.clause_if(%1) ) > body > llvm.openmp.parallel_end() > > or we could pass in an end label to parallel_begin (we can then call > it parallel instead of parallel_begin). I'm not sure which one is the > better idea. There will have to be restrictions on the body, of > course. We can't allow control to jump outside body without > encountering the barrier, for instance. > > The more difficult problem would be, I think, to lift the structured > block and create a function out of it that takes a closure. > > I also think there is potential for optimizing things like the > reduction clause and the atomic construct by lowering them directly > into CAS or other machine instructions. But I'll stop here before I > get ahead of myself. :) > > Thanks! -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory From rafael.espindola at gmail.com Tue Mar 13 13:06:02 2012 From: rafael.espindola at gmail.com (=?UTF-8?Q?Rafael_Esp=C3=ADndola?=) Date: Tue, 13 Mar 2012 15:06:02 -0300 Subject: [LLVMdev] Your commit 103140 In-Reply-To: <4F5F6721.60308@free.fr> References: <4F5F6721.60308@free.fr> Message-ID: On 13 March 2012 12:26, Duncan Sands wrote: > Hi Chris, your commit 103140 broke PR397 for llvm-gcc (in LLVM 2.9) and > dragonegg. ?In the PR, asm renaming creates two linkonce functions with > the same asm name (in the IR they are @"\01lstat64" and "@lstat64". ?What > used to happen is that they were both output to the assembler file, both > with the name lstat64, exactly like GCC does. ?The assembler and linker > are perfectly happy about this, presumably because the functions have weak > linkage. ?What happens now is that compilation fails with "label emitted > multiple times to assembly file". > > Do you agree that it is reasonable to support outputting multiple functions > with the same name, as long as they have weak linkage? > > Ciao, Duncan. > > PS: The alternative is to follow clang and have the front-end take care of > dropping one of the functions, rather than leaving it to the linker. If you can implement this I think it is better. Having two functions with the same name can cause problems to libLTO for example. Which function should it use? Cheers, Rafael From kursh at ispras.ru Tue Mar 13 15:19:05 2012 From: kursh at ispras.ru (kursh) Date: Wed, 14 Mar 2012 00:19:05 +0400 Subject: [LLVMdev] MC JIT on ARM can't generate valid code for external functions call In-Reply-To: <20120313134706.GB6633@britannica.bec.de> References: <4F5F4B5E.5030902@ispras.ru> <20120313134706.GB6633@britannica.bec.de> Message-ID: Instruction BL in the ARM mode can jump by offset +/- 32 MB only. If using absolute offset, we need generate a few additional instructions same as: load offset into register and jump to address from register, or save LR, and load into PC offset. But in both cases change size of code in the function will lead to possibility invalidate offsets computed in the "ARM constant island placement and branch shortening pass" and we still need in the thunk. --- Kurmangaleev Shamil 13.03.2012 17:47, Joerg Sonnenberger ?????: On Tue, Mar 13, 2012 at 05:27:58PM +0400, Shamil Kurmangaleev wrote: Because the ELF file in memory generated by MC JIT does not have the .plt section, but we need to have the following code to be emitted in it: Why do you need it to emit PIC? You know the offsets of all functions it is calling. Joerg _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu [1] http://llvm.cs.uiuc.edu [2] http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev [3] Links: ------ [1] mailto:LLVMdev at cs.uiuc.edu [2] http://llvm.cs.uiuc.edu [3] http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120314/ea508aab/attachment.html From baldrick at free.fr Tue Mar 13 15:48:43 2012 From: baldrick at free.fr (Duncan Sands) Date: Tue, 13 Mar 2012 21:48:43 +0100 Subject: [LLVMdev] Your commit 103140 In-Reply-To: <8C54CF4A-54E8-4E4A-8F49-623515A8C259@apple.com> References: <4F5F6721.60308@free.fr> <8C54CF4A-54E8-4E4A-8F49-623515A8C259@apple.com> Message-ID: <4F5FB2AB.5020007@free.fr> Hi Chris, On 13/03/12 21:11, Chris Lattner wrote: > > On Mar 13, 2012, at 8:26 AM, Duncan Sands wrote: > >> Hi Chris, your commit 103140 broke PR397 for llvm-gcc (in LLVM 2.9) and >> dragonegg. > > Wow, this is an old patch :). > >> In the PR, asm renaming creates two linkonce functions with >> the same asm name (in the IR they are @"\01lstat64" and "@lstat64". What >> used to happen is that they were both output to the assembler file, both >> with the name lstat64, exactly like GCC does. The assembler and linker >> are perfectly happy about this, presumably because the functions have weak >> linkage. What happens now is that compilation fails with "label emitted >> multiple times to assembly file". >> >> Do you agree that it is reasonable to support outputting multiple functions >> with the same name, as long as they have weak linkage? > > No, I don't. I think that an IR module should be required to be well defined and obey the rules. For GCC/clang (and any other compilers that support things like asm renaming and USER_LABEL_PREFIX), I think it is best for the frontend to not use the "\01" prefix in a case that conflicts with the normal USER_LABEL_PREFIX. For example, on an _'y system, if asm-renamed to "_foo", the IR name should be just @"foo", not "\01_foo". I think I agree. My agreement is helped by noticing that the assembler produced by gcc for the testcase from PR397 doesn't assemble! One of my side worries was that the MC layer wouldn't be able to assemble the assembler produced by gcc if it contained multiple functions with the same name, but since gas rejects it too there is no compatibility issue there after all. Ciao, Duncan. From clattner at apple.com Tue Mar 13 15:11:30 2012 From: clattner at apple.com (Chris Lattner) Date: Tue, 13 Mar 2012 13:11:30 -0700 Subject: [LLVMdev] Your commit 103140 In-Reply-To: <4F5F6721.60308@free.fr> References: <4F5F6721.60308@free.fr> Message-ID: <8C54CF4A-54E8-4E4A-8F49-623515A8C259@apple.com> On Mar 13, 2012, at 8:26 AM, Duncan Sands wrote: > Hi Chris, your commit 103140 broke PR397 for llvm-gcc (in LLVM 2.9) and > dragonegg. Wow, this is an old patch :). > In the PR, asm renaming creates two linkonce functions with > the same asm name (in the IR they are @"\01lstat64" and "@lstat64". What > used to happen is that they were both output to the assembler file, both > with the name lstat64, exactly like GCC does. The assembler and linker > are perfectly happy about this, presumably because the functions have weak > linkage. What happens now is that compilation fails with "label emitted > multiple times to assembly file". > > Do you agree that it is reasonable to support outputting multiple functions > with the same name, as long as they have weak linkage? No, I don't. I think that an IR module should be required to be well defined and obey the rules. For GCC/clang (and any other compilers that support things like asm renaming and USER_LABEL_PREFIX), I think it is best for the frontend to not use the "\01" prefix in a case that conflicts with the normal USER_LABEL_PREFIX. For example, on an _'y system, if asm-renamed to "_foo", the IR name should be just @"foo", not "\01_foo". -Chris From andrew.kaylor at intel.com Tue Mar 13 17:24:55 2012 From: andrew.kaylor at intel.com (Kaylor, Andrew) Date: Tue, 13 Mar 2012 22:24:55 +0000 Subject: [LLVMdev] MC JIT on ARM can't generate valid code for external functions call In-Reply-To: <4F5F4B5E.5030902@ispras.ru> References: <4F5F4B5E.5030902@ispras.ru> Message-ID: <0983E6C011D2DC4188F8761B533492DE0B2254@ORSMSX105.amr.corp.intel.com> There is a patch in progress (http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120220/137666.html -- I think it's still just in progress) which refactors the MCJIT dynamic loading and adds some new features. I believe that support for external functions on ARM is one of the features that was added. -Andy -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Shamil Kurmangaleev Sent: Tuesday, March 13, 2012 6:28 AM To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] MC JIT on ARM can't generate valid code for external functions call Hello. We found the following problem with MC JIT, on ARM it can't generate valid code for instruction "bl " like: bl printf Because the ELF file in memory generated by MC JIT does not have the .plt section, but we need to have the following code to be emitted in it: .plt:00008290 STR LR, [SP,#-4]! .plt:00008294 LDR LR, =_GLOBAL_OFFSET_TABLE_ ; PIC mode .plt:00008298 NOP .plt:0000829C LDR PC, [LR,#8]! Also GOT section doesn't exists. To fix this we need to generate the valid entries in GOT and PLT sections We propose adding these sections and generating a thunk, same as in the usual compilation pipeline. What is the best way to fix these issues? --- Kurmangaleev Shamil, _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From davidterei at gmail.com Tue Mar 13 18:36:16 2012 From: davidterei at gmail.com (David Terei) Date: Tue, 13 Mar 2012 16:36:16 -0700 Subject: [LLVMdev] LLVM GHC Backend: Tables Next To Code In-Reply-To: <607B6576-F8F5-4293-900D-EFA21B7FD48C@apple.com> References: <607B6576-F8F5-4293-900D-EFA21B7FD48C@apple.com> Message-ID: Hi Chris, One remaining question here is, if the GHC team tries some of these alternative schemes and finds them unsatisfactory what is the LLVM communities feeling in regards to extending LLVM IR to support directly implementing TNTC? How do you envision this would look at the IR level, how much work do you think it would be and most importantly do you feel LLVM would be willing to accept patches for it? We simply post process the assembly right now to get the desired code and it works very well but it means we can't use the integrated assembler which annoys me. Cheers, David. On 14 February 2012 02:59, Chris Lattner wrote: > On Feb 13, 2012, at 6:49 AM, Sergiu Ivanov wrote: >> On behalf of GHC hackers, I would like to discuss the possibility of >> having a proper implementation of the tables-next-to-code optimisation >> in LLVM. > > It would be great to have this. ?However, the design will be tricky. ?Is there anything that spells out how the TNTC optimization works at the actual machine instruction level? ?It seems that there should be a blog post somewhere that shows the code with and without the optimization, but I can't find it offhand. > >> This, obviously, requires certain >> ordering of data and text in the object code. ?Since LLVM does not >> make it possible to explicitly control the placement of data and code, >> the necessary ordering is currently achieved by injecting GNU >> Assembler subsections on platforms supported by GNU Assembler. ?Mac >> assembler, however, does not support this feature, so the resulting >> object code is post-processed directly. > > It's interesting that you bring this up. ?It turns out that on the mac toolchain (unless you disable subsectionsviasymbol, a gross hack) does not give you the ability to control the ordering of blobs of code separated by global labels (aka 'atoms' in the linker's terminology). ?This is important because it enables link-time dead code elimination, profile based code reordering etc. ?My understanding is that ELF toolchains don't have something like this, but it would be unfortunate if TNTC fundamentally prevents something like this from working. > > Beyond this, the proposed model has some other issues: code ordering only makes sense within a linker section, but modeling "the table" and "the code" as two different LLVM values (a global value and a function) would mean that the optimizer will be tempted to put them into different sections, do dead code elimination, etc. > >> He proposes adding a "placebefore" >> attribute to global variables (or, similarly, a "placeafter" attribute >> for functions). ?The corresponding example is: > > This is a non-starter for a few reasons, but that doesn't mean that there aren't other reasonable options. ?I'd really like to see the codegen that you guys are after to try to help come up with another suggestion that isn't a complete one-off hack for GHC. :) > > One random question: have you considered placing the table *inside* of the function? ?If the prologue for the closure was effectively: > > Closure: > ?jmp .LAfterTable > ?.word ... > ?.word ... > .LAfterTable: > ?push $rbp > ?... > > then you can avoid a lot of problems. ?I realize that this is not going to be absolutely as fast as your current TNTC implementation, but processors are *really really* good at predicting unconditional branches, so the cost is probably minimal, and it is likely to be much much faster than not having TNTC at all. > > Getting even this to work will not be fully straight-forward, but again I'd like to understand more of what you're looking for from codegen to understand what the constraints are. > > -Chris > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From dengjunqi06323011 at gmail.com Wed Mar 14 06:20:13 2012 From: dengjunqi06323011 at gmail.com (Jun-qi Deng) Date: Wed, 14 Mar 2012 19:20:13 +0800 Subject: [LLVMdev] llvm-config --cxxflags does not give the result the configuration script wants? In-Reply-To: <87aa3klghe.fsf@wanadoo.es> References: <87aa3klghe.fsf@wanadoo.es> Message-ID: > > Recently I think I found a bug in llvm's CMakeLists.(I use llvm 3.1svn > and > > Clang 3.1) I follow the normal way to try to compile an application that > > utilize both clang and llvm: I ./autogen.sh it, ./configure it and make > it. > > But the make fails. At last I found out the failure is because that the > > Makefile's CXXFLAGS does not contain -fno-rtti option which is needed by > > the compilation. > > Is it? In my experience, it isn't. Please show the relevant command > generated by your makefile and the associated error message(s). > > RTTI is an on/off option that changes per LLVM library, so setting > -fno-rtti for using LLVM makes no sense. VMCore and Support have -frtti > while most of the rest have -fno-rtti. The switch is decided on > cmake/modules/LLVMProcessSources.cmake depending on the value of > LLVM_REQUIRES_RTTI. > > > Hello! I'm sorry, because I've already solved the build problem with the mentioned app manually, now it's not very convenient for me to reproduce the error for the moment. I agree that "RTTI is an on/off option". But I think the point of the problem I mentioned here is: some applications that depend on llvm and clang use llvm-config to configure their Makefile, while the llvm-config cannot provide the correct information they need. And this is what actually happened under my platform. Regards, TangKK > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120314/20d10cbf/attachment.html From senthilkumar_ttv at yahoo.com Wed Mar 14 07:52:45 2012 From: senthilkumar_ttv at yahoo.com (Senthil Kumar) Date: Wed, 14 Mar 2012 05:52:45 -0700 (PDT) Subject: [LLVMdev] ARM EHABI support in LLVM + clang Message-ID: <1331729565.85187.YahooMailNeo@web140805.mail.bf1.yahoo.com> Hi all, ????????I found some problem when trying to use exception handling with LLVM (SVN revision 152113)?+?clang combination (SVN revision 152115).?I am observing failure in __gnu_unwind_pr_common unwind-arm.c (gcc/config/arm dir in gcc 4.5.3 source). Personality routine 0 is used. ? ????????Inside __gnu_unwind_pr_common after "switch (((offset & 1) << 1) | (len & 1))" control?reaches default: return _URC_FAILURE. offset was == 0x808f and len was == 0xff. (((offset & 1) << 1) | (len & 1)) was == 3. 3 is an undefined descriptor as per ARM EHABI spec section 9.2. DwarfException::EmitExceptionTable Asm->EmitULEB128(TTypeBaseOffset, "@TType base offset", SizeAlign); was generating ".asciz? "\217\200"???????????? @ @TType base offset" in .s file leading to the problem. ? ????????I would like to know whether anyone managed to use ARM EHABI successfully. Also I can see from 3.0 release html notes EHABI support might be there in 3.1, html comment was there though. Can someone let me know what is the current state of ARM EHABI support for C++ exceptions? I am not sure whether DwarfException::EmitExceptionTable is?appropriate?to make?ARM EHABI specific changes. If someone is already working on this pls let me know your comments. I have?mentioned the steps I have followed. ? $LLVM_BIN_PATH/clang --sysroot=$SYSROOT a.cpp -funwind-tables -ccc-host-triple arm-linux-gnueabi -mcpu=cortex-a9 -emit-llvm -c -o a.bc $LLVM_BIN_PATH/llc -mtriple=arm-linux-gnueabi -arm-enable-ehabi -arm-enable-ehabi-descriptors -march=arm a.bc arm-linux-gnueabi-g++ --sysroot=$SYSROOT -static a.s -o a.bin ? ==== a.cpp ==== class A {}; void throwA(int x) { ? if(x == 0) ??????? throw A(); } int main() { ? try { ??? throwA(0); ? } catch (A a) { ??? return 0; ? } ? return 1; } ? Thanks in advance Senthil Kumar From ofv at wanadoo.es Wed Mar 14 08:00:16 2012 From: ofv at wanadoo.es (=?us-ascii?Q?=3D=3Futf-8=3FQ=3F=3DC3=3D93scar=5FFuentes=3F=3D?=) Date: Wed, 14 Mar 2012 14:00:16 +0100 Subject: [LLVMdev] llvm-config --cxxflags does not give the result the configuration script wants? In-Reply-To: (Jun-qi Deng's message of "Wed, 14 Mar 2012 19:20:13 +0800") References: <87aa3klghe.fsf@wanadoo.es> Message-ID: <878vj3xrq7.fsf@wanadoo.es> Jun-qi Deng writes: >> Is it? In my experience, it isn't. Please show the relevant command >> generated by your makefile and the associated error message(s). >> > >> RTTI is an on/off option that changes per LLVM library, so setting >> -fno-rtti for using LLVM makes no sense. VMCore and Support have -frtti >> while most of the rest have -fno-rtti. The switch is decided on >> cmake/modules/LLVMProcessSources.cmake depending on the value of >> LLVM_REQUIRES_RTTI. >> >> >> Hello! I'm sorry, because I've already solved the build problem with the > mentioned app manually, now it's not very convenient for me to reproduce > the error for the moment. I agree that "RTTI is an on/off option". But I > think the point of the problem I mentioned here is: some applications that > depend on llvm and clang use llvm-config to configure their Makefile, while > the llvm-config cannot provide the correct information they need. And this > is what actually happened under my platform. If you read again my post, you'll see that I doubt that the absence of -fno-rtti on the output of llvm-config is the problem. Using -fno-rtti is an internal decision taken while building LLVM and it makes no sense to impose it on third party code. In short: LLVM does not require -fno-rtti from the projects that link to it. Most likely, you were experiencing link problems on *your* code that were resolved by using -fno-rtti. But as you are not interested on investigating the issue, we'll never know for sure. From dengjunqi06323011 at gmail.com Wed Mar 14 08:40:01 2012 From: dengjunqi06323011 at gmail.com (Jun-qi Deng) Date: Wed, 14 Mar 2012 21:40:01 +0800 Subject: [LLVMdev] llvm-config --cxxflags does not give the result the configuration script wants? In-Reply-To: <878vj3xrq7.fsf@wanadoo.es> References: <87aa3klghe.fsf@wanadoo.es> <878vj3xrq7.fsf@wanadoo.es> Message-ID: I got your point. Thank you, and I'd like to provide the relative message now. But firstly, what do you mean by the "relevant command generated by your makefile"? What I can tell you now is: The Error Message: make[3]: Entering directory `/home/tang.kk/ppcg/ppcg/isl/interface' CXXLD extract_interface extract_interface.o:(.data.rel.ro._ZTI13MyASTConsumer[typeinfo for MyASTConsumer]+0x10): undefined reference to `typeinfo for clang::ASTConsumer' collect2: ld returned 1 exit status make[3]: *** [extract_interface] Error 1 make[3]: Leaving directory `/home/tang.kk/ppcg/ppcg/isl/interface' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/home/tang.kk/ppcg/ppcg/isl' make[1]: *** [all] Error 2 make[1]: Leaving directory `/home/tang.kk/ppcg/ppcg/isl' make: *** [all-recursive] Error 1 The "make V=1" is: tang.kk at linux-eda-0:~/ppcg/ppcg> make V=1 Making all in isl make[1]: Entering directory `/home/tang.kk/ppcg/ppcg/isl' make all-recursive make[2]: Entering directory `/home/tang.kk/ppcg/ppcg/isl' Making all in . make[3]: Entering directory `/home/tang.kk/ppcg/ppcg/isl' make[3]: Leaving directory `/home/tang.kk/ppcg/ppcg/isl' Making all in interface make[3]: Entering directory `/home/tang.kk/ppcg/ppcg/isl/interface' /bin/sh ../libtool --tag=CXX --mode=link g++ -I/home/tang.kk/work/llvm-install/include -fPIC -fvisibility-inlines-hidden -D_GNU_SOURCE -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -g -O2 -save-temps -L/home/tang.kk/work/llvm-install/lib -ldl -lpthread -o extract_interface python.o extract_interface.o -lclangFrontend -lclangSerialization -lclangParse -lclangSema -lclangAnalysis -lclangAST -lclangLex -lclangBasic -lclangDriver -lLLVMAnalysis -lLLVMTarget -lLLVMMC -lLLVMObject -lLLVMCore -lLLVMSupport -L/home/tang.kk/work/llvm-install/lib -ldl -lpthread libtool: link: g++ -I/home/tang.kk/work/llvm-install/include -fPIC -fvisibility-inlines-hidden -D_GNU_SOURCE -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -g -O2 -save-temps -o extract_interface python.o extract_interface.o -L/home/tang.kk/work/llvm-install/lib -lclangFrontend -lclangSerialization -lclangParse -lclangSema -lclangAnalysis -lclangAST -lclangLex -lclangBasic -lclangDriver -lLLVMAnalysis -lLLVMTarget -lLLVMMC -lLLVMObject -lLLVMCore -lLLVMSupport -ldl -lpthread extract_interface.o:(.data.rel.ro._ZTI13MyASTConsumer[typeinfo for MyASTConsumer]+0x10): undefined reference to `typeinfo for clang::ASTConsumer' collect2: ld returned 1 exit status make[3]: *** [extract_interface] Error 1 make[3]: Leaving directory `/home/tang.kk/ppcg/ppcg/isl/interface' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/home/tang.kk/ppcg/ppcg/isl' make[1]: *** [all] Error 2 make[1]: Leaving directory `/home/tang.kk/ppcg/ppcg/isl' make: *** [all-recursive] Error 1 And my llvm-config --cxxflags is: -fPIC -fvisibility-inlines-hidden -D_GNU_SOURCE -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS Best Regards, TangKK > >> Is it? In my experience, it isn't. Please show the relevant command > >> generated by your makefile and the associated error message(s). > >> > > > >> RTTI is an on/off option that changes per LLVM library, so setting > >> -fno-rtti for using LLVM makes no sense. VMCore and Support have -frtti > >> while most of the rest have -fno-rtti. The switch is decided on > >> cmake/modules/LLVMProcessSources.cmake depending on the value of > >> LLVM_REQUIRES_RTTI. > >> > >> > >> Hello! I'm sorry, because I've already solved the build problem with the > > mentioned app manually, now it's not very convenient for me to reproduce > > the error for the moment. I agree that "RTTI is an on/off option". But I > > think the point of the problem I mentioned here is: some applications > that > > depend on llvm and clang use llvm-config to configure their Makefile, > while > > the llvm-config cannot provide the correct information they need. And > this > > is what actually happened under my platform. > > If you read again my post, you'll see that I doubt that the absence of > -fno-rtti on the output of llvm-config is the problem. Using -fno-rtti > is an internal decision taken while building LLVM and it makes no sense > to impose it on third party code. In short: LLVM does not require > -fno-rtti from the projects that link to it. > > Most likely, you were experiencing link problems on *your* code that > were resolved by using -fno-rtti. But as you are not interested on > investigating the issue, we'll never know for sure. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120314/e6bb93dc/attachment.html From konstantin.vladimirov at gmail.com Wed Mar 14 08:57:27 2012 From: konstantin.vladimirov at gmail.com (Konstantin Vladimirov) Date: Wed, 14 Mar 2012 17:57:27 +0400 Subject: [LLVMdev] How to set constant pool section? Message-ID: Hi, In the document: http://llvm.org/docs/WritingAnLLVMBackend.html described example like: SparcTargetAsmInfo::SparcTargetAsmInfo(const SparcTargetMachine &TM) { Data16bitsDirective = "\t.half\t"; Data32bitsDirective = "\t.word\t"; Data64bitsDirective = 0; // .xword is only supported by V9. ZeroDirective = "\t.skip\t"; CommentString = "!"; ConstantPoolSection = "\t.section \".rodata\",#alloc\n"; } That is wrong for LLVM 3.0 In latest LLVM versions, Sparc have MC subtarget and: SparcELFMCAsmInfo::SparcELFMCAsmInfo(const Target &T, StringRef TT) { IsLittleEndian = false; Triple TheTriple(TT); if (TheTriple.getArch() == Triple::sparcv9) PointerSize = 8; Data16bitsDirective = "\t.half\t"; Data32bitsDirective = "\t.word\t"; Data64bitsDirective = 0; // .xword is only supported by V9. ZeroDirective = "\t.skip\t"; CommentString = "!"; HasLEB128 = true; SupportsDebugInformation = true; SunStyleELFSectionSwitchSyntax = true; UsesELFSectionDirectiveForBSS = true; WeakRefDirective = "\t.weak\t"; PrivateGlobalPrefix = ".L"; } But I can not find in Sparc, or any other backend code to set ConstantPoolSection. I tried in my backend deriving MCAsmInfo, but it seems, that ConstantPoolSection is not member of MCAsmInfo. I really need in my backend value for this section, distinct from default. Where can I set it? Thanks in advance for all suggestions. --- With best regards, Konstantin From ivanllopard at gmail.com Wed Mar 14 09:07:57 2012 From: ivanllopard at gmail.com (Ivan Llopard) Date: Wed, 14 Mar 2012 15:07:57 +0100 Subject: [LLVMdev] Data/Address registers In-Reply-To: <4AE96120-FE98-4DC2-B963-A60C043B33E2@apple.com> References: <4F521304.1030900@gmail.com> <46738C23-06A8-4F28-8068-FEBC0B39798E@apple.com> <4F576F44.6000801@gmail.com> <4AE96120-FE98-4DC2-B963-A60C043B33E2@apple.com> Message-ID: <4F60A63D.9070409@gmail.com> Le 07/03/2012 17:36, Jim Grosbach a ?crit : > On Mar 7, 2012, at 6:23 AM, Ivan Llopard wrote: > >> Hi Jim, >> >> Thanks for your response. >> >> Le 06/03/2012 22:54, Jim Grosbach a ?crit : >>> Hi Ivan, >>> On Mar 3, 2012, at 4:48 AM, Ivan Llopard wrote: >>> >>>> Hi, >>>> >>>> I'm facing a problem in llvm while porting it to a new target and I'll >>>> need some support. >>>> We have 2 kind of register, one for general purposes (i.e. arithmetic, >>>> comparisons, etc.) and the other for memory addressing. >>> OK. Separate register classes should be able to handle this. >>> >>>> Cross copies are not allowed (no data path). >>> You mean you can't copy directly from a general purpose register to an address register? That's an unfortunate architectural quirk. You may have to write some interesting and potentially ugly code in copyPhysReg() to handle that. >>> >> Actually, I can't copy them in any way, it's just impossible :-/. > Do you have load/store instructions for each register class? Worst case you could do a push/pop pair on the stack. It's really, really important that there be a way, even a very expensive way, to do this. I'm curious, why is it so important ? We are trying hard to avoid this kind of situations. >>>> We use clang 3.0 to produce assembler code. >>>> Because both registers have the same size and type (i16), I don't know >>>> what would be the best solution to distinguish them in order to match >>>> the right instructions. >>> The register classes should take care of this. >> I tried but IMO the matching rule should be context-dependent, i.e. an i16 addition should match machine additions with operands being either data registers or address registers depending on its usage. Even if I look at index operands of load/stores (into the DAG) to match target's addressing modes, I can't assume that some operations are not being used for something else than basic arithmetics (like comparisons which are not supported for address regs). Is it still possible to get ride of this with register classes ? > It should be, yes. For a contrived example of a simple add-immediate instruction for each: > > def ADD_address_reg: myBaseInstrClass<(outs ADDR_REG:$dst), (ins ADDR_REG:$src, i32imm:$imm), [(set ADDR_REG:$dst, (add ADDR_REG:$dst, i32imm:$imm)]>; > def ADD_general_reg: myBaseInstrClass<(outs GPR:$dst), (ins GPR:$src, i32imm:$imm), [(set GPR:$dst, (add GPR:$dst, i32imm:$imm)]>; > > Likewise, other operations that can target either register class should have a variant for each. ISel will choose the appropriate one based on the rest of the operands. Thanks for your advice Jim, I did what you said it but it didn't work and I have no clue what is going wrong. I can't realize where register classes are matched in order to pick the right instructions. I couldn't find a trace of register classes in the instruction selection process. I have these patterns defined so far: def AADDMri { // Instruction MephInstr AGInstr dag OutOperandList = (outs AGRegs:$dst); dag InOperandList = (ins AGRegs:$a, i16imm:$b); list Pattern = [(set AGRegs:$dst, (add AGRegs:$a, imm:$b))]; ... } def DADDri { // Pattern Pat dag PatternToMatch = (add LSubRegs:$a, imm:$b); list ResultInstrs = [(asrsat (asextr (sextr iRSubRegs:$a), (XLoadImm imm:$b)), (i16 0))]; } where asrsat has LSubRegs as its output operand. Both patterns have the same complexity and they are located at different scopes. For these two patterns, tblgen is producing the following isel opcodes: /*3244*/ /*Scope*/ 20, /*->3265*/ /*3245*/ OPC_RecordChild1, // #1 = $b /*3246*/ OPC_MoveChild, 1, /*3248*/ OPC_CheckOpcode, TARGET_VAL(ISD::Constant), /*3251*/ OPC_MoveParent, /*3252*/ OPC_CheckType, MVT::i16, /*3254*/ OPC_EmitConvertToTarget, 1, /*3256*/ OPC_MorphNodeTo, TARGET_VAL(ME::AADDMri), 0, and in the same logic chain of pattern checking, DADDri comes right after AADDMri (with Scope changes in the middle) /*3285*/ OPC_RecordChild0, // #0 = $a /*3286*/ OPC_RecordChild1, // #1 = $b /*3287*/ OPC_Scope, 42, /*->3331*/ // 2 children in Scope /*3289*/ OPC_MoveChild, 1, /*3291*/ OPC_CheckOpcode, TARGET_VAL(ISD::Constant), /*3294*/ OPC_MoveParent, /*3295*/ OPC_CheckType, MVT::i16, /*3297*/ OPC_EmitNode, TARGET_VAL(ME::sextr), 0, 1/*#VTs*/, MVT::i64, 1/*#Ops*/, 0, // Results = #2 /*3305*/ OPC_EmitConvertToTarget, 1, /*3307*/ OPC_EmitNodeXForm, 0, 3, // XLoadImm ... AADDMri supersedes DADDri (the same checks are performed). It's worth to note that the result is used by another instruction which has LSubRegs as its source operand and I got copy instructions added by the iselector to meet this requirement. I really would like to know why this is happening. It's like tblgen is not taking into account the register class assignations of both instructions :-/. Ivan >> I can make a pass before ISel to annotate the code identifying those registers which are only used for addressing (by doing a simple data-flow analysis), can it help ISelector later ? >> Because I could not find how to get metadata from the DAG to drive matching rules or lowering phases, is it possible ? How is metadata transferred to the DAG, where should I look for it ? >> > Metadata should not be necessary for this. In general, metadata should never be used for anything that's required information, only for optional information. I.e., if it's stripped out of the IR, the backend should still generate correct code. > > -Jim > >> Ivan >> >>>> Moreover, the standard pointer arithmetic is not >>>> enough for us (we need to support modulo operations also). >>>> I thought that I could manually match every arithmetic operation while >>>> matching the addressing mode but it doesn't work because intermediate >>>> results are sometimes reused for other purposes (e.g. comparisons). >>> I suggest getting things working correctly first and then coming back to things like this as an optimization. >>> >>>> Do I need to add another type to clang/llvm ? >>>> >>> Unlikely. >>> >>> Regards, >>> Jim >>> >>> >>>> Thanks in advance, >>>> >>>> Ivan >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From ofv at wanadoo.es Wed Mar 14 09:08:47 2012 From: ofv at wanadoo.es (=?us-ascii?Q?=3D=3Futf-8=3FQ=3F=3DC3=3D93scar=5FFuentes=3F=3D?=) Date: Wed, 14 Mar 2012 15:08:47 +0100 Subject: [LLVMdev] llvm-config --cxxflags does not give the result the configuration script wants? In-Reply-To: (Jun-qi Deng's message of "Wed, 14 Mar 2012 21:40:01 +0800") References: <87aa3klghe.fsf@wanadoo.es> <878vj3xrq7.fsf@wanadoo.es> Message-ID: <874ntrxok0.fsf@wanadoo.es> Jun-qi Deng writes: > I got your point. Thank you, and I'd like to provide the relative message > now. But firstly, what do you mean by the "relevant command generated by > your makefile"? What I can tell you now is: > > The Error Message: > make[3]: Entering directory `/home/tang.kk/ppcg/ppcg/isl/interface' > CXXLD extract_interface > extract_interface.o:(.data.rel.ro._ZTI13MyASTConsumer[typeinfo for > MyASTConsumer]+0x10): undefined reference to `typeinfo for > clang::ASTConsumer' > collect2: ld returned 1 exit status [snip] So you define a class MyAstConsumer that derives from clang::ASTConsumer and then the link fails because the typinfo for MyAstConsumer can not reference the typeinfo of clang::ASTConsumer, because the latter is undefined as a consecuence of building Clang with -fno-rtti. In this case the right thing is to apply -fno-rtti to the specific source file that defines MyAstConsumer. Please note that deriving from a Clang/LLVM class is not something that all projects do. The typical compiler that uses LLVM as a backend does not need to derive from LLVM classes at all. Imposing -fno-rtti on those projects most likely would cause breakage on user's code. From ivanllopard at gmail.com Wed Mar 14 10:16:32 2012 From: ivanllopard at gmail.com (Ivan Llopard) Date: Wed, 14 Mar 2012 16:16:32 +0100 Subject: [LLVMdev] Lowering formal pointer arguments Message-ID: <4F60B650.1060406@gmail.com> Hi, How can I get the llvm-type of the formal argument while lowering it ? My target needs to map pointer and non-pointer parameters to different registers. In addition, parameter lowering is address space dependent (another reason why I need such information). Looking at the DAGBuilder, I found that it is dropping it when translating llvm-types to BE types. Even if the base type is saved into the MVT structure, it's a private member. What's the reason for hiding it ? Why not to keep such information ? Thanks in advance, Ivan From anton at korobeynikov.info Wed Mar 14 12:29:30 2012 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Wed, 14 Mar 2012 21:29:30 +0400 Subject: [LLVMdev] How to set constant pool section? In-Reply-To: References: Message-ID: Hello > I really need in my backend value for this section, distinct from > default. Where can I set it? It was renamed to ReadOnlySection. You might want to check the logic inside CodeGen/TargetLoweringObjectFileImp.cpp (in particular - TargetLoweringObjectFile::SelectionSectionForGlobal) to see how it's used. -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From grosbach at apple.com Wed Mar 14 12:38:28 2012 From: grosbach at apple.com (Jim Grosbach) Date: Wed, 14 Mar 2012 10:38:28 -0700 Subject: [LLVMdev] Data/Address registers In-Reply-To: <4F60A63D.9070409@gmail.com> References: <4F521304.1030900@gmail.com> <46738C23-06A8-4F28-8068-FEBC0B39798E@apple.com> <4F576F44.6000801@gmail.com> <4AE96120-FE98-4DC2-B963-A60C043B33E2@apple.com> <4F60A63D.9070409@gmail.com> Message-ID: <457DBDB9-6AA6-4081-B7D0-6A576042A59A@apple.com> On Mar 14, 2012, at 7:07 AM, Ivan Llopard wrote: > Le 07/03/2012 17:36, Jim Grosbach a ?crit : >> On Mar 7, 2012, at 6:23 AM, Ivan Llopard wrote: >> >>> Hi Jim, >>> >>> Thanks for your response. >>> >>> Le 06/03/2012 22:54, Jim Grosbach a ?crit : >>>> Hi Ivan, >>>> On Mar 3, 2012, at 4:48 AM, Ivan Llopard wrote: >>>> >>>>> Hi, >>>>> >>>>> I'm facing a problem in llvm while porting it to a new target and I'll >>>>> need some support. >>>>> We have 2 kind of register, one for general purposes (i.e. arithmetic, >>>>> comparisons, etc.) and the other for memory addressing. >>>> OK. Separate register classes should be able to handle this. >>>> >>>>> Cross copies are not allowed (no data path). >>>> You mean you can't copy directly from a general purpose register to an address register? That's an unfortunate architectural quirk. You may have to write some interesting and potentially ugly code in copyPhysReg() to handle that. >>>> >>> Actually, I can't copy them in any way, it's just impossible :-/. >> Do you have load/store instructions for each register class? Worst case you could do a push/pop pair on the stack. It's really, really important that there be a way, even a very expensive way, to do this. > > I'm curious, why is it so important ? We are trying hard to avoid this kind of situations. Sometimes the allocator, and other bits, will need to do a cross-class copy. It's assumed that a value can be copied between register classes for which that value type is legal. The coalescer will then go through and try hard to get rid of any copies that aren't actually needed. Specifically, as I understand it, there needs to be a way to copy between any two register classes for which the same ValueType is legal. > >>>>> We use clang 3.0 to produce assembler code. >>>>> Because both registers have the same size and type (i16), I don't know >>>>> what would be the best solution to distinguish them in order to match >>>>> the right instructions. >>>> The register classes should take care of this. >>> I tried but IMO the matching rule should be context-dependent, i.e. an i16 addition should match machine additions with operands being either data registers or address registers depending on its usage. Even if I look at index operands of load/stores (into the DAG) to match target's addressing modes, I can't assume that some operations are not being used for something else than basic arithmetics (like comparisons which are not supported for address regs). Is it still possible to get ride of this with register classes ? >> It should be, yes. For a contrived example of a simple add-immediate instruction for each: >> >> def ADD_address_reg: myBaseInstrClass<(outs ADDR_REG:$dst), (ins ADDR_REG:$src, i32imm:$imm), [(set ADDR_REG:$dst, (add ADDR_REG:$dst, i32imm:$imm)]>; >> def ADD_general_reg: myBaseInstrClass<(outs GPR:$dst), (ins GPR:$src, i32imm:$imm), [(set GPR:$dst, (add GPR:$dst, i32imm:$imm)]>; >> >> Likewise, other operations that can target either register class should have a variant for each. ISel will choose the appropriate one based on the rest of the operands. > > Thanks for your advice Jim, I did what you said it but it didn't work and I have no clue what is going wrong. I can't realize where register classes are matched in order to pick the right instructions. I couldn't find a trace of register classes in the instruction selection process. > I have these patterns defined so far: > > def AADDMri { // Instruction MephInstr AGInstr > dag OutOperandList = (outs AGRegs:$dst); > dag InOperandList = (ins AGRegs:$a, i16imm:$b); > list Pattern = [(set AGRegs:$dst, (add AGRegs:$a, imm:$b))]; > ? > } > > def DADDri { // Pattern Pat > dag PatternToMatch = (add LSubRegs:$a, imm:$b); > list ResultInstrs = [(asrsat (asextr (sextr iRSubRegs:$a), (XLoadImm imm:$b)), (i16 0))]; > } > > where asrsat has LSubRegs as its output operand. Both patterns have the same complexity and they are located at different scopes. For these two patterns, tblgen is producing the following isel opcodes: > > /*3244*/ /*Scope*/ 20, /*->3265*/ > /*3245*/ OPC_RecordChild1, // #1 = $b > /*3246*/ OPC_MoveChild, 1, > /*3248*/ OPC_CheckOpcode, TARGET_VAL(ISD::Constant), > /*3251*/ OPC_MoveParent, > /*3252*/ OPC_CheckType, MVT::i16, > /*3254*/ OPC_EmitConvertToTarget, 1, > /*3256*/ OPC_MorphNodeTo, TARGET_VAL(ME::AADDMri), 0, > > and in the same logic chain of pattern checking, DADDri comes right after AADDMri (with Scope changes in the middle) > > /*3285*/ OPC_RecordChild0, // #0 = $a > /*3286*/ OPC_RecordChild1, // #1 = $b > /*3287*/ OPC_Scope, 42, /*->3331*/ // 2 children in Scope > /*3289*/ OPC_MoveChild, 1, > /*3291*/ OPC_CheckOpcode, TARGET_VAL(ISD::Constant), > /*3294*/ OPC_MoveParent, > /*3295*/ OPC_CheckType, MVT::i16, > /*3297*/ OPC_EmitNode, TARGET_VAL(ME::sextr), 0, > 1/*#VTs*/, MVT::i64, 1/*#Ops*/, 0, // Results = #2 > /*3305*/ OPC_EmitConvertToTarget, 1, Huh. I would have expected OPC_EmitRegister here. Probably something different in your target causing this. I don't anticipate that it'll cause a problem, though, as there's the CheckType bits to keep things sane. > /*3307*/ OPC_EmitNodeXForm, 0, 3, // XLoadImm > ... > > AADDMri supersedes DADDri (the same checks are performed). It's worth to note that the result is used by another instruction which has LSubRegs as its source operand and I got copy instructions added by the iselector to meet this requirement. Hmm.. OK. So it's correctly understanding the class requirements of the instruction, just not doing what we want in order to meet them. I'm suspecting TableGen isn't as ambitious as one would hope in this regard. That is, defining separate instructions w/ the different register classes is a necessary, but not sufficient, condition to getting where you want to go. ISel is being driven by the ValueType, which is in turn mapped to a register class to use for that value type by default. When instructions need a different register class, regalloc will insert copies to satisfy the constraint. That is, isel is driving the register class selection. I'd thought there was at least some information flowing the other direction, but it looks like I was mistaken. > I really would like to know why this is happening. It's like tblgen is not taking into account the register class assignations of both instructions :-/. Well, the differences are taken into account, because we're seeing the copy inserted to handle them. There's just insufficient effort made to avoid the copy entirely. Now, that's all fine and good, but doesn't directly help you solve your original problem. The more I think about it, this is effectively a heuristically based problem, as there's no 100% "right" answer. Consider the following contrived example: define i16 @foo(i16* %ptr, i16 %a) nounwind ssp { %1 = getelementptr inbounds i16* %ptr, i16 %a %2 = ptrtoint i16* %1 to i16 store i16 %2, i16* %ptr, align 4 ret i16 %2 } The same intermediate value (%1) is being used here both as a generic i16 and as a pointer value. Which register class should be used to compute the value? There will be a cross-class copy instruction either way. I think you may be stuck having smart custom lowering for all the operations you want to work on whichever register class isn't the default for i16. That and/or or have a custom target pass that runs before register allocation to go through and clean things up, changing which instructions are used based on context. Personally, I'd probably go with the latter. Get your target basically working using the (expensive) copies first. Then start building up smarts to make the generated code efficient, not just correct. For example, a simple pattern to look for is to identify loads or stores where the address is coming from a copy of a value computed by a chain of arithmetic instructions and the values defined by those instructions have no other uses outside just computing the address. You can trivially swap those instructions (and the register classes of the operands) with the versions that operate on address registers and get rid of the copy. Honestly, that alone will likely be good enough for most cases. Regards, Jim > Ivan > >>> I can make a pass before ISel to annotate the code identifying those registers which are only used for addressing (by doing a simple data-flow analysis), can it help ISelector later ? >>> Because I could not find how to get metadata from the DAG to drive matching rules or lowering phases, is it possible ? How is metadata transferred to the DAG, where should I look for it ? >>> >> Metadata should not be necessary for this. In general, metadata should never be used for anything that's required information, only for optional information. I.e., if it's stripped out of the IR, the backend should still generate correct code. >> >> -Jim >> >>> Ivan >>> >>>>> Moreover, the standard pointer arithmetic is not >>>>> enough for us (we need to support modulo operations also). >>>>> I thought that I could manually match every arithmetic operation while >>>>> matching the addressing mode but it doesn't work because intermediate >>>>> results are sometimes reused for other purposes (e.g. comparisons). >>>> I suggest getting things working correctly first and then coming back to things like this as an optimization. >>>> >>>>> Do I need to add another type to clang/llvm ? >>>>> >>>> Unlikely. >>>> >>>> Regards, >>>> Jim >>>> >>>> >>>>> Thanks in advance, >>>>> >>>>> Ivan >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From rkotler at mips.com Wed Mar 14 16:43:12 2012 From: rkotler at mips.com (reed kotler) Date: Wed, 14 Mar 2012 14:43:12 -0700 Subject: [LLVMdev] ENABLE_OPTIMIZED=1 Message-ID: <4F6110F0.2050803@mips.com> When building without ENABLE_OPTIMIZED=1, is that supposed to take 10 times longer to "build" LLVM than with ENABLE_OPTIMIZED=1? Thanks. Reed From James.Molloy at arm.com Wed Mar 14 16:50:33 2012 From: James.Molloy at arm.com (James Molloy) Date: Wed, 14 Mar 2012 21:50:33 +0000 Subject: [LLVMdev] ENABLE_OPTIMIZED=1 In-Reply-To: <4F6110F0.2050803@mips.com> References: <4F6110F0.2050803@mips.com> Message-ID: Building with debug info makes the links take a really long time on a box with <4GB RAM. ________________________________________ From: llvmdev-bounces at cs.uiuc.edu [llvmdev-bounces at cs.uiuc.edu] On Behalf Of reed kotler [rkotler at mips.com] Sent: 14 March 2012 21:43 To: ll >> "llvmdev at cs.uiuc.edu" Subject: [LLVMdev] ENABLE_OPTIMIZED=1 When building without ENABLE_OPTIMIZED=1, is that supposed to take 10 times longer to "build" LLVM than with ENABLE_OPTIMIZED=1? Thanks. Reed _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From welson.sun at gmail.com Wed Mar 14 18:26:07 2012 From: welson.sun at gmail.com (Welson Sun) Date: Wed, 14 Mar 2012 16:26:07 -0700 Subject: [LLVMdev] Linking static external library into an LLVM pass library? Message-ID: This document http://llvm.org/docs/Projects.html says the USEDLIBS should be used to statically link libraries: USEDLIBSThis variable holds a space separated list of libraries that should be linked into the program. These libraries must be libraries that come from your *lib* directory. The libraries must be specified without their "lib" prefix. For example, to link libsample.a, you would set USEDLIBS to sample.a. Note that this works only for statically linked libraries. But, what is that "lib" directory? How can I specify external libxyz.a file that doesn't live in this libs directory? Anybody has done this? Thanks! Welson -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120314/5581b22d/attachment.html From paul at lucasmail.org Wed Mar 14 19:02:37 2012 From: paul at lucasmail.org (Paul J. Lucas) Date: Wed, 14 Mar 2012 17:02:37 -0700 Subject: [LLVMdev] Using JIT code to code a program to call C++ Message-ID: My project has a C++ library that I want to allow the user to use via some programming language to be JIT'd to call functions in said library. For the sake of simplicity, assume the library has classes like: class item { public: item(); item( int ); ~item(); // ... }; class item_iterator { public: virtual ~item_iterator(); virtual bool next( item *result ) = 0; }; class singleton_iterator : public item_iterator { public: singleton_iterator( item const &i ); // ... }; I'm aware that LLVM doesn't know anything about C++ and that one way to call C++ functions is to wrap them in C thunks: extern "C" { void thunk_item_M_new( item *addr ) { new( addr ) item; } void thunk_singleton_iterator_M_new( singleton_iterator *addr, item *i ) { new( addr ) singleton_iterator( *i ); } bool thunk_iterator_M_next( item_iterator *that, item *result ) { return that->next( result ); } } // extern "C" The first problem is how to allocate an item from LLVM. I know how to create StructTypes and add fields to them, but I don't have to have to parallel the C++ class layout -- that's tedious and error-prone. The idea I got was simply to add a char[sizeof(T)] as the only field to a StructType for a C++ class type: StructType *const llvm_item_type = StructType::create( llvm_ctx, "item" ); vector llvm_struct_types; llvm_struct_types.push_back( ArrayType::get( IntegerType::get( llvm_ctx, 8 ), sizeof( item ) ) ); llvm_item_type->setBody( llvm_struct_types, false ); PointerType *const llvm_item_ptr_type = PointerType::getUnqual( llvm_item_type ); I would think that, because it's a StructType, the alignment would be correct and the sizeof(item) would get the size right. Will that work? Is there a better way? The second problem is that, unlike the C++ class hierarchy, there's no inheritance relationship between StructTypes. If I create a Function that takes an llvm_iterator_type but try to build a Function object using an llvm_singleton_iterator_type, the LLVM verifyModule() function complains at me: > Call parameter type does not match function signature! So then I thought I'd simply use void* everywhere: Type *const llvm_void_type = Type::getVoidTy( llvm_ctx ); PointerType *const llvm_void_ptr_type = PointerType::getUnqual( llvm_void_type ); but verifyModule() still complains at me because, apparently, there's no automatic casting to void* types in LLVM. How can I solve this problem? - Paul From welson.sun at gmail.com Wed Mar 14 19:17:59 2012 From: welson.sun at gmail.com (Welson Sun) Date: Wed, 14 Mar 2012 17:17:59 -0700 Subject: [LLVMdev] Linking static external library into an LLVM pass library? In-Reply-To: References: Message-ID: A related question and I cannot find answers by Googling: How should you write the main function to compile an executable LLVM pass? - Welson On Wed, Mar 14, 2012 at 4:26 PM, Welson Sun wrote: > This document http://llvm.org/docs/Projects.html says the USEDLIBS > should be used to statically link libraries: > > USEDLIBSThis variable holds a space separated list of libraries that > should be linked into the program. These libraries must be libraries that > come from your *lib* directory. The libraries must be specified without > their "lib" prefix. For example, to link libsample.a, you would set > USEDLIBS to sample.a. > > Note that this works only for statically linked libraries. > > > But, what is that "lib" directory? How can I specify external libxyz.a > file that doesn't live in this libs directory? > > Anybody has done this? > > > Thanks! > Welson > > -- Welson Phone: (408) 418-8385 Email: welson.sun at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120314/03c796f4/attachment.html From ofv at wanadoo.es Wed Mar 14 19:51:11 2012 From: ofv at wanadoo.es (=?utf-8?Q?=C3=93scar_Fuentes?=) Date: Thu, 15 Mar 2012 01:51:11 +0100 Subject: [LLVMdev] Using JIT code to code a program to call C++ References: Message-ID: <87zkbiwutc.fsf@wanadoo.es> "Paul J. Lucas" writes: [snip] > I'm aware that LLVM doesn't know anything about C++ and that one way > to call C++ functions is to wrap them in C thunks: Yes, this is the only sane way unless you are willing to replicate the ABI of your chosen C++ implementation onto your system. Let's forget about dealing with more than one C++ ABI ;-) [snip] > I would think that, because it's a StructType, the alignment would be > correct and the sizeof(item) would get the size right. Will that > work? Yes. I do the same on my compiler. Please note that this trick is only necessary if you pretend to put C++ objects on your stack. If all C++ objects your JITted code creates are on the heap, you can simply use `new' instead of placement `new' in the C function that creates the C++ object and deal with the returned pointer.. > Is there a better way? It all depends on the charateristics of your language. [snip] > but verifyModule() still complains at me because, apparently, there's > no automatic casting to void* types in LLVM. How can I solve this > problem? You must do the cast yourself, or simply use void* for all pointers for all your C++ objects and deal with its real type on the upper part of the compiler. As said above, it all depends on the details of your language. From dengjunqi06323011 at gmail.com Wed Mar 14 20:21:26 2012 From: dengjunqi06323011 at gmail.com (Jun-qi Deng) Date: Thu, 15 Mar 2012 09:21:26 +0800 Subject: [LLVMdev] llvm-config --cxxflags does not give the result the configuration script wants? In-Reply-To: <874ntrxok0.fsf@wanadoo.es> References: <87aa3klghe.fsf@wanadoo.es> <878vj3xrq7.fsf@wanadoo.es> <874ntrxok0.fsf@wanadoo.es> Message-ID: 2012/3/14 =?utf-8?Q?=C3=93scar_Fuentes?= > Jun-qi Deng writes: > > > I got your point. Thank you, and I'd like to provide the relative message > > now. But firstly, what do you mean by the "relevant command generated by > > your makefile"? What I can tell you now is: > > > > The Error Message: > > make[3]: Entering directory `/home/tang.kk/ppcg/ppcg/isl/interface' > > CXXLD extract_interface > > extract_interface.o:(.data.rel.ro._ZTI13MyASTConsumer[typeinfo for > > MyASTConsumer]+0x10): undefined reference to `typeinfo for > > clang::ASTConsumer' > > collect2: ld returned 1 exit status > > [snip] > > So you define a class MyAstConsumer that derives from clang::ASTConsumer > and then the link fails because the typinfo for MyAstConsumer can not > reference the typeinfo of clang::ASTConsumer, because the latter is > undefined as a consecuence of building Clang with -fno-rtti. > Oh, I got it! Thank you very much! It helps me a lot. > > In this case the right thing is to apply -fno-rtti to the specific > source file that defines MyAstConsumer. > > Please note that deriving from a Clang/LLVM class is not something that > all projects do. The typical compiler that uses LLVM as a backend does > not need to derive from LLVM classes at all. Imposing -fno-rtti on those > projects most likely would cause breakage on user's code. > Best Regards! TangKK -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120315/ac712c73/attachment.html From ahatanak at gmail.com Wed Mar 14 21:07:13 2012 From: ahatanak at gmail.com (Akira Hatanaka) Date: Wed, 14 Mar 2012 19:07:13 -0700 Subject: [LLVMdev] Lowering formal pointer arguments In-Reply-To: <4F60B650.1060406@gmail.com> References: <4F60B650.1060406@gmail.com> Message-ID: If you need llvm::Argument, this returns the iterator pointing to the first argument: Function::const_arg_iterator Arg = DAG.getMachineFunction().getFunction()->arg_begin(); On Wed, Mar 14, 2012 at 8:16 AM, Ivan Llopard wrote: > Hi, > > How can I get the llvm-type of the formal argument while lowering it ? > > My target needs to map pointer and non-pointer parameters to different > registers. In addition, parameter lowering is address space dependent > (another reason why I need such information). Looking at the DAGBuilder, > I found that it is dropping it when translating llvm-types to BE types. > Even if the base type is saved into the MVT structure, it's a private > member. What's the reason for hiding it ? Why not to keep such information ? > > Thanks in advance, > > Ivan > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From shanmuk.rao008 at gmail.com Thu Mar 15 00:58:14 2012 From: shanmuk.rao008 at gmail.com (shanmuk rao) Date: Thu, 15 Mar 2012 11:28:14 +0530 Subject: [LLVMdev] Problem with LoopDependenceAnalysis Message-ID: Hi, I am using LLVM for implementing LoopFission pass. I am using LoopPass. I know that for checking circular dependency in loop I have to use LoopDependenceAnalysis This is what i want to do. for(int i = 0; i< n ; i++){ s1 : a[i] = a[i] + x[i]; s2 : x[i] = x[i+1] + i*2 ; } /**there is no dependence from s2 to s1/ so after distribution(it should be) : for(int i = 0; i< n ; i++) s1: a[i] = a[i] + x[i]; for(int i = 0; i< n ; i++) s2: x[i] = x[i+1] + i*2 ; but in llvm i couldn't able to find there is no dependency from s2 to s1. LoopDependenceAnalyis always gives there is a dependency from every load instructions to every store instructions. is there any other alternative to LoopDependencyAnalysis ? thank you ...... Regards, Shanmukha Rao Compilers lab, Indian Institute of Science, Bangalore. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120315/94716630/attachment.html From konstantin.vladimirov at gmail.com Thu Mar 15 02:00:54 2012 From: konstantin.vladimirov at gmail.com (Konstantin Vladimirov) Date: Thu, 15 Mar 2012 11:00:54 +0400 Subject: [LLVMdev] How to set constant pool section? In-Reply-To: References: Message-ID: Hi, Thanks for pointing direction. As far, as I understand by reversing, logic, that I want to overwrite is digged into: lib/MC/MCSectionELF.cpp MCSectionELF::PrintSwitchToSection if (ShouldOmitSectionDirective(SectionName, MAI)) { OS << '\t' << getSectionName() << '\n'; return; } // otherwise print ".section" directive and then section name So I need to overwrite ShouldOmitSectionDirective behavior. But this method of MCSectionELF is not virtual. As a workaround, I stubbed it in core LLVM code (MCSectionELF::ShouldOmitSectionDirective), and everything works, but it is ugly. May be you can advise further? --- With best regards, Konstantin On Wed, Mar 14, 2012 at 9:29 PM, Anton Korobeynikov wrote: > Hello > >> I really need in my backend value for this section, distinct from >> default. Where can I set it? > It was renamed to ReadOnlySection. You might want to check the logic > inside CodeGen/TargetLoweringObjectFileImp.cpp (in particular - > TargetLoweringObjectFile::SelectionSectionForGlobal) to see how it's > used. > > -- > With best regards, Anton Korobeynikov > Faculty of Mathematics and Mechanics, Saint Petersburg State University From joerg at britannica.bec.de Thu Mar 15 02:40:01 2012 From: joerg at britannica.bec.de (Joerg Sonnenberger) Date: Thu, 15 Mar 2012 08:40:01 +0100 Subject: [LLVMdev] How to set constant pool section? In-Reply-To: References: Message-ID: <20120315074001.GB15474@britannica.bec.de> On Thu, Mar 15, 2012 at 11:00:54AM +0400, Konstantin Vladimirov wrote: > Hi, > > Thanks for pointing direction. As far, as I understand by reversing, > logic, that I want to overwrite is digged into: > > lib/MC/MCSectionELF.cpp > > MCSectionELF::PrintSwitchToSection > > if (ShouldOmitSectionDirective(SectionName, MAI)) { > OS << '\t' << getSectionName() << '\n'; > return; > } > > // otherwise print ".section" directive and then section name > > So I need to overwrite ShouldOmitSectionDirective behavior. But this > method of MCSectionELF is not virtual. > As a workaround, I stubbed it in core LLVM code > (MCSectionELF::ShouldOmitSectionDirective), and everything works, but > it is ugly. May be you can advise further? I think you are off the mark here. The fragment above is used to create .text instead of .section ".text" or other more ugly forms. This is really just an optimisation for readiblity and compatibility with ancient tools. Joerg From anton at korobeynikov.info Thu Mar 15 03:47:13 2012 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Thu, 15 Mar 2012 12:47:13 +0400 Subject: [LLVMdev] How to set constant pool section? In-Reply-To: References: Message-ID: Hello > So I need to overwrite ShouldOmitSectionDirective behavior. But this > method of MCSectionELF is not virtual. > As a workaround, I stubbed it in core LLVM code > (MCSectionELF::ShouldOmitSectionDirective), and everything works, but > it is ugly. May be you can advise further? So, it seems that: 1. You're using ELF for your target 2. The assember you're using despite pretending to be ELF-ish is so broken, that it requires something like ".foo" instead of ".section .foo" ? Is it so? -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From ivanllopard at gmail.com Thu Mar 15 03:55:31 2012 From: ivanllopard at gmail.com (Ivan Llopard) Date: Thu, 15 Mar 2012 09:55:31 +0100 Subject: [LLVMdev] Data/Address registers In-Reply-To: <457DBDB9-6AA6-4081-B7D0-6A576042A59A@apple.com> References: <4F521304.1030900@gmail.com> <46738C23-06A8-4F28-8068-FEBC0B39798E@apple.com> <4F576F44.6000801@gmail.com> <4AE96120-FE98-4DC2-B963-A60C043B33E2@apple.com> <4F60A63D.9070409@gmail.com> <457DBDB9-6AA6-4081-B7D0-6A576042A59A@apple.com> Message-ID: <4F61AE83.1070202@gmail.com> Le 14/03/2012 18:38, Jim Grosbach a ?crit : > On Mar 14, 2012, at 7:07 AM, Ivan Llopard wrote: > >> Le 07/03/2012 17:36, Jim Grosbach a ?crit : >>> On Mar 7, 2012, at 6:23 AM, Ivan Llopard wrote: >>> >>>> Hi Jim, >>>> >>>> Thanks for your response. >>>> >>>> Le 06/03/2012 22:54, Jim Grosbach a ?crit : >>>>> Hi Ivan, >>>>> On Mar 3, 2012, at 4:48 AM, Ivan Llopard wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm facing a problem in llvm while porting it to a new target and I'll >>>>>> need some support. >>>>>> We have 2 kind of register, one for general purposes (i.e. arithmetic, >>>>>> comparisons, etc.) and the other for memory addressing. >>>>> OK. Separate register classes should be able to handle this. >>>>> >>>>>> Cross copies are not allowed (no data path). >>>>> You mean you can't copy directly from a general purpose register to an address register? That's an unfortunate architectural quirk. You may have to write some interesting and potentially ugly code in copyPhysReg() to handle that. >>>>> >>>> Actually, I can't copy them in any way, it's just impossible :-/. >>> Do you have load/store instructions for each register class? Worst case you could do a push/pop pair on the stack. It's really, really important that there be a way, even a very expensive way, to do this. >> I'm curious, why is it so important ? We are trying hard to avoid this kind of situations. > Sometimes the allocator, and other bits, will need to do a cross-class copy. It's assumed that a value can be copied between register classes for which that value type is legal. The coalescer will then go through and try hard to get rid of any copies that aren't actually needed. > > Specifically, as I understand it, there needs to be a way to copy between any two register classes for which the same ValueType is legal. Ok, I understand, thanks. >>>>>> We use clang 3.0 to produce assembler code. >>>>>> Because both registers have the same size and type (i16), I don't know >>>>>> what would be the best solution to distinguish them in order to match >>>>>> the right instructions. >>>>> The register classes should take care of this. >>>> I tried but IMO the matching rule should be context-dependent, i.e. an i16 addition should match machine additions with operands being either data registers or address registers depending on its usage. Even if I look at index operands of load/stores (into the DAG) to match target's addressing modes, I can't assume that some operations are not being used for something else than basic arithmetics (like comparisons which are not supported for address regs). Is it still possible to get ride of this with register classes ? >>> It should be, yes. For a contrived example of a simple add-immediate instruction for each: >>> >>> def ADD_address_reg: myBaseInstrClass<(outs ADDR_REG:$dst), (ins ADDR_REG:$src, i32imm:$imm), [(set ADDR_REG:$dst, (add ADDR_REG:$dst, i32imm:$imm)]>; >>> def ADD_general_reg: myBaseInstrClass<(outs GPR:$dst), (ins GPR:$src, i32imm:$imm), [(set GPR:$dst, (add GPR:$dst, i32imm:$imm)]>; >>> >>> Likewise, other operations that can target either register class should have a variant for each. ISel will choose the appropriate one based on the rest of the operands. >> Thanks for your advice Jim, I did what you said it but it didn't work and I have no clue what is going wrong. I can't realize where register classes are matched in order to pick the right instructions. I couldn't find a trace of register classes in the instruction selection process. >> I have these patterns defined so far: >> >> def AADDMri { // Instruction MephInstr AGInstr >> dag OutOperandList = (outs AGRegs:$dst); >> dag InOperandList = (ins AGRegs:$a, i16imm:$b); >> list Pattern = [(set AGRegs:$dst, (add AGRegs:$a, imm:$b))]; >> ? >> } >> >> def DADDri { // Pattern Pat >> dag PatternToMatch = (add LSubRegs:$a, imm:$b); >> list ResultInstrs = [(asrsat (asextr (sextr iRSubRegs:$a), (XLoadImm imm:$b)), (i16 0))]; >> } >> >> where asrsat has LSubRegs as its output operand. Both patterns have the same complexity and they are located at different scopes. For these two patterns, tblgen is producing the following isel opcodes: >> >> /*3244*/ /*Scope*/ 20, /*->3265*/ >> /*3245*/ OPC_RecordChild1, // #1 = $b >> /*3246*/ OPC_MoveChild, 1, >> /*3248*/ OPC_CheckOpcode, TARGET_VAL(ISD::Constant), >> /*3251*/ OPC_MoveParent, >> /*3252*/ OPC_CheckType, MVT::i16, >> /*3254*/ OPC_EmitConvertToTarget, 1, >> /*3256*/ OPC_MorphNodeTo, TARGET_VAL(ME::AADDMri), 0, >> >> and in the same logic chain of pattern checking, DADDri comes right after AADDMri (with Scope changes in the middle) >> >> /*3285*/ OPC_RecordChild0, // #0 = $a >> /*3286*/ OPC_RecordChild1, // #1 = $b >> /*3287*/ OPC_Scope, 42, /*->3331*/ // 2 children in Scope >> /*3289*/ OPC_MoveChild, 1, >> /*3291*/ OPC_CheckOpcode, TARGET_VAL(ISD::Constant), >> /*3294*/ OPC_MoveParent, >> /*3295*/ OPC_CheckType, MVT::i16, >> /*3297*/ OPC_EmitNode, TARGET_VAL(ME::sextr), 0, >> 1/*#VTs*/, MVT::i64, 1/*#Ops*/, 0, // Results = #2 >> /*3305*/ OPC_EmitConvertToTarget, 1, > Huh. I would have expected OPC_EmitRegister here. Probably something different in your target causing this. I don't anticipate that it'll cause a problem, though, as there's the CheckType bits to keep things sane. > >> /*3307*/ OPC_EmitNodeXForm, 0, 3, // XLoadImm >> ... >> >> AADDMri supersedes DADDri (the same checks are performed). It's worth to note that the result is used by another instruction which has LSubRegs as its source operand and I got copy instructions added by the iselector to meet this requirement. > Hmm.. OK. So it's correctly understanding the class requirements of the instruction, just not doing what we want in order to meet them. I'm suspecting TableGen isn't as ambitious as one would hope in this regard. That is, defining separate instructions w/ the different register classes is a necessary, but not sufficient, condition to getting where you want to go. > > ISel is being driven by the ValueType, which is in turn mapped to a register class to use for that value type by default. When instructions need a different register class, regalloc will insert copies to satisfy the constraint. That is, isel is driving the register class selection. I'd thought there was at least some information flowing the other direction, but it looks like I was mistaken. I wonder if the isel can't have another opcode(s) to check also for register class consistencies. For example: a = op1 b, c d = op2 a, f where op2 is meant to match mop2. It will be nice to have op1 matching 2 different machine instructions depending on a's register class (mop1a or mop1b). Because llvm have a bottom-up ISel, when it reachs "a", "d" will be already selected and "a" will have a well defined regclass (let's say A). It would be interesting to be able to choose between mop1a (Aa) and mop1b (Ab) depending on its register class cardinality (taking the lower one) while satisfying A<=Aa and A<=Ab (inclusion relationship). It will make the isel more accurate and will reduce regclass cross-copies. If they are still needed, the regalloc will do the job either way. What do you think ? >> I really would like to know why this is happening. It's like tblgen is not taking into account the register class assignations of both instructions :-/. > Well, the differences are taken into account, because we're seeing the copy inserted to handle them. There's just insufficient effort made to avoid the copy entirely. > > Now, that's all fine and good, but doesn't directly help you solve your original problem. > > The more I think about it, this is effectively a heuristically based problem, as there's no 100% "right" answer. Consider the following contrived example: > define i16 @foo(i16* %ptr, i16 %a) nounwind ssp { > %1 = getelementptr inbounds i16* %ptr, i16 %a > %2 = ptrtoint i16* %1 to i16 > store i16 %2, i16* %ptr, align 4 > ret i16 %2 > } > > The same intermediate value (%1) is being used here both as a generic i16 and as a pointer value. Which register class should be used to compute the value? There will be a cross-class copy instruction either way. > > I think you may be stuck having smart custom lowering for all the operations you want to work on whichever register class isn't the default for i16. That and/or or have a custom target pass that runs before register allocation to go through and clean things up, changing which instructions are used based on context. Personally, I'd probably go with the latter. Get your target basically working using the (expensive) copies first. Then start building up smarts to make the generated code efficient, not just correct. For example, a simple pattern to look for is to identify loads or stores where the address is coming from a copy of a value computed by a chain of arithmetic instructions and the values defined by those instructions have no other uses outside just computing the address. You can trivially swap those instructions (and the register classes of the operands) with the versions that operate on address registers and get rid of the copy. Honestly, that alone will likely be good enough for most cases. Thanks for the idea! Regards, Ivan > Regards, > Jim > > >> Ivan >> >>>> I can make a pass before ISel to annotate the code identifying those registers which are only used for addressing (by doing a simple data-flow analysis), can it help ISelector later ? >>>> Because I could not find how to get metadata from the DAG to drive matching rules or lowering phases, is it possible ? How is metadata transferred to the DAG, where should I look for it ? >>>> >>> Metadata should not be necessary for this. In general, metadata should never be used for anything that's required information, only for optional information. I.e., if it's stripped out of the IR, the backend should still generate correct code. >>> >>> -Jim >>> >>>> Ivan >>>> >>>>>> Moreover, the standard pointer arithmetic is not >>>>>> enough for us (we need to support modulo operations also). >>>>>> I thought that I could manually match every arithmetic operation while >>>>>> matching the addressing mode but it doesn't work because intermediate >>>>>> results are sometimes reused for other purposes (e.g. comparisons). >>>>> I suggest getting things working correctly first and then coming back to things like this as an optimization. >>>>> >>>>>> Do I need to add another type to clang/llvm ? >>>>>> >>>>> Unlikely. >>>>> >>>>> Regards, >>>>> Jim >>>>> >>>>> >>>>>> Thanks in advance, >>>>>> >>>>>> Ivan >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From konstantin.vladimirov at gmail.com Thu Mar 15 03:41:00 2012 From: konstantin.vladimirov at gmail.com (Konstantin Vladimirov) Date: Thu, 15 Mar 2012 12:41:00 +0400 Subject: [LLVMdev] How to set constant pool section? In-Reply-To: <20120315074001.GB15474@britannica.bec.de> References: <20120315074001.GB15474@britannica.bec.de> Message-ID: Hi, What I want is to emit exact custom string to switch to rodata, without ".section" in the front of it. I can see only that place where ".section" directive is hardcoded. It is relatively new hardcode -- earlier versions had string ConstantPoolSection to operate with. --- With best regards, Konstantin On Thu, Mar 15, 2012 at 11:40 AM, Joerg Sonnenberger wrote: > On Thu, Mar 15, 2012 at 11:00:54AM +0400, Konstantin Vladimirov wrote: >> Hi, >> >> Thanks for pointing direction. As far, as I understand by reversing, >> logic, that I want to overwrite is digged into: >> >> lib/MC/MCSectionELF.cpp >> >> MCSectionELF::PrintSwitchToSection >> >> ? if (ShouldOmitSectionDirective(SectionName, MAI)) { >> ? ? OS << '\t' << getSectionName() << '\n'; >> ? ? return; >> ? } >> >> // otherwise print ".section" directive and then section name >> >> So I need to overwrite ShouldOmitSectionDirective behavior. But this >> method of MCSectionELF is not virtual. >> As a workaround, I stubbed it in core LLVM code >> (MCSectionELF::ShouldOmitSectionDirective), and everything works, but >> it is ugly. May be you can advise further? > > I think you are off the mark here. The fragment above is used to create > > ? ? ? ?.text > > instead of > > ? ? ? ?.section ".text" > > or other more ugly forms. This is really just an optimisation for > readiblity and compatibility with ancient tools. > > Joerg > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From ivanllopard at gmail.com Thu Mar 15 04:02:15 2012 From: ivanllopard at gmail.com (Ivan Llopard) Date: Thu, 15 Mar 2012 10:02:15 +0100 Subject: [LLVMdev] Lowering formal pointer arguments In-Reply-To: References: <4F60B650.1060406@gmail.com> Message-ID: <4F61B017.9040304@gmail.com> Le 15/03/2012 03:07, Akira Hatanaka a ?crit : > If you need llvm::Argument, this returns the iterator pointing to the > first argument: > > Function::const_arg_iterator Arg = > DAG.getMachineFunction().getFunction()->arg_begin(); Thanks Akira. Ivan > > On Wed, Mar 14, 2012 at 8:16 AM, Ivan Llopard wrote: >> Hi, >> >> How can I get the llvm-type of the formal argument while lowering it ? >> >> My target needs to map pointer and non-pointer parameters to different >> registers. In addition, parameter lowering is address space dependent >> (another reason why I need such information). Looking at the DAGBuilder, >> I found that it is dropping it when translating llvm-types to BE types. >> Even if the base type is saved into the MVT structure, it's a private >> member. What's the reason for hiding it ? Why not to keep such information ? >> >> Thanks in advance, >> >> Ivan >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From baldrick at free.fr Thu Mar 15 03:54:50 2012 From: baldrick at free.fr (Duncan Sands) Date: Thu, 15 Mar 2012 09:54:50 +0100 Subject: [LLVMdev] Problem with LoopDependenceAnalysis In-Reply-To: References: Message-ID: <4F61AE5A.1060205@free.fr> Hi, did you do this on optimized IR? These kind of analyses only do a decent job if at least a basic set of optimizations has been applied. Ciao, Duncan. > I am using LLVM for implementing LoopFission pass. > I am using LoopPass. > I know that for checking circular dependency in loop I have to use > LoopDependenceAnalysis > > This is what i want to do. > for(int i = 0; i< n ; i++){ > > s1 : a[i] = a[i] + x[i]; > s2 : x[i] = x[i+1] + i*2 ; > } > > /**there is no dependence from s2 to s1/ > so after distribution(it should be) : > > for(int i = 0; i< n ; i++) > s1: a[i] = a[i] + x[i]; > > for(int i = 0; i< n ; i++) > s2: x[i] = x[i+1] + i*2 ; > > > but in llvm i couldn't able to find there is no dependency from s2 to s1. > > LoopDependenceAnalyis always gives there is a dependency from every load instructions to every store instructions. > > > is there any other alternative to LoopDependencyAnalysis ? > thank you > > ...... > Regards, > Shanmukha Rao > Compilers lab, > Indian Institute of Science, Bangalore. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From clattner at apple.com Thu Mar 15 03:58:27 2012 From: clattner at apple.com (Chris Lattner) Date: Thu, 15 Mar 2012 01:58:27 -0700 Subject: [LLVMdev] LLVM GHC Backend: Tables Next To Code In-Reply-To: References: <607B6576-F8F5-4293-900D-EFA21B7FD48C@apple.com> Message-ID: <8437383B-0304-4C62-95BA-CC050266D281@apple.com> On Mar 13, 2012, at 4:36 PM, David Terei wrote: > Hi Chris, > > One remaining question here is, if the GHC team tries some of these > alternative schemes and finds them unsatisfactory what is the LLVM > communities feeling in regards to extending LLVM IR to support > directly implementing TNTC? I'm strongly in favor of getting proper support for TNTC, because your workarounds (while very pragmatic!) give me the shivers :). I'd much rather have a proper solution that works well with the rest of the llvm toolchain. That said, the design and implementation needs to fit in well with the rest of llvm. I'm not willing to add a crazy completely-special purpose bolt-on extension just to support this, which means that we need to find a way to design it that makes sense in the larger context of llvm. > How do you envision this would look at the > IR level, how much work do you think it would be and most importantly > do you feel LLVM would be willing to accept patches for it? I really like the idea of adding this as an inline asm blob at the start of a function, and biasing the actual address of the closure based on the size of the table. I'm not 100% confident that it will work (not being very familiar with TNTC) but it seems quite plausible and the impact on LLVM would be quite reasonable (some new calling convention work?) -Chris From r.jordans at tue.nl Thu Mar 15 03:58:44 2012 From: r.jordans at tue.nl (Roel Jordans) Date: Thu, 15 Mar 2012 09:58:44 +0100 Subject: [LLVMdev] Linking static external library into an LLVM pass library? In-Reply-To: References: Message-ID: <4F61AF44.5000807@tue.nl> You could try to use the LIBS variable, that one gets passed directly to the linker and takes arguments in the standard linking convention. It's a few items down on the page you linked. - Roel On 03/15/2012 12:26 AM, Welson Sun wrote: > This document http://llvm.org/docs/Projects.html says the USEDLIBS > should be used to statically link libraries: > > USEDLIBS > This variable holds a space separated list of libraries that should > be linked into the program. These libraries must be libraries that > come from your *lib* directory. The libraries must be specified > without their "lib" prefix. For example, to link libsample.a, you > would set USEDLIBS to sample.a. > > Note that this works only for statically linked libraries. > > > But, what is that "lib" directory? How can I specify external libxyz.a > file that doesn't live in this libs directory? > > Anybody has done this? > > > Thanks! > Welson > From chandlerc at gmail.com Thu Mar 15 04:16:39 2012 From: chandlerc at gmail.com (Chandler Carruth) Date: Thu, 15 Mar 2012 02:16:39 -0700 Subject: [LLVMdev] FYI -- potential compile time regression on boost spirit with r152737 and/or r152752 Message-ID: Justed wanted to drop folks a note in case they started investigating issues... Eric let me know that he was seeing a significant compile time regression (3x!!!) for O2 builds of Boost spirit on the nightly testers. The really weird thing is that this was only happening for the ARM targeted build. =/ Very strange, and makes it more likely that there is a smoking gun of "oh, oops". I strongly suspect one (or both) of the inliner changes I made as they were specifically targeting C++-y template and header-based code. I'll be looking into these first thing in the morning, and I'll revert if there isn't any quick fix so that bots get back on their feet. On the flip side, there seem to be some significant performance improvements for other benchmarks. =/ Hopefully the compile time issues can be sorted out reasonably. -Chandler -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120315/80237a2e/attachment.html From ggreif at gmail.com Thu Mar 15 04:30:48 2012 From: ggreif at gmail.com (Gabor Greif) Date: Thu, 15 Mar 2012 10:30:48 +0100 Subject: [LLVMdev] LLVM GHC Backend: Tables Next To Code In-Reply-To: <8437383B-0304-4C62-95BA-CC050266D281@apple.com> References: <607B6576-F8F5-4293-900D-EFA21B7FD48C@apple.com> <8437383B-0304-4C62-95BA-CC050266D281@apple.com> Message-ID: Chris said: > I really like the idea of adding this as an inline asm blob at the start of a function, and biasing the actual address of the closure based on the size of the table. I'm not 100% confident that it will work (not being very familiar with TNTC) but it seems quite plausible and the impact on LLVM would be quite reasonable (some new calling convention work?) While reading this I had the idea that the LLVM code generator could watch out for the specific combination of inline asm and calling convention and strip off the fat. This way the code increase could be dealt with. OTOH the inline asm would still need to be target specific, which is very ugly. What about a new intrinsic which holds a reference to the global and creates the right assembly in the backend? Since the reference to the global (which is the table) would not be used otherwise, the linker could drop it, thus no code increase and no redundant global. Does this make sense? Cheers, Gabor From clchiou at gmail.com Thu Mar 15 04:47:39 2012 From: clchiou at gmail.com (Che-Liang Chiou) Date: Thu, 15 Mar 2012 17:47:39 +0800 Subject: [LLVMdev] GPU thread/block/grid size contraints in LLVM PTX backend In-Reply-To: References: Message-ID: I don't think so, but you should check source code. On Tue, Mar 13, 2012 at 9:58 PM, Xin Tong wrote: > but does it have default values ? > > Thanks > > Xin > > On Tue, Mar 13, 2012 at 5:19 AM, Che-Liang Chiou wrote: >> You specify shader model, bit size and etc. arch-specified parameters >> though -march, -mattr and -mcpu, but AFAIK, PTX backend does not use >> the GPU thread/block/grid size information in optimization yet. >> >> On Mon, Mar 12, 2012 at 8:17 PM, Xin Tong wrote: >>> I am wondering that how does the LLVM PTX backend find out the >>> constraints on executing GPU thread/block/grid size ( i.e. a block can >>> at most have 1024 threads). Can anyone point me to the code ? I need >>> information in the optimizer, ?how can I get it ? >>> >>> Thanks >>> >>> Xin >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From patrik.h.hagglund at ericsson.com Thu Mar 15 04:55:04 2012 From: patrik.h.hagglund at ericsson.com (=?iso-8859-1?Q?Patrik_H=E4gglund_H?=) Date: Thu, 15 Mar 2012 10:55:04 +0100 Subject: [LLVMdev] Lowering formal pointer arguments In-Reply-To: <4F61B017.9040304@gmail.com> References: <4F60B650.1060406@gmail.com> <4F61B017.9040304@gmail.com> Message-ID: Our target also use different registers for pointer and non-pointer parameters. > If you need llvm::Argument, this returns the iterator pointing to the > first argument: DAG.getMachineFunction().getFunction() only works in LowerFormalArguments (there it returns the callee), not in LowerCall (where it returns the caller, rather than the callee). You need to pass more information about the function type to LowerCall (besides partial information such as the isVarArg parameter). I can provide a patch if you are interested. (Unfortunately, to push this upstream has been on my to-do-list for while). /Patrik H?gglund -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Ivan Llopard Sent: den 15 mars 2012 10:02 To: Akira Hatanaka Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] Lowering formal pointer arguments Le 15/03/2012 03:07, Akira Hatanaka a ?crit : > If you need llvm::Argument, this returns the iterator pointing to the > first argument: > > Function::const_arg_iterator Arg = > DAG.getMachineFunction().getFunction()->arg_begin(); Thanks Akira. Ivan > > On Wed, Mar 14, 2012 at 8:16 AM, Ivan Llopard wrote: >> Hi, >> >> How can I get the llvm-type of the formal argument while lowering it ? >> >> My target needs to map pointer and non-pointer parameters to different >> registers. In addition, parameter lowering is address space dependent >> (another reason why I need such information). Looking at the DAGBuilder, >> I found that it is dropping it when translating llvm-types to BE types. >> Even if the base type is saved into the MVT structure, it's a private >> member. What's the reason for hiding it ? Why not to keep such information ? >> >> Thanks in advance, >> >> Ivan >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From jobnoorman at gmail.com Thu Mar 15 07:56:12 2012 From: jobnoorman at gmail.com (Job Noorman) Date: Thu, 15 Mar 2012 13:56:12 +0100 Subject: [LLVMdev] Dragonegg stack variables reorderings Message-ID: <1368636.oedjX9M6N5@squatpc> I have noticed that dragonegg sometimes allocates stack objects in a different order than they were declared in the source file. I experienced this behavior when compiling RIPE (https://github.com/johnwilander/RIPE) in the function perform_attack. Unfortunately, I haven't been able to reproduce this in a minimal example. (Note that when compiling RIPE with GCC, the order of stack variables is preserved) So, I have two questions about this behavior: 1) When exactly does dragonegg reorder stack variables? 2) Is there a way to always keep the variables in declared order? Kind regards, Job Noorman From baldrick at free.fr Thu Mar 15 08:42:03 2012 From: baldrick at free.fr (Duncan Sands) Date: Thu, 15 Mar 2012 14:42:03 +0100 Subject: [LLVMdev] Dragonegg stack variables reorderings In-Reply-To: <1368636.oedjX9M6N5@squatpc> References: <1368636.oedjX9M6N5@squatpc> Message-ID: <4F61F1AB.9000705@free.fr> Hi Job, On 15/03/12 13:56, Job Noorman wrote: > I have noticed that dragonegg sometimes allocates stack objects in a different > order than they were declared in the source file. > > I experienced this behavior when compiling RIPE > (https://github.com/johnwilander/RIPE) in the function perform_attack. > Unfortunately, I haven't been able to reproduce this in a minimal example. > > (Note that when compiling RIPE with GCC, the order of stack variables is > preserved) as far as I know reordering stack variables or placing arbitrary padding between them is perfectly conformant with the C standard. The LLVM optimizers know this, so if you compile with optimization then code that relies on a particular stack layout is liable to break even if the front-end outputs everything in textual order. > So, I have two questions about this behavior: > 1) When exactly does dragonegg reorder stack variables? I think this is probably due to stack variables being output lazily, i.e. when first used. For example, if you declare variables A and B but use B first then probably B will get output to the LLVM IR first. > 2) Is there a way to always keep the variables in declared order? I guess I could arrange for them all to be output in one fell swoop at the start of the function. Why do you need this? Ciao, Duncan. From hkultala at iki.fi Thu Mar 15 09:31:50 2012 From: hkultala at iki.fi (Heikki Kultala) Date: Thu, 15 Mar 2012 16:31:50 +0200 Subject: [LLVMdev] rematerialization question Message-ID: <4F61FD56.7080602@iki.fi> I am a bit confused how the rematerialization works. It seems currently in our backend we get lots of code where some stack offset address is calculated, but this address is then spilled to stack, and loaded from stack later. This does not make sense, it would be better to just recalculate the address later, ie rematerialize the original stack offset calculation. But marking some instruction rematerializable means all operands have to be always available? So I cannot make my add operation which is used for the stack offset calculations rematerializable because not all operands of all adds are always available? So how can I make llvm to try to rematerialize only those stack offset adds, but not all adds? And also, when I have the stack update code in beginning and at end of the functions, these stack+immediate adds/subs also should not be rematerialized as the sp has changed? From jobnoorman at gmail.com Thu Mar 15 10:29:46 2012 From: jobnoorman at gmail.com (Job Noorman) Date: Thu, 15 Mar 2012 16:29:46 +0100 Subject: [LLVMdev] Dragonegg stack variables reorderings Message-ID: <8433345.Iinx5bSMku@squatpc> Hi Duncan, > I think this is probably due to stack variables being output lazily, i.e. > when first used. For example, if you declare variables A and B but use B > first then probably B will get output to the LLVM IR first. I think you're right: I fixed my particular problem by initializing all stack variables. > I guess I could arrange for them all to be output in one fell swoop at the > start of the function. Why do you need this? I need this to make RIPE (https://github.com/johnwilander/RIPE) work. RIPE is the "runtime intrusion prevention evaluator" and it is a program that performs a lot of different attacks on itself. It relies on the order of stack variables for its stack smashing attacks. Regards, Job From ivanllopard at gmail.com Thu Mar 15 11:09:19 2012 From: ivanllopard at gmail.com (Ivan Llopard) Date: Thu, 15 Mar 2012 17:09:19 +0100 Subject: [LLVMdev] rematerialization question In-Reply-To: <4F61FD56.7080602@iki.fi> References: <4F61FD56.7080602@iki.fi> Message-ID: <4F62142F.1070105@gmail.com> Hi Heikki, Le 15/03/2012 15:31, Heikki Kultala a ?crit : > I am a bit confused how the rematerialization works. > > It seems currently in our backend we get lots of code where some stack > offset address is calculated, but this address is then spilled to stack, > and loaded from stack later. > > This does not make sense, it would be better to just recalculate the > address later, ie rematerialize the original stack offset calculation. > > But marking some instruction rematerializable means all operands have to > be always available? Yes, the inline spiller will verify that condition. > So I cannot make my add operation which is used for the stack offset > calculations rematerializable because not all operands of all adds are > always available? Well, you can mark your add operation as being rematerializable in your BE but it will not get remat'd if its operands are not available. > > So how can I make llvm to try to rematerialize only those stack offset > adds, but not all adds? You can force remat for those adds you want to by patching the spiller (see InlineSpiller.cpp). I did it for my target (a dirty patch though) but I cannot assure that remat will always be possible :(. Regards, Ivan > > And also, when I have the stack update code in beginning and at end of > the functions, these stack+immediate adds/subs also should not be > rematerialized as the sp has changed? > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From anton at korobeynikov.info Thu Mar 15 11:02:10 2012 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Thu, 15 Mar 2012 20:02:10 +0400 Subject: [LLVMdev] Dragonegg stack variables reorderings In-Reply-To: <8433345.Iinx5bSMku@squatpc> References: <8433345.Iinx5bSMku@squatpc> Message-ID: > I need this to make RIPE (https://github.com/johnwilander/RIPE) work. RIPE is > the "runtime intrusion prevention evaluator" and it is a program that performs > a lot of different attacks on itself. It relies on the order of stack > variables for its stack smashing attacks. Interesting, how much of other undefined / implementation-defined behaviors it uses then? :) -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From jobnoorman at gmail.com Thu Mar 15 11:46:26 2012 From: jobnoorman at gmail.com (Job Noorman) Date: Thu, 15 Mar 2012 17:46:26 +0100 Subject: [LLVMdev] Dragonegg stack variables reorderings In-Reply-To: References: <8433345.Iinx5bSMku@squatpc> Message-ID: <19589806.etJiFk7zo7@squatpc> Hi Anton, > Interesting, how much of other undefined / implementation-defined > behaviors it uses then? :) A lot:-) For example, the offset between parameters and the return address, to name one. For variables, it relies on the order 1) on the stack; 2) in structs; 3) in the data segment; 4) in the bss segment. 1) is definately undefined, 2) is definately defined and I'm not sure about 3) and 4). Regards, Job On Thursday 15 March 2012 20:02:10 Anton Korobeynikov wrote: > > I need this to make RIPE (https://github.com/johnwilander/RIPE) work. RIPE > > is the "runtime intrusion prevention evaluator" and it is a program that > > performs a lot of different attacks on itself. It relies on the order of > > stack variables for its stack smashing attacks. > > Interesting, how much of other undefined / implementation-defined > behaviors it uses then? :) From preston.briggs at gmail.com Thu Mar 15 11:57:00 2012 From: preston.briggs at gmail.com (Preston Briggs) Date: Thu, 15 Mar 2012 09:57:00 -0700 Subject: [LLVMdev] Problem with LoopDependenceAnalysis Message-ID: Shanmukha Rao wrote: > I am using LLVM for implementing LoopFission pass. > I am using LoopPass. > I know that for checking circular dependency in loop I have to use LoopDependenceAnalysis > > This is what i want to do. > ? ? ? ? for(int i = 0; i< n ; i++){ > s1 : a[i] = a[i] + x[i]; > s2 : x[i] = x[i+1] + i*2 ; > } > > /**there is no dependence from s2 to s1/ > so after distribution(it should be) : > > for(int i = 0; i< n ; i++) > s1: a[i] = a[i] + x[i]; > > for(int i = 0; i< n ; i++) > s2: x[i] = x[i+1] + i*2 ; > > but in llvm i couldn't able to find there is no dependency from s2 to s1. > LoopDependenceAnalyis always gives there is a dependency from every load instructions to every store instructions. > Is there any other alternative to LoopDependenceAnalysis ? LoopDependenceAnalysis is the right tool, but it's not completely implemented. If you look at the source, you'll see the implementation is just outlined. I'm working on a more complete implementation, but much work remains. Preston From hfinkel at anl.gov Thu Mar 15 12:05:12 2012 From: hfinkel at anl.gov (Hal Finkel) Date: Thu, 15 Mar 2012 12:05:12 -0500 Subject: [LLVMdev] Problem with LoopDependenceAnalysis In-Reply-To: References: Message-ID: <20120315120512.528854b8@sapling2> On Thu, 15 Mar 2012 09:57:00 -0700 Preston Briggs wrote: > Shanmukha Rao wrote: > > I am using LLVM for implementing LoopFission pass. > > I am using LoopPass. > > I know that for checking circular dependency in loop I have to use > > LoopDependenceAnalysis > > > > This is what i want to do. > > ? ? ? ? for(int i = 0; i< n ; i++){ > > s1 : a[i] = a[i] + x[i]; > > s2 : x[i] = x[i+1] + i*2 ; > > } > > > > /**there is no dependence from s2 to s1/ > > so after distribution(it should be) : > > > > for(int i = 0; i< n ; i++) > > s1: a[i] = a[i] + x[i]; > > > > for(int i = 0; i< n ; i++) > > s2: x[i] = x[i+1] + i*2 ; > > > > but in llvm i couldn't able to find there is no dependency from s2 > > to s1. LoopDependenceAnalyis always gives there is a dependency > > from every load instructions to every store instructions. Is there > > any other alternative to LoopDependenceAnalysis ? > > LoopDependenceAnalysis is the right tool, but it's not completely > implemented. If you look at the source, you'll see the implementation > is just outlined. > > I'm working on a more complete implementation, but much work remains. Have you looked at the recently-proposed patch by Sanjoy Das? He has done some work on actually implementing the SIV tests. http://permalink.gmane.org/gmane.comp.compilers.llvm.cvs/109305 Preston, if you're also working on this, can you please look over Sanjoy's patch? -Hal > > Preston > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory From ahatanak at gmail.com Thu Mar 15 12:22:25 2012 From: ahatanak at gmail.com (Akira Hatanaka) Date: Thu, 15 Mar 2012 10:22:25 -0700 Subject: [LLVMdev] Question about post RA scheduler In-Reply-To: References: <8EABC991-2BE1-45E5-A388-ABCCE5E826FB@apple.com> Message-ID: Thank you for your suggestions. I implemented the first approach (provided the byval argument and offset to MachinePointerInfo) and it seems to have fixed the instruction ordering problem. It was a lot simpler than initially expected. In this particular case, is the user responsible for providing alias information to MachinePointerInfo to guarantee instructions are emitted in the correct order? It seems to me that getStore should not try to infer pointer information unless the user explicitly asks for it. The scheduler will then conservatively treat it as a load or store that aliases anything. On Mon, Mar 12, 2012 at 10:39 PM, Andrew Trick wrote: > > On Mar 7, 2012, at 11:34 AM, Akira Hatanaka wrote: > >> I filed a bug report (Bug 12205). >> Please take a look when you have time. >> >> Per your suggestion, I also attached a patch which attaches to load or >> store nodes a machinepointerinfo that points to a stack frame object >> when it can infer they are actually reading from or writing to the >> stack. The test that was failing passes if I apply this patch, but I >> doubt this is the right approach, because this will fail if >> InferPointerInfo in SelectionDAG.cpp cannot discover a load or store >> is accessing a stack object (it can only infer the information if the >> expression for the pointer is simple, for example add FI + const). >> >> An alternative approach might be to make the machinepointerinfo of the >> stores refer to %struct.ObjPointStruct* byval %P or refer to nothing, >> but that currently doesn't seem to be possible. > > I've thought of several ways we could potentially handle this. All are fairly messy without recognizing the situation during argument lowering. I'm not very familiar with the argument lowering code. But it seems to me you should be able to lookup the Value for the formal argument when you generate stack stores. Can you create a MachinePointerInfo for each store that refers to the argument value and proper offset? These initializers will no longer appear to alias with stack accesses, but that's probably ok. What exactly do you think is not possible? > > If finding the formal argument value and offset is too hard, I suppose there are other hacks you could try. I'm not encouraging it though. Is it valid to set MachinePointerInfo.V = 0? You could try overriding it after calling getStore. If that's not valid, you could probably create a PseudoSourceValue that aliases with everything. I suppose the hackiest thing would be marking the store volatile. The alternative would be to define a new MachineMemOperand flag. I really don't think we should have to go that far though. > > -Andy > >> On Tue, Mar 6, 2012 at 6:01 PM, Andrew Trick wrote: >>> On Mar 6, 2012, at 5:05 PM, Akira Hatanaka wrote: >>>> I am having trouble trying to enable post RA scheduler for the Mips backend. >>>> >>>> This is the bit code of the function I am compiling: >>>> >>>> (gdb) p MF.Fn->dump() >>>> >>>> define void @PointToHPoint(%struct.HPointStruct* noalias sret >>>> %agg.result, %struct.ObjPointStruct* byval %P) nounwind { >>>> entry: >>>> ?%res = alloca %struct.HPointStruct, align 8 >>>> ?%x2 = bitcast %struct.ObjPointStruct* %P to double* >>>> ?%0 = load double* %x2, align 8 >>>> >>>> The third instruction is loading the first floating point double of >>>> structure %P which is being passed by value. >>>> >>>> This is the machine function right after completion of isel: >>>> (gdb) p MF->dump() >>>> # Machine code for function PointToHPoint: >>>> Frame Objects: >>>> ?fi#-1: size=48, align=8, fixed, at location [SP+8] >>>> ?fi#0: size=32, align=8, at location [SP] >>>> Function Live Ins: %A0 in %vreg0, %A2 in %vreg1, %A3 in %vreg2 >>>> >>>> BB#0: derived from LLVM BB %entry >>>> ? ? ? SW %vreg2, , 4; mem:ST4[FixedStack-1+4] CPURegs:%vreg2 >>>> ? ? ? SW %vreg1, , 0; mem:ST4[FixedStack-1](align=8) CPURegs:%vreg1 >>>> ? ? ? %vreg3 = COPY %vreg0; CPURegs:%vreg3,%vreg0 >>>> ? ? ? %vreg4 = LDC1 , 0; mem:LD8[%x2] AFGR64:%vreg4 >>>> >>>> >>>> The first two stores write the values in argument registers $6 and $7 >>>> to frame object -1 >>>> (Mips stores byval arguments passed in registers to the stack). >>>> The fourth instruction LDC1 loads the value written by the first two >>>> stores as a floating point double. >>>> >>>> This is the machine function just before post RA scheduling: >>>> (gdb) p MF.dump() >>>> # Machine code for function PointToHPoint: >>>> Frame Objects: >>>> ?fi#-1: size=48, align=8, fixed, at location [SP+8] >>>> ?fi#0: size=32, align=8, at location [SP-32] >>>> Function Live Ins: %A0 in %vreg0, %A2 in %vreg1, %A3 in %vreg2 >>>> >>>> BB#0: derived from LLVM BB %entry >>>> ? ?Live Ins: %A0 %A2 %A3 >>>> ? ? ? %SP = ADDiu %SP, -32 >>>> ? ? ? PROLOG_LABEL >>>> ? ? ? SW %A3, %SP, 44; mem:ST4[FixedStack-1+4] >>>> ? ? ? SW %A2, %SP, 40; mem:ST4[FixedStack-1](align=8) >>>> ? ? ? %D0 = LDC1 %SP, 40; mem:LD8[%x2] >>>> >>>> >>>> The frame index operands of the first two stores and the fourth load >>>> have been lowered to real addresses. >>>> Since the first two SWs store to ($sp + 44) and ?($sp + 40), and >>>> instruction LDC1 loads from ($sp + 40), >>>> there should be a dependency between these instructions. >>>> >>>> However, when ScheduleDAGInstrs::BuildSchedGraph(AliasAnalysis *AA) >>>> builds the schedule graph, >>>> there are no dependency edges added between the two SWs and LDC1 because >>>> getUnderlyingObjectForInstr returns different objects for these instructions: >>>> >>>> underlying object of SWs: FixedStack-1 >>>> underlying object of LDC1: struct.ObjPointStruct* %P >>>> >>>> >>>> Is this a bug? >>>> Or are there ways to tell BuildSchedGraph it should add dependency edges? >>> >>> This is a wild guess. But it looks to me like your load's machineMemOperand should have been converted to refer to the stack frame. I would call that an ISEL bug. I can't say where the bug is without stepping through a test case. >>> >>> Maybe someone who's worked in this area of ISEL can give you a better hint. In the meantime, I would file a PR. >>> >>> -Andy >> > From atrick at apple.com Thu Mar 15 12:37:53 2012 From: atrick at apple.com (Andrew Trick) Date: Thu, 15 Mar 2012 10:37:53 -0700 Subject: [LLVMdev] Question about post RA scheduler In-Reply-To: References: <8EABC991-2BE1-45E5-A388-ABCCE5E826FB@apple.com> Message-ID: <8B7B9083-EF96-4608-B417-1ED3D8E2BAB6@apple.com> On Mar 15, 2012, at 10:22 AM, Akira Hatanaka wrote: > Thank you for your suggestions. > > I implemented the first approach (provided the byval argument and > offset to MachinePointerInfo) and it seems to have fixed the > instruction ordering problem. It was a lot simpler than initially > expected. > > In this particular case, is the user responsible for providing alias > information to MachinePointerInfo to guarantee instructions are > emitted in the correct order? It seems to me that getStore should not > try to infer pointer information unless the user explicitly asks for > it. The scheduler will then conservatively treat it as a load or store > that aliases anything. I think the pointer type inference is correct in the absence of any stronger information provided when the target lowers the store. In this case, the store is special because it initializes an object that already has a name in the IR. So it's the job of whoever creates that store to produce correct MachinePointerInfo. Unfortunately, that requirement is not obvious. If you can think of a way to clarify the lowering code through better comments, stricter API, or verification code, a patch would be most welcome. Thanks, -Andy From hfinkel at anl.gov Thu Mar 15 12:47:18 2012 From: hfinkel at anl.gov (Hal Finkel) Date: Thu, 15 Mar 2012 12:47:18 -0500 Subject: [LLVMdev] Problem with LoopDependenceAnalysis In-Reply-To: <20120315120512.528854b8@sapling2> References: <20120315120512.528854b8@sapling2> Message-ID: <20120315124718.01722ea7@sapling2> On Thu, 15 Mar 2012 12:05:12 -0500 Hal Finkel wrote: > On Thu, 15 Mar 2012 09:57:00 -0700 > Preston Briggs wrote: > > > Shanmukha Rao wrote: > > > I am using LLVM for implementing LoopFission pass. > > > I am using LoopPass. > > > I know that for checking circular dependency in loop I have to use > > > LoopDependenceAnalysis > > > > > > This is what i want to do. > > > ? ? ? ? for(int i = 0; i< n ; i++){ > > > s1 : a[i] = a[i] + x[i]; > > > s2 : x[i] = x[i+1] + i*2 ; > > > } > > > > > > /**there is no dependence from s2 to s1/ > > > so after distribution(it should be) : > > > > > > for(int i = 0; i< n ; i++) > > > s1: a[i] = a[i] + x[i]; > > > > > > for(int i = 0; i< n ; i++) > > > s2: x[i] = x[i+1] + i*2 ; > > > > > > but in llvm i couldn't able to find there is no dependency from s2 > > > to s1. LoopDependenceAnalyis always gives there is a dependency > > > from every load instructions to every store instructions. Is there > > > any other alternative to LoopDependenceAnalysis ? > > > > LoopDependenceAnalysis is the right tool, but it's not completely > > implemented. If you look at the source, you'll see the > > implementation is just outlined. > > > > I'm working on a more complete implementation, but much work > > remains. > > Have you looked at the recently-proposed patch by Sanjoy Das? He has > done some work on actually implementing the SIV tests. > http://permalink.gmane.org/gmane.comp.compilers.llvm.cvs/109305 Better link: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120227/138320.html -Hal > > Preston, if you're also working on this, can you please look over > Sanjoy's patch? > > -Hal > > > > > Preston > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory From atrick at apple.com Thu Mar 15 13:06:24 2012 From: atrick at apple.com (Andrew Trick) Date: Thu, 15 Mar 2012 11:06:24 -0700 Subject: [LLVMdev] Question about post RA scheduler In-Reply-To: <8B7B9083-EF96-4608-B417-1ED3D8E2BAB6@apple.com> References: <8EABC991-2BE1-45E5-A388-ABCCE5E826FB@apple.com> <8B7B9083-EF96-4608-B417-1ED3D8E2BAB6@apple.com> Message-ID: <714CEF71-24FC-4719-970B-C519BAE239A3@apple.com> On Mar 15, 2012, at 10:37 AM, Andrew Trick wrote: > > On Mar 15, 2012, at 10:22 AM, Akira Hatanaka wrote: > >> Thank you for your suggestions. >> >> I implemented the first approach (provided the byval argument and >> offset to MachinePointerInfo) and it seems to have fixed the >> instruction ordering problem. It was a lot simpler than initially >> expected. >> >> In this particular case, is the user responsible for providing alias >> information to MachinePointerInfo to guarantee instructions are >> emitted in the correct order? It seems to me that getStore should not >> try to infer pointer information unless the user explicitly asks for >> it. The scheduler will then conservatively treat it as a load or store >> that aliases anything. > > I think the pointer type inference is correct in the absence of any stronger information provided when the target lowers the store. In this case, the store is special because it initializes an object that already has a name in the IR. So it's the job of whoever creates that store to produce correct MachinePointerInfo. Unfortunately, that requirement is not obvious. If you can think of a way to clarify the lowering code through better comments, stricter API, or verification code, a patch would be most welcome. ...actually, if you're willing to submit a patch, then there's no need for me to justify the current implementation. I agree with your proposal in principal. Pointer inference in codegen is fundamentally incompatible with our representation of AliasAnalysis. If lowering code knows that its accessing a stack slot that cannot alias with any IR level objects, then it should explicitly ask for an inferred stack pointer. Otherwise we should conservatively assume the stack access can alias with anything. -Andy From stoklund at 2pi.dk Thu Mar 15 13:14:29 2012 From: stoklund at 2pi.dk (Jakob Stoklund Olesen) Date: Thu, 15 Mar 2012 11:14:29 -0700 Subject: [LLVMdev] commit r152019 broke architectures with more than 255 registers In-Reply-To: <94C47797-91E0-4BB0-856E-782CC5F3507F@2pi.dk> References: <4F54C204.1070108@cs.tut.fi> <94C47797-91E0-4BB0-856E-782CC5F3507F@2pi.dk> Message-ID: On Mar 5, 2012, at 10:40 AM, Jakob Stoklund Olesen wrote: > > On Mar 5, 2012, at 5:39 AM, Heikki Kultala wrote: > >> Our architecture(TCE) can have LOTS of registers. >> >> It seems r152019 changed some register bookkeeping data structures to >> 8-bit. This broke support for architectures with >255 registers. >> >> Please revert this change or make those register-related values at least >> 16 bits wide. > > I agree. We can limit the number of physregs to 64k, but no more. I have reverted the commits that limited the concatenated register and instruction names to 64k. They would have caused problems for a 16k register target. Heikki, please let me know if you are having problems with the limits enforced by TableGen now. /jakob From dggraham at email.wm.edu Thu Mar 15 13:32:53 2012 From: dggraham at email.wm.edu (Daniel Graham) Date: Thu, 15 Mar 2012 14:32:53 -0400 Subject: [LLVMdev] Register Allocation, Spill Weight calculation Message-ID: To Who it may concern, I would like to edit the how the spill weight are calculated for the greedy algorithm. Does anyone know where I could start. V/R Daniel Graham -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120315/0a46a181/attachment.html From nicolas.capens at gmail.com Thu Mar 15 14:07:45 2012 From: nicolas.capens at gmail.com (Nicolas Capens) Date: Thu, 15 Mar 2012 15:07:45 -0400 Subject: [LLVMdev] Scalar replacement of arrays In-Reply-To: References: Message-ID: <4F623E01.4090304@gmail.com> Hi Preston, Thanks for the suggestion. Unfortunately I don't know how to apply it to LLVM. I'm struggling with the problem of spilling. Basically scalarrepl makes the original array alloca disappear and replaces it with individual scalar allocas, which then also disappear when mem2reg puts them into SSA form. Then register allocation puts stuff on the stack again, but not where the allocas were. That's a problem when you actually expect to be able to dynamically index the array. I'd like the array alloca to stay where it is, but to see the same optimizations as achieved by scalarrepl+mem2reg in sections of code where no dynamic indexing is occuring, and the register allocator should spill and restore to/from the original array so that for dynamic indexing a minimal number of registers is written back to memory. I'm starting to think this actually can't be achieved with SSA at all. Perhaps I shouldn't run scalarrepl and see if the optimizations can be performed at the register allocator level instead? Cheers, Nicolas On 09/03/2012 12:34 PM, Preston Briggs wrote: > Nicolas Capens wrote: >> [...] >> I'm not sure if that's going to help achieve optimal code >> for when the array is sometimes being dynamically indexed. >> Essentially there should be some kind of store to load copy >> propagation. As far as I know that's exactly what mem2reg >> does, except that it only considers scalars and not elements >> of arrays. >> >> So would it be hard to extend mem2reg to also consider elements >> of arrays for promotion? It should obviously not perform the promotion >> when in between the store and load there's a dynamically indexed >> access to the array. Correct me if I'm wrong, but that seems it would >> be superior to scalarrepl itself (for arrays). >> >> Is there anyone experienced with mem2reg who wants to implement this? >> If not, any advice on how to best approach this? > Classically, we use dependence analysis to support such optimizations. > For example, see Chapter 8 in Allen& Kennedy's book, > "Optimizing Compilers for Modern Architectures." > > Preston > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From jake at ehrlichks.net Thu Mar 15 16:06:23 2012 From: jake at ehrlichks.net (ishkabible) Date: Thu, 15 Mar 2012 14:06:23 -0700 (PDT) Subject: [LLVMdev] New executable format Message-ID: <33513282.post@talk.nabble.com> So I was wondering how one would go about adding a new target format. I can't find any documentation on the matter. I want to create my own much simplified executable format for a simple OS but I definitely don't wan't to re-implement a compiler and I really want the powerful optimizations that LLVM is already capable of. How would it be possible to do this so that after all my items are linked together I have the output format I desire? The linker doesn't strictly need to accept this format as input but it does need to be able to produce it as output. I have looked at the class TargetLoweringObjectFile but I'm really not sure what to do with it. -- View this message in context: http://old.nabble.com/New-executable-format-tp33513282p33513282.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From stoklund at 2pi.dk Thu Mar 15 16:33:38 2012 From: stoklund at 2pi.dk (Jakob Stoklund Olesen) Date: Thu, 15 Mar 2012 14:33:38 -0700 Subject: [LLVMdev] Register Allocation, Spill Weight calculation In-Reply-To: References: Message-ID: <2D68DA82-3F12-4303-8BD2-E5891A9201BC@2pi.dk> On Mar 15, 2012, at 11:32 AM, Daniel Graham wrote: > To Who it may concern, > I would like to edit the how the spill weight are calculated for the greedy algorithm. Does anyone know where I could start. CalcSpillWeights.cpp ;-) /jakob From davidterei at gmail.com Thu Mar 15 18:53:49 2012 From: davidterei at gmail.com (David Terei) Date: Thu, 15 Mar 2012 16:53:49 -0700 Subject: [LLVMdev] LLVM GHC Backend: Tables Next To Code In-Reply-To: <8437383B-0304-4C62-95BA-CC050266D281@apple.com> References: <607B6576-F8F5-4293-900D-EFA21B7FD48C@apple.com> <8437383B-0304-4C62-95BA-CC050266D281@apple.com> Message-ID: On 15 March 2012 01:58, Chris Lattner wrote: > > On Mar 13, 2012, at 4:36 PM, David Terei wrote: > >> Hi Chris, >> >> One remaining question here is, if the GHC team tries some of these >> alternative schemes and finds them unsatisfactory what is the LLVM >> communities feeling in regards to extending LLVM IR to support >> directly implementing TNTC? > > I'm strongly in favor of getting proper support for TNTC, because your workarounds (while very pragmatic!) give me the shivers :). ?I'd much rather have a proper solution that works well with the rest of the llvm toolchain. > > That said, the design and implementation needs to fit in well with the rest of llvm. ?I'm not willing to add a crazy completely-special purpose bolt-on extension just to support this, which means that we need to find a way to design it that makes sense in the larger context of llvm. OK great! I have no idea when we on the GHC side (me) will get time to tackle this so don't hold your breath. Good to know where you guys stand though. We began this conversation as a GSoC student is interested in tackling it so we may progress down that path. > >> How do you envision this would look at the >> IR level, how much work do you think it would be and most importantly >> do you feel LLVM would be willing to accept patches for it? > > I really like the idea of adding this as an inline asm blob at the start of a function, and biasing the actual address of the closure based on the size of the table. ?I'm not 100% confident that it will work (not being very familiar with TNTC) but it seems quite plausible and the impact on LLVM would be quite reasonable (some new calling convention work?) > > -Chris I personally would prefer something like Gabor suggests that is platform agnostic but am not fussed on the issue, whatever works (and isn't a hack) is my goal. Cheers, David From chandlerc at gmail.com Thu Mar 15 19:10:57 2012 From: chandlerc at gmail.com (Chandler Carruth) Date: Thu, 15 Mar 2012 17:10:57 -0700 Subject: [LLVMdev] FYI -- potential compile time regression on boost spirit with r152737 and/or r152752 In-Reply-To: References: Message-ID: Just a brief follow-up, mostly relaying my findings from IRC: I looked in depth at loop_unroll to see why it slowed down. The inliner run far 4.2% of the time, 2.7% of which was spent actually doing inlines. So the cost analysis is not hurting us here. However, we are spending quite a bit of time in the optimizations I expect to benefit from better inlining: InstCombine, LSR, and GVN. And, thankfully, we're getting significant runtime improvements from the time spent in these optimizers, so they aren't going of the deep end, they're actually simplifying code (if perhaps not as quickly as we'd like). The conclusion seems to be that these patches are fine, and we just need to keep pressure on the scalar optimization passes to run as efficiently as possible. The improved inlining costs us compile time but seems to pay handsomely at runtime. If folks have other significant compile-time regressions, I would be interested in having repro instructions. =] Last but not least, the refactoring to do inline cost analysis per-callsite may actually make the analysis *faster* in several situations. As I'm going through this I'm finding lots of inefficiencies in the current design that should be fixed along the way. -Chandler -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120315/73cf6223/attachment.html From gregory.szorc at gmail.com Thu Mar 15 23:15:02 2012 From: gregory.szorc at gmail.com (Gregory Szorc) Date: Thu, 15 Mar 2012 21:15:02 -0700 Subject: [LLVMdev] Python bindings in tree Message-ID: <4F62BE46.7070705@gmail.com> There was some talk on IRC last week about desire for Python bindings to LLVM's Object.h C interface. So, I coded up some and you can now find some Python bindings in trunk at bindings/python. Currently, the interfaces for Object.h and Disassembler.h are implemented. I'd like to stress that things are still rough around the edges, so use at your own risk. I intend to smooth things over in the next week or so. I'd really like to fill out the implementation to cover the entirety of the C interface. Since this will require a lot of work (Core.h is *massive*), I wanted to run things by the community before I invest too much time and create something people don't want (I already had to back out the Python binding to EnhancedDisassembly because I didn't realize it was deprecated - oops). Are people interested in more expansive in-tree Python bindings? Specifically, do we want a Python API for the IR primitives like type and value that sit lower than the module APIs? I know there are other Python bindings floating around and from the perspective of the project, one option is to just tell people to go use them. But llvm-py seems to have fallen to the wayside (although I did read a blog post last week where somebody forked it on GitHub and ported it to work with current SVN HEAD). Having in-tree bindings would certainly help prevent bit rot (especially if Python test regressions can mark builds as failed). Finally, I checked in the new bindings with no review (I was given the OK over IRC). If someone would be so kind as to review them, I'd really appreciate the feedback. Also, if I am to commit new features to the Python bindings, does anyone have a problem with continuing to hold off on the review [of new code] until after check-in? I think this would help lower the "time to market" and get more eyes and early testers using the bindings. From my experience with Clang, people aren't exactly lining up to review Python patches, so I fear that new Python features would be sitting in patch purgatory instead of being tested by early adopters. Gregory Szorc gregory.szorc at gmail.com From ggreif at gmail.com Fri Mar 16 04:32:47 2012 From: ggreif at gmail.com (Gabor Greif) Date: Fri, 16 Mar 2012 10:32:47 +0100 Subject: [LLVMdev] PowerPC codegen experts looking for challenges? Message-ID: Hi all, at my paid job I am pushing the Clang/LLVM combo into evaluation (we currently use a gcc3.4 generation toolchain). Since we produce for the embedded domain we need a reliable host (i.e. simulation i686) / target (PPC) dual setup. To this end I almost succeeded grinding through our large(ish) codebase but found some PPC snags. I filed these bugs, complete with repro IR code: http://llvm.org/bugs/show_bug.cgi?id=12201 http://llvm.org/bugs/show_bug.cgi?id=12203 I gather that the FreeBSD project is aiming at production use of LLVM for PPC also, so these might have interesting fixes for those folks too. Another thing, it appears that thread-global data is not yet supported by the PPC backend. Is there a tracking bug for this or any plans to fill this gap? Cheers, Gabor From hfinkel at anl.gov Fri Mar 16 08:08:16 2012 From: hfinkel at anl.gov (Hal Finkel) Date: Fri, 16 Mar 2012 08:08:16 -0500 Subject: [LLVMdev] PowerPC codegen experts looking for challenges? In-Reply-To: References: Message-ID: <20120316080816.7b1871b0@sapling2> On Fri, 16 Mar 2012 10:32:47 +0100 Gabor Greif wrote: > Hi all, > > at my paid job I am pushing the Clang/LLVM combo into evaluation (we > currently use a gcc3.4 generation toolchain). Great! Since we produce for the > embedded domain we need a reliable > host (i.e. simulation i686) / target (PPC) dual setup. To this end I > almost succeeded grinding through our large(ish) codebase but found > some PPC snags. > > I filed these bugs, complete with repro IR code: > > http://llvm.org/bugs/show_bug.cgi?id=12201 Interesting. I'll try to get this fixed next week. > http://llvm.org/bugs/show_bug.cgi?id=12203 Roman and I have been working on this one over the last few days; expect a fix soon. Please keep submitting bug reports; there are a number of us who are interested in PPC support. -Hal > > I gather that the FreeBSD project is aiming at production use of LLVM > for PPC also, so these might have interesting fixes for those folks > too. > > Another thing, it appears that thread-global data is not yet supported > by the PPC backend. Is there a tracking bug for this or any plans to > fill this gap? > > Cheers, > > Gabor > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory From christophg+llvm at grenz-bonn.de Fri Mar 16 08:12:08 2012 From: christophg+llvm at grenz-bonn.de (Christoph Grenz) Date: Fri, 16 Mar 2012 14:12:08 +0100 Subject: [LLVMdev] Python bindings in tree In-Reply-To: <4F62BE46.7070705@gmail.com> References: <4F62BE46.7070705@gmail.com> Message-ID: <2358486.PuEW3MFQ3C@deepthought> Hello, Am Donnerstag, 15. M?rz 2012, 21:15:02 schrieb Gregory Szorc: > There was some talk on IRC last week about desire for Python bindings to > LLVM's Object.h C interface. So, I coded up some and you can now find > some Python bindings in trunk at bindings/python. Currently, the > interfaces for Object.h and Disassembler.h are implemented. FYI: I recently startet working on Python3 bindings for LLVM 3 as all bindings I could find were for LLVM 2.x and up to Python 2.6. I used Cython for easier coding and already ported a big part of Core.h including all Type and Value classes. https://www.gitorious.org/python-llvm3 > [...] > > Gregory Szorc > gregory.szorc at gmail.com Christoph Grenz From rkotler at mips.com Fri Mar 16 08:55:49 2012 From: rkotler at mips.com (Reed Kotler) Date: Fri, 16 Mar 2012 06:55:49 -0700 Subject: [LLVMdev] tablegen question Message-ID: <4F634665.9070401@mips.com> Trying to resolve some general tablegen questions. Consider the test case for Tablegen called eq.td class Base { int Value = V; } class Derived : Base; def TRUE : Derived<"true">; def FALSE : Derived<"false">; If I process this through tablegen I get: ------------- Classes ----------------- class Base { int Value = Base:V; string NAME = ?; } class Derived { // Base int Value = !if(!eq(Derived:Truth, "true"), 1, 0); string NAME = ?; } ------------- Defs ----------------- def FALSE { // Base Derived int Value = 0; string NAME = ?; } def TRUE { // Base Derived int Value = 1; string NAME = ?; } Why is NAME=? in FALSE and TRUE. Shouldn't it be FALSE and TRUE ?? From christoph at sicherha.de Fri Mar 16 09:39:26 2012 From: christoph at sicherha.de (Christoph Erhardt) Date: Fri, 16 Mar 2012 15:39:26 +0100 Subject: [LLVMdev] Lowering formal pointer arguments In-Reply-To: References: <4F60B650.1060406@gmail.com> <4F61B017.9040304@gmail.com> Message-ID: <4F63509E.8020503@sicherha.de> Hi Patrik, > DAG.getMachineFunction().getFunction() only works in LowerFormalArguments (there it returns the callee), not in LowerCall (where it returns the caller, rather than the callee). You need to pass more information about the function type to LowerCall (besides partial information such as the isVarArg parameter). > > I can provide a patch if you are interested. (Unfortunately, to push this upstream has been on my to-do-list for while). please do! I have been facing the same problem and am very interested in a clean solution for this. Best regards, Christoph From borja.ferav at gmail.com Fri Mar 16 11:01:26 2012 From: borja.ferav at gmail.com (Borja Ferrer) Date: Fri, 16 Mar 2012 17:01:26 +0100 Subject: [LLVMdev] Lowering formal pointer arguments Message-ID: I had the same issue as both of you when I was implementing this for my backend. In LowerCall you can get the callee prototype info when the Callee SDValue is a GlobalAddressSDNode doing cast(G->getGlobal()) (where G is GlobalAddressSDNode *G = dyn_cast(Callee)), but this won't work when it is a ExternalSymbolSDNode, for that case i had to add additional info into the ISD::OutputArg struct to know the real size of the splitted argument. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120316/b55df694/attachment.html From jcarter at mips.com Fri Mar 16 13:49:34 2012 From: jcarter at mips.com (Carter, Jack) Date: Fri, 16 Mar 2012 18:49:34 +0000 Subject: [LLVMdev] Clang target specific test case question Message-ID: <86AC779C188FE74F88F6494478B46332EB329F@exchdb03.mips.com> I notice that there doesn't seem to be any target specific test directories for clang. In llvm proper we put all/most target specific test cases in subdirectories named after the target (such as Mips) under the generic category (such as MC). I'd like to submit my target specific clang test in the same manner. In clang/test/CodeGen/Mips instead of in clang/test/CodeGen. Are there any philosophical problems with this? I believe the testing software doesn't care. Cheers, Jack -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120316/485051f5/attachment.html From anton at korobeynikov.info Fri Mar 16 15:05:25 2012 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Sat, 17 Mar 2012 00:05:25 +0400 Subject: [LLVMdev] LLVM is participating in Google Summer of Code 2012 Message-ID: Hello Everyone I'd like to announce that our application for participation in the GSoC 2012 was accepted. More details for prospective students will follow. PS: To whom it may concern: it's a good time to update "Open Projects" pages ;) -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From etherzhhb at gmail.com Fri Mar 16 20:18:49 2012 From: etherzhhb at gmail.com (Hongbin Zheng) Date: Sat, 17 Mar 2012 09:18:49 +0800 Subject: [LLVMdev] [llvm-commits] Review Request: Use SmallPtrSetImpl instead of SmallPtrSet in funciton IVUsers::AddUsersIfInteresting In-Reply-To: <6D6B49D7-C9AA-40DA-885E-46F6F2C33C80@apple.com> References: <6D6B49D7-C9AA-40DA-885E-46F6F2C33C80@apple.com> Message-ID: hi, On Sat, Mar 17, 2012 at 2:11 AM, Andrew Trick wrote: > Yep. I normally do that. I was under some strange impression last night that SmallPtrSetImpl wasn't a template. The patch is incorrect because the SmallPtrSetImpl is neither a template nor has an "insert" function... After a detailed look at the header of SmallPtrSet, I found that the SmallPtrSetImpl has an "insert_imp" function which accepts void pointer as argument, and the "insert" function in SmallPtrSet simply cast the incoming pointer to void* by the "PtrTraits::getAsVoidPointer" function and pass the void pointer to insert_imp. So I wonder can we make SmallPtrSetImpl become a template just like SmallVectorImpl? After that we can move insert/erase/count from SmallPtrSet to SmallPtrSetImpl, and users can pass a SmallPtrSetImpl as argument instead of pass something like SmallPtrSet. > Please check in, and you can simultaneously fix the polly branch. Temporary fixed by passing a dummy set. > > Incidentally, the only reason I didn't hide the SmallPtrSet argument behind the API is that all the callers outside IVUsers will be removed from the codebase soon. So which function is supposed to call by outside caller? > > -Andy best regards ether From anders at 0x63.nu Sat Mar 17 18:14:43 2012 From: anders at 0x63.nu (Anders Waldenborg) Date: Sun, 18 Mar 2012 00:14:43 +0100 Subject: [LLVMdev] Python bindings in tree In-Reply-To: <2358486.PuEW3MFQ3C@deepthought> References: <4F62BE46.7070705@gmail.com> <2358486.PuEW3MFQ3C@deepthought> Message-ID: <87mx7eg6qk.wl%anders@0x63.nu> At Fri, 16 Mar 2012 14:12:08 +0100, Christoph Grenz wrote: > > Hello, > > Am Donnerstag, 15. M?rz 2012, 21:15:02 schrieb Gregory Szorc: > > There was some talk on IRC last week about desire for Python bindings to > > LLVM's Object.h C interface. So, I coded up some and you can now find > > some Python bindings in trunk at bindings/python. Currently, the > > interfaces for Object.h and Disassembler.h are implemented. > > > FYI: > > I recently startet working on Python3 bindings for LLVM 3 as all bindings I > could find were for LLVM 2.x and up to Python 2.6. > I used Cython for easier coding and already ported a big part of Core.h > including all Type and Value classes. FYI: I've also been working on new python bindings. My bindings are written using ctypes (just like the in-tree clang/cindex bindings). Most of Core.h is bound, and stuff from ExecutionEngine.h, Analysis, BitReader, BitWriter. The have fairly good test coverage (using nosetests). The ctypes definitions are generated from the header files using the clang python bindings. My local copy also contain a few patches to llvm-c. Everything can be found here: http://people.0x63.nu/~andersg/llvm-python-bindings/ * 0001-Fix-class-hierarchy-indentation-in-LLVM_FOR_EACH_VAL.patch * 0029-Trivial-copy-paste-error-in-LangRef.patch These are just cosmetic stuff that I stumbled upon * 0004-Add-LLVMPrintModule-to-llvm-c.patch Adds a new LLVMPrintModule function which is similar to LLVMDumpModule but dumps to a string instead of stdout. * 0005-Add-LLVMCreateMemoryBufferFromData-to-llvm-c.patch Adds LLVMCreateMemoryBufferFromData function. * 0015-LLVMMessageRef.patch Adds a "typedef char *LLVMMessageRef;". Which may seem useless. But it acts as documentation. All functions that return a string that should be freed with LLVMDisposeMessage are changed to use this type instead. * bindings-python.tar.gz The bindings/python/ directory. There are some hardcoded paths and hacks here and there. From anders at 0x63.nu Sat Mar 17 18:14:25 2012 From: anders at 0x63.nu (Anders Waldenborg) Date: Sun, 18 Mar 2012 00:14:25 +0100 Subject: [LLVMdev] Python bindings in tree In-Reply-To: <2358486.PuEW3MFQ3C@deepthought> References: <4F62BE46.7070705@gmail.com> <2358486.PuEW3MFQ3C@deepthought> Message-ID: <87obrug6r2.wl%anders@0x63.nu> At Fri, 16 Mar 2012 14:12:08 +0100, Christoph Grenz wrote: > > Hello, > > Am Donnerstag, 15. M?rz 2012, 21:15:02 schrieb Gregory Szorc: > > There was some talk on IRC last week about desire for Python bindings to > > LLVM's Object.h C interface. So, I coded up some and you can now find > > some Python bindings in trunk at bindings/python. Currently, the > > interfaces for Object.h and Disassembler.h are implemented. > > > FYI: > > I recently startet working on Python3 bindings for LLVM 3 as all bindings I > could find were for LLVM 2.x and up to Python 2.6. > I used Cython for easier coding and already ported a big part of Core.h > including all Type and Value classes. FYI: I've also been working on new python bindings. My bindings are written using ctypes (just like the in-tree clang/cindex bindings). Most of Core.h is bound, and stuff from ExecutionEngine.h, Analysis, BitReader, BitWriter. The have fairly good test coverage (using nosetests). The ctypes definitions are generated from the header files using the clang python bindings. My local copy also contain a few patches to llvm-c. Everything can be found here: http://people.0x63.nu/~andersg/llvm-python-bindings/ * 0001-Fix-class-hierarchy-indentation-in-LLVM_FOR_EACH_VAL.patch * 0029-Trivial-copy-paste-error-in-LangRef.patch These are just cosmetic stuff that I stumbled upon * 0004-Add-LLVMPrintModule-to-llvm-c.patch Adds a new LLVMPrintModule function which is similar to LLVMDumpModule but dumps to a string instead of stdout. * 0005-Add-LLVMCreateMemoryBufferFromData-to-llvm-c.patch Adds LLVMCreateMemoryBufferFromData function. * 0015-LLVMMessageRef.patch Adds a "typedef char *LLVMMessageRef;". Which may seem useless. But it acts as documentation. All functions that return a string that should be freed with LLVMDisposeMessage are changed to use this type instead. * bindings-python.tar.gz The bindings/python/ directory. There are some hardcoded paths and hacks here and there. From jobnoorman at gmail.com Sun Mar 18 04:15:38 2012 From: jobnoorman at gmail.com (Job Noorman) Date: Sun, 18 Mar 2012 10:15:38 +0100 Subject: [LLVMdev] Dematerializing functions during opt Message-ID: <1741795.ey69bixLQy@squatpc> I'm writing an opt pass that adds a lot of new functions to a module. In some extreme cases, this causes opt to fail with out-of-memory errors. Since all the created functions quickly become unneeded for my pass, I am trying to find a way to discard them from memory (i.e., write them to disk). I noticed there is a method to do just this: GlobalValue::Dematerialize. However, there does not seem to be an appropriate GVMaterializer to do the job. So, I was wondering if there is an existing way to dematerialize functions during opt. And if there's not, could someone give some pointers on how to add one? Thanks in advance, Job From gtsiour at softlab.ntua.gr Sun Mar 18 05:47:44 2012 From: gtsiour at softlab.ntua.gr (Yiannis Tsiouris) Date: Sun, 18 Mar 2012 12:47:44 +0200 Subject: [LLVMdev] Python bindings in tree In-Reply-To: <87obrug6r2.wl%anders@0x63.nu> References: <4F62BE46.7070705@gmail.com> <2358486.PuEW3MFQ3C@deepthought> <87obrug6r2.wl%anders@0x63.nu> Message-ID: <4F65BD50.1060903@softlab.ntua.gr> Hi Anders, On 03/18/2012 01:14 AM, Anders Waldenborg wrote: > ... > My local copy also contain a few patches to llvm-c. > > Everything can be found here: > http://people.0x63.nu/~andersg/llvm-python-bindings/ > > > * 0001-Fix-class-hierarchy-indentation-in-LLVM_FOR_EACH_VAL.patch > * 0029-Trivial-copy-paste-error-in-LangRef.patch > These are just cosmetic stuff that I stumbled upon > > * 0004-Add-LLVMPrintModule-to-llvm-c.patch > Adds a new LLVMPrintModule function which is similar to > LLVMDumpModule but dumps to a string instead of stdout. > > * 0005-Add-LLVMCreateMemoryBufferFromData-to-llvm-c.patch > Adds LLVMCreateMemoryBufferFromData function. > I'm very interested on the 0004,0005 above! Please, try to share/push them upstream! :-) Yiannis -- Yiannis Tsiouris Ph.D. student, Software Engineering Laboratory, National Technical University of Athens WWW: http://www.softlab.ntua.gr/~gtsiour From keithshep at gmail.com Sun Mar 18 13:04:50 2012 From: keithshep at gmail.com (Keith Sheppard) Date: Sun, 18 Mar 2012 14:04:50 -0400 Subject: [LLVMdev] a place for listing LLVM binding implementations? Message-ID: Hello, I didn't see any section on this site for LLVM language bindings. There is http://llvm.org/ProjectsWithLLVM/ but that seems to be more about self-contained applications of LLVM. I think it would be useful to add a page (or section to an existing page) if you all agree. My binding is https://github.com/keithshep/llvm-fs and I know that there are many others. Best, Keith From christophg+llvm at grenz-bonn.de Sun Mar 18 20:21:47 2012 From: christophg+llvm at grenz-bonn.de (Christoph Grenz) Date: Mon, 19 Mar 2012 02:21:47 +0100 Subject: [LLVMdev] a place for listing LLVM binding implementations? In-Reply-To: References: Message-ID: <7110006.uF3KBG717H@deepthought> Am Sonntag, 18. M?rz 2012, 14:04:50 schrieb Keith Sheppard: > Hello, I didn't see any section on this site for LLVM language > bindings. There is http://llvm.org/ProjectsWithLLVM/ but that seems to > be more about self-contained applications of LLVM. I think it would be > useful to add a page (or section to an existing page) if you all > agree. My binding is https://github.com/keithshep/llvm-fs and I know > that there are many others. I agree. A page listing all language bindings, their extent/features and if possible the last supported LLVM version would really help. > Best, Keith Best regards, Christoph Grenz > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From gregory.szorc at gmail.com Sun Mar 18 23:52:12 2012 From: gregory.szorc at gmail.com (Gregory Szorc) Date: Sun, 18 Mar 2012 21:52:12 -0700 Subject: [LLVMdev] Python bindings in tree In-Reply-To: <87obrug6r2.wl%anders@0x63.nu> References: <4F62BE46.7070705@gmail.com> <2358486.PuEW3MFQ3C@deepthought> <87obrug6r2.wl%anders@0x63.nu> Message-ID: <4F66BB7C.8060500@gmail.com> On 3/17/2012 4:14 PM, Anders Waldenborg wrote: > FYI: > > I've also been working on new python bindings. > > My bindings are written using ctypes (just like the in-tree > clang/cindex bindings). Most of Core.h is bound, and stuff from > ExecutionEngine.h, Analysis, BitReader, BitWriter. The have fairly > good test coverage (using nosetests). The ctypes definitions are > generated from the header files using the clang python bindings. The automatic generation of the Python ctypes interfaces using the Clang Python bindings is pretty friggin cool! > My local copy also contain a few patches to llvm-c. > > Everything can be found here: > http://people.0x63.nu/~andersg/llvm-python-bindings/ > > > * 0004-Add-LLVMPrintModule-to-llvm-c.patch > Adds a new LLVMPrintModule function which is similar to > LLVMDumpModule but dumps to a string instead of stdout. > > * 0005-Add-LLVMCreateMemoryBufferFromData-to-llvm-c.patch > Adds LLVMCreateMemoryBufferFromData function. These are desperately needed by the C API. Can you please submit them? FWIW, all my work is at https://github.com/indygreg/llvm/tree/python_bindings/bindings/python. Parts of Core.h still need love (especially the Value system). I'm doing some dynamic type creation at run-time using the Value hierarchy. Somewhat scary stuff, but it does seem to work. I really need a LLVMGetValueID() API to fetch llvm::Value::getValueID() to enable more efficient value casting. From some discussion on #llvm, I think people are receptive to this. The main concern would be that the C API would be tied to a specific version of the shared library because the value ID enumeration aren't guaranteed for all of time. But, that contract is already broken, so I don't think it's a big deal: just something that needs to be documented. Of course, Python is a dynamic language, so if there were a C API that exposed the llvm::Value class hierarchy, we could always have Python dynamically create types at run-time :) I've also implemented some missing C APIs (such as IR parsing and more ObjectFile APIs) and have patches awaiting review on the mailing list. Greg From anton at korobeynikov.info Mon Mar 19 02:46:28 2012 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Mon, 19 Mar 2012 11:46:28 +0400 Subject: [LLVMdev] a place for listing LLVM binding implementations? In-Reply-To: References: Message-ID: Hello Everyone > bindings. There is http://llvm.org/ProjectsWithLLVM/ but that seems to > be more about self-contained applications of LLVM. I think it would be > useful to add a page (or section to an existing page) if you all > agree. My binding is https://github.com/keithshep/llvm-fs and I know > that there are many others. +1. Keith, if you'll create such page including yours bindings and some others you're aware of - I'll put the stuff on llvm.org -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From patrik.h.hagglund at ericsson.com Mon Mar 19 03:00:55 2012 From: patrik.h.hagglund at ericsson.com (=?iso-8859-1?Q?Patrik_H=E4gglund_H?=) Date: Mon, 19 Mar 2012 09:00:55 +0100 Subject: [LLVMdev] Lowering formal pointer arguments In-Reply-To: <4F63509E.8020503@sicherha.de> References: <4F60B650.1060406@gmail.com> <4F61B017.9040304@gmail.com> <4F63509E.8020503@sicherha.de> Message-ID: Here is a quick-and-dirty fix, done on top of trunk from Jan 25. It just adds FTy as an extra parameter. Inside LowerCall: ... FTy->getReturnType(); for (FunctionType::param_iterator i = FTy->param_begin(), e = FTy->param_end(); i != e; ++i) { Type *T = *i; ... However, using NULL as default value probably breaks some code, which needs to be fixed (before submitting upstream). Also, for example, the isVarArg parameter should probably be removed. /Patrik H?gglund -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Christoph Erhardt Sent: den 16 mars 2012 15:39 To: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Lowering formal pointer arguments Hi Patrik, > DAG.getMachineFunction().getFunction() only works in LowerFormalArguments (there it returns the callee), not in LowerCall (where it returns the caller, rather than the callee). You need to pass more information about the function type to LowerCall (besides partial information such as the isVarArg parameter). > > I can provide a patch if you are interested. (Unfortunately, to push this upstream has been on my to-do-list for while). please do! I have been facing the same problem and am very interested in a clean solution for this. Best regards, Christoph _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- A non-text attachment was scrubbed... Name: lowercall_fty.diff Type: application/octet-stream Size: 18766 bytes Desc: lowercall_fty.diff Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/9301afd0/attachment.obj From kevin at kelleysoft.com Mon Mar 19 05:46:37 2012 From: kevin at kelleysoft.com (Kevin Kelley) Date: Mon, 19 Mar 2012 05:46:37 -0500 Subject: [LLVMdev] a place for listing LLVM binding implementations? In-Reply-To: References: Message-ID: <4F670E8D.6000308@kelleysoft.com> On 3/19/2012 2:46 AM, Anton Korobeynikov wrote: > Hello Everyone > >> bindings. There is http://llvm.org/ProjectsWithLLVM/ but that seems to >> be more about self-contained applications of LLVM. I think it would be >> useful to add a page (or section to an existing page) if you all >> agree. My binding is https://github.com/keithshep/llvm-fs and I know >> that there are many others. > +1. > > Keith, if you'll create such page including yours bindings and some > others you're aware of - I'll put the stuff on llvm.org > I'm doing a Java binding -- http://code.google.com/p/llvm-j/ The APIs are mostly present, now, and a very simple JIT function does work. Still work in progress, though. LLVM-3.0; developing on Windows. I'm also aware of a jllvm java binding project, Swig-based, 2.8-level, on googlecode. Kevin From v.d.sorokin at gmail.com Mon Mar 19 08:45:49 2012 From: v.d.sorokin at gmail.com (Vladimir Sorokin) Date: Mon, 19 Mar 2012 17:45:49 +0400 Subject: [LLVMdev] [patch] Enhance of asm macros In-Reply-To: References: <082C3546-45AE-4800-870C-254C040C1B96@apple.com> Message-ID: Hi llvm users & developers! Attached patches: 1) rewrite previous patch, now for darwin platform applied old mechanism 2) patch added processing .rept directive 3) patch added processing .irp directive 4) patch added processing .irpc directive -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/bba2d4bb/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: 1.macro-enh.patch Type: text/x-patch Size: 27672 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/bba2d4bb/attachment-0004.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: 2.rept-directive.patch Type: text/x-patch Size: 3657 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/bba2d4bb/attachment-0005.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: 3.irp-directive.patch Type: text/x-patch Size: 4944 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/bba2d4bb/attachment-0006.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: 4.irpc-directive.patch Type: text/x-patch Size: 5030 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/bba2d4bb/attachment-0007.bin From keithshep at gmail.com Mon Mar 19 09:33:07 2012 From: keithshep at gmail.com (Keith Sheppard) Date: Mon, 19 Mar 2012 10:33:07 -0400 Subject: [LLVMdev] a place for listing LLVM binding implementations? In-Reply-To: References: Message-ID: OK I'll do that. I might not get to it until the weekend depending on how much free time I have. Thanks, Keith On Mon, Mar 19, 2012 at 3:46 AM, Anton Korobeynikov wrote: > Hello Everyone > >> bindings. There is http://llvm.org/ProjectsWithLLVM/ but that seems to >> be more about self-contained applications of LLVM. I think it would be >> useful to add a page (or section to an existing page) if you all >> agree. My binding is https://github.com/keithshep/llvm-fs and I know >> that there are many others. > +1. > > Keith, if you'll create such page including yours bindings and some > others you're aware of - I'll put the stuff on llvm.org > > -- > With best regards, Anton Korobeynikov > Faculty of Mathematics and Mechanics, Saint Petersburg State University From anton at korobeynikov.info Mon Mar 19 09:34:03 2012 From: anton at korobeynikov.info (Anton Korobeynikov) Date: Mon, 19 Mar 2012 17:34:03 +0300 Subject: [LLVMdev] a place for listing LLVM binding implementations? In-Reply-To: References: Message-ID: > OK I'll do that. I might not get to it until the weekend depending on > how much free time I have. Thanks, Keith! -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University From hkultala at cs.tut.fi Mon Mar 19 09:41:08 2012 From: hkultala at cs.tut.fi (Heikki Kultala) Date: Mon, 19 Mar 2012 16:41:08 +0200 Subject: [LLVMdev] floating point immediate problem Message-ID: <4F674584.6050206@cs.tut.fi> I tried to generate pattern for instruction which transports floating point immediate to a floating point register. def MOVF32fk : InstTCE<(outs F32Regs:$dst), (ins f32imm:$val), "$val -> $dst;", [(set F32Regs:$dst, (f32 imm:$val))]>; This causes an type contradiction: /home/hkultala26/src/devel/tce/src/applibs/LLVMBackend/plugin//TCEInstrInfo.td:109:1: error: In MOVF32fk: Type inference contradiction found, 'f32' needs to be integer def MOVF32fk : InstTCE<(outs F32Regs:$dst), (ins f32imm:$val), why? Why does llvm assume floating point immediate needs to be integer? From matt.pharr at gmail.com Mon Mar 19 10:17:23 2012 From: matt.pharr at gmail.com (Matt Pharr) Date: Mon, 19 Mar 2012 08:17:23 -0700 Subject: [LLVMdev] Publication: ispc compiler paper Message-ID: <351485AB-1D82-40C9-9669-08FEFAB6221E@gmail.com> An addition for the publications page on llvm.org (and of potential interest to other people using LLVM for high-performance SIMD computation.) The ispc project would never have been possible without LLVM; many thanks to all involved in the LLVM project for building such a great system. Thanks, -matt ispc: A SPMD Compiler for High-Performance CPU Programming Matt Pharr and William R. Mark Innovative Parallel Computing (InPar) 2012 http://cloud.github.com/downloads/ispc/ispc/ispc_inpar_2012.pdf Abstract: SIMD parallelism has become an increasingly important mechanism for delivering performance in modern CPUs, due its power efficiency and relatively low cost in die area compared to other forms of parallelism. Unfortunately, languages and compilers for CPUs have not kept up with the hardware's capabilities. Existing CPU parallel programming models focus primarily on multi-core parallelism, neglecting the substantial computational capabilities that are available in CPU SIMD vector units. GPU-oriented languages like OpenCL support SIMD but lack capabilities needed to achieve maximum efficiency on CPUs and suffer from GPU-driven constraints that impair ease of use on CPUs. We have developed a compiler, the Intel SPMD Program Compiler (ispc), that delivers very high performance on CPUs thanks to effective use of both multiple processor cores and SIMD vector units. ispc draws from GPU programming languages, which have shown that for many applications the easiest way to program SIMD units is to use a single-program, multiple-data (SPMD) model, with each instance of the program mapped to one SIMD lane. We discuss language features that make ispc easy to adopt and use productively with existing software systems and show that ispc delivers up to 35x speedups on a 4-core system and up to 240x speedups on a 40-core system for complex workloads (compared to serial C++ code). -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/57b29f97/attachment.html From Micah.Villmow at amd.com Mon Mar 19 10:36:38 2012 From: Micah.Villmow at amd.com (Villmow, Micah) Date: Mon, 19 Mar 2012 15:36:38 +0000 Subject: [LLVMdev] floating point immediate problem In-Reply-To: <4F674584.6050206@cs.tut.fi> References: <4F674584.6050206@cs.tut.fi> Message-ID: <88EE5EEF64BDB14686BA3D45C5C30BA318549464@sausexdag03.amd.com> (f32 imm:$val))]>; <-- this needs to be fpimm, 'imm' is an integer immediate. > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Heikki Kultala > Sent: Monday, March 19, 2012 7:41 AM > To: LLVM Dev > Subject: [LLVMdev] floating point immediate problem > > I tried to generate pattern for instruction which transports floating > point immediate to a floating point register. > > def MOVF32fk : InstTCE<(outs F32Regs:$dst), (ins f32imm:$val), > "$val -> $dst;", > [(set F32Regs:$dst, (f32 imm:$val))]>; > > This causes an type contradiction: > > /home/hkultala26/src/devel/tce/src/applibs/LLVMBackend/plugin//TCEInstr > Info.td:109:1: > error: In MOVF32fk: Type inference contradiction found, 'f32' needs to > be integer > def MOVF32fk : InstTCE<(outs F32Regs:$dst), (ins f32imm:$val), > > > > why? Why does llvm assume floating point immediate needs to be integer? > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From anders at 0x63.nu Mon Mar 19 12:44:41 2012 From: anders at 0x63.nu (Anders Waldenborg) Date: Mon, 19 Mar 2012 18:44:41 +0100 Subject: [LLVMdev] Python bindings in tree In-Reply-To: <4F66BB7C.8060500@gmail.com> References: <4F62BE46.7070705@gmail.com> <2358486.PuEW3MFQ3C@deepthought> <87obrug6r2.wl%anders@0x63.nu> <4F66BB7C.8060500@gmail.com> Message-ID: <20120319174441.GJ7216@gagarin.0x63.nu> On Sun, Mar 18, 2012 at 09:52:12PM -0700, Gregory Szorc wrote: > The automatic generation of the Python ctypes interfaces using the Clang > Python bindings is pretty friggin cool! A nice side effect is that everything is added to the interface. So it is easy to add a small proxy over the lib that shows which parts of the llvm-c API that is exercised by the tests. (have that in my bindings) > > * 0004-Add-LLVMPrintModule-to-llvm-c.patch > > Adds a new LLVMPrintModule function which is similar to > > LLVMDumpModule but dumps to a string instead of stdout. > > > > * 0005-Add-LLVMCreateMemoryBufferFromData-to-llvm-c.patch > > Adds LLVMCreateMemoryBufferFromData function. > > These are desperately needed by the C API. Can you please submit them? Will do! > FWIW, all my work is at > https://github.com/indygreg/llvm/tree/python_bindings/bindings/python. Excellent! I'll try to see if I can adapt my bindings to your to fill in the gaps. There do indeed seem to be much overlap in our bindings. But there are a few things where the design differs. If we should try to combine our work I guess it would be a good idea to discuss these differences, to make sure we work towards a common goal. I think the main differences between our bindings are: * Auto generated vs manual ctypes declarations. From your comment above I assume you would prefer auto generated too. * Types inheriting from c_void_p vs having a ptr attribute. My bindings has for example Module (indirectly) inheriting from c_void_p, that way there is no "from_param" methods needed, and no extra attribute of the actual pointer. I'm not sure this is better. I might have done with separate pointer as you have if I started from scratch today. * Use of constructor vs "new" static methods. When using the bindings one never initializes the class manually. Instead a "factory" method is used: mymod = Module.from_file(...) mymod = Module.from_data(...) mymod = Module.new("foo") ity = Type.int(32) instead of mymod = Module(file=...) mymod = Module(data=...) mymod = Module(name="foo") ity = IntType(32) I prefer this in, especially in the cases where there are many different ways to construct an item. Also many objects are not really created standalone. e.g a function is added: f = Module.add_function(FTy, "foo") and the Function constructor is never used. That way having the policy "never use constructor" to create objects makes it consistent. Also this makes it consistent with the old defuct llvm-py bindings. (partially this also is a consequence of the fact that my bindings inherits from c_void_p making it a bit messier) * Directory layout Just minor thing. My bindings have python/bindings/lib/llvm /tests /tools I do like having the tests outside the dir. > Parts of Core.h still need love (especially the Value system). I'm doing > some dynamic type creation at run-time using the Value hierarchy. > Somewhat scary stuff, but it does seem to work. I really need a > LLVMGetValueID() API to fetch llvm::Value::getValueID() to enable more > efficient value casting. I'm doing the very same thing in my bindings, and yes it is a bit inefficient, but seems to work fine and should work fine as long as classes are not moved in the hierarchy. I use the same hierarchy at python level. And at python level recursivly drills down into the correct subclass by doing LLVMIsA* for the possible (direct) subclasses. > From some discussion on #llvm, I think people > are receptive to this. The main concern would be that the C API would be > tied to a specific version of the shared library because the value ID > enumeration aren't guaranteed for all of time. But, that contract is > already broken, so I don't think it's a big deal: just something that > needs to be documented. Of course, Python is a dynamic language, so if > there were a C API that exposed the llvm::Value class hierarchy, we > could always have Python dynamically create types at run-time :) I guess we could have a separate valueid enum and a mapping between llvm-c<->c++ valueid. IIRC the clang python bindings does that for for something. That way there wont be any breakage if the c++ side is changed. anders From ahatanak at gmail.com Mon Mar 19 15:39:33 2012 From: ahatanak at gmail.com (Akira Hatanaka) Date: Mon, 19 Mar 2012 13:39:33 -0700 Subject: [LLVMdev] Sorting relocation entries Message-ID: What would be the best way to sort relocation entries before they are written out in ELFObjectWriter::WriteRelocationsFragment? According to the Mips ABI documents I have, there are certain restrictions on the order relocations appear in the table (e.g. R_MIPS_HI16 and R_MIPS_GOT16 must be followed immediately by a R_MIPS_LO16). When I enable post RA scheduling, some of the restrictions are violated in the generated object code, which results in incorrect relocation values generated by the linker. I am considering imitating what gas does in function mips_frob_file (line 15522 of tc-mips.c) to fix this problem: http://repo.or.cz/w/binutils.git/blob/master:/gas/config/tc-mips.c Are there any other targets that have similar restrictions or requirements? From preston.briggs at gmail.com Mon Mar 19 15:45:38 2012 From: preston.briggs at gmail.com (Preston Briggs) Date: Mon, 19 Mar 2012 13:45:38 -0700 Subject: [LLVMdev] SIV tests in LoopDependence Analysis, Sanjoy's patch In-Reply-To: References: Message-ID: Gents, I spent some time reading over Sanjoy's patch for LoopDependenceAnalysis. Unfortunately, an early version of these notes escaped; this is the complete review. First off, I agree with his choice to implement the SIV tests. For scientific Fortran, the SIV (and the simpler ZIV) tests cover about 85% of the cases in practice. For C and C++, I expect the percentage will be much higher. Someday we might like to see the general SIV test and handling of MIV subscripts, but we can get a long ways without them. It was my intention to implement exactly these tests, but Sanjoy is way ahead of me. My biggest problem is with the choice (not Sanjoy's, I think) of implementing *Loop*DependenceAnalysis as a *Loop*Pass. Dependence analysis needn't be restricted to loop nests. I'd like to see DependenceAnalysis for an entire function, so we can do things like loop fusion, scheduling, etc. In particular, we should be able to test for dependence between references in disjoint loops. Here's an (incomplete) description of what I'm thinking: https://sites.google.com/site/parallelizationforllvm/ In the large, I think we want to build a dependence graph for an entire function, with edges for dependences, annotated with direction/distance info. I imagine the code divided into 2 big chunks: 1. the dependence graph builder, that walks around the code looking for interesting pairs of references, calling the dependence tester, and assembling the results into a dependence graph 2. the dependence tester, that takes a pair of references and tests them for dependence, computing direction/distance vectors when possible (very close to what Sanjoy has built). In the meantime, I agree with Sanjoy's idea that a great next step would be to compute direction vectors (distance vectors when possible, strong SIV). I'd reorganize things a bit, maybe, so we return NULL for proven independence and a pointer for a dependence description otherwise, including a direction/distance vector with an entry for all common loops (or potentially common loops, in case of loop fusion). Entries in the vector should include <, =, > (and combos like <=) and distances, but also entries for Scalar and Unknown. Remember that a single pair can end up with several dependences. Remember loop-independent dependences. Might make provision finding input dependences. They're expensive (typically lots of them), but very useful to guide restructuring to improve use of cache and registers. More detailed comments follow below. Preston LoopDependenceAnalysis.h - comment about isAffine() seems wrong. Consider A[2*i + 3*j + 10], where i and j are both induction variables in the loop nest. Isn't that affine? - findOrInsertDependencePair() - insert into what? Could use some comments explaining whats going on here - cache should perhaps not be based on pairs of *instructions*, but on pairs of *subscripts, *since the anlysis for A[i] and A[i+1] is exactly the same as the analysis for B[i] and B[i + 1] LoopDependenceAnalysis.cpp - AnalyzePair - should check for mod-ref info with calls, so we can take advantage of any available interprocedural analysis. For example, if we have a load from A[i] and a call, you immediately give up and call it Unknown.. That's pessimistic; we should make sure that A is modified by the call. - when analyzing subscript pairs, if a result is Unknown, you give up. That's pessimistic; you should look at remaining subscript pairs too. If one proves Independent, then there's no dependence. - Of course, this is the place to accumulate and merge direction vectors. - isSIVPair - only works with single loop nest. We'd prefer that it also work with disjoint loops too, so we can do loop fusion. See Wolfe's "Optimizing Supercompilers for Supercomputers", page 18 and chapter 5. Instead of a set of Loop *, accumulate a set of loop levels (ints). - analyzeSIV - annoying to search for common loop again, since we had to find it to arrive here - want to be able to analyze references in disjoint loops as if already fused; will need to adapt SIV tests, since loop bounds aren't always available - the actual tests look ok Result types for analyzePair and analyzeSubscript (and analyzeSIV, at al.) should probably be different. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/962521c8/attachment.html From kcc at google.com Mon Mar 19 16:52:17 2012 From: kcc at google.com (Kostya Serebryany) Date: Mon, 19 Mar 2012 14:52:17 -0700 Subject: [LLVMdev] recognizing DTORs and vptr updates in LLVM. Message-ID: Hello, While instrumenting LLVM IR in ThreadSanitizer (race detector), I need to distinguish between a store to vtable pointer (vptr) and any other regular store. This special treatment should be limited to class DTORs, so I should also know when a function is a DTOR. Rationale: need to distinguish benign and harmful races on vptr ( http://code.google.com/p/data-race-test/wiki/PopularDataRaces#Data_race_on_vptr ). Currently, I can figure out when a function is a DTOR and when a store touches vptr by analyzing mangled names. _ZN1BD1Ev=="B::~B()" _ZTV1B=="vtable for B" define linkonce_odr void @_ZN1BD1Ev(%struct.B* %this) unnamed_addr nounwind uwtable align 2 { entry: .... store i32 (...)** bitcast (i8** getelementptr inbounds ([5 x i8*]* @_ZTV1B, i64 0, i64 2) to i32 (...)**), i32 (...)*** %0, align 8 However, this does not sound right. What would be the right way to pass this information from clang to LLVM? Will using metadata for this purpose be a right solution? (insn-level metadata for vptr store and module-level metadata for DTORs) Thanks, --kcc -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/efc7cc34/attachment.html From rkotler at mips.com Mon Mar 19 16:55:56 2012 From: rkotler at mips.com (reed kotler) Date: Mon, 19 Mar 2012 14:55:56 -0700 Subject: [LLVMdev] tablegen nomenclature Message-ID: <4F67AB6C.6060300@mips.com> What would you call elements of the form: xyz:$abc -- variables seems to be the name in the tablegen code ??? They are not mentioned in the tablegen users guide but of course used heavily. xyx must be a def and exist? Thanks. Reed From eli.friedman at gmail.com Mon Mar 19 17:15:05 2012 From: eli.friedman at gmail.com (Eli Friedman) Date: Mon, 19 Mar 2012 15:15:05 -0700 Subject: [LLVMdev] recognizing DTORs and vptr updates in LLVM. In-Reply-To: References: Message-ID: On Mon, Mar 19, 2012 at 2:52 PM, Kostya Serebryany wrote: > Hello, > > While?instrumenting?LLVM IR in ThreadSanitizer (race detector), I need > to?distinguish between a store to vtable pointer (vptr)?and any other > regular store. > This special treatment should be limited to class DTORs, so I should also > know when a function is a DTOR. > Rationale: need to distinguish benign and harmful races on vptr > (http://code.google.com/p/data-race-test/wiki/PopularDataRaces#Data_race_on_vptr). > > Currently, I can figure out when a function is a DTOR and when a store > touches vptr by analyzing mangled names. > _ZN1BD1Ev=="B::~B()" > _ZTV1B=="vtable for B" > > define linkonce_odr void @_ZN1BD1Ev(%struct.B* %this) unnamed_addr nounwind > uwtable align 2 { > entry: > ? .... > ? store i32 (...)** bitcast (i8** getelementptr inbounds ([5 x i8*]* > @_ZTV1B, i64 0, i64 2) to i32 (...)**), i32 (...)*** %0, align 8 > > However, this does not sound right. > What would be the right way to pass this information from clang to LLVM? > Will using metadata for this purpose be a right solution? > (insn-level metadata for vptr store and module-level?metadata for DTORs) It's worth pointing out the according to the abstract LLVM IR model, your "benign" races are in fact undefined behavior. The only reason it appears to work is that in practice non-atomic loads and stores usually result in the same generated code as "relaxed" atomic loads and stores. If we are in fact supposed to guarantee some sort of behavior here, we should generate an atomic store. If we aren't, I'm not sure why AddressSanitizer needs to distinguish between "usually appears to work" and "almost always appears to work". -Eli From kcc at google.com Mon Mar 19 17:38:16 2012 From: kcc at google.com (Kostya Serebryany) Date: Mon, 19 Mar 2012 15:38:16 -0700 Subject: [LLVMdev] recognizing DTORs and vptr updates in LLVM. In-Reply-To: References: Message-ID: On Mon, Mar 19, 2012 at 3:15 PM, Eli Friedman wrote: > On Mon, Mar 19, 2012 at 2:52 PM, Kostya Serebryany wrote: > > Hello, > > > > While instrumenting LLVM IR in ThreadSanitizer (race detector), I need > > to distinguish between a store to vtable pointer (vptr) and any other > > regular store. > > This special treatment should be limited to class DTORs, so I should also > > know when a function is a DTOR. > > Rationale: need to distinguish benign and harmful races on vptr > > ( > http://code.google.com/p/data-race-test/wiki/PopularDataRaces#Data_race_on_vptr > ). > > > > Currently, I can figure out when a function is a DTOR and when a store > > touches vptr by analyzing mangled names. > > _ZN1BD1Ev=="B::~B()" > > _ZTV1B=="vtable for B" > > > > define linkonce_odr void @_ZN1BD1Ev(%struct.B* %this) unnamed_addr > nounwind > > uwtable align 2 { > > entry: > > .... > > store i32 (...)** bitcast (i8** getelementptr inbounds ([5 x i8*]* > > @_ZTV1B, i64 0, i64 2) to i32 (...)**), i32 (...)*** %0, align 8 > > > > However, this does not sound right. > > What would be the right way to pass this information from clang to LLVM? > > Will using metadata for this purpose be a right solution? > > (insn-level metadata for vptr store and module-level metadata for DTORs) > > It's worth pointing out the according to the abstract LLVM IR model, > your "benign" races are in fact undefined behavior. Oh yes. According to C++11 too, I believe. But C++98 did not define threads, so we are in the grey area here. > The only reason > it appears to work is that in practice non-atomic loads and stores > usually result in the same generated code as "relaxed" atomic loads > and stores. If we are in fact supposed to guarantee some sort of > behavior here, we should generate an atomic store. If we aren't, I'm > not sure why AddressSanitizer s/AddressSanitizer/ThreadSanitizer/ > needs to distinguish between "usually > appears to work" and "almost always appears to work". > This is more like "almost always works in practice" and "certainly broken". We run ThreadSanitizer (the old valgrind-based one) on millions lines of legacy code (not always our own code) and we see a lot of those "benign" vptr races. Ignoring those (while still detecting really harmful ones) has been #1 feature request from our users until we implemented it in valgrind-based ThreadSanitizer. --kcc > > -Eli > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/881d89d2/attachment.html From clattner at apple.com Mon Mar 19 18:30:12 2012 From: clattner at apple.com (Chris Lattner) Date: Mon, 19 Mar 2012 16:30:12 -0700 Subject: [LLVMdev] recognizing DTORs and vptr updates in LLVM. In-Reply-To: References: Message-ID: <1F3170AA-F2C0-4DA5-8F79-FCE423EFFE27@apple.com> On Mar 19, 2012, at 2:52 PM, Kostya Serebryany wrote: > Hello, > > While instrumenting LLVM IR in ThreadSanitizer (race detector), I need to distinguish between a store to vtable pointer (vptr) and any other regular store. > This special treatment should be limited to class DTORs, so I should also know when a function is a DTOR. > Rationale: need to distinguish benign and harmful races on vptr (http://code.google.com/p/data-race-test/wiki/PopularDataRaces#Data_race_on_vptr). > > Currently, I can figure out when a function is a DTOR and when a store touches vptr by analyzing mangled names. > _ZN1BD1Ev=="B::~B()" > _ZTV1B=="vtable for B" > > define linkonce_odr void @_ZN1BD1Ev(%struct.B* %this) unnamed_addr nounwind uwtable align 2 { > entry: > .... > store i32 (...)** bitcast (i8** getelementptr inbounds ([5 x i8*]* @_ZTV1B, i64 0, i64 2) to i32 (...)**), i32 (...)*** %0, align 8 > > However, this does not sound right. > What would be the right way to pass this information from clang to LLVM? > Will using metadata for this purpose be a right solution? > (insn-level metadata for vptr store and module-level metadata for DTORs) Using instruction level metadata for this would be appropriate. However, I also don't understand why a race on this is truly benign. I'm also concerned that you're adding even more knobs to clang and IR for special case situations. How many more special cases like this are you going to require? -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/b6d399c0/attachment.html From eli.friedman at gmail.com Mon Mar 19 18:46:44 2012 From: eli.friedman at gmail.com (Eli Friedman) Date: Mon, 19 Mar 2012 16:46:44 -0700 Subject: [LLVMdev] recognizing DTORs and vptr updates in LLVM. In-Reply-To: <1F3170AA-F2C0-4DA5-8F79-FCE423EFFE27@apple.com> References: <1F3170AA-F2C0-4DA5-8F79-FCE423EFFE27@apple.com> Message-ID: On Mon, Mar 19, 2012 at 4:30 PM, Chris Lattner wrote: > > On Mar 19, 2012, at 2:52 PM, Kostya Serebryany wrote: > > Hello, > > While?instrumenting?LLVM IR in ThreadSanitizer (race detector), I need > to?distinguish between a store to vtable pointer (vptr)?and any other > regular store. > This special treatment should be limited to class DTORs, so I should also > know when a function is a DTOR. > Rationale: need to distinguish benign and harmful races on vptr > (http://code.google.com/p/data-race-test/wiki/PopularDataRaces#Data_race_on_vptr). > > Currently, I can figure out when a function is a DTOR and when a store > touches vptr by analyzing mangled names. > _ZN1BD1Ev=="B::~B()" > _ZTV1B=="vtable for B" > > define linkonce_odr void @_ZN1BD1Ev(%struct.B* %this) unnamed_addr nounwind > uwtable align 2 { > entry: > ? .... > ? store i32 (...)** bitcast (i8** getelementptr inbounds ([5 x i8*]* > @_ZTV1B, i64 0, i64 2) to i32 (...)**), i32 (...)*** %0, align 8 > > However, this does not sound right. > What would be the right way to pass this information from clang to LLVM? > Will using metadata for this purpose be a right solution? > (insn-level metadata for vptr store and module-level?metadata for DTORs) > > > Using instruction level metadata for this would be appropriate. ?However, I > also don't understand why a race on this is truly benign. It isn't, really; calling it "benign" is deceptive. It's just that storing a pointer which is equal to the existing pointer stored at a given address almost always makes the optimizer/codegen generate code which can't trigger the race in a way which visibly misbehaves. Therefore, as a heuristic users apparently want ThreadSanitizer to ignore (or list separately) such races. Given that, I'm not sure I really see the issue with just special-casing any store where the value stored is a pointer to a global... but it could be argued either way, I guess. -Eli From chandlerc at google.com Mon Mar 19 18:52:19 2012 From: chandlerc at google.com (Chandler Carruth) Date: Mon, 19 Mar 2012 16:52:19 -0700 Subject: [LLVMdev] recognizing DTORs and vptr updates in LLVM. In-Reply-To: References: <1F3170AA-F2C0-4DA5-8F79-FCE423EFFE27@apple.com> Message-ID: On Mon, Mar 19, 2012 at 4:46 PM, Eli Friedman wrote: > Given that, I'm not sure I really see the issue with just > special-casing any store where the value stored is a pointer to a > global... but it could be argued either way, I guess. > I users expect this to "just work", why not extend the language and make it just work? We could, as an implementation, decide to emit these as relaxed atomic stores, making the code well defined without changing the semantics (or optimization) in any meaningful way, right? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/b0bc391b/attachment.html From kcc at google.com Mon Mar 19 19:01:46 2012 From: kcc at google.com (Kostya Serebryany) Date: Mon, 19 Mar 2012 17:01:46 -0700 Subject: [LLVMdev] recognizing DTORs and vptr updates in LLVM. In-Reply-To: References: <1F3170AA-F2C0-4DA5-8F79-FCE423EFFE27@apple.com> Message-ID: On Mon, Mar 19, 2012 at 4:46 PM, Eli Friedman wrote: > On Mon, Mar 19, 2012 at 4:30 PM, Chris Lattner wrote: > > > > On Mar 19, 2012, at 2:52 PM, Kostya Serebryany wrote: > > > > Hello, > > > > While instrumenting LLVM IR in ThreadSanitizer (race detector), I need > > to distinguish between a store to vtable pointer (vptr) and any other > > regular store. > > This special treatment should be limited to class DTORs, so I should also > > know when a function is a DTOR. > > Rationale: need to distinguish benign and harmful races on vptr > > ( > http://code.google.com/p/data-race-test/wiki/PopularDataRaces#Data_race_on_vptr > ). > > > > Currently, I can figure out when a function is a DTOR and when a store > > touches vptr by analyzing mangled names. > > _ZN1BD1Ev=="B::~B()" > > _ZTV1B=="vtable for B" > > > > define linkonce_odr void @_ZN1BD1Ev(%struct.B* %this) unnamed_addr > nounwind > > uwtable align 2 { > > entry: > > .... > > store i32 (...)** bitcast (i8** getelementptr inbounds ([5 x i8*]* > > @_ZTV1B, i64 0, i64 2) to i32 (...)**), i32 (...)*** %0, align 8 > > > > However, this does not sound right. > > What would be the right way to pass this information from clang to LLVM? > > Will using metadata for this purpose be a right solution? > > (insn-level metadata for vptr store and module-level metadata for DTORs) > > > > > > Using instruction level metadata for this would be appropriate. > However, I > > also don't understand why a race on this is truly benign. > > It isn't, really; calling it "benign" is deceptive. Well, yes. Generally, I agree with you here. But then there are tsan users who have all that legacy code and want to find races that will harm them for sure and don't want to see "noise". These vptr races are hard to suppress w/o risking to hide some other races. > It's just that > storing a pointer which is equal to the existing pointer stored at a > given address almost always makes the optimizer/codegen generate code > which can't trigger the race in a way which visibly misbehaves. > Therefore, as a heuristic users apparently want ThreadSanitizer to > ignore (or list separately) such races. > Yep. > > Given that, I'm not sure I really see the issue with just > special-casing any store where the value stored is a pointer to a > global... but it could be argued either way, I guess. > That will hide too many real races, I afraid. Including those "harmful" vptr races. > I'm also concerned that you're adding even more knobs to clang and IR for > special case situations. How many more special cases like this are you > going to require? I don't remember more special cases off the top of my head. valgrind-based variant has this special case and nothing else, I believe. We've run our race detector unit tests ( http://code.google.com/p/data-race-test/source/browse/trunk/unittest/racecheck_unittest.cc ) under the current LLVM-TSAN and this is the only thing we found so far. But we did not run anything heavy under LLVM-TSAN yet, so something else may be hiding from us. --kcc -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/b53259f5/attachment.html From kcc at google.com Mon Mar 19 19:13:02 2012 From: kcc at google.com (Kostya Serebryany) Date: Mon, 19 Mar 2012 17:13:02 -0700 Subject: [LLVMdev] recognizing DTORs and vptr updates in LLVM. In-Reply-To: <1F3170AA-F2C0-4DA5-8F79-FCE423EFFE27@apple.com> References: <1F3170AA-F2C0-4DA5-8F79-FCE423EFFE27@apple.com> Message-ID: On Mon, Mar 19, 2012 at 4:30 PM, Chris Lattner wrote: > > On Mar 19, 2012, at 2:52 PM, Kostya Serebryany wrote: > > Hello, > > While instrumenting LLVM IR in ThreadSanitizer (race detector), I need > to distinguish between a store to vtable pointer (vptr) and any other > regular store. > This special treatment should be limited to class DTORs, so I should also > know when a function is a DTOR. > Rationale: need to distinguish benign and harmful races on vptr ( > http://code.google.com/p/data-race-test/wiki/PopularDataRaces#Data_race_on_vptr > ). > > Currently, I can figure out when a function is a DTOR and when a store > touches vptr by analyzing mangled names. > _ZN1BD1Ev=="B::~B()" > _ZTV1B=="vtable for B" > > define linkonce_odr void @_ZN1BD1Ev(%struct.B* %this) unnamed_addr > nounwind uwtable align 2 { > entry: > .... > store i32 (...)** bitcast (i8** getelementptr inbounds ([5 x i8*]* > @_ZTV1B, i64 0, i64 2) to i32 (...)**), i32 (...)*** %0, align 8 > > However, this does not sound right. > What would be the right way to pass this information from clang to LLVM? > Will using metadata for this purpose be a right solution? > (insn-level metadata for vptr store and module-level metadata for DTORs) > > > Using instruction level metadata for this would be appropriate. However, > I also don't understand why a race on this is truly benign. I'm also > concerned that you're adding even more knobs to clang and IR for special > case situations. > As for "more knobs", Chandler mentioned to me recently that being able to identify vptr accesses will help virtual function call inlining, so we may still need this knob for other purposes. --kcc > How many more special cases like this are you going to require? > > -Chris > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/e3049778/attachment.html From eli.friedman at gmail.com Mon Mar 19 19:24:27 2012 From: eli.friedman at gmail.com (Eli Friedman) Date: Mon, 19 Mar 2012 17:24:27 -0700 Subject: [LLVMdev] recognizing DTORs and vptr updates in LLVM. In-Reply-To: References: <1F3170AA-F2C0-4DA5-8F79-FCE423EFFE27@apple.com> Message-ID: On Mon, Mar 19, 2012 at 4:52 PM, Chandler Carruth wrote: > On Mon, Mar 19, 2012 at 4:46 PM, Eli Friedman > wrote: >> >> Given that, I'm not sure I really see the issue with just >> special-casing any store where the value stored is a pointer to a >> global... but it could be argued either way, I guess. > > > I users expect this to "just work", why not extend the language and make it > just work? I'm not sure anyone really expects this to "just work", just that they did it by accident. Making cross-thread unsynchronized virtual calls on an object which is being destroyed strikes me as a construct nobody would intentionally write. > We could, as an implementation, decide to emit these as relaxed atomic > stores, making the code well defined without changing the semantics (or > optimization) in any meaningful way, right? Making all vptr loads and stores atomic would block some optimizations (specifically, we can't perform certain optimizations involving memcpy, and IIRC some optimizers have incomplete atomics handling). Not sure if it would have much practical impact, though. Specifically just making vptr stores in destructors "unordered", and making unordered stores which don't change the stored value effectively no-ops in the memory model, could work too; the potential impact on optimization is much less, and I don't think the model changes would lead to any optimizer changes. -Eli From kcc at google.com Mon Mar 19 19:36:12 2012 From: kcc at google.com (Kostya Serebryany) Date: Mon, 19 Mar 2012 17:36:12 -0700 Subject: [LLVMdev] recognizing DTORs and vptr updates in LLVM. In-Reply-To: References: <1F3170AA-F2C0-4DA5-8F79-FCE423EFFE27@apple.com> Message-ID: On Mon, Mar 19, 2012 at 5:24 PM, Eli Friedman wrote: > On Mon, Mar 19, 2012 at 4:52 PM, Chandler Carruth > wrote: > > On Mon, Mar 19, 2012 at 4:46 PM, Eli Friedman > > wrote: > >> > >> Given that, I'm not sure I really see the issue with just > >> special-casing any store where the value stored is a pointer to a > >> global... but it could be argued either way, I guess. > > > > > > I users expect this to "just work", why not extend the language and make > it > > just work? > > I'm not sure anyone really expects this to "just work", just that they > did it by accident. Making cross-thread unsynchronized virtual calls > on an object which is being destroyed strikes me as a construct nobody > would intentionally write. > > > We could, as an implementation, decide to emit these as relaxed atomic > > stores, making the code well defined without changing the semantics (or > > optimization) in any meaningful way, right? > > Making all vptr loads and stores atomic would block some optimizations > .. and will not solve my problem -- I still need to distinguish between "benign-for-practical-purposes" and "definitely-harmful" vptr races. The only difference between those two cases lies outside of the instrumented function. (it depends on the dynamic type of the object being destroyed). --kcc > (specifically, we can't perform certain optimizations involving > memcpy, and IIRC some optimizers have incomplete atomics handling). > Not sure if it would have much practical impact, though. > > Specifically just making vptr stores in destructors "unordered", and > making unordered stores which don't change the stored value > effectively no-ops in the memory model, could work too; the potential > impact on optimization is much less, and I don't think the model > changes would lead to any optimizer changes. > > -Eli > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/3ecb1b45/attachment.html From chandlerc at google.com Mon Mar 19 19:38:51 2012 From: chandlerc at google.com (Chandler Carruth) Date: Mon, 19 Mar 2012 17:38:51 -0700 Subject: [LLVMdev] recognizing DTORs and vptr updates in LLVM. In-Reply-To: References: <1F3170AA-F2C0-4DA5-8F79-FCE423EFFE27@apple.com> Message-ID: On Mon, Mar 19, 2012 at 5:36 PM, Kostya Serebryany wrote: > .. and will not solve my problem -- I still need to distinguish between > "benign-for-practical-purposes" and "definitely-harmful" vptr races. > The only difference between those two cases lies outside of the > instrumented function. > (it depends on the dynamic type of the object being destroyed). I see. So essentially, this is purely a QoI issue, and it just happens to be so common that we can't get everyone to fix their code. Instruction-level metadata sounds better and better. =[ It's not great, but at least you have evidence that this won't be an unending series of QoI issues. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/b6d42586/attachment.html From akonchady at gmail.com Mon Mar 19 22:17:08 2012 From: akonchady at gmail.com (Adarsh Konchady) Date: Tue, 20 Mar 2012 08:47:08 +0530 Subject: [LLVMdev] Array Dependence Analysis Message-ID: Sir, I was going through the following link about Array dependence analysis in Old Nabble where they mentioned about some work going on array dependence analysis. It was posted in 2008. Array-Dependency-Analysis I need to know whether array dependence analysis has been implemented in LLVM. Regards, Adarsh Konchady -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120320/4872b467/attachment-0001.html From clattner at apple.com Mon Mar 19 22:54:22 2012 From: clattner at apple.com (Chris Lattner) Date: Mon, 19 Mar 2012 20:54:22 -0700 Subject: [LLVMdev] recognizing DTORs and vptr updates in LLVM. In-Reply-To: References: <1F3170AA-F2C0-4DA5-8F79-FCE423EFFE27@apple.com> Message-ID: <92F7D67A-7313-428B-A476-8203191811D2@apple.com> On Mar 19, 2012, at 5:13 PM, Kostya Serebryany wrote: >> However, this does not sound right. >> What would be the right way to pass this information from clang to LLVM? >> Will using metadata for this purpose be a right solution? >> (insn-level metadata for vptr store and module-level metadata for DTORs) > > Using instruction level metadata for this would be appropriate. However, I also don't understand why a race on this is truly benign. I'm also concerned that you're adding even more knobs to clang and IR for special case situations. > > As for "more knobs", Chandler mentioned to me recently that being able to identify vptr accesses will help > virtual function call inlining, so we may still need this knob for other purposes. Right, but this would have to be designed properly. If someone wants to put forward a concrete use case and a design that would enable virtual call inlining that is also useful for tsan, that would certainly be interesting. -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120319/35d04be0/attachment.html From etherzhhb at gmail.com Mon Mar 19 23:10:45 2012 From: etherzhhb at gmail.com (Hongbin Zheng) Date: Tue, 20 Mar 2012 12:10:45 +0800 Subject: [LLVMdev] Array Dependence Analysis In-Reply-To: References: Message-ID: hi, you may have a look at the dependencies analysis pass in Polly. best regards ether On Tue, Mar 20, 2012 at 11:17 AM, Adarsh Konchady wrote: > Sir, > I was going through the following link about Array dependence analysis in > Old Nabble where they mentioned about some work going on array dependence > analysis. It was posted in 2008. > Array-Dependency-Analysis > I need to know whether array dependence analysis has been implemented in > LLVM. > > Regards, > Adarsh Konchady > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From baldrick at free.fr Tue Mar 20 02:51:25 2012 From: baldrick at free.fr (Duncan Sands) Date: Tue, 20 Mar 2012 08:51:25 +0100 Subject: [LLVMdev] recognizing DTORs and vptr updates in LLVM. In-Reply-To: References: <1F3170AA-F2C0-4DA5-8F79-FCE423EFFE27@apple.com> Message-ID: <4F6836FD.5020509@free.fr> >> Using instruction level metadata for this would be appropriate. However, I >> also don't understand why a race on this is truly benign. > > It isn't, really; calling it "benign" is deceptive. It's just that > storing a pointer which is equal to the existing pointer stored at a > given address almost always makes the optimizer/codegen generate code > which can't trigger the race in a way which visibly misbehaves. > Therefore, as a heuristic users apparently want ThreadSanitizer to > ignore (or list separately) such races. The gcc Ada front-end does this too, in quite a range of situations. For example multiple threads racily initialize a pointer variable, but they all initialize to the same value. The various valgrind based race detection tools all complain about this, which makes them much less useful than they might be for Ada. Ciao, Duncan. From shanmuk.rao008 at gmail.com Tue Mar 20 04:25:35 2012 From: shanmuk.rao008 at gmail.com (shanmuk rao) Date: Tue, 20 Mar 2012 14:55:35 +0530 Subject: [LLVMdev] Problem with LoopDependenceAnalysis In-Reply-To: References: Message-ID: thank you all for ur replies.. I looked at the sanjoys patch for SIV Test. And i figured out that this is exactly what i need. as the comments said, check if subscript A can possibly have the same value as B in *analyseSIV(A,B)* but i didn't get How to use this information ? lets just say in the above program When i use *depends* function it shows the dependency from load of x to store in x. similarily for array a. but how can i be sure that there is no dependence from store of x to load of x in the next iteration ? On Thu, Mar 15, 2012 at 11:28 AM, shanmuk rao wrote: > Hi, > I am using LLVM for implementing LoopFission pass. > I am using LoopPass. > I know that for checking circular dependency in loop I have to use > LoopDependenceAnalysis > > This is what i want to do. > for(int i = 0; i< n ; i++){ > > s1 : a[i] = a[i] + x[i]; > s2 : x[i] = x[i+1] + i*2 ; > } > > /**there is no dependence from s2 to s1/ > so after distribution(it should be) : > > for(int i = 0; i< n ; i++) > s1: a[i] = a[i] + x[i]; > > for(int i = 0; i< n ; i++) > s2: x[i] = x[i+1] + i*2 ; > > > but in llvm i couldn't able to find there is no dependency from s2 to s1. > > LoopDependenceAnalyis always gives there is a dependency from every load instructions to every store instructions. > > > is there any other alternative to LoopDependencyAnalysis ? > thank you > > ...... > Regards, > Shanmukha Rao > Compilers lab, > Indian Institute of Science, Bangalore. > -- ...... Regards, Shanmukha Rao Compilers lab, Indian Institute of Science, Bangalore. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120320/c66d5c41/attachment.html From ryta1203 at gmail.com Thu Mar 22 15:49:08 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Thu, 22 Mar 2012 13:49:08 -0700 Subject: [LLVMdev] StructLayout getSizeInBits() Message-ID: LLVMers, I have a const StructType *StTy where I call TargetData->getStructLayout(const_cast(StTy))->getSizeInBits() but it continues to return 64 regardless of the actual size of the struct? I want the size of the structure in bits, including alignment padding for offset calculations, is this not the right function call? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120322/50e06c07/attachment.html From ryta1203 at gmail.com Thu Mar 22 16:04:12 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Thu, 22 Mar 2012 14:04:12 -0700 Subject: [LLVMdev] StructLayout getSizeInBits() In-Reply-To: References: Message-ID: Ah, my mistake, I forgot to realize the min struct size in the target datalayout, thanks. On Thu, Mar 22, 2012 at 1:49 PM, Ryan Taylor wrote: > LLVMers, > > I have a const StructType *StTy where I call > TargetData->getStructLayout(const_cast(StTy))->getSizeInBits() > but it continues to return 64 regardless of the actual size of the struct? > I want the size of the structure in bits, including alignment padding for > offset calculations, is this not the right function call? > > Thanks. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120322/455959ba/attachment.html From ryta1203 at gmail.com Wed Mar 21 19:57:07 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Wed, 21 Mar 2012 17:57:07 -0700 Subject: [LLVMdev] Target Data Message-ID: Is it possible to change the widths of types independent of the architecture? Or to reset the widths of types? I haven't seen anything like this. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120321/780487da/attachment.html From kcc at google.com Wed Mar 21 19:06:25 2012 From: kcc at google.com (Kostya Serebryany) Date: Wed, 21 Mar 2012 17:06:25 -0700 Subject: [LLVMdev] recognizing DTORs and vptr updates in LLVM. In-Reply-To: References: <1F3170AA-F2C0-4DA5-8F79-FCE423EFFE27@apple.com> <4F6836FD.5020509@free.fr> Message-ID: Chris, is this how the tbaa for vtable loads/stores should look like? Metadata: !0 = metadata !{metadata !"vtable pointer", metadata !1} !1 = metadata !{metadata !"omnipotent char", metadata !2} !2 = metadata !{metadata !"Simple C/C++ TBAA", null} ... Load: %0 = bitcast %struct.A* %a to void (%struct.A*)*** %vtable = load void (%struct.A*)*** %0, align 8, !tbaa !0 Store: %0 = getelementptr inbounds %struct.B* %this, i64 0, i32 0, i32 0 store i32 (...)** bitcast (i8** getelementptr inbounds ([5 x i8*]* @_ZTV1B, i64 0, i64 2) to i32 (...)**), i32 (...)*** %0, align 8, !tbaa !0 --kcc On Wed, Mar 21, 2012 at 12:57 PM, Chris Lattner wrote: > > On Mar 21, 2012, at 11:53 AM, Kostya Serebryany wrote: > > > The gcc Ada front-end does this too, in quite a range of situations. For >> > example multiple threads racily initialize a pointer variable, but they >> all >> > initialize to the same value. The various valgrind based race detection >> > tools all complain about this, which makes them much less useful than >> they >> > might be for Ada. >> >> FWIW, after thinking about this for awhile, I realize that we already >> have the tools to handle this: TBAA. >> >> It would be general goodness for clang to emit VTable loads and stores in >> their with their own TBAA type class (one that does not even alias "char*"). > > > Indeed, sounds very nice. > I'll try to make a patch that adds TBAA metadata to VTable loads (unless > someone else knows how to do it off the top of his head). > > > Sounds great, thanks Kostya, > > -Chris > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120321/7697c5f7/attachment.html From grosbach at apple.com Wed Mar 21 17:50:21 2012 From: grosbach at apple.com (Jim Grosbach) Date: Wed, 21 Mar 2012 15:50:21 -0700 Subject: [LLVMdev] Sorting relocation entries In-Reply-To: References: Message-ID: <177AEDD6-ED13-479C-A8F6-9BAE636CC638@apple.com> Hi Akira, If I follow correctly, the relocation entries can thus be in a different order than the instructions that they're for? That seems a bit odd, but I suppose there's nothing inherently wrong with that. It's just not something, AFAIK, that llvm has had to deal with before. This should definitely be a target-specific thing, not a general ELFObjectWriter thing, as other targets may have entirely different needs. Offhand, it seems reasonable to have a post-processing pass over the relocation list before it's written out to the file. The target can manipulate the list in whatever manner it needs to. A hook on MCELFObjectTargetWriter should do the trick. -Jim On Mar 19, 2012, at 1:39 PM, Akira Hatanaka wrote: > What would be the best way to sort relocation entries before they are > written out in ELFObjectWriter::WriteRelocationsFragment? > > According to the Mips ABI documents I have, there are certain > restrictions on the order relocations appear in the table (e.g. > R_MIPS_HI16 and R_MIPS_GOT16 must be followed immediately by a > R_MIPS_LO16). When I enable post RA scheduling, some of the > restrictions are violated in the generated object code, which results > in incorrect relocation values generated by the linker. > > I am considering imitating what gas does in function mips_frob_file > (line 15522 of tc-mips.c) to fix this problem: > > http://repo.or.cz/w/binutils.git/blob/master:/gas/config/tc-mips.c > > Are there any other targets that have similar restrictions or requirements? > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From rkotler at mips.com Wed Mar 21 16:58:40 2012 From: rkotler at mips.com (reed kotler) Date: Wed, 21 Mar 2012 14:58:40 -0700 Subject: [LLVMdev] apparent mistake in several ports register td file ??? Message-ID: <4F6A4F10.5020004@mips.com> The field Num seems to have no meaning. It is not recognized by the backend tools. It does not hurt anything but should not be there. // We have banks of 32 registers each. class MipsReg : Register { field bits<5> Num; let Namespace = "Mips"; } class ARMReg num, string n, list subregs = []> : Register { field bits<4> Num; let Namespace = "ARM"; let SubRegs = subregs; // All bits of ARM registers with sub-registers are covered by sub-registers. let CoveredBySubRegs = 1; } class ARMFReg num, string n> : Register { field bits<6> Num; let Namespace = "ARM"; } class SparcReg : Register { field bits<5> Num; let Namespace = "SP"; } Then subsequently, further derived types copy the mistake. // Registers are identified with 5-bit ID numbers. // Ri - 32-bit integer registers class Ri num, string n> : SparcReg { let Num = num; } // Rf - 32-bit floating-point registers class Rf num, string n> : SparcReg { let Num = num; } // Rd - Slots in the FP register file for 64-bit floating-point values. class Rd num, string n, list subregs> : SparcReg { let Num = num; let SubRegs = subregs; let SubRegIndices = [sub_even, sub_odd]; let CoveredBySubRegs = 1; } ...... // Mips CPU Registers class MipsGPRReg num, string n> : MipsReg { let Num = num; } From paul at lucasmail.org Wed Mar 21 14:25:39 2012 From: paul at lucasmail.org (Paul J. Lucas) Date: Wed, 21 Mar 2012 12:25:39 -0700 Subject: [LLVMdev] Mailing list archives broken? Message-ID: <04CB4EA9-A6BF-4D43-B865-A4C7C20AD716@lucasmail.org> This URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/ returns "404 Not Found." - Paul From clattner at apple.com Wed Mar 21 14:54:50 2012 From: clattner at apple.com (Chris Lattner) Date: Wed, 21 Mar 2012 12:54:50 -0700 Subject: [LLVMdev] Mailing list archives Message-ID: FYI, the mailing list archives (e.g. http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev) are currently down. The disk that held them had a failure, and the machine is being worked on. -Chris From keithshep at gmail.com Wed Mar 21 15:01:34 2012 From: keithshep at gmail.com (Keith Sheppard) Date: Wed, 21 Mar 2012 16:01:34 -0400 Subject: [LLVMdev] [3.1 Release] Call For Testers! In-Reply-To: References: Message-ID: > As always, we support Intel on Darwin, Debian and Ubuntu Linux, Windows, and > FreeBSD. We haven't released binaries for ARM, but that may change this time > around (no guarantees though). If you have a platform for which you would > like to see binaries released, let me know! My 2 cents: I think it would be great if dynamic libs were a part of the binary releases. That would simplify the process of using LLVM with bindings for python, .net, java ... Best, Keith From clattner at apple.com Wed Mar 21 14:57:13 2012 From: clattner at apple.com (Chris Lattner) Date: Wed, 21 Mar 2012 12:57:13 -0700 Subject: [LLVMdev] recognizing DTORs and vptr updates in LLVM. In-Reply-To: References: <1F3170AA-F2C0-4DA5-8F79-FCE423EFFE27@apple.com> <4F6836FD.5020509@free.fr> Message-ID: On Mar 21, 2012, at 11:53 AM, Kostya Serebryany wrote: > > The gcc Ada front-end does this too, in quite a range of situations. For > > example multiple threads racily initialize a pointer variable, but they all > > initialize to the same value. The various valgrind based race detection > > tools all complain about this, which makes them much less useful than they > > might be for Ada. > > FWIW, after thinking about this for awhile, I realize that we already have the tools to handle this: TBAA. > > It would be general goodness for clang to emit VTable loads and stores in their with their own TBAA type class (one that does not even alias "char*"). > > Indeed, sounds very nice. > I'll try to make a patch that adds TBAA metadata to VTable loads (unless someone else knows how to do it off the top of his head). > Sounds great, thanks Kostya, -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120321/8f2b4eda/attachment.html From kcconley at gmail.com Thu Mar 22 16:11:10 2012 From: kcconley at gmail.com (Kal Conley) Date: Thu, 22 Mar 2012 22:11:10 +0100 Subject: [LLVMdev] Infinite recursion in sys::fs::create_directories() Message-ID: <4F6B956E.302@gmail.com> Hi, sys::fs::create_directories() recurses infinitely for relative paths with only one directory or where the first directory in path doesn't exist. This was observed in r153176. Example: #include using namespace llvm; int main(int argc, char *argv[]) { bool existed; error_code ec = sys::fs::create_directories(Twine("log"), existed); return 0; } recurses infinitely in sys::fs::create_directories(). This happens because the parent of "log" is "" which doesn't exist so the function recurses and looks for the parent or "" which is "" which doesn't exist etc. The function should perhaps check if parent is empty. Here is how I fixed it: //------------------ error_code create_directories(const Twine &path, bool &existed) { SmallString<128> path_storage; StringRef p = path.toStringRef(path_storage); StringRef parent = path::parent_path(p); if (!parent.empty()) { bool parent_exists; if (error_code ec = exists(parent, parent_exists)) return ec; if (!parent_exists) if (error_code ec = create_directories(parent, existed)) return ec; } return create_directory(p, existed); } //------------------ Thanks, Kal From paul at lucasmail.org Thu Mar 22 13:40:51 2012 From: paul at lucasmail.org (Paul J. Lucas) Date: Thu, 22 Mar 2012 11:40:51 -0700 Subject: [LLVMdev] Catching C++ exceptions, cleaning up, rethrowing In-Reply-To: <46631B9D-2E98-4DB4-80F0-1F484D6A9060@apple.com> References: <978903B6-BDA6-47CD-9E7B-B8214DEDE339@lucasmail.org> <46631B9D-2E98-4DB4-80F0-1F484D6A9060@apple.com> Message-ID: On Mar 22, 2012, at 12:28 AM, Bill Wendling wrote: > On Mar 20, 2012, at 7:38 PM, Paul J. Lucas wrote: > >> I've read the docs on LLVM exceptions, but I don't see any examples. A little help? > > I don't think this has anything to do with LLVM's IR-level exception system. It sounds to me like you just need a way to handle C++ exceptions inside of the C++ code and then rethrow so that the JIT's caller can do its thing. (Right?) Right. The call sequence is: my_lib(1) -> JIT_code -> C_thunk -> my_lib(2) The JIT code creates Functions that create C++ objects on their stacks (by using alloca instructions then calling a C thunk that calls the C++ object's constructor via placement new). If an exception is thrown in my_lib(2), then somewhere between there and when the stack unwinds to my_lib(1), the C++ objects that were created on the stack must have their destructors called (also via C thunks). Hence, some code somewhere between my_lib(1) and C_thunk has to catch all exceptions, call the destructors, and rethrow the exceptions. > You could move the C++ code into a C++ function that catches all exceptions. The C functions you provide would call the small bit of C++ code that would then execute the "real" functionality. You would have to wrap/unwrap the variables, of course. (There are examples of wrapping/unwrapping of variables in LLVM's source tree.) That way you will get to use C++'s exception handling system instead of creating your own, which is a huge massive undertaking full of pitfalls. When you rethrow the exception, it will propagate past the C function to the code calling the JIT'ed code. Unfortunately, I'm not following. How is having the code that catches all exceptions in a separate function different from what I proposed (putting the try/catch in the thunks)? (Ideally, I want to minimize layers of function calls.) Again for reference: extern "C" bool thunk_iterator_M_next( void *v_that, void *v_result, dtor_pairs *dtors ) { try { item_iterator *const that = static_cast( v_that ); item *const result = static_cast( v_result ); return that->next( result ); } catch ( ... ) { run_dtors( dtors ); throw; } } - Paul From bigcheesegs at gmail.com Thu Mar 22 16:20:52 2012 From: bigcheesegs at gmail.com (Michael Spencer) Date: Thu, 22 Mar 2012 14:20:52 -0700 Subject: [LLVMdev] Infinite recursion in sys::fs::create_directories() In-Reply-To: <4F6B956E.302@gmail.com> References: <4F6B956E.302@gmail.com> Message-ID: On Thu, Mar 22, 2012 at 2:11 PM, Kal Conley wrote: > Hi, > > sys::fs::create_directories() recurses infinitely for relative paths > with only one directory or where the first directory in path doesn't > exist. This was observed in r153176. > > Example: > > #include > > using namespace llvm; > > int main(int argc, char *argv[]) > { > ? ?bool existed; > ? ?error_code ec = sys::fs::create_directories(Twine("log"), existed); > ? ?return 0; > } > > recurses infinitely in sys::fs::create_directories(). > > This happens because the parent of "log" is "" which doesn't exist so > the function recurses and looks for the parent or "" which is "" which > doesn't exist etc. The function should perhaps check if parent is empty. > > Here is how I fixed it: > > //------------------ > > error_code create_directories(const Twine &path, bool &existed) { > ?SmallString<128> path_storage; > ?StringRef p = path.toStringRef(path_storage); > > ?StringRef parent = path::parent_path(p); > ?if (!parent.empty()) { > ? ?bool parent_exists; > ? ?if (error_code ec = exists(parent, parent_exists)) return ec; > > ? ?if (!parent_exists) > ? ? ?if (error_code ec = create_directories(parent, existed)) return ec; > ?} > ?return create_directory(p, existed); > } > > //------------------ > > Thanks, > Kal Thanks, fixed in: r153225 - Michael Spencer From ahatanak at gmail.com Thu Mar 22 13:11:57 2012 From: ahatanak at gmail.com (Akira Hatanaka) Date: Thu, 22 Mar 2012 11:11:57 -0700 Subject: [LLVMdev] Sorting relocation entries In-Reply-To: <177AEDD6-ED13-479C-A8F6-9BAE636CC638@apple.com> References: <177AEDD6-ED13-479C-A8F6-9BAE636CC638@apple.com> Message-ID: Hi Jim, Yes, the relocation entries have to be reordered so that the got16/lo16 or hi16/lo16 pairs appear consecutively in the relocation table. As a result, relocations can appear in a different order than the instructions that they're for. For example, in this code, the post-RA scheduler inserts an instruction with relocation %got(body_ok) between %got(scope_top) and %lo(scope_top). $ cat z29.s lw $3, %got(scope_top)($gp) lw $2, %got(body_ok)($gp) lw $3, %lo(scope_top)($3) addiu $2, $2, %lo(body_ok) This is the assembled program generated by gas: $ mips-linux-gnu-objdump -dr z29.gas.o 748: 8f830000 lw v1,0(gp) 748: R_MIPS_GOT16 .bss 74c: 8f820000 lw v0,0(gp) 74c: R_MIPS_GOT16 .bss 750: 8c630000 lw v1,0(v1) 750: R_MIPS_LO16 .bss 754: 244245d4 addiu v0,v0,17876 754: R_MIPS_LO16 .bss gas reorders these relocations with the function in the following link: http://repo.or.cz/w/binutils.git/blob/master:/gas/config/tc-mips.c#l15222 $ mips--linux-gnu-readelf -r z29.gas.o Relocation section '.rel.text' at offset 0x4584 contains 705 entries: Offset Info Type Sym.Value Sym. Name ... 00000748 00000409 R_MIPS_GOT16 00000000 .bss // %got(scope_top) 00000750 00000406 R_MIPS_LO16 00000000 .bss // %lo(scope_top) 0000074c 00000409 R_MIPS_GOT16 00000000 .bss // %got(body_ok) 00000754 00000406 R_MIPS_LO16 00000000 .bss // %lo(body_ok) The attached patch makes the following changes to make direct object emitter write out relocations in the correct order: 1. Add a target hook MCELFObjectTargetWriter::ReorderRelocs. The default behavior sorts the relocations by the r_offset. 2. Move struct ELFRelocationEntry from ELFObjectWriter to MCELFObjectTargetWriter and add member fixup to it. The overridden version of ReorderRelocs (MipsELFObjectWriter::ReorderRelocs) needs access to ELFRelocationEntry::Type and MCFixup::Value to reorder the relocations. Do you think these changes are acceptable? On Wed, Mar 21, 2012 at 3:50 PM, Jim Grosbach wrote: > Hi Akira, > > If I follow correctly, the relocation entries can thus be in a different order than the instructions that they're for? That seems a bit odd, but I suppose there's nothing inherently wrong with that. It's just not something, AFAIK, that llvm has had to deal with before. This should definitely be a target-specific thing, not a general ELFObjectWriter thing, as other targets may have entirely different needs. Offhand, it seems reasonable to have a post-processing pass over the relocation list before it's written out to the file. The target can manipulate the list in whatever manner it needs to. A hook on MCELFObjectTargetWriter should do the trick. > > -Jim > > > On Mar 19, 2012, at 1:39 PM, Akira Hatanaka wrote: > >> What would be the best way to sort relocation entries before they are >> written out in ELFObjectWriter::WriteRelocationsFragment? >> >> According to the Mips ABI documents I have, there are certain >> restrictions on the order relocations appear in the table (e.g. >> R_MIPS_HI16 and R_MIPS_GOT16 must be followed immediately by a >> R_MIPS_LO16). When I enable post RA scheduling, some of the >> restrictions are violated in the generated object code, which results >> in incorrect relocation values generated by the linker. >> >> I am considering imitating what gas does in function mips_frob_file >> (line 15522 of tc-mips.c) to fix this problem: >> >> http://repo.or.cz/w/binutils.git/blob/master:/gas/config/tc-mips.c >> >> Are there any other targets that have similar restrictions or requirements? >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > From ahatanak at gmail.com Thu Mar 22 13:13:01 2012 From: ahatanak at gmail.com (Akira Hatanaka) Date: Thu, 22 Mar 2012 11:13:01 -0700 Subject: [LLVMdev] Sorting relocation entries In-Reply-To: References: <177AEDD6-ED13-479C-A8F6-9BAE636CC638@apple.com> Message-ID: Here is the patch. On Thu, Mar 22, 2012 at 11:11 AM, Akira Hatanaka wrote: > Hi Jim, > > Yes, the relocation entries have to be reordered so that the > got16/lo16 or hi16/lo16 pairs appear consecutively in the relocation > table. As a result, relocations can appear in a different order than > the instructions that they're for. > > For example, in this code, the post-RA scheduler inserts an > instruction with relocation %got(body_ok) between %got(scope_top) and > %lo(scope_top). > > $ cat z29.s > ?lw ?$3, %got(scope_top)($gp) > ?lw ?$2, %got(body_ok)($gp) > ?lw ?$3, %lo(scope_top)($3) > ?addiu $2, $2, %lo(body_ok) > > This is the assembled program generated by gas: > $ ?mips-linux-gnu-objdump -dr z29.gas.o > > ? ? 748: ? ? ? 8f830000 ? ? ? ?lw ? ? ?v1,0(gp) > ? ? ? ? ? ? ? ? ? ? ? ?748: R_MIPS_GOT16 ? ? ? .bss > ? ? 74c: ? ? ? 8f820000 ? ? ? ?lw ? ? ?v0,0(gp) > ? ? ? ? ? ? ? ? ? ? ? ?74c: R_MIPS_GOT16 ? ? ? .bss > ? ? 750: ? ? ? 8c630000 ? ? ? ?lw ? ? ?v1,0(v1) > ? ? ? ? ? ? ? ? ? ? ? ?750: R_MIPS_LO16 ? ? ? ?.bss > ? ? 754: ? ? ? 244245d4 ? ? ? ?addiu ? v0,v0,17876 > ? ? ? ? ? ? ? ? ? ? ? ?754: R_MIPS_LO16 ? ? ? ?.bss > > > gas reorders these relocations with the function in the following link: > > http://repo.or.cz/w/binutils.git/blob/master:/gas/config/tc-mips.c#l15222 > > > $ mips--linux-gnu-readelf -r z29.gas.o > > Relocation section '.rel.text' at offset 0x4584 contains 705 entries: > ?Offset ? ? Info ? ?Type ? ? ? ? ? ?Sym.Value ?Sym. Name > ... > 00000748 ?00000409 R_MIPS_GOT16 ? ? ?00000000 ? .bss // %got(scope_top) > 00000750 ?00000406 R_MIPS_LO16 ? ? ? 00000000 ? .bss ? // %lo(scope_top) > 0000074c ?00000409 R_MIPS_GOT16 ? ? ?00000000 ? .bss // %got(body_ok) > 00000754 ?00000406 R_MIPS_LO16 ? ? ? 00000000 ? .bss // %lo(body_ok) > > > The attached patch makes the following changes to make direct object > emitter write out relocations in the correct order: > > 1. Add a target hook MCELFObjectTargetWriter::ReorderRelocs. The > default behavior sorts the relocations by the r_offset. > 2. Move struct ELFRelocationEntry from ELFObjectWriter to > MCELFObjectTargetWriter and add member fixup to it. The overridden > version of ReorderRelocs (MipsELFObjectWriter::ReorderRelocs) needs > access to ELFRelocationEntry::Type and MCFixup::Value to reorder the > relocations. > > Do you think these changes are acceptable? > > On Wed, Mar 21, 2012 at 3:50 PM, Jim Grosbach wrote: >> Hi Akira, >> >> If I follow correctly, the relocation entries can thus be in a different order than the instructions that they're for? That seems a bit odd, but I suppose there's nothing inherently wrong with that. It's just not something, AFAIK, that llvm has had to deal with before. This should definitely be a target-specific thing, not a general ELFObjectWriter thing, as other targets may have entirely different needs. Offhand, it seems reasonable to have a post-processing pass over the relocation list before it's written out to the file. The target can manipulate the list in whatever manner it needs to. A hook on MCELFObjectTargetWriter should do the trick. >> >> -Jim >> >> >> On Mar 19, 2012, at 1:39 PM, Akira Hatanaka wrote: >> >>> What would be the best way to sort relocation entries before they are >>> written out in ELFObjectWriter::WriteRelocationsFragment? >>> >>> According to the Mips ABI documents I have, there are certain >>> restrictions on the order relocations appear in the table (e.g. >>> R_MIPS_HI16 and R_MIPS_GOT16 must be followed immediately by a >>> R_MIPS_LO16). When I enable post RA scheduling, some of the >>> restrictions are violated in the generated object code, which results >>> in incorrect relocation values generated by the linker. >>> >>> I am considering imitating what gas does in function mips_frob_file >>> (line 15522 of tc-mips.c) to fix this problem: >>> >>> http://repo.or.cz/w/binutils.git/blob/master:/gas/config/tc-mips.c >>> >>> Are there any other targets that have similar restrictions or requirements? >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu ? ? ? ? http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> -------------- next part -------------- A non-text attachment was scrubbed... Name: reloc.patch Type: text/x-patch Size: 4776 bytes Desc: not available Url : http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120322/99b1f3dc/attachment.bin From clattner at apple.com Thu Mar 22 16:22:53 2012 From: clattner at apple.com (Chris Lattner) Date: Thu, 22 Mar 2012 14:22:53 -0700 Subject: [LLVMdev] recognizing DTORs and vptr updates in LLVM. In-Reply-To: References: <1F3170AA-F2C0-4DA5-8F79-FCE423EFFE27@apple.com> <4F6836FD.5020509@free.fr> Message-ID: <49B34260-7293-4B87-B7EC-E5E5AECB2A15@apple.com> On Mar 21, 2012, at 5:06 PM, Kostya Serebryany wrote: > Chris, is this how the tbaa for vtable loads/stores should look like? > > Metadata: > !0 = metadata !{metadata !"vtable pointer", metadata !1} > !1 = metadata !{metadata !"omnipotent char", metadata !2} > !2 = metadata !{metadata !"Simple C/C++ TBAA", null} char*'s can't point to vtables, so I think that "Simple C/C++ TBAA" should be the parent of vtables. Otherwise, looks great. -Chris > ... > > Load: > %0 = bitcast %struct.A* %a to void (%struct.A*)*** > %vtable = load void (%struct.A*)*** %0, align 8, !tbaa !0 > > Store: > %0 = getelementptr inbounds %struct.B* %this, i64 0, i32 0, i32 0 > store i32 (...)** bitcast (i8** getelementptr inbounds ([5 x i8*]* @_ZTV1B, i64 0, i64 2) to i32 (...)**), i32 (...)*** %0, align 8, !tbaa !0 > > --kcc > > > > > > On Wed, Mar 21, 2012 at 12:57 PM, Chris Lattner wrote: > > On Mar 21, 2012, at 11:53 AM, Kostya Serebryany wrote: > >> > The gcc Ada front-end does this too, in quite a range of situations. For >> > example multiple threads racily initialize a pointer variable, but they all >> > initialize to the same value. The various valgrind based race detection >> > tools all complain about this, which makes them much less useful than they >> > might be for Ada. >> >> FWIW, after thinking about this for awhile, I realize that we already have the tools to handle this: TBAA. >> >> It would be general goodness for clang to emit VTable loads and stores in their with their own TBAA type class (one that does not even alias "char*"). >> >> Indeed, sounds very nice. >> I'll try to make a patch that adds TBAA metadata to VTable loads (unless someone else knows how to do it off the top of his head). >> > > Sounds great, thanks Kostya, > > -Chris > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120322/9a5c4b1c/attachment.html From clattner at apple.com Thu Mar 22 16:25:08 2012 From: clattner at apple.com (Chris Lattner) Date: Thu, 22 Mar 2012 14:25:08 -0700 Subject: [LLVMdev] Target Data In-Reply-To: References: Message-ID: <72EB7528-4F94-4B1F-9BE9-7C03F31F5D16@apple.com> On Mar 21, 2012, at 5:57 PM, Ryan Taylor wrote: > Is it possible to change the widths of types independent of the architecture? Or to reset the widths of types? > > I haven't seen anything like this. Thanks. The datalayout string is required to match the target, if it exists: http://llvm.org/docs/LangRef.html#datalayout You can't control it to change how things are laid out. -Chris From ryta1203 at gmail.com Thu Mar 22 16:28:19 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Thu, 22 Mar 2012 14:28:19 -0700 Subject: [LLVMdev] Target Data In-Reply-To: <72EB7528-4F94-4B1F-9BE9-7C03F31F5D16@apple.com> References: <72EB7528-4F94-4B1F-9BE9-7C03F31F5D16@apple.com> Message-ID: I see, thanks. However, if I -emit-llvm and then append the "target datalayout" string (then operate on that emitted llvm), I can then change the data type sizes and alignments, correct? Thanks. On Thu, Mar 22, 2012 at 2:25 PM, Chris Lattner wrote: > > On Mar 21, 2012, at 5:57 PM, Ryan Taylor wrote: > > > Is it possible to change the widths of types independent of the > architecture? Or to reset the widths of types? > > > > I haven't seen anything like this. Thanks. > > The datalayout string is required to match the target, if it exists: > http://llvm.org/docs/LangRef.html#datalayout > > You can't control it to change how things are laid out. > > -Chris > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120322/f2532bd6/attachment.html From james.molloy at arm.com Thu Mar 22 04:04:19 2012 From: james.molloy at arm.com (James Molloy) Date: Thu, 22 Mar 2012 09:04:19 -0000 Subject: [LLVMdev] Euro-LLVM 2012 BoFs and lightning talks - last call! Message-ID: <00a601cd080a$c4050190$4c0f04b0$@molloy@arm.com> Hi, The deadline for BoF and lightning talk registration is in 3 hours time, at 12:00 BST . Please send in any contributions before then! Cheers, James -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120322/8e04bd2f/attachment-0001.html From clattner at apple.com Thu Mar 22 16:46:24 2012 From: clattner at apple.com (Chris Lattner) Date: Thu, 22 Mar 2012 14:46:24 -0700 Subject: [LLVMdev] Target Data In-Reply-To: References: <72EB7528-4F94-4B1F-9BE9-7C03F31F5D16@apple.com> Message-ID: On Mar 22, 2012, at 2:28 PM, Ryan Taylor wrote: > I see, thanks. > > However, if I -emit-llvm and then append the "target datalayout" string (then operate on that emitted llvm), I can then change the data type sizes and alignments, correct? If the frontend is making different assumptions than the target data string, then you'll get broken code in a variety of situations. A trivial example is that sizeof() folds to an integer constant in the frontend. -Chris > > Thanks. > > On Thu, Mar 22, 2012 at 2:25 PM, Chris Lattner wrote: > > On Mar 21, 2012, at 5:57 PM, Ryan Taylor wrote: > > > Is it possible to change the widths of types independent of the architecture? Or to reset the widths of types? > > > > I haven't seen anything like this. Thanks. > > The datalayout string is required to match the target, if it exists: > http://llvm.org/docs/LangRef.html#datalayout > > You can't control it to change how things are laid out. > > -Chris > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120322/3aa9149d/attachment.html From wendling at apple.com Thu Mar 22 02:28:58 2012 From: wendling at apple.com (Bill Wendling) Date: Thu, 22 Mar 2012 00:28:58 -0700 Subject: [LLVMdev] Catching C++ exceptions, cleaning up, rethrowing In-Reply-To: <978903B6-BDA6-47CD-9E7B-B8214DEDE339@lucasmail.org> References: <978903B6-BDA6-47CD-9E7B-B8214DEDE339@lucasmail.org> Message-ID: <46631B9D-2E98-4DB4-80F0-1F484D6A9060@apple.com> On Mar 20, 2012, at 7:38 PM, Paul J. Lucas wrote: > To recap, on Mar 14, 2012, I wrote: > >> My project has a C++ library that I want to allow the user to use via some programming language to be JIT'd to call functions in said library. For the sake of simplicity, assume the library has classes like: >> >> class item_iterator { >> public: >> virtual ~item_iterator(); >> virtual bool next( item *result ) = 0; >> }; >> >> I'm aware that LLVM doesn't know anything about C++ and that one way to call C++ functions is to wrap them in C thunks: >> >> extern "C" bool thunk_iterator_M_next( void *v_that, void *v_result ) { >> item_iterator *const that = static_cast( v_that ); >> item *const result = static_cast( v_result ); >> return that->next( result ); >> } >> >> extern "C" void thunk_iterator_M_delete( void *v_that ) { >> item_iterator *const that = static_cast( v_that ); >> that->~item_iterator(); >> } > > Thanks to a previous answer, I now have everything working. My next problem is to deal with exceptions that may be thrown from the C++ functions that are called via the thunks, e.g., what if that->next() throws an exception? I need to be able to catch it, call a clean-up function, and rethrow the exception so the code calling the JIT'd code can deal with the exception. > > I've read the docs on LLVM exceptions, but I don't see any examples. A little help? > I don't think this has anything to do with LLVM's IR-level exception system. It sounds to me like you just need a way to handle C++ exceptions inside of the C++ code and then rethrow so that the JIT's caller can do its thing. (Right?) You could move the C++ code into a C++ function that catches all exceptions. The C functions you provide would call the small bit of C++ code that would then execute the "real" functionality. You would have to wrap/unwrap the variables, of course. (There are examples of wrapping/unwrapping of variables in LLVM's source tree.) That way you will get to use C++'s exception handling system instead of creating your own, which is a huge massive undertaking full of pitfalls. When you rethrow the exception, it will propagate past the C function to the code calling the JIT'ed code. > One thought might be to try to handle all the C++ exception code in the thunks. The JIT'd code would create/maintain a simple array-of-structs like: > > struct dtor_pair { > void (*dtor_fn)(void*); > void *that; > }; > dtor_pair dtor_pairs[10]; > > that contain a pointer to the thunk for a destructor and a pointer to the object to be destructed. > > As the JIT'd code creates objects on the stack, it populates the dtor_pairs array. This array could then be passed to every thunk: > > extern "C" bool thunk_iterator_M_next( void *v_that, void *v_result, > dtor_pairs *dtors ) { > try { > item_iterator *const that = static_cast( v_that ); > item *const result = static_cast( v_result ); > return that->next( result ); > } > catch ( ... ) { > run_dtors( dtors ); > throw; > } > } > > where run_dtors() would run through the array backwards calling the destructor functions in reverse order of construction. > > Would this work? If so, then I wouldn't have to mess with handing C++ exceptions from LLVM. But is there a better "LLVM way" to do what I want? > What you're doing is recreating what the personality function and DWARF unwinding library do. -bw From ryta1203 at gmail.com Thu Mar 22 16:56:33 2012 From: ryta1203 at gmail.com (Ryan Taylor) Date: Thu, 22 Mar 2012 14:56:33 -0700 Subject: [LLVMdev] Fwd: Target Data In-Reply-To: References: <72EB7528-4F94-4B1F-9BE9-7C03F31F5D16@apple.com> Message-ID: ---------- Forwarded message ---------- From: Ryan Taylor Date: Thu, Mar 22, 2012 at 2:56 PM Subject: Re: [LLVMdev] Target Data To: Chris Lattner So I read that link yesterday and it says that it uses some default unless they are overridden by the datalayout keyword, which from what I can tell can only be put in an LLVM IR file to be read, is there a way to set this via the API? On Thu, Mar 22, 2012 at 2:46 PM, Chris Lattner wrote: > > On Mar 22, 2012, at 2:28 PM, Ryan Taylor wrote: > > I see, thanks. > > However, if I -emit-llvm and then append the "target datalayout" string > (then operate on that emitted llvm), I can then change the data type sizes > and alignments, correct? > > > If the frontend is making different assumptions than the target data > string, then you'll get broken code in a variety of situations. A trivial > example is that sizeof() folds to an integer constant in the frontend. > > -Chris > > > Thanks. > > On Thu, Mar 22, 2012 at 2:25 PM, Chris Lattner wrote: > >> >> On Mar 21, 2012, at 5:57 PM, Ryan Taylor wrote: >> >> > Is it possible to change the widths of types independent of the >> architecture? Or to reset the widths of types? >> > >> > I haven't seen anything like this. Thanks. >> >> The datalayout string is required to match the target, if it exists: >> http://llvm.org/docs/LangRef.html#datalayout >> >> You can't control it to change how things are laid out. >> >> -Chris >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120322/045e3518/attachment.html From marco at peereboom.us Tue Mar 20 12:50:41 2012 From: marco at peereboom.us (Marco Peereboom) Date: Tue, 20 Mar 2012 12:50:41 -0500 Subject: [LLVMdev] Runtime linker issue wtih X11R6 on i386 with -O3 optimization Message-ID: <20120320175041.GR16415@peereboom.us> Hi everybody. I have an odd issue that I'd like to get some advice on. It is a bit of a long story so please bear with me. X11R6 has a notion of modules so it basically compiles everything into shared libraries and at start-of-day it loads libraries (modules) as needed. A side effect of that is that they require really lazy binding because they do (can?) not enforce the load order. The problem I am seeing is with any optimization higher than -O0 on the following code: void uxa_check_poly_lines(DrawablePtr pDrawable, GCPtr pGC, int mode, int npt, DDXPointPtr ppt) { ScreenPtr screen = pDrawable->pScreen; UXA_FALLBACK(("to %p (%c), width %d, mode %d, count %d\n", pDrawable, uxa_drawable_location(pDrawable), pGC->lineWidth, mode, npt)); if (pGC->lineWidth == 0) { if (uxa_prepare_access(pDrawable, UXA_ACCESS_RW)) { if (uxa_prepare_access_gc(pGC)) { fbPolyLine(pDrawable, pGC, mode, npt, ppt); uxa_finish_access_gc(pGC); } uxa_finish_access(pDrawable); } return; } /* fb calls mi functions in the lineWidth != 0 case. */ fbPolyLine(pDrawable, pGC, mode, npt, ppt); } This code optimizes into a TAILCALL and that makes X unhappy. Now to make things worse, this exact same code works fine on X86_64, I only see this issue on i386. Admittedly I have not looked at the x86_64 asm to look for differences. All the code was compiled using clang 3.0 release on OpenBSD. Prototyping the offending functions with __attribute__((weak)) works around the problem but is pretty ugly and unmaintainable in a project as old and the size of xorg. Is there a magic flag I can use to enforce this behavior or can we consider this a bug of sorts. I get why clang does what it does, unfortunately it breaks stuff. And I'll add the mandatory whine, yes it works with gcc at all optimization levels. I can provide more information if needed. ======================================================================= -O3 # objdump -R intel_drv.so | grep PolyLine 20014754 R_386_GLOB_DAT fbPolyLine 20014048 R_386_JUMP_SLOT fbPolyLine So the problem is that clang generates R_386_GLOB_DAT for fbPolyLine in order to load the jump into %eax that then is jumped to from 5589f. The offending code is at 558a1. I get the optimization and think it is pretty cute however stuff like X11 relies on symbols being loaded really really late :( There is one extra confusing factor. If I proto type the fbPolyLine funtion with __attribute__((weak)) the same code *does* work. I have not dug into this at all but since I found it as an ugly workaround I figured I'd mention it. if (uxa_prepare_access_gc(pGC)) { fbPolyLine(pDrawable, pGC, mode, npt, ppt); 55844: 89 44 24 10 mov %eax,0x10(%esp) 55848: 8b 45 14 mov 0x14(%ebp),%eax 5584b: 89 44 24 0c mov %eax,0xc(%esp) 5584f: 8b 45 10 mov 0x10(%ebp),%eax 55852: 89 44 24 08 mov %eax,0x8(%esp) 55856: 89 7c 24 04 mov %edi,0x4(%esp) 5585a: 8b 45 08 mov 0x8(%ebp),%eax 5585d: 89 04 24 mov %eax,(%esp) 55860: 89 f3 mov %esi,%ebx 55862: e8 05 45 fb ff call 9d6c <_init+0x77c> 55867: b8 c0 00 00 00 mov $0xc0,%eax 5586c: 23 47 10 and 0x10(%edi),%eax 5586f: 83 f8 40 cmp $0x40,%eax 55872: 75 0d jne 55881 55874: 8b 47 20 mov 0x20(%edi),%eax 55877: 89 04 24 mov %eax,(%esp) 5587a: 89 f3 mov %esi,%ebx 5587c: e8 bb 5a fb ff call b33c <_init+0x1d4c> 55881: 8b 47 24 mov 0x24(%edi),%eax 55884: 85 c0 test %eax,%eax 55886: 74 0a je 55892 55888: 89 04 24 mov %eax,(%esp) 5588b: 89 f3 mov %esi,%ebx 5588d: e8 aa 5a fb ff call b33c <_init+0x1d4c> uxa_finish_access_gc(pGC); } uxa_finish_access(pDrawable); 55892: 8b 86 c4 09 00 00 mov 0x9c4(%esi),%eax 55898: 83 c4 24 add $0x24,%esp 5589b: 5e pop %esi 5589c: 5f pop %edi 5589d: 5b pop %ebx 5589e: 5d pop %ebp 5589f: ff e0 jmp *%eax } return; } /* fb calls mi functions in the lineWidth != 0 case. */ fbPolyLine(pDrawable, pGC, mode, npt, ppt); 558a1: 8b 86 f0 08 00 00 mov 0x8f0(%esi),%eax 558a7: eb ef jmp 55898 ======================================================================= -O0 # objdump -R intel_drv.so | grep PolyLine 200143cc R_386_JUMP_SLOT fbPolyLine The relevant asm. The juicy bits are at 8330d and 83358 which are the calls to fbPolyLine. Since it is always a direct call all is groovy. if (uxa_prepare_access_gc(pGC)) { 832cf: 8b 45 ec mov 0xffffffec(%ebp),%eax 832d2: 89 04 24 mov %eax,(%esp) 832d5: 8b 5d d8 mov 0xffffffd8(%ebp),%ebx 832d8: e8 1f 75 f8 ff call a7fc <_init+0x119c> 832dd: 3d 00 00 00 00 cmp $0x0,%eax 832e2: 0f 84 38 00 00 00 je 83320 fbPolyLine(pDrawable, pGC, mode, npt, ppt); 832e8: 8b 45 f0 mov 0xfffffff0(%ebp),%eax 832eb: 8b 4d ec mov 0xffffffec(%ebp),%ecx 832ee: 8b 55 e8 mov 0xffffffe8(%ebp),%edx 832f1: 8b 75 e4 mov 0xffffffe4(%ebp),%esi 832f4: 8b 7d e0 mov 0xffffffe0(%ebp),%edi 832f7: 89 04 24 mov %eax,(%esp) 832fa: 89 4c 24 04 mov %ecx,0x4(%esp) 832fe: 89 54 24 08 mov %edx,0x8(%esp) 83302: 89 74 24 0c mov %esi,0xc(%esp) 83306: 89 7c 24 10 mov %edi,0x10(%esp) 8330a: 8b 5d d8 mov 0xffffffd8(%ebp),%ebx 8330d: e8 fa 6a f8 ff call 9e0c <_init+0x7ac> uxa_finish_access_gc(pGC); 83312: 8b 45 ec mov 0xffffffec(%ebp),%eax 83315: 89 04 24 mov %eax,(%esp) 83318: 8b 5d d8 mov 0xffffffd8(%ebp),%ebx 8331b: e8 4c 73 f8 ff call a66c <_init+0x100c> } uxa_finish_access(pDrawable); 83320: 8b 45 f0 mov 0xfffffff0(%ebp),%eax 83323: 89 04 24 mov %eax,(%esp) 83326: 8b 5d d8 mov 0xffffffd8(%ebp),%ebx 83329: e8 9e 81 f8 ff call b4cc <_init+0x1e6c> } return; 8332e: e9 2a 00 00 00 jmp 8335d } /* fb calls mi functions in the lineWidth != 0 case. */ fbPolyLine(pDrawable, pGC, mode, npt, ppt); 83333: 8b 45 f0 mov 0xfffffff0(%ebp),%eax 83336: 8b 4d ec mov 0xffffffec(%ebp),%ecx 83339: 8b 55 e8 mov 0xffffffe8(%ebp),%edx 8333c: 8b 75 e4 mov 0xffffffe4(%ebp),%esi 8333f: 8b 7d e0 mov 0xffffffe0(%ebp),%edi 83342: 89 04 24 mov %eax,(%esp) 83345: 89 4c 24 04 mov %ecx,0x4(%esp) 83349: 89 54 24 08 mov %edx,0x8(%esp) 8334d: 89 74 24 0c mov %esi,0xc(%esp) 83351: 89 7c 24 10 mov %edi,0x10(%esp) 83355: 8b 5d d8 mov 0xffffffd8(%ebp),%ebx 83358: e8 af 6a f8 ff call 9e0c <_init+0x7ac> ======================================================================== xorg output: $ startx xauth: file /home/marco/.serverauth.12707 does not exist X.Org X Server 1.11.4 Release Date: 2012-01-27 X Protocol Version 11, Revision 0 Build Operating System: OpenBSD 5.1 i386 Current Operating System: OpenBSD i386.peereboom.us 5.1 GENERIC.MP#4 i386 Build Date: 20 March 2012 11:29:41AM Current version of pixman: 0.24.4 Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. (==) Log file: "/var/log/Xorg.0.log", Time: Tue Mar 20 12:42:07 2012 (==) Using system config directory "/usr/X11R6/share/X11/xorg.conf.d" /usr/X11R6/bin/X:/usr/X11R6/lib/modules/drivers/intel_drv.so: undefined symbol 'fbPolyLine' /usr/X11R6/bin/X:/usr/X11R6/lib/modules/drivers/intel_drv.so: undefined symbol 'vgaHWSaveScreen' /usr/X11R6/bin/X:/usr/X11R6/lib/modules/drivers/intel_drv.so: undefined symbol 'fbPolySegment' (EE) Failed to load /usr/X11R6/lib/modules/drivers/intel_drv.so: Cannot load specified object (EE) Failed to load module "intel" (loader failed, 7) /usr/X11R6/bin/X:/usr/X11R6/lib/modules/drivers/vesa_drv.so: undefined symbol 'shadowUpdatePacked' (EE) Failed to load /usr/X11R6/lib/modules/drivers/vesa_drv.so: Cannot load specified object (EE) Failed to load module "vesa" (loader failed, 7) (EE) No drivers available. Fatal server error: no screens found Please consult the The X.Org Foundation support at http://wiki.x.org for help. Please also check the log file at "/var/log/Xorg.0.log" for additional information. Server terminated with error (1). Closing log file. xinit: giving up xinit: unable to connect to X server: Connection refused xinit: server error From nlamee at cs.mcgill.ca Thu Mar 22 10:31:51 2012 From: nlamee at cs.mcgill.ca (nlamee at cs.mcgill.ca) Date: Thu, 22 Mar 2012 11:31:51 -0400 (EDT) Subject: [LLVMdev] Execution Engine: CodeGenOpt level Message-ID: <49368.132.206.3.112.1332430311.squirrel@mail.cs.mcgill.ca> Hi, How can I dynamically change the code generation optimization level (e.g., None) of a JIT in other to recompile a function with a new optimization level (e.g., Default)? Thank you. Best regards, Nurudeen. From abhishekr1982 at gmail.com Thu Mar 22 06:49:31 2012 From: abhishekr1982 at gmail.com (Abhishek Rhisheekesan) Date: Thu, 22 Mar 2012 04:49:31 -0700 (PDT) Subject: [LLVMdev] Problem using a label to a MachineBasicBlock In-Reply-To: <7655ECAC2D5EEB4E95BCDC8D74CBE7B203B891EA18@DE02WXMBX1.internal.synopsys.com> References: <7655ECAC2D5EEB4E95BCDC8D74CBE7B203B891E8C9@DE02WXMBX1.internal.synopsys.com> <7655ECAC2D5EEB4E95BCDC8D74CBE7B203B891EA18@DE02WXMBX1.internal.synopsys.com> Message-ID: <33544612.post@talk.nabble.com> Can you please post the code to split a MachineBasicBlock? I am trying to split a MachineBasicBlock at a specific instruction in the MBB, let us say, into MBB1 and MBB2. This instruction should go into MBB2. Also MBB1 should have an unconditional branch to MBB2 as the terminator. (quite similar to splitBasicBlock in BasicBlock.cpp) Meanwhile, I am trying to come up with a variant of SplitCriticalEdge to do this but if someone can provide the code to split a MBB, it will be of great help. Jeroen Dobbelaere-2 wrote: > > Using 'NEW_BB->setIsLandingPad(true);' seems to resolve everything. > > Greetings, > > Jeroen Dobbelaere > [...] > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > -- View this message in context: http://old.nabble.com/Problem-using-a-label-to-a-MachineBasicBlock-tp32889812p33544612.html Sent from the LLVM - Dev mailing list archive at Nabble.com. From grosbach at apple.com Thu Mar 22 18:42:11 2012 From: grosbach at apple.com (Jim Grosbach) Date: Thu, 22 Mar 2012 16:42:11 -0700 Subject: [LLVMdev] Sorting relocation entries In-Reply-To: References: <177AEDD6-ED13-479C-A8F6-9BAE636CC638@apple.com> Message-ID: Hi Akira, This is looking good. Some specific comments on the details below. Thanks! Jim > diff --git a/include/llvm/MC/MCELFObjectWriter.h b/include/llvm/MC/MCELFObjectWriter.h > index 6e9f5d8..220ecd0 100644 > --- a/include/llvm/MC/MCELFObjectWriter.h > +++ b/include/llvm/MC/MCELFObjectWriter.h > @@ -13,6 +13,7 @@ > #include "llvm/MC/MCObjectWriter.h" > #include "llvm/Support/DataTypes.h" > #include "llvm/Support/ELF.h" > +#include > > namespace llvm { > class MCELFObjectTargetWriter { > @@ -27,6 +28,33 @@ protected: > uint16_t EMachine_, bool HasRelocationAddend_); > > public: > + /// @name Relocation Data > + /// @{ > + > + struct ELFRelocationEntry { > + // Make these big enough for both 32-bit and 64-bit > + uint64_t r_offset; > + int Index; > + unsigned Type; > + const MCSymbol *Symbol; > + uint64_t r_addend; > + const MCFixup *fixup; > + > + ELFRelocationEntry() > + : r_offset(0), Index(0), Type(0), Symbol(0), r_addend(0), fixup(0) {} > + > + ELFRelocationEntry(uint64_t RelocOffset, int Idx, > + unsigned RelType, const MCSymbol *Sym, > + uint64_t Addend, const MCFixup *Fixup) > + : r_offset(RelocOffset), Index(Idx), Type(RelType), > + Symbol(Sym), r_addend(Addend), fixup(Fixup) {} > + > + // Support lexicographic sorting. > + bool operator<(const ELFRelocationEntry &RE) const { > + return RE.r_offset < r_offset; > + } > + }; > + I don't think this really belongs to the MCELFObjectTargetWriter class, per se. I suggest moving it outside of the class definition. > static uint8_t getOSABI(Triple::OSType OSType) { > switch (OSType) { > case Triple::FreeBSD: > @@ -52,6 +80,8 @@ public: > virtual void adjustFixupOffset(const MCFixup &Fixup, > uint64_t &RelocOffset); > > + virtual void ReorderRelocs(const MCAssembler &Asm, s/ReorderRelocs/reorderRelocs/. Function names start w/ a lower case letter. Personally, I prefer naming the prefix "sort" rather than "reorder", as it's a bit more descriptive, but not a big deal either way. > + std::vector& Relocs); The '&' binds to the identifier, not the type name, and should be formatted as such. I.e., space before the '&' and no space between it and "Relocs". > /// @name Accessors > /// @{ > diff --git a/lib/MC/ELFObjectWriter.cpp b/lib/MC/ELFObjectWriter.cpp > index 36f94b4..093eb07 100644 > --- a/lib/MC/ELFObjectWriter.cpp > +++ b/lib/MC/ELFObjectWriter.cpp > @@ -84,31 +84,7 @@ class ELFObjectWriter : public MCObjectWriter { > } > }; > > - /// @name Relocation Data > - /// @{ > - > - struct ELFRelocationEntry { > - // Make these big enough for both 32-bit and 64-bit > - uint64_t r_offset; > - int Index; > - unsigned Type; > - const MCSymbol *Symbol; > - uint64_t r_addend; > - > - ELFRelocationEntry() > - : r_offset(0), Index(0), Type(0), Symbol(0), r_addend(0) {} > - > - ELFRelocationEntry(uint64_t RelocOffset, int Idx, > - unsigned RelType, const MCSymbol *Sym, > - uint64_t Addend) > - : r_offset(RelocOffset), Index(Idx), Type(RelType), > - Symbol(Sym), r_addend(Addend) {} > - > - // Support lexicographic sorting. > - bool operator<(const ELFRelocationEntry &RE) const { > - return RE.r_offset < r_offset; > - } > - }; > + typedef MCELFObjectTargetWriter::ELFRelocationEntry ELFRelocationEntry; Scoping operators shouldn't be typedefed away. Spell it out explicitly when the type is referenced. It makes the code clearer, though a bit more verbose. That said, with the above tweak to move the relocation type out to the top level, there shouldn't need to be any explicit scope resolution. > /// The target specific ELF writer instance. > llvm::OwningPtr TargetObjectWriter; > @@ -786,7 +762,7 @@ void ELFObjectWriter::RecordRelocation(const MCAssembler &Asm, > else > assert(isInt<32>(Addend)); > > - ELFRelocationEntry ERE(RelocOffset, Index, Type, RelocSymbol, Addend); > + ELFRelocationEntry ERE(RelocOffset, Index, Type, RelocSymbol, Addend, &Fixup); > Relocations[Fragment->getParent()].push_back(ERE); > } > > @@ -1072,8 +1048,7 @@ void ELFObjectWriter::WriteRelocationsFragment(const MCAssembler &Asm, > MCDataFragment *F, > const MCSectionData *SD) { > std::vector &Relocs = Relocations[SD]; > - // sort by the r_offset just like gnu as does > - array_pod_sort(Relocs.begin(), Relocs.end()); > + TargetObjectWriter->ReorderRelocs(Asm, Relocs); Please add a comment explaining a bit. Nothing elaborate, just something along the lines of, "Sort the relocation entries. Most targets just sort by r_offset, but some (e.g., MIPS) have additional constraints." > > for (unsigned i = 0, e = Relocs.size(); i != e; ++i) { > ELFRelocationEntry entry = Relocs[e - i - 1]; > diff --git a/lib/MC/MCELFObjectTargetWriter.cpp b/lib/MC/MCELFObjectTargetWriter.cpp > index 15bf476..4f3e3b2 100644 > --- a/lib/MC/MCELFObjectTargetWriter.cpp > +++ b/lib/MC/MCELFObjectTargetWriter.cpp > @@ -7,6 +7,7 @@ > // > //===----------------------------------------------------------------------===// > > +#include "llvm/ADT/STLExtras.h" Since we're moving the sort here from ELFObjectWriter.cpp, it may be possible to remove the STLExtras.h include from the latter. Please check and see. > #include "llvm/MC/MCELFObjectWriter.h" > > using namespace llvm; > @@ -36,3 +37,10 @@ const MCSymbol *MCELFObjectTargetWriter::ExplicitRelSym(const MCAssembler &Asm, > void MCELFObjectTargetWriter::adjustFixupOffset(const MCFixup &Fixup, > uint64_t &RelocOffset) { > } > + > +void > +MCELFObjectTargetWriter::ReorderRelocs(const MCAssembler &Asm, > + std::vector& Relocs) { '&' binding thing again. > + // Not original with you, but since we're in here anyway, this should be a well-formed sentence: "Sort by the r_offset, just like gnu as does." > + array_pod_sort(Relocs.begin(), Relocs.end()); Trailing whitespace. > +} > On Mar 22, 2012, at 11:13 AM, Akira Hatanaka wrote: > Here is the patch. > > On Thu, Mar 22, 2012 at 11:11 AM, Akira Hatanaka wrote: >> Hi Jim, >> >> Yes, the relocation entries have to be reordered so that the >> got16/lo16 or hi16/lo16 pairs appear consecutively in the relocation >> table. As a result, relocations can appear in a different order than >> the instructions that they're for. >> >> For example, in this code, the post-RA scheduler inserts an >> instruction with relocation %got(body_ok) between %got(scope_top) and >> %lo(scope_top). >> >> $ cat z29.s >> lw $3, %got(scope_top)($gp) >> lw $2, %got(body_ok)($gp) >> lw $3, %lo(scope_top)($3) >> addiu $2, $2, %lo(body_ok) >> >> This is the assembled program generated by gas: >> $ mips-linux-gnu-objdump -dr z29.gas.o >> >> 748: 8f830000 lw v1,0(gp) >> 748: R_MIPS_GOT16 .bss >> 74c: 8f820000 lw v0,0(gp) >> 74c: R_MIPS_GOT16 .bss >> 750: 8c630000 lw v1,0(v1) >> 750: R_MIPS_LO16 .bss >> 754: 244245d4 addiu v0,v0,17876 >> 754: R_MIPS_LO16 .bss >> >> >> gas reorders these relocations with the function in the following link: >> >> http://repo.or.cz/w/binutils.git/blob/master:/gas/config/tc-mips.c#l15222 >> >> >> $ mips--linux-gnu-readelf -r z29.gas.o >> >> Relocation section '.rel.text' at offset 0x4584 contains 705 entries: >> Offset Info Type Sym.Value Sym. Name >> ... >> 00000748 00000409 R_MIPS_GOT16 00000000 .bss // %got(scope_top) >> 00000750 00000406 R_MIPS_LO16 00000000 .bss // %lo(scope_top) >> 0000074c 00000409 R_MIPS_GOT16 00000000 .bss // %got(body_ok) >> 00000754 00000406 R_MIPS_LO16 00000000 .bss // %lo(body_ok) >> >> >> The attached patch makes the following changes to make direct object >> emitter write out relocations in the correct order: >> >> 1. Add a target hook MCELFObjectTargetWriter::ReorderRelocs. The >> default behavior sorts the relocations by the r_offset. >> 2. Move struct ELFRelocationEntry from ELFObjectWriter to >> MCELFObjectTargetWriter and add member fixup to it. The overridden >> version of ReorderRelocs (MipsELFObjectWriter::ReorderRelocs) needs >> access to ELFRelocationEntry::Type and MCFixup::Value to reorder the >> relocations. >> >> Do you think these changes are acceptable? >> >> On Wed, Mar 21, 2012 at 3:50 PM, Jim Grosbach wrote: >>> Hi Akira, >>> >>> If I follow correctly, the relocation entries can thus be in a different order than the instructions that they're for? That seems a bit odd, but I suppose there's nothing inherently wrong with that. It's just not something, AFAIK, that llvm has had to deal with before. This should definitely be a target-specific thing, not a general ELFObjectWriter thing, as other targets may have entirely different needs. Offhand, it seems reasonable to have a post-processing pass over the relocation list before it's written out to the file. The target can manipulate the list in whatever manner it needs to. A hook on MCELFObjectTargetWriter should do the trick. >>> >>> -Jim >>> >>> >>> On Mar 19, 2012, at 1:39 PM, Akira Hatanaka wrote: >>> >>>> What would be the best way to sort relocation entries before they are >>>> written out in ELFObjectWriter::WriteRelocationsFragment? >>>> >>>> According to the Mips ABI documents I have, there are certain >>>> restrictions on the order relocations appear in the table (e.g. >>>> R_MIPS_HI16 and R_MIPS_GOT16 must be followed immediately by a >>>> R_MIPS_LO16). When I enable post RA scheduling, some of the >>>> restrictions are violated in the generated object code, which results >>>> in incorrect relocation values generated by the linker. >>>> >>>> I am considering imitating what gas does in function mips_frob_file >>>> (line 15522 of tc-mips.c) to fix this problem: >>>> >>>> http://repo.or.cz/w/binutils.git/blob/master:/gas/config/tc-mips.c >>>> >>>> Are there any other targets that have similar restrictions or requirements? >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> > From adonig at gmx.de Thu Mar 22 18:54:25 2012 From: adonig at gmx.de (Andreas Donig) Date: Fri, 23 Mar 2012 00:54:25 +0100 Subject: [LLVMdev] SPEC CPU2006 bitcode files Message-ID: <7E6959AC-F1DF-4873-91EE-C763D5666EAE@gmx.de> Hello, I'm trying to generate bitcode files from the SPEC CPU2006 benchmark suites. First I installed the benchmarks into ~/llvm/projects/test-suite/External/speccpu2006 and then I tried ~/mysandbox/bin/lnt runtest nt --sandbox=sandbox --cc=/Users/asd/llvm/Release/bin/clang --test-suite=/Users/asd/llvm/projects/test-suite --test-externals=/Users/asd/llvm/projects/test-suite/External -j 4 --only-test=External/SPEC --enable-jit It looks like all but two tests did run fine but the Output directories don't contain bitcode files, they only contain object files. Is there a way to generate bitcode files? Regards Andreas From wendling at apple.com Thu Mar 22 19:29:51 2012 From: wendling at apple.com (Bill Wendling) Date: Thu, 22 Mar 2012 17:29:51 -0700 Subject: [LLVMdev] Catching C++ exceptions, cleaning up, rethrowing In-Reply-To: References: <978903B6-BDA6-47CD-9E7B-B8214DEDE339@lucasmail.org> <46631B9D-2E98-4DB4-80F0-1F484D6A9060@apple.com> Message-ID: On Mar 22, 2012, at 11:40 AM, Paul J. Lucas wrote: > On Mar 22, 2012, at 12:28 AM, Bill Wendling wrote: > >> On Mar 20, 2012, at 7:38 PM, Paul J. Lucas wrote: >> >>> I've read the docs on LLVM exceptions, but I don't see any examples. A little help? >> >> I don't think this has anything to do with LLVM's IR-level exception system. It sounds to me like you just need a way to handle C++ exceptions inside of the C++ code and then rethrow so that the JIT's caller can do its thing. (Right?) > > Right. The call sequence is: > > my_lib(1) -> JIT_code -> C_thunk -> my_lib(2) > > The JIT code creates Functions that create C++ objects on their stacks (by using alloca instructions then calling a C thunk that calls the C++ object's constructor via placement new). If an exception is thrown in my_lib(2), then somewhere between there and when the stack unwinds to my_lib(1), the C++ objects that were created on the stack must have their destructors called (also via C thunks). Hence, some code somewhere between my_lib(1) and C_thunk has to catch all exceptions, call the destructors, and rethrow the exceptions. > >> You could move the C++ code into a C++ function that catches all exceptions. The C functions you provide would call the small bit of C++ code that would then execute the "real" functionality. You would have to wrap/unwrap the variables, of course. (There are examples of wrapping/unwrapping of variables in LLVM's source tree.) That way you will get to use C++'s exception handling system instead of creating your own, which is a huge massive undertaking full of pitfalls. When you rethrow the exception, it will propagate past the C function to the code calling the JIT'ed code. > > Unfortunately, I'm not following. How is having the code that catches all exceptions in a separate function different from what I proposed (putting the try/catch in the thunks)? (Ideally, I want to minimize layers of function calls.) Again for reference: > No reason. But if you have the 'try{}catch(...){}', then it should run the d'tors for you. There's no reason for you to have a "run_dtors" function there. -bw > extern "C" bool thunk_iterator_M_next( void *v_that, void *v_result, > dtor_pairs *dtors ) { > try { > item_iterator *const that = static_cast( v_that ); > item *const result = static_cast( v_result ); > return that->next( result ); > } > catch ( ... ) { > run_dtors( dtors ); > throw; > } > } > > - Paul > From traf at kth.se Thu Mar 22 19:38:28 2012 From: traf at kth.se (Thibault Raffaillac) Date: Fri, 23 Mar 2012 00:38:28 +0000 Subject: [LLVMdev] GSoC on LLVM usability? Message-ID: <47466B24D5352C4AA77737202B165839227F52CF@EXDB1.ug.kth.se> Hello all, My name is Thibault Raffaillac, degree student at KTH, Stockholm, Sweden (in double-degree partnership with Ecole Centrale Marseille, France). I am currently carrying my degree thesis on the usability of compilers, and as such would be very much interested in contributing to LLVM through a Google Summer of Code. The task I would like to propose is implementing the work from my thesis: a user-friendly feedback of the optimizations performed (http://www.csc.kth.se/~traf/traf-sketch.pdf). If this can be of interest (or unclear), please tell me so that I develop further the project application. Best regards, Thibault (http://www.csc.kth.se/~traf/) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120323/3e9dc0d0/attachment.html From chenwj at iis.sinica.edu.tw Thu Mar 22 20:49:03 2012 From: chenwj at iis.sinica.edu.tw (=?utf-8?B?6Zmz6Z+L5Lu7?=) Date: Fri, 23 Mar 2012 09:49:03 +0800 Subject: [LLVMdev] Mailing list archives In-Reply-To: References: Message-ID: <20120323014903.GA67430@cs.nctu.edu.tw> On Wed, Mar 21, 2012 at 12:54:50PM -0700, Chris Lattner wrote: > FYI, the mailing list archives (e.g. http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev) are currently down. The disk that held them had a failure, and the machine is being worked on. Hope the data won't get lost. :) -- Wei-Ren Chen (???) Computer Systems Lab, Institute of Information Science, Academia Sinica, Taiwan (R.O.C.) Tel:886-2-2788-3799 #1667 Homepage: http://people.cs.nctu.edu.tw/~chenwj From rkotler at mips.com Thu Mar 22 20:50:43 2012 From: rkotler at mips.com (reed kotler) Date: Thu, 22 Mar 2012 18:50:43 -0700 Subject: [LLVMdev] apparent mistake in several ports register td file ??? In-Reply-To: <4F6A4F10.5020004@mips.com> References: <4F6A4F10.5020004@mips.com> Message-ID: <4F6BD6F3.2010507@mips.com> At least or Mips, this line seems extraneous. I removed it and and all consequential uses of that (400 changes to MipsRegisterInfo.td) and make check for mips still works. Am running our full test sequence now. This Mips part of this was copied from the Sparc port. Similar problems in other ports. Seems this has just been copied many times to new ports. On 03/21/2012 02:58 PM, reed kotler wrote: > The field Num seems to have no meaning. It is not recognized by the > backend tools. It does not hurt anything but should not be there. > > // We have banks of 32 registers each. > class MipsReg : Register { > field bits<5> Num; > let Namespace = "Mips"; > } > > class ARMReg num, string n, list subregs = []> : > Register { > field bits<4> Num; > let Namespace = "ARM"; > let SubRegs = subregs; > // All bits of ARM registers with sub-registers are covered by > sub-registers. > let CoveredBySubRegs = 1; > } > > class ARMFReg num, string n> : Register { > field bits<6> Num; > let Namespace = "ARM"; > } > > class SparcReg : Register { > field bits<5> Num; > let Namespace = "SP"; > } > > > > > Then subsequently, further derived types copy the mistake. > > // Registers are identified with 5-bit ID numbers. > // Ri - 32-bit integer registers > class Ri num, string n> : SparcReg { > let Num = num; > } > // Rf - 32-bit floating-point registers > class Rf num, string n> : SparcReg { > let Num = num; > } > // Rd - Slots in the FP register file for 64-bit floating-point values. > class Rd num, string n, list subregs> : SparcReg { > let Num = num; > let SubRegs = subregs; > let SubRegIndices = [sub_even, sub_odd]; > let CoveredBySubRegs = 1; > } > > ...... > // Mips CPU Registers > class MipsGPRReg num, string n> : MipsReg { > let Num = num; > } > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From chenwj at iis.sinica.edu.tw Thu Mar 22 21:15:47 2012 From: chenwj at iis.sinica.edu.tw (=?utf-8?B?6Zmz6Z+L5Lu7?=) Date: Fri, 23 Mar 2012 10:15:47 +0800 Subject: [LLVMdev] Execution Engine: CodeGenOpt level In-Reply-To: <49368.132.206.3.112.1332430311.squirrel@mail.cs.mcgill.ca> References: <49368.132.206.3.112.1332430311.squirrel@mail.cs.mcgill.ca> Message-ID: <20120323021547.GA94394@cs.nctu.edu.tw> > How can I dynamically change the code generation optimization level (e.g., > None) of a JIT in other to recompile a function with a new optimization > level (e.g., Default)? From the source code I'm reading, you might have to creat another JIT with different opt level. Regards, chenwj -- Wei-Ren Chen (???) Computer Systems Lab, Institute of Information Science, Academia Sinica, Taiwan (R.O.C.) Tel:886-2-2788-3799 #1667 Homepage: http://people.cs.nctu.edu.tw/~chenwj From nlamee at cs.mcgill.ca Thu Mar 22 21:30:02 2012 From: nlamee at cs.mcgill.ca (nlamee at cs.mcgill.ca) Date: Thu, 22 Mar 2012 22:30:02 -0400 (EDT) Subject: [LLVMdev] Execution Engine: CodeGenOpt level In-Reply-To: <20120323021547.GA94394@cs.nctu.edu.tw> References: <49368.132.206.3.112.1332430311.squirrel@mail.cs.mcgill.ca> <20120323021547.GA94394@cs.nctu.edu.tw> Message-ID: <57959.96.21.95.106.1332469802.squirrel@mail.cs.mcgill.ca> Hi Chenwj, Thank you for your response. The problem with this approach is that global mappings have to be recreated in the new JIT. Can this be somehow avoided? Best regards, Nurudeen. On Thu, March 22, 2012 10:15 pm, ????????? wrote: >> How can I dynamically change the code generation optimization level >> (e.g., >> None) of a JIT in other to recompile a function with a new optimization >> level (e.g., Default)? > > From the source code I'm reading, you might have to creat another JIT > with different opt level. > > Regards, > chenwj > > -- > Wei-Ren Chen (?????????) > Computer Systems Lab, Institute of Information Science, > Academia Sinica, Taiwan (R.O.C.) > Tel:886-2-2788-3799 #1667 > Homepage: http://people.cs.nctu.edu.tw/~chenwj > > From chenwj at iis.sinica.edu.tw Thu Mar 22 21:51:43 2012 From: chenwj at iis.sinica.edu.tw (=?utf-8?B?6Zmz6Z+L5Lu7?=) Date: Fri, 23 Mar 2012 10:51:43 +0800 Subject: [LLVMdev] Execution Engine: CodeGenOpt level In-Reply-To: <57959.96.21.95.106.1332469802.squirrel@mail.cs.mcgill.ca> References: <49368.132.206.3.112.1332430311.squirrel@mail.cs.mcgill.ca> <20120323021547.GA94394@cs.nctu.edu.tw> <57959.96.21.95.106.1332469802.squirrel@mail.cs.mcgill.ca> Message-ID: <20120323025143.GA96570@cs.nctu.edu.tw> > The problem with this approach is that global mappings have to be > recreated in the new JIT. Can this be somehow avoided? I am afraid not. Besides, when you create another ExecutionEngine, it'll take ownership of the module which contains the function you want to compile. Maybe you have creat another module, perhaps. But I am not a LLVM expert, I leave for other's comment. Regards, chenwj -- Wei-Ren Chen (???) Computer Systems Lab, Institute of Information Science, Academia Sinica, Taiwan (R.O.C.) Tel:886-2-2788-3799 #1667 Homepage: http://people.cs.nctu.edu.tw/~chenwj From chenwj at iis.sinica.edu.tw Thu Mar 22 21:57:36 2012 From: chenwj at iis.sinica.edu.tw (=?utf-8?B?6Zmz6Z+L5Lu7?=) Date: Fri, 23 Mar 2012 10:57:36 +0800 Subject: [LLVMdev] SPEC CPU2006 bitcode files In-Reply-To: <7E6959AC-F1DF-4873-91EE-C763D5666EAE@gmx.de> References: <7E6959AC-F1DF-4873-91EE-C763D5666EAE@gmx.de> Message-ID: <20120323025736.GB96570@cs.nctu.edu.tw> > ~/mysandbox/bin/lnt runtest nt --sandbox=sandbox --cc=/Users/asd/llvm/Release/bin/clang --test-suite=/Users/asd/llvm/projects/test-suite --test-externals=/Users/asd/llvm/projects/test-suite/External -j 4 --only-test=External/SPEC --enable-jit From your cmdline, I don't see any option to make clang output bitcode. If you want bitcode, try something like `clang -emit-llvm hello.c -c -o hello.bc`. Regards, chenwj -- Wei-Ren Chen (???) Computer Systems Lab, Institute of Information Science, Academia Sinica, Taiwan (R.O.C.) Tel:886-2-2788-3799 #1667 Homepage: http://people.cs.nctu.edu.tw/~chenwj From Chuck.Caldarale at unisys.com Thu Mar 22 22:17:45 2012 From: Chuck.Caldarale at unisys.com (Caldarale, Charles R) Date: Thu, 22 Mar 2012 22:17:45 -0500 Subject: [LLVMdev] Execution Engine: CodeGenOpt level In-Reply-To: <49368.132.206.3.112.1332430311.squirrel@mail.cs.mcgill.ca> References: <49368.132.206.3.112.1332430311.squirrel@mail.cs.mcgill.ca> Message-ID: <99C8B2929B39C24493377AC7A121E21FB0144AC930@USEA-EXCH8.na.uis.unisys.com> > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of nlamee at cs.mcgill.ca > Subject: [LLVMdev] Execution Engine: CodeGenOpt level > How can I dynamically change the code generation optimization level (e.g., > None) of a JIT in other to recompile a function with a new optimization > level (e.g., Default)? We set the optimization level with a PassManagerBuilder, which is initialized for each function compilation: PassManagerBuilder PMBuilder; ... PMBuilder.OptLevel = conf.value_of(CF_OPTLEVEL); PMBuilder.SizeLevel = conf.is_set(CF_OPTSIZE) ? 1 : 0; PMBuilder.DisableUnitAtATime = !conf.is_set(CF_OPTUNIT); PMBuilder.DisableUnrollLoops = !conf.is_set(CF_UNROLL); PMBuilder.DisableSimplifyLibCalls = !conf.is_set(CF_SIMPLIB); if (Opt != CodeGenOpt::None) { PMBuilder.Inliner = createFunctionInliningPass(Opt == CodeGenOpt::Aggressive ? 250 : 200); } pFPasses = new FunctionPassManager(pMod); pFPasses->add(new TargetData(*TD)); PMBuilder.populateFunctionPassManager(*pFPasses); pFPasses->doInitialization(); pFPasses->run(*pFun); pFPasses->doFinalization(); delete pFPasses; pMPasses = new PassManager(); pMPasses->add(new TargetData(*TD)); pMPasses->add(createCFGSimplificationPass()); pMPasses->add(createBlockPlacementPass()); PMBuilder.populateModulePassManager(*pMPasses); pMPasses->run(*pMod); delete pMPasses; (There is likely some redundancy and unnecessary steps in the above.) - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. From chenwj at iis.sinica.edu.tw Thu Mar 22 22:36:27 2012 From: chenwj at iis.sinica.edu.tw (=?utf-8?B?6Zmz6Z+L5Lu7?=) Date: Fri, 23 Mar 2012 11:36:27 +0800 Subject: [LLVMdev] Execution Engine: CodeGenOpt level In-Reply-To: <99C8B2929B39C24493377AC7A121E21FB0144AC930@USEA-EXCH8.na.uis.unisys.com> References: <49368.132.206.3.112.1332430311.squirrel@mail.cs.mcgill.ca> <99C8B2929B39C24493377AC7A121E21FB0144AC930@USEA-EXCH8.na.uis.unisys.com> Message-ID: <20120323033627.GA98586@cs.nctu.edu.tw> Neat approach, I think. So you set PassManagers's Opt level rather then ExecutionEngine's one? Regards, chenwj -- Wei-Ren Chen (???) Computer Systems Lab, Institute of Information Science, Academia Sinica, Taiwan (R.O.C.) Tel:886-2-2788-3799 #1667 Homepage: http://people.cs.nctu.edu.tw/~chenwj From Chuck.Caldarale at unisys.com Thu Mar 22 22:50:52 2012 From: Chuck.Caldarale at unisys.com (Caldarale, Charles R) Date: Thu, 22 Mar 2012 22:50:52 -0500 Subject: [LLVMdev] Execution Engine: CodeGenOpt level In-Reply-To: <20120323033627.GA98586@cs.nctu.edu.tw> References: <49368.132.206.3.112.1332430311.squirrel@mail.cs.mcgill.ca> <99C8B2929B39C24493377AC7A121E21FB0144AC930@USEA-EXCH8.na.uis.unisys.com> <20120323033627.GA98586@cs.nctu.edu.tw> Message-ID: <99C8B2929B39C24493377AC7A121E21FB01450B2F4@USEA-EXCH8.na.uis.unisys.com> > From: ??? [mailto:chenwj at iis.sinica.edu.tw] > Subject: Re: [LLVMdev] Execution Engine: CodeGenOpt level > So you set PassManagers's Opt level rather then ExecutionEngine's one? That's correct. I have no idea what difference doing one or the other (or both) makes. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. From adonig at gmx.de Thu Mar 22 23:04:34 2012 From: adonig at gmx.de (Andreas Donig) Date: Fri, 23 Mar 2012 05:04:34 +0100 Subject: [LLVMdev] SPEC CPU2006 bitcode files In-Reply-To: <20120323025736.GB96570@cs.nctu.edu.tw> References: <7E6959AC-F1DF-4873-91EE-C763D5666EAE@gmx.de> <20120323025736.GB96570@cs.nctu.edu.tw> Message-ID: <5AE3B5B1-329C-46F1-A788-0A234E1FDF5F@gmx.de> Hi chenwj, first let me thank you for your quick answer. > From your cmdline, I don't see any option to make clang output bitcode. If you > want bitcode, try something like `clang -emit-llvm hello.c -c -o hello.bc`. I added --enable-jit because I had hope this would make LNT run the JIT tests. I thought this process would cause the creation of bitcode files and then execute them using the LLVM interpreter. I would love to simply run clang but the SPEC CPU2006 benchmarks are quite complicated to build. There already exist makefiles to build them in the LLVM test-suite and I had hope I could use the test-suite to create the bitcode files. It looks like the SPEC makefiles contain rules to build and then run the benchmarks with the interpreter but I could not figure out how to make this happen using the LNT tool. I tried to directly run the makefiles instead of using LNT but it just caused a lot of terrible errors. Regards Andreas From hfinkel at anl.gov Thu Mar 22 23:50:36 2012 From: hfinkel at anl.gov (Hal Finkel) Date: Thu, 22 Mar 2012 23:50:36 -0500 Subject: [LLVMdev] Fixing VAARG on PPC64 Message-ID: <20120322235036.4adf4c93@sapling2> The PowerPC backend on PPC64 for non-Darwin (SVR4 ABI) systems currently has a problem handling integer types smaller than 64 bits. This is because the ABI specifies that these types are zero-extended to 64 bits on the stack and the default logic provided in LegalizeDAG does not use that convention. Specifically, for these targets we have: setOperationAction(ISD::VAARG, MVT::Other, Expand); I thought that I could solve this problem by: setOperationAction(ISD::VAARG, MVT::i1, Promote); AddPromotedToType (ISD::VAARG, MVT::i1, MVT::i64); setOperationAction(ISD::VAARG, MVT::i8, Promote); AddPromotedToType (ISD::VAARG, MVT::i8, MVT::i64); setOperationAction(ISD::VAARG, MVT::i16, Promote); AddPromotedToType (ISD::VAARG, MVT::i16, MVT::i64); setOperationAction(ISD::VAARG, MVT::i32, Promote); AddPromotedToType (ISD::VAARG, MVT::i32, MVT::i64); but this does not seem to have any effect. I thought this would work because SDValue DAGTypeLegalizer::PromoteIntRes_VAARG seems to have the appropriate logic. Is this a bug, or am I misunderstanding how Promote works? Thanks again, Hal -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory From wong.kwongyuan at gmail.com Fri Mar 23 01:28:53 2012 From: wong.kwongyuan at gmail.com (WANG.Jiong) Date: Fri, 23 Mar 2012 14:28:53 +0800 Subject: [LLVMdev] tablegen question In-Reply-To: <4F634665.9070401@mips.com> References: <4F634665.9070401@mips.com> Message-ID: <4F6C1825.1080106@gmail.com> >From my understanding, NAME is a special builtin entry and dedicated for things related multiclass, So, is the following rewrite what you want? class Base { int Value = V; } class Derived : Base; multiclass Derived_m { def #NAME# : Derived; } defm TRUE : Derived_m<"true">; defm FALSE : Derived_m<"false">; tablegen result: ------------- Classes ----------------- class Base { int Value = Base:V; string NAME = ?; } class Derived { // Base int Value = !if(!eq(Derived:Truth, "true"), 1, 0); string NAME = ?; } ------------- Defs ----------------- def FALSE { // Base Derived !strconcat(NAME, "") int Value = 0; string NAME = "FALSE"; } def TRUE { // Base Derived !strconcat(NAME, "") int Value = 1; string NAME = "TRUE"; } --- Regards, WANG.Jiong On 03/16/2012 09:55 PM, Reed Kotler wrote: > Trying to resolve some general tablegen questions. > > Consider the test case for Tablegen called eq.td > > class Base { > int Value = V; > } > > class Derived : > Base; > > def TRUE : Derived<"true">; > def FALSE : Derived<"false">; > > If I process this through tablegen I get: > > ------------- Classes ----------------- > class Base { > int Value = Base:V; > string NAME = ?; > } > class Derived { // Base > int Value = !if(!eq(Derived:Truth, "true"), 1, 0); > string NAME = ?; > } > ------------- Defs ----------------- > def FALSE { // Base Derived > int Value = 0; > string NAME = ?; > } > def TRUE { // Base Derived > int Value = 1; > string NAME = ?; > } > > Why is NAME=? in FALSE and TRUE. > > Shouldn't it be FALSE and TRUE ?? > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev From james.molloy at arm.com Fri Mar 23 02:50:04 2012 From: james.molloy at arm.com (James Molloy) Date: Fri, 23 Mar 2012 07:50:04 -0000 Subject: [LLVMdev] FW: IntervalMap - maximum alignment requirements Message-ID: <00ab01cd08c9$8f6a9cf0$ae3fd6d0$@molloy@arm.com> Hi, Bumping this as it appears to have been caught up in the general melee of mailing list outageness. From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of James Molloy Sent: 21 March 2012 18:21 To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] IntervalMap - maximum alignment requirements Hi, I'm debugging a fault seen in RuntimeDyldELF on 32-bit machines, stemming from its use of IntervalMap. The documentation at the top of IntervalMap.h states that it is useful for 4 or 8 byte types, and so RuntimeDyldELF correctly uses it. However, further down the file is this comment: // The root data is either a RootLeaf or a RootBranchData instance. // We can't put them in a union since C++03 doesn't allow non-trivial // constructors in unions. // Instead, we use a char array with pointer alignment. The alignment is // ensured by the allocator member in the class, but still verified in the // constructor. We don't support keys or values that are more aligned than a // pointer. Emphasis mine. So my question is, is IntervalMap supposed to support values greater than the native pointer width, such as uint64_t on a 32-bit system? Should the documentation be updated to reflect this requirement (and RuntimeDyldELF rewritten to follow it) or should we fix IntervalMap? (my current hack fix is to add __attribute__((aligned)) to the end of the 'data[]' member, but I'm sure that's not the best way). Cheers, James -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120323/52967ca2/attachment.html -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ATT00001..txt Url: http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20120323/52967ca2/attachment.txt From chenwj at iis.sinica.edu.tw Fri Mar 23 03:15:41 2012 From: chenwj at iis.sinica.edu.tw (=?utf-8?B?6Zmz6Z+L5Lu7?=) Date: Fri, 23 Mar 2012 16:15:41 +0800 Subject: [LLVMdev] SPEC CPU2006 bitcode files In-Reply-To: <5AE3B5B1-329C-46F1-A788-0A234E1FDF5F@gmx.de> References: <7E6959AC-F1DF-4873-91EE-C763D5666EAE@gmx.de> <20120323025736.GB96570@cs.nctu.edu.tw> <5AE3B5B1-329C-46F1-A788-0A234E1FDF5F@gmx.de> Message-ID: <20120323081541.GA12194@cs.nctu.edu.tw> On Fri, Mar 23, 2012 at 05:04:34AM +0100, Andreas Donig wrote: > Hi chenwj, > > first let me thank you for your quick answer. > > > From your cmdline, I don't see any option to make clang output bitcode. If you > > want bitcode, try something like `clang -emit-llvm hello.c -c -o hello.bc`. > > I added --enable-jit because I had hope this would make LNT run the JIT tests. I thought this process would cause the creation of bitcode files and then execute them using the LLVM interpreter. > I would love to simply run clang but the SPEC CPU2006 benchmarks are quite complicated to build. There already exist makefiles to build them in the LLVM test-suite and I had hope I could use the test-suite to create the bitcode files. It looks like the SPEC makefiles contain rules to build and then run the benc