From gaeke at cs.uiuc.edu Mon Nov 24 11:04:01 2003 From: gaeke at cs.uiuc.edu (Brian Gaeke) Date: Mon Nov 24 11:04:01 2003 Subject: [llvm-commits] CVS: llvm/docs/Stacker.html Message-ID: <200311241703.LAA09428@zion.cs.uiuc.edu> Changes in directory llvm/docs: Stacker.html updated: 1.1 -> 1.2 --- Log message: Apply doc patch from PR136. --- Diffs of the changes: (+348 -60) Index: llvm/docs/Stacker.html diff -u llvm/docs/Stacker.html:1.1 llvm/docs/Stacker.html:1.2 --- llvm/docs/Stacker.html:1.1 Sun Nov 23 20:52:51 2003 +++ llvm/docs/Stacker.html Mon Nov 24 11:03:38 2003 @@ -6,9 +6,21 @@
Written by Reid Spencer
Exercise for the reader: how could you make this a one line program?
Stacker was written for two purposes: (a) to get the author over the learning curve and (b) to provide a simple example of how to write a compiler using LLVM. During the development of Stacker, many lessons about LLVM were learned. Those lessons are described in the following subsections.
Although I knew that LLVM used a Single Static Assignment (SSA) format, +it wasn't obvious to me how prevalent this idea was in LLVM until I really +started using it. Reading the Programmer's Manual and Language Reference I +noted that most of the important LLVM IR (Intermediate Representation) C++ +classes were derived from the Value class. The full power of that simple +design only became fully understood once I started constructing executable +expressions for Stacker.
+This really makes your programming go faster. Think about compiling code +for the following C/C++ expression: (a|b)*((x+1)/(y+1)). You could write a +function using LLVM that does exactly that, this way:
+
+Value*
+expression(BasicBlock*bb, Value* a, Value* b, Value* x, Value* y )
+{
+ Instruction* tail = bb->getTerminator();
+ ConstantSInt* one = ConstantSInt::get( Type::IntTy, 1);
+ BinaryOperator* or1 =
+ new BinaryOperator::create( Instruction::Or, a, b, "", tail );
+ BinaryOperator* add1 =
+ new BinaryOperator::create( Instruction::Add, x, one, "", tail );
+ BinaryOperator* add2 =
+ new BinaryOperator::create( Instruction::Add, y, one, "", tail );
+ BinaryOperator* div1 =
+ new BinaryOperator::create( Instruction::Div, add1, add2, "", tail);
+ BinaryOperator* mult1 =
+ new BinaryOperator::create( Instruction::Mul, or1, div1, "", tail );
+
+ return mult1;
+}
+
+"Okay, big deal," you say. It is a big deal. Here's why. Note that I didn't +have to tell this function which kinds of Values are being passed in. They could be +instructions, Constants, Global Variables, etc. Furthermore, if you specify Values +that are incorrect for this sequence of operations, LLVM will either notice right +away (at compilation time) or the LLVM Verifier will pick up the inconsistency +when the compiler runs. In no case will you make a type error that gets passed +through to the generated program. This really helps you write a compiler +that always generates correct code!
+
The second point is that we don't have to worry about branching, registers, +stack variables, saving partial results, etc. The instructions we create +are the values we use. Note that all that was created in the above +code is a Constant value and five operators. Each of the instructions is +the resulting value of that instruction.
+The lesson is this: SSA form is very powerful: there is no difference + between a value and the instruction that created it. This is fully +enforced by the LLVM IR. Use it to your best advantage.
+I had to learn about terminating blocks the hard way: using the debugger +to figure out what the LLVM verifier was trying to tell me and begging for +help on the LLVMdev mailing list. I hope you avoid this experience.
+Emblazon this rule in your mind:
+BasicBlocks in your compiler must be
+ terminated with a terminating instruction (branch, return, etc.).
+ Terminating instructions are a semantic requirement of the LLVM IR. There +is no facility for implicitly chaining together blocks placed into a function +in the order they occur. Indeed, in the general case, blocks will not be +added to the function in the order of execution because of the recursive +way compilers are written.
+Furthermore, if you don't terminate your blocks, your compiler code will +compile just fine. You won't find out about the problem until you're running +the compiler and the module you just created fails on the LLVM Verifier.
+After a little initial fumbling around, I quickly caught on to how blocks +should be constructed. The use of the standard template library really helps +simply the interface. In general, here's what I learned: +
insert_before argument. At first, I thought this was a mistake
+ because clearly the normal mode of inserting instructions would be one at
+ a time after some other instruction, not before. However,
+ if you hold on to your terminating instruction (or use the handy dandy
+ getTerminator() method on a BasicBlock), it can
+ always be used as the insert_before argument to your instruction
+ constructors. This causes the instruction to automatically be inserted in
+ the RightPlace&tm; place, just before the terminating instruction. The
+ nice thing about this design is that you can pass blocks around and insert
+ new instructions into them without ever known what instructions came
+ before. This makes for some very clean compiler design.The foregoing is such an important principal, its worth making an idiom:
+
+
+BasicBlock* bb = new BasicBlock();
+bb->getInstList().push_back( new Branch( ... ) );
+new Instruction(..., bb->getTerminator() );
+
+
+To make this clear, consider the typical if-then-else statement +(see StackerCompiler::handle_if() method). We can set this up +in a single function using LLVM in the following way:
+
+using namespace llvm;
+BasicBlock*
+MyCompiler::handle_if( BasicBlock* bb, SetCondInst* condition )
+{
+ // Create the blocks to contain code in the structure of if/then/else
+ BasicBlock* then = new BasicBlock();
+ BasicBlock* else = new BasicBlock();
+ BasicBlock* exit = new BasicBlock();
+
+ // Insert the branch instruction for the "if"
+ bb->getInstList().push_back( new BranchInst( then, else, condition ) );
+
+ // Set up the terminating instructions
+ then->getInstList().push_back( new BranchInst( exit ) );
+ else->getInstList().push_back( new BranchInst( exit ) );
+
+ // Fill in the then part .. details excised for brevity
+ this->fill_in( then );
+
+ // Fill in the else part .. details excised for brevity
+ this->fill_in( else );
+
+ // Return a block to the caller that can be filled in with the code
+ // that follows the if/then/else construct.
+ return exit;
+}
+
+Presumably in the foregoing, the calls to the "fill_in" method would add
+the instructions for the "then" and "else" parts. They would use the third part
+of the idiom almost exclusively (inserting new instructions before the
+terminator). Furthermore, they could even recurse back to handle_if
+should they encounter another if/then/else statement and it will all "just work".
+
+
Note how cleanly this all works out. In particular, the push_back methods on
+the BasicBlock's instruction list. These are lists of type
+Instruction which also happen to be Values. To create
+the "if" branch we merely instantiate a BranchInst that takes as
+arguments the blocks to branch to and the condition to branch on. The blocks
+act like branch labels! This new BranchInst terminates
+the BasicBlock provided as an argument. To give the caller a way
+to keep inserting after calling handle_if we create an "exit" block
+which is returned to the caller. Note that the "exit" block is used as the
+terminator for both the "then" and the "else" blocks. This gaurantees that no
+matter what else "handle_if" or "fill_in" does, they end up at the "exit" block.
+
+One of the first things I noticed is the frequent use of the "push_back" +method on the various lists. This is so common that it is worth mentioning. +The "push_back" inserts a value into an STL list, vector, array, etc. at the +end. The method might have also been named "insert_tail" or "append". +Althought I've used STL quite frequently, my use of push_back wasn't very +high in other programs. In LLVM, you'll use it all the time. +
++It took a little getting used to and several rounds of postings to the LLVM +mail list to wrap my head around this instruction correctly. Even though I had +read the Language Reference and Programmer's Manual a couple times each, I still +missed a few very key points: +
+This means that when you look up an element in the global variable (assuming +its a struct or array), you must deference the pointer first! For many +things, this leads to the idiom: +
+
+std::vector index_vector;
+index_vector.push_back( ConstantSInt::get( Type::LongTy, 0 );
+// ... push other indices ...
+GetElementPtrInst* gep = new GetElementPtrInst( ptr, index_vector );
+
+For example, suppose we have a global variable whose type is [24 x int]. The +variable itself represents a pointer to that array. To subscript the +array, we need two indices, not just one. The first index (0) dereferences the +pointer. The second index subscripts the array. If you're a "C" programmer, this +will run against your grain because you'll naturally think of the global array +variable and the address of its first element as the same. That tripped me up +for a while until I realized that they really do differ .. by type. +Remember that LLVM is a strongly typed language itself. Absolutely everything +has a type. The "type" of the global variable is [24 x int]*. That is, its +a pointer to an array of 24 ints. When you dereference that global variable with +a single index, you now have a " [24 x int]" type, the pointer is gone. Although +the pointer value of the dereferenced global and the address of the zero'th element +in the array will be the same, they differ in their type. The zero'th element has +type "int" while the pointer value has type "[24 x int]".
+Get this one aspect of LLVM right in your head and you'll save yourself +a lot of compiler writing headaches down the road.
+To be completed.
To be completed.
To be completed.
To be completed.
To be completed.
To be completed.
To be completed.
To be completed.
To be completed.
Linkage types in LLVM can be a little confusing, especially if your compiler +writing mind has affixed very hard concepts to particular words like "weak", +"external", "global", "linkonce", etc. LLVM does not use the precise +definitions of say ELF or GCC even though they share common terms. To be fair, +the concepts are related and similar but not precisely the same. This can lead +you to think you know what a linkage type represents but in fact it is slightly +different. I recommend you read the + Language Reference on this topic very +carefully.
+
Here are some handy tips that I discovered along the way:
++Constants in LLVM took a little getting used to until I discovered a few utility +functions in the LLVM IR that make things easier. Here's what I learned:
+The source code, test programs, and sample programs can all be found -under the LLVM "projects" directory. You will need to obtain the LLVM sources -to find it (either via anonymous CVS or a tarball. See the -Getting Started document).
-Under the "projects" directory there is a directory named "stacker". That -directory contains everything, as follows:
-The following fully documented program highlights many of features of both -the Stacker language and what is possible with LLVM. The program simply -prints out the prime numbers until it reaches +
The following fully documented program highlights many features of both +the Stacker language and what is possible with LLVM. The program has two modes +of operations. If you provide numeric arguments to the program, it checks to see +if those arguments are prime numbers, prints out the results. Without any +aruments, the program prints out any prime numbers it finds between 1 and one +million (there's a log of them!). The source code comments below tell the +remainder of the story.
-
################################################################################
#
# Brute force prime number generator
@@ -964,24 +1203,73 @@
ENDIF
0 ( push return code )
;
-]]>
-
To be completed.
This section is under construction. +
In the mean time, you can always read the code! It has comments!
+The source code, test programs, and sample programs can all be found +under the LLVM "projects" directory. You will need to obtain the LLVM sources +to find it (either via anonymous CVS or a tarball. See the +Getting Started document).
+Under the "projects" directory there is a directory named "stacker". That +directory contains everything, as follows:
+See projects/Stacker/lib/compiler/Lexer.l
+See projects/Stacker/lib/compiler/StackerParser.y
+See projects/Stacker/lib/compiler/StackerCompiler.cpp
+See projects/Stacker/lib/runtime/stacker_rt.c
+See projects/Stacker/tools/stkrc/stkrc.cpp
+See projects/Stacker/test/*.st
+