[LLVMdev] Stack roots and function parameters
jonathandeanharrop at googlemail.com
Sat Sep 25 05:45:42 CDT 2010
Forgive my top post but I hate Windows. J
I am surprised you (Talin) say that "we know conservative collectors work"
because my experience has very much been of them not working. Indeed, if you
have 400Mb of allocated heap blocks on a 32-bit machine is there not a 10%
chance of *each* random 32-bit int "pointing" into your heap, i.e. a false
positive? I just did a simple test here (create an array with 1M random
32-bit ints and then allocate blocks n=1..2^16 of size n bytes) and Boehm's
conservative GC sometimes takes 13s and other times runs out of memory after
several minutes. It even complains about the "Repeated allocation of very
large block (appr. size 65536)". Is 64kb really "very large" today? I think
Performance is certainly an important reason to implement an accurate
collector though. However, the performance of a conservative collector is
surely dominated by the cost of searching to see if each int "points" to
within any allocated block, a completely unnecessary operation that accurate
collectors simply do not perform. That overhead will dwarf any doubling up
of roots on the stack but the main problem with conservative collectors, as
I say, is not that they are slow but that they leak.
I agree that eliminating redundant roots is a desirable goal but I did only
the most basic things for HLVM and was very happy with the results. So I
think it is premature to value such optimizations. I personally would much
rather see a mature VM built on top of LLVM.
Here's another way to think about the issue: Compare this whole approach to
stack roots to that of a conservative collector. Since we know conservative
collectors work, the whole reason for making an accurate collector in the
first place is efficiency, right? If we couldn't make an accurate collector
that was more efficient than a conservative collector, then no one would
bother because it's such a pain.
However, looking strictly at stack frames, a conservative collector has a
much more compact stack layout. There's no need to copy anything to local
variables, because by the time you call into the collector proper,
everything - every register variable, every function parameter, and so on -
is already on the stack *somewhere*.
The stack frame layout dictated by llvm.gcroot, by contrast, is much, much
fatter. In a typical deeply-nested call stack, there will be many copies of
the same value on the stack. Let's take the 'this' pointer as an example.
Say we have some method that calls itself recursively, and we are currently
10 levels deep. That means that we have 20 copies of the 'this' pointer on
the stack: Because each invocation of the method has to push the 'this'
pointer on the stack as part of the calling protocol (even if it's passed in
registers, it still has to save the prior value of the register somewhere),
and in addition we also have to save the 'this' pointer as a stack root. A
conservative collector, on the other hand, would be able to get by with only
10 copies of the 'this' pointer.
As far as the issue of eliminating redundant stores, I'm pretty sure that
the optimizers cannot eliminate redundant roots. Figuring out the minimal
set of temporary roots needed for a function call is a non-trivial problem
(you don't want to just blindly make every pointer argument to a function a
temporary root, since that just bloats the size of the call frame
My entire goal here is to get a GC that has high performance and efficiency.
I want my GC to be able to run on everything from a data center to a beagle
board, and use the minimum resources possible.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the LLVMdev