[llvm-commits] [PATCH] Allow SelectionDAGBuilder to reorder loads past stores
atrick at apple.com
Mon Apr 2 19:19:59 CDT 2012
On Mar 28, 2012, at 10:10 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> Andrew, et al.,
> I've attached a rebased patch. This is still a "for discussion purposes
> only" change (although I do use it internally), but I would very much
> appreciate it being reviewed for conceptual errors. All feedback, of
> course, is welcome.
I think your changes make sense, and if they work for you that's great. There's more to worry about when you want to push them upstream. Lets talk about memory alias support and list-ilp heuristics separately even though there is a performance relationship.
1) memory alias support
Conceptually your approach seems like it could work. When I tried your initial patch I saw some failures and didn't investigate further. It may not be a serious issue, but either way I'm not anxious to add complexity to the SelectionDAG. It's hard for me to say whether some code that I'm unfamiliar with makes assumptions about the SD structure. Long term, we want to simplify SelectionDAG scheduling. It should only need to serialize the DAG in order to respect physical register dependencies. Aggressive load/store reordering will be performed (eventually) in the MachineScheduler if needed.
We need alias analysis support in the MachineScheduler's DAG builder. In fact, it's so important that I think we should do it right. Although your patch is clearly an improvement over nothing, we want something more complete in MachineScheduler. It seems that your conservative approach can get lucky and do the right thing in narrow cases, but a single stray store could serialize all memory at the wrong point. Minimally, I think we need to track the alias sets within the current DAG, where each alias set is a set of AA::Locations occurring in the scheduling region that may alias. "unknown" locations get a special set. If we partition memory this way, and limit the memory partition into a small finite list of set, then we can treat each alias set as a separate root and efficiently build the DAG without forcing arbitrary serialization. There may be more clever ways to do this. The approach I described is a conceptual baseline that we need to surpass.
2) x86_64 list_ilp heuristics
Your improvements make at least as much sense as the existing heuristics, but you're building on an unstable set of hacks to begin with. Those hacks were tuned to handle certain benchmarks, so we have to carefully measure and reevaluate any changes in this area. It's time consuming.
It probably doesn't make sense to add these if we don't have AA support, so let's revisit the problem that you're solving when the MachineScheduler reaches that point.
Thanks, and good work!
> Thanks again,
> On Mon, 16 Jan 2012 18:18:43 -0600
> Hal Finkel <hfinkel at anl.gov> wrote:
>> I've attached an updated patch. It includes a new ILP scheduling
>> heuristic that I found useful (on some x86_64 machines), and some
>> other improvements over the previous version.
>> Regarding the partial-aliasing problem you had pointed out, I don't
>> think that can be an problem here. The reason is that the patch
>> maintains the following invariant: all currently pending loads and
>> stores are commutable (exchangeable). There is still a critical chain,
>> but instead of consisting of all stores, the chain now consists of all
>> loads or stores that are not commutable with some previous load or
>> store. As a result, I do not think that it is possible to queue a load
>> or store that would cause an "aliasing cycle" such as in your example.
>> Please let me know what you think.
>> Thanks again,
>> On Wed, 2012-01-11 at 22:16 -0600, Hal Finkel wrote:
>>> On Wed, 2012-01-11 at 18:45 -0800, Andrew Trick wrote:
>>>> On Dec 19, 2011, at 9:42 AM, Hal Finkel wrote:
>>>>> The current SelectionDAGBuilder does not allow loads to be
>>>>> reordered past stores, and does not allow stores to be
>>>>> reordered. This is a side effect of the way the critical chain
>>>>> is constructed: there is a queue of pending loads that is
>>>>> flushed (in parallel) to the root of the chain upon
>>>>> encountering any store (and that store is also appended to the
>>>>> root of the chain). Among other things, loop unrolling is far
>>>>> less effective than it otherwise could be.
>>>>> The attached patch allows SelectionDAGBuilder to use the
>>>>> available alias analysis to reorder independent loads and
>>>>> stores. It changes the queue of pending loads into a more
>>>>> general queue of pending memory operations, and flushes, in
>>>>> parallel, all potentially-conflicting loads and stores as
>>>> I haven't tried out your patch, so correct me if I'm wrong. I
>>>> don't understand how this can work with a simple alias analysis
>>>> interface like we have in LLVM.
>>>> ld A
>>>> st B
>>>> st C
>>>> mayAlias(A,B) = true
>>>> mayAlias(A,C) = true
>>>> mayAlias(B,C) = false
>>> This is a good point, your situation is something like this:
>>> | A |
>>> | B | C |
>>> and without strict aliasing rules, this could certainly be an issue.
>>>> So we have the chain Root->A->B and Root->C which may result in
>>>> the schedule:
>>>> st C
>>>> ld A
>>>> st B
>>>> It seems that we have to order all stores if we want them to
>>>> "cover" preceding loads. What am I missing?
>>> In the patch, aliasing loads and stores still set the current root
>>> to their combined token factor. As a result, the reordering will
>>> never be as severe as in your example; the patch is safe in this
>>> particular case because the DAG root will be set to A once B is
>>> visited and so C will also be forced to come after A. I'll need to
>>> give your point further thought, however, because maybe this kind
>>> of thing can bite in some other way.
>>> We may need to build an aliasing set per chain in order to check for
>>> this kind of thing.
>>> Thanks again,
> Hal Finkel
> Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory
More information about the llvm-commits