[LLVMdev] GSoC 2012 Proposal: Automatic GPGPU code generation for llvm
yabin.hwu at gmail.com
Mon Apr 2 09:16:05 CDT 2012
I am a phd student from Huazhong University of Sci&Tech, China. The
following is my GSoC 2012 proposal.
Comments are welcome!
*Title: Automatic GPGPU Code Generation for LLVM*
Very often, manually developing an GPGPU application is a time-consuming,
complex, error-prone and iterative process. In this project, I propose to
build an automatic GPGPU code generation framework for LLVM, based on two
successful LLVM (sub-)projects - Polly and PTX backend. This can be very
useful to ease the burden of the long learning curve of various GPU
With the broad proliferation of GPU computing, it is very important to
provide an easy and automatic tool to develop or port the applications to
GPU for normal developers, especially for those domain experts who want to
harness the huge computing power of GPU. Polly has implemented many
transformations, such as tiling, auto-vectorization and openmp code
generation. With the help of LLVM's PTX backend, I plan to extend Polly
with the feature of GPGPU code generation.
In this project, we target various parallel loops which can be described by
Polly's polyhedral model. We first translated the selected SCoPs (Static
Control Parts) into 4-depth loops with Polly's schedule optimization. Then
we extract the loop body (or inner non-parallel loops) into a LLVM
sub-function, tagged with PTX_Kernel or PTX_Device call convention. After
that, we use PTX backend to translate the subfunctions into a string of the
corresponding PTX codes. Finally, we provide an runtime library to generate
the executable program.
There are three key challenges in this project here.
1. How to get the optimal execution configure of GPU codes.
The execution configure is essential to the performance of the GPU codes.
It is limited by many factors, including hardware, source codes, register
usage, local store (device) usage, original memory access patterns and so
on. We must take all the staff into consideration.
2. How to automatically insert the synchronization codes.
This is very important to preserve the original semantics. We must detect
where we need insert them correctly.
3. How to automatically generate the memory copy operation between host and
We must transport the input data to GPU and copy the
results back. Fortunately, Polly has implemented a very expressive way to
describe memory access.
May 21 ~ June 3 preliminary code generation for 1-d and 2d parallel loops.
June 4 ~ June 11 code generation for parallel loops with non-parallel inner
June 11 ~ June 24 automatic memory copy insertions.
June 25 ~ July 8 auto-tuning for GPU execution configuration.
July 9 ~ July 15 Midterm evaluation and writing documents.
July 16 ~ July 22 automatic synchronization insertion.
July 23 ~ August 3 test on polybench benchmarks.
August 4 ~ August 12 summarize and complete the final documents.
I participated in several projects related to binary translation
(optimization) and run-time system. And I implemented a frontend for
numerical computing languages like octave/matlab, following the style of
clang. Recently, I work very close with Polly team to contribute some
patches and investigate lots of details about polyhedral transformation.
1. Tobias Grosser, Ragesh A. *Polly - First Successful Optimizations - How
to proceed?* LLVM Developer Meeting 2011.
2. Muthu Manikandan Baskaran, J. Ramanujam and P. Sadayappan.* **Automatic
C-to-CUDA Code Generation for Affine Programs*. CC 2010.
3. Soufiane Baghdadi, Armin Größlinger, and Albert Cohen. *Putting
Automatic Polyhedral Compilation for GPGPU to Work*. In Proc. of Compilers
for Parallel Computers (CPC), 2010.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the LLVMdev