[LLVMdev] GSoC Proposal: Table-Driven Decompilation

Charles Davis cdavis at mymail.mines.edu
Wed Apr 4 00:08:39 CDT 2012


Here's one of my proposals for GSoC 2012. What do you think?


Project Title: Table-Driven Decompilation

Over the years, the LLVM family has grown to include nearly every type of build tool in existence. One of the few missing is a decompiler. LLVM's TableGen tool could potentially accelerate development of such a tool; most backends already have the information needed to implement it. This project proposes implementing support for decompilation in LLVM using information gleaned from target description files. Such a decompiler could be used for analysis, optimization, and recompilation of machine code.

Since its humble beginnings in 2001, LLVM has grown from a simple compiler toolkit to an entire family of build tools. Currently, it includes an assembler, a disassembler, a JIT, a C compiler, a debugger, an archiver, various tools for analyzing object files, and even a linker. In fact, just about the only tool missing from this set (aside from various language compilers) is a decompiler--a tool to turn machine code back into LLVM IR. This project proposes adding such a tool.

Some of the information needed to produce such a tool is already present--in the form of target description files, some of which contain patterns used to transform LLVM IR--or, more accurately, a selection DAG--into machine code. Since a decompiler is largely a compiler working in reverse, it should be conceivable to use these patterns to transform machine code back into a selection DAG--and transform that, in turn, back into raw LLVM IR. To actually read machine code, the decompiler will use the MC disassembler to produce MCInst objects, which can be transformed back into CodeGen's MachineInstr representation, so it can be fed through the selection DAG in reverse.

Some DAG->machine code transformations aren't controlled by TableGen patterns, but by custom transformations implemented as C++ code. For those transformations, custom C++ code performing the reverse transformation will be necessary.

Proposed schedule:
Week 1-4: Work on MCInst->MachineInstr transformation.
Week 5: Work on TableGen backend to generate MachineInstr->SDAG tables.
Week 6-8: Work on support for custom MachineInstr->SDAG transformation.
Week 9-12: Work on SDAG->LLVM IR transformation.

Contact Information:
email: cdavis (at) mymail (dot) mines (dot) edu
phone: (seven-one-nine) 963-4781
IRC: cdavis5x at oftc, cdavis5x at freenode

Since Apple gutted PowerPC support from Mac OS 10.7, I have been searching for a way to run old PowerPC apps. While it would be easy to make my own recompiler manually using LLVM's JIT, I believe it would be far more useful to create a general framework for transforming machine code back to LLVM IR.

Use to LLVM:
Decomposing machine code back to LLVM IR could have potentially many uses within the LLVM community. Such a component would make building a recompiler--a tool that transforms machine code from one form to another--easier. In addition, it would allow clients to use the existing LLVM optimization and analysis passes on machine code, without the need to implement special tools to operate directly on MCInst objects.

I have been working with LLVM for over two and a half years. I implemented support for the force_align_arg_pointer attribute, as well as the beginnings of Win64 exception-handling support. In GSoC 2010, I added support for multiple C++ ABIs, and the beginnings of support for the Microsoft Visual C++ ABI. In addition, I have also contributed to the Wine project--some of which was making it easier to compile with Clang :).

More information about the LLVMdev mailing list