Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM-BAP Translation Tool #57

Open
ChaosData opened this issue Sep 2, 2013 · 4 comments
Open

LLVM-BAP Translation Tool #57

ChaosData opened this issue Sep 2, 2013 · 4 comments

Comments

@ChaosData
Copy link
Member

This would be a tool to translate between the Binary Analysis Platform Intermediate Language and the LLVM Intermediate Representation. This would allow for using LLVM tools on a correct (including side-effects) lifting from x86 binaries and allow use of the BAP optimizers and analysis tools on LLVM code.

http://bap.ece.cmu.edu/
http://llvm.org/releases/3.3/docs/LangRef.html

@ChaosData
Copy link
Member Author

Notes from the Trenches

Below you will find a bit of a rant that hopefully provides enough useful
information to safely throw you into the deep end of BAP to LLVM shenanigans.

My original stepping stone goal, which appeared simple at first, was to take
the CMU Binary Analysis Platform IL of a hello world
binary, translate it into LLVM IR, and get it to run on the LLVM interpreter
(lli). The following are notes and important details I've noticed while trying
to get this to work.

Note: When I was working on this, the newest version of BAP at the time was
BAP v0.6, since then BAP v0.7 has been released.

New in BAP 0.7:
* New function identification heuristics in get_functions for stripped binaries
* Serialized output formats for easy parsing of BIL outside of BAP
* Support for ocaml 4.00 (see INSTALL)
* Support for OS X as a host platform (see INSTALL)
* New support for streaming symbolic execution of traces
* New VC framework
* New VC implementations: FWP and PWP
* Misc. bug fixes and performance improvements to SMT printers
* Misc. improvements to x86 lifting
* Steensgard loop nesting forest algorithm
* Improved loop unrolling for irreducible loops using Steensgard's algorithm

Originally, when I downloaded BAP, I noticed that it did in fact have a feature
to translate BIL into LLVM IR. Unfortunately, It was based on LLVM 2.9 and
didn't work with LLVM 3.0+. When I installed the older LLVM for compatability,
it didn't work insofar as to me my original goal. The LLVM 2.9 lli tool rejected
the LLVM IR that BAP produced.

While one of the first things you might want to do is read the LLVM language
reference manual
, there are
some important things about BAP and LLVM that should be known before trying
to just jump in:

Note: Changes to the BIL format/changes in v0.7 may have rendered some of
this obsolete.

  • _BAP is 32-bit only right now._ While it's possible to run BAP on x86_64, it
    only works for 32-bit x86 binaries. So you're going to want to run this in a
    32-bit VM.
  • If you want BAP to properly lift instructions to BIL, you're going to have
    to give it the proper offsets so it can start at the instruction you want it
    to start at.
  • BAP doesn't really attempt to understand calls or anything, which would
    involve having some understanding of binary formats like ELF. This means that
    all calls are simply to memory locations and you will need to do further
    parsing to determine what calls are being made.

The last point is particularly important in regards to LLVM. This is because
LLVM is not an assembly language, but is essentially a compiler IR wrapper
around a libc implementation. So an LLVM-based compiler would not just generate
code containing raw system calls but instead would rely upon a system's libc
implementation which would have system call stubs/wrapper functions. Due to
this while BAP itself might have issues with lifting raw system calls (I
haven't tested it), LLVM IR needs quite a bit more information to do a call
than just the raw memory address that BAP will return.

So you still want to do this? Cool

Remember that if you want to do anything meaningful with LLVM and BAP, BIL
alone is not going to cut it and you're probably going to need to do additional
analysis of the target binary. Hopefully scripting up readelf will suffice for
most things.

The first thing I would recommend doing is diging into the LLVM IR, learning
the LLVM tools and writing some basic LLVM IR code and running it via lli or
compiling it to native code.

Only after you have done the above should you start playing with BAP.

@darwinyip darwinyip assigned darwinyip and unassigned darwinyip Apr 23, 2014
@sdconsta
Copy link

Did anything ever come of this? Is there a way to translate from BIL to LLVM IR? If not, what were the obstacles?

@ChaosData
Copy link
Member Author

ChaosData commented May 2, 2017

  1. Nope.
  2. Should totally be do-able, see dump program in LLVM IR BinaryAnalysisPlatform/bap#575 for BAP's own issue tracking this. But https://github.com/trailofbits/mcsema is probably worth using if you can (it requires IDA though).
  3. BAP now seems to be a lot less ghetto than back then, but I mostly just never got the time to focus on learning BAP/BIL and LLVM IR to do it. I also got semi-stuck on figuring out how to represent data/symbols especially for external/dynamic functions (e.g. listed in the PLT).

@ColdHeat
Copy link

ColdHeat commented May 2, 2017

is that the one they call jefferson? is it truly him? back from the depths of nccgroup?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants