123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323 |
- =======================================================
- Building a JIT: Starting out with KaleidoscopeJIT
- =======================================================
- .. contents::
- :local:
- Chapter 1 Introduction
- ======================
- **Warning: This tutorial is currently being updated to account for ORC API
- changes. Only Chapters 1 and 2 are up-to-date.**
- **Example code from Chapters 3 to 5 will compile and run, but has not been
- updated**
- Welcome to Chapter 1 of the "Building an ORC-based JIT in LLVM" tutorial. This
- tutorial runs through the implementation of a JIT compiler using LLVM's
- On-Request-Compilation (ORC) APIs. It begins with a simplified version of the
- KaleidoscopeJIT class used in the
- `Implementing a language with LLVM <LangImpl01.html>`_ tutorials and then
- introduces new features like concurrent compilation, optimization, lazy
- compilation and remote execution.
- The goal of this tutorial is to introduce you to LLVM's ORC JIT APIs, show how
- these APIs interact with other parts of LLVM, and to teach you how to recombine
- them to build a custom JIT that is suited to your use-case.
- The structure of the tutorial is:
- - Chapter #1: Investigate the simple KaleidoscopeJIT class. This will
- introduce some of the basic concepts of the ORC JIT APIs, including the
- idea of an ORC *Layer*.
- - `Chapter #2 <BuildingAJIT2.html>`_: Extend the basic KaleidoscopeJIT by adding
- a new layer that will optimize IR and generated code.
- - `Chapter #3 <BuildingAJIT3.html>`_: Further extend the JIT by adding a
- Compile-On-Demand layer to lazily compile IR.
- - `Chapter #4 <BuildingAJIT4.html>`_: Improve the laziness of our JIT by
- replacing the Compile-On-Demand layer with a custom layer that uses the ORC
- Compile Callbacks API directly to defer IR-generation until functions are
- called.
- - `Chapter #5 <BuildingAJIT5.html>`_: Add process isolation by JITing code into
- a remote process with reduced privileges using the JIT Remote APIs.
- To provide input for our JIT we will use a lightly modified version of the
- Kaleidoscope REPL from `Chapter 7 <LangImpl07.html>`_ of the "Implementing a
- language in LLVM tutorial".
- Finally, a word on API generations: ORC is the 3rd generation of LLVM JIT API.
- It was preceded by MCJIT, and before that by the (now deleted) legacy JIT.
- These tutorials don't assume any experience with these earlier APIs, but
- readers acquainted with them will see many familiar elements. Where appropriate
- we will make this connection with the earlier APIs explicit to help people who
- are transitioning from them to ORC.
- JIT API Basics
- ==============
- The purpose of a JIT compiler is to compile code "on-the-fly" as it is needed,
- rather than compiling whole programs to disk ahead of time as a traditional
- compiler does. To support that aim our initial, bare-bones JIT API will have
- just two functions:
- 1. ``Error addModule(std::unique_ptr<Module> M)``: Make the given IR module
- available for execution.
- 2. ``Expected<JITEvaluatedSymbol> lookup()``: Search for pointers to
- symbols (functions or variables) that have been added to the JIT.
- A basic use-case for this API, executing the 'main' function from a module,
- will look like:
- .. code-block:: c++
- JIT J;
- J.addModule(buildModule());
- auto *Main = (int(*)(int, char*[]))J.lookup("main").getAddress();
- int Result = Main();
- The APIs that we build in these tutorials will all be variations on this simple
- theme. Behind this API we will refine the implementation of the JIT to add
- support for concurrent compilation, optimization and lazy compilation.
- Eventually we will extend the API itself to allow higher-level program
- representations (e.g. ASTs) to be added to the JIT.
- KaleidoscopeJIT
- ===============
- In the previous section we described our API, now we examine a simple
- implementation of it: The KaleidoscopeJIT class [1]_ that was used in the
- `Implementing a language with LLVM <LangImpl01.html>`_ tutorials. We will use
- the REPL code from `Chapter 7 <LangImpl07.html>`_ of that tutorial to supply the
- input for our JIT: Each time the user enters an expression the REPL will add a
- new IR module containing the code for that expression to the JIT. If the
- expression is a top-level expression like '1+1' or 'sin(x)', the REPL will also
- use the lookup method of our JIT class find and execute the code for the
- expression. In later chapters of this tutorial we will modify the REPL to enable
- new interactions with our JIT class, but for now we will take this setup for
- granted and focus our attention on the implementation of our JIT itself.
- Our KaleidoscopeJIT class is defined in the KaleidoscopeJIT.h header. After the
- usual include guards and #includes [2]_, we get to the definition of our class:
- .. code-block:: c++
- #ifndef LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H
- #define LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H
- #include "llvm/ADT/StringRef.h"
- #include "llvm/ExecutionEngine/JITSymbol.h"
- #include "llvm/ExecutionEngine/Orc/CompileUtils.h"
- #include "llvm/ExecutionEngine/Orc/Core.h"
- #include "llvm/ExecutionEngine/Orc/ExecutionUtils.h"
- #include "llvm/ExecutionEngine/Orc/IRCompileLayer.h"
- #include "llvm/ExecutionEngine/Orc/JITTargetMachineBuilder.h"
- #include "llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h"
- #include "llvm/ExecutionEngine/SectionMemoryManager.h"
- #include "llvm/IR/DataLayout.h"
- #include "llvm/IR/LLVMContext.h"
- #include <memory>
- namespace llvm {
- namespace orc {
- class KaleidoscopeJIT {
- private:
- ExecutionSession ES;
- RTDyldObjectLinkingLayer ObjectLayer;
- IRCompileLayer CompileLayer;
- DataLayout DL;
- MangleAndInterner Mangle;
- ThreadSafeContext Ctx;
- public:
- KaleidoscopeJIT(JITTargetMachineBuilder JTMB, DataLayout DL)
- : ObjectLayer(ES,
- []() { return std::make_unique<SectionMemoryManager>(); }),
- CompileLayer(ES, ObjectLayer, ConcurrentIRCompiler(std::move(JTMB))),
- DL(std::move(DL)), Mangle(ES, this->DL),
- Ctx(std::make_unique<LLVMContext>()) {
- ES.getMainJITDylib().setGenerator(
- cantFail(DynamicLibrarySearchGenerator::GetForCurrentProcess(DL)));
- }
- Our class begins with six member variables: An ExecutionSession member, ``ES``,
- which provides context for our running JIT'd code (including the string pool,
- global mutex, and error reporting facilities); An RTDyldObjectLinkingLayer,
- ``ObjectLayer``, that can be used to add object files to our JIT (though we will
- not use it directly); An IRCompileLayer, ``CompileLayer``, that can be used to
- add LLVM Modules to our JIT (and which builds on the ObjectLayer), A DataLayout
- and MangleAndInterner, ``DL`` and ``Mangle``, that will be used for symbol mangling
- (more on that later); and finally an LLVMContext that clients will use when
- building IR files for the JIT.
- Next up we have our class constructor, which takes a `JITTargetMachineBuilder``
- that will be used by our IRCompiler, and a ``DataLayout`` that we will use to
- initialize our DL member. The constructor begins by initializing our
- ObjectLayer. The ObjectLayer requires a reference to the ExecutionSession, and
- a function object that will build a JIT memory manager for each module that is
- added (a JIT memory manager manages memory allocations, memory permissions, and
- registration of exception handlers for JIT'd code). For this we use a lambda
- that returns a SectionMemoryManager, an off-the-shelf utility that provides all
- the basic memory management functionality required for this chapter. Next we
- initialize our CompileLayer. The CompileLayer needs three things: (1) A
- reference to the ExecutionSession, (2) A reference to our object layer, and (3)
- a compiler instance to use to perform the actual compilation from IR to object
- files. We use the off-the-shelf ConcurrentIRCompiler utility as our compiler,
- which we construct using this constructor's JITTargetMachineBuilder argument.
- The ConcurrentIRCompiler utility will use the JITTargetMachineBuilder to build
- llvm TargetMachines (which are not thread safe) as needed for compiles. After
- this, we initialize our supporting members: ``DL``, ``Mangler`` and ``Ctx`` with
- the input DataLayout, the ExecutionSession and DL member, and a new default
- constucted LLVMContext respectively. Now that our members have been initialized,
- so the one thing that remains to do is to tweak the configuration of the
- *JITDylib* that we will store our code in. We want to modify this dylib to
- contain not only the symbols that we add to it, but also the symbols from our
- REPL process as well. We do this by attaching a
- ``DynamicLibrarySearchGenerator`` instance using the
- ``DynamicLibrarySearchGenerator::GetForCurrentProcess`` method.
- .. code-block:: c++
- static Expected<std::unique_ptr<KaleidoscopeJIT>> Create() {
- auto JTMB = JITTargetMachineBuilder::detectHost();
- if (!JTMB)
- return JTMB.takeError();
- auto DL = JTMB->getDefaultDataLayoutForTarget();
- if (!DL)
- return DL.takeError();
- return std::make_unique<KaleidoscopeJIT>(std::move(*JTMB), std::move(*DL));
- }
- const DataLayout &getDataLayout() const { return DL; }
- LLVMContext &getContext() { return *Ctx.getContext(); }
- Next we have a named constructor, ``Create``, which will build a KaleidoscopeJIT
- instance that is configured to generate code for our host process. It does this
- by first generating a JITTargetMachineBuilder instance using that clases's
- detectHost method and then using that instance to generate a datalayout for
- the target process. Each of these operations can fail, so each returns its
- result wrapped in an Expected value [3]_ that we must check for error before
- continuing. If both operations succeed we can unwrap their results (using the
- dereference operator) and pass them into KaleidoscopeJIT's constructor on the
- last line of the function.
- Following the named constructor we have the ``getDataLayout()`` and
- ``getContext()`` methods. These are used to make data structures created and
- managed by the JIT (especially the LLVMContext) available to the REPL code that
- will build our IR modules.
- .. code-block:: c++
- void addModule(std::unique_ptr<Module> M) {
- cantFail(CompileLayer.add(ES.getMainJITDylib(),
- ThreadSafeModule(std::move(M), Ctx)));
- }
- Expected<JITEvaluatedSymbol> lookup(StringRef Name) {
- return ES.lookup({&ES.getMainJITDylib()}, Mangle(Name.str()));
- }
- Now we come to the first of our JIT API methods: addModule. This method is
- responsible for adding IR to the JIT and making it available for execution. In
- this initial implementation of our JIT we will make our modules "available for
- execution" by adding them to the CompileLayer, which will it turn store the
- Module in the main JITDylib. This process will create new symbol table entries
- in the JITDylib for each definition in the module, and will defer compilation of
- the module until any of its definitions is looked up. Note that this is not lazy
- compilation: just referencing a definition, even if it is never used, will be
- enough to trigger compilation. In later chapters we will teach our JIT to defer
- compilation of functions until they're actually called. To add our Module we
- must first wrap it in a ThreadSafeModule instance, which manages the lifetime of
- the Module's LLVMContext (our Ctx member) in a thread-friendly way. In our
- example, all modules will share the Ctx member, which will exist for the
- duration of the JIT. Once we switch to concurrent compilation in later chapters
- we will use a new context per module.
- Our last method is ``lookup``, which allows us to look up addresses for
- function and variable definitions added to the JIT based on their symbol names.
- As noted above, lookup will implicitly trigger compilation for any symbol
- that has not already been compiled. Our lookup method calls through to
- `ExecutionSession::lookup`, passing in a list of dylibs to search (in our case
- just the main dylib), and the symbol name to search for, with a twist: We have
- to *mangle* the name of the symbol we're searching for first. The ORC JIT
- components use mangled symbols internally the same way a static compiler and
- linker would, rather than using plain IR symbol names. This allows JIT'd code
- to interoperate easily with precompiled code in the application or shared
- libraries. The kind of mangling will depend on the DataLayout, which in turn
- depends on the target platform. To allow us to remain portable and search based
- on the un-mangled name, we just re-produce this mangling ourselves using our
- ``Mangle`` member function object.
- This brings us to the end of Chapter 1 of Building a JIT. You now have a basic
- but fully functioning JIT stack that you can use to take LLVM IR and make it
- executable within the context of your JIT process. In the next chapter we'll
- look at how to extend this JIT to produce better quality code, and in the
- process take a deeper look at the ORC layer concept.
- `Next: Extending the KaleidoscopeJIT <BuildingAJIT2.html>`_
- Full Code Listing
- =================
- Here is the complete code listing for our running example. To build this
- example, use:
- .. code-block:: bash
- # Compile
- clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy
- # Run
- ./toy
- Here is the code:
- .. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter1/KaleidoscopeJIT.h
- :language: c++
- .. [1] Actually we use a cut-down version of KaleidoscopeJIT that makes a
- simplifying assumption: symbols cannot be re-defined. This will make it
- impossible to re-define symbols in the REPL, but will make our symbol
- lookup logic simpler. Re-introducing support for symbol redefinition is
- left as an exercise for the reader. (The KaleidoscopeJIT.h used in the
- original tutorials will be a helpful reference).
- .. [2] +-----------------------------+-----------------------------------------------+
- | File | Reason for inclusion |
- +=============================+===============================================+
- | JITSymbol.h | Defines the lookup result type |
- | | JITEvaluatedSymbol |
- +-----------------------------+-----------------------------------------------+
- | CompileUtils.h | Provides the SimpleCompiler class. |
- +-----------------------------+-----------------------------------------------+
- | Core.h | Core utilities such as ExecutionSession and |
- | | JITDylib. |
- +-----------------------------+-----------------------------------------------+
- | ExecutionUtils.h | Provides the DynamicLibrarySearchGenerator |
- | | class. |
- +-----------------------------+-----------------------------------------------+
- | IRCompileLayer.h | Provides the IRCompileLayer class. |
- +-----------------------------+-----------------------------------------------+
- | JITTargetMachineBuilder.h | Provides the JITTargetMachineBuilder class. |
- +-----------------------------+-----------------------------------------------+
- | RTDyldObjectLinkingLayer.h | Provides the RTDyldObjectLinkingLayer class. |
- +-----------------------------+-----------------------------------------------+
- | SectionMemoryManager.h | Provides the SectionMemoryManager class. |
- +-----------------------------+-----------------------------------------------+
- | DataLayout.h | Provides the DataLayout class. |
- +-----------------------------+-----------------------------------------------+
- | LLVMContext.h | Provides the LLVMContext class. |
- +-----------------------------+-----------------------------------------------+
- .. [3] See the ErrorHandling section in the LLVM Programmer's Manual
- (http://llvm.org/docs/ProgrammersManual.html#error-handling)
|