123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517 |
- ===================================
- Stack maps and patch points in LLVM
- ===================================
- .. contents::
- :local:
- :depth: 2
- Definitions
- ===========
- In this document we refer to the "runtime" collectively as all
- components that serve as the LLVM client, including the LLVM IR
- generator, object code consumer, and code patcher.
- A stack map records the location of ``live values`` at a particular
- instruction address. These ``live values`` do not refer to all the
- LLVM values live across the stack map. Instead, they are only the
- values that the runtime requires to be live at this point. For
- example, they may be the values the runtime will need to resume
- program execution at that point independent of the compiled function
- containing the stack map.
- LLVM emits stack map data into the object code within a designated
- :ref:`stackmap-section`. This stack map data contains a record for
- each stack map. The record stores the stack map's instruction address
- and contains a entry for each mapped value. Each entry encodes a
- value's location as a register, stack offset, or constant.
- A patch point is an instruction address at which space is reserved for
- patching a new instruction sequence at run time. Patch points look
- much like calls to LLVM. They take arguments that follow a calling
- convention and may return a value. They also imply stack map
- generation, which allows the runtime to locate the patchpoint and
- find the location of ``live values`` at that point.
- Motivation
- ==========
- This functionality is currently experimental but is potentially useful
- in a variety of settings, the most obvious being a runtime (JIT)
- compiler. Example applications of the patchpoint intrinsics are
- implementing an inline call cache for polymorphic method dispatch or
- optimizing the retrieval of properties in dynamically typed languages
- such as JavaScript.
- The intrinsics documented here are currently used by the JavaScript
- compiler within the open source WebKit project, see the `FTL JIT
- <https://trac.webkit.org/wiki/FTLJIT>`_, but they are designed to be
- used whenever stack maps or code patching are needed. Because the
- intrinsics have experimental status, compatibility across LLVM
- releases is not guaranteed.
- The stack map functionality described in this document is separate
- from the functionality described in
- :ref:`stack-map`. `GCFunctionMetadata` provides the location of
- pointers into a collected heap captured by the `GCRoot` intrinsic,
- which can also be considered a "stack map". Unlike the stack maps
- defined above, the `GCFunctionMetadata` stack map interface does not
- provide a way to associate live register values of arbitrary type with
- an instruction address, nor does it specify a format for the resulting
- stack map. The stack maps described here could potentially provide
- richer information to a garbage collecting runtime, but that usage
- will not be discussed in this document.
- Intrinsics
- ==========
- The following two kinds of intrinsics can be used to implement stack
- maps and patch points: ``llvm.experimental.stackmap`` and
- ``llvm.experimental.patchpoint``. Both kinds of intrinsics generate a
- stack map record, and they both allow some form of code patching. They
- can be used independently (i.e. ``llvm.experimental.patchpoint``
- implicitly generates a stack map without the need for an additional
- call to ``llvm.experimental.stackmap``). The choice of which to use
- depends on whether it is necessary to reserve space for code patching
- and whether any of the intrinsic arguments should be lowered according
- to calling conventions. ``llvm.experimental.stackmap`` does not
- reserve any space, nor does it expect any call arguments. If the
- runtime patches code at the stack map's address, it will destructively
- overwrite the program text. This is unlike
- ``llvm.experimental.patchpoint``, which reserves space for in-place
- patching without overwriting surrounding code. The
- ``llvm.experimental.patchpoint`` intrinsic also lowers a specified
- number of arguments according to its calling convention. This allows
- patched code to make in-place function calls without marshaling.
- Each instance of one of these intrinsics generates a stack map record
- in the :ref:`stackmap-section`. The record includes an ID, allowing
- the runtime to uniquely identify the stack map, and the offset within
- the code from the beginning of the enclosing function.
- '``llvm.experimental.stackmap``' Intrinsic
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Syntax:
- """""""
- ::
- declare void
- @llvm.experimental.stackmap(i64 <id>, i32 <numShadowBytes>, ...)
- Overview:
- """""""""
- The '``llvm.experimental.stackmap``' intrinsic records the location of
- specified values in the stack map without generating any code.
- Operands:
- """""""""
- The first operand is an ID to be encoded within the stack map. The
- second operand is the number of shadow bytes following the
- intrinsic. The variable number of operands that follow are the ``live
- values`` for which locations will be recorded in the stack map.
- To use this intrinsic as a bare-bones stack map, with no code patching
- support, the number of shadow bytes can be set to zero.
- Semantics:
- """"""""""
- The stack map intrinsic generates no code in place, unless nops are
- needed to cover its shadow (see below). However, its offset from
- function entry is stored in the stack map. This is the relative
- instruction address immediately following the instructions that
- precede the stack map.
- The stack map ID allows a runtime to locate the desired stack map
- record. LLVM passes this ID through directly to the stack map
- record without checking uniqueness.
- LLVM guarantees a shadow of instructions following the stack map's
- instruction offset during which neither the end of the basic block nor
- another call to ``llvm.experimental.stackmap`` or
- ``llvm.experimental.patchpoint`` may occur. This allows the runtime to
- patch the code at this point in response to an event triggered from
- outside the code. The code for instructions following the stack map
- may be emitted in the stack map's shadow, and these instructions may
- be overwritten by destructive patching. Without shadow bytes, this
- destructive patching could overwrite program text or data outside the
- current function. We disallow overlapping stack map shadows so that
- the runtime does not need to consider this corner case.
- For example, a stack map with 8 byte shadow:
- .. code-block:: llvm
- call void @runtime()
- call void (i64, i32, ...)* @llvm.experimental.stackmap(i64 77, i32 8,
- i64* %ptr)
- %val = load i64* %ptr
- %add = add i64 %val, 3
- ret i64 %add
- May require one byte of nop-padding:
- .. code-block:: none
- 0x00 callq _runtime
- 0x05 nop <--- stack map address
- 0x06 movq (%rdi), %rax
- 0x07 addq $3, %rax
- 0x0a popq %rdx
- 0x0b ret <---- end of 8-byte shadow
- Now, if the runtime needs to invalidate the compiled code, it may
- patch 8 bytes of code at the stack map's address at follows:
- .. code-block:: none
- 0x00 callq _runtime
- 0x05 movl $0xffff, %rax <--- patched code at stack map address
- 0x0a callq *%rax <---- end of 8-byte shadow
- This way, after the normal call to the runtime returns, the code will
- execute a patched call to a special entry point that can rebuild a
- stack frame from the values located by the stack map.
- '``llvm.experimental.patchpoint.*``' Intrinsic
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Syntax:
- """""""
- ::
- declare void
- @llvm.experimental.patchpoint.void(i64 <id>, i32 <numBytes>,
- i8* <target>, i32 <numArgs>, ...)
- declare i64
- @llvm.experimental.patchpoint.i64(i64 <id>, i32 <numBytes>,
- i8* <target>, i32 <numArgs>, ...)
- Overview:
- """""""""
- The '``llvm.experimental.patchpoint.*``' intrinsics creates a function
- call to the specified ``<target>`` and records the location of specified
- values in the stack map.
- Operands:
- """""""""
- The first operand is an ID, the second operand is the number of bytes
- reserved for the patchable region, the third operand is the target
- address of a function (optionally null), and the fourth operand
- specifies how many of the following variable operands are considered
- function call arguments. The remaining variable number of operands are
- the ``live values`` for which locations will be recorded in the stack
- map.
- Semantics:
- """"""""""
- The patch point intrinsic generates a stack map. It also emits a
- function call to the address specified by ``<target>`` if the address
- is not a constant null. The function call and its arguments are
- lowered according to the calling convention specified at the
- intrinsic's callsite. Variants of the intrinsic with non-void return
- type also return a value according to calling convention.
- On PowerPC, note that ``<target>`` must be the ABI function pointer for the
- intended target of the indirect call. Specifically, when compiling for the
- ELF V1 ABI, ``<target>`` is the function-descriptor address normally used as
- the C/C++ function-pointer representation.
- Requesting zero patch point arguments is valid. In this case, all
- variable operands are handled just like
- ``llvm.experimental.stackmap.*``. The difference is that space will
- still be reserved for patching, a call will be emitted, and a return
- value is allowed.
- The location of the arguments are not normally recorded in the stack
- map because they are already fixed by the calling convention. The
- remaining ``live values`` will have their location recorded, which
- could be a register, stack location, or constant. A special calling
- convention has been introduced for use with stack maps, anyregcc,
- which forces the arguments to be loaded into registers but allows
- those register to be dynamically allocated. These argument registers
- will have their register locations recorded in the stack map in
- addition to the remaining ``live values``.
- The patch point also emits nops to cover at least ``<numBytes>`` of
- instruction encoding space. Hence, the client must ensure that
- ``<numBytes>`` is enough to encode a call to the target address on the
- supported targets. If the call target is constant null, then there is
- no minimum requirement. A zero-byte null target patchpoint is
- valid.
- The runtime may patch the code emitted for the patch point, including
- the call sequence and nops. However, the runtime may not assume
- anything about the code LLVM emits within the reserved space. Partial
- patching is not allowed. The runtime must patch all reserved bytes,
- padding with nops if necessary.
- This example shows a patch point reserving 15 bytes, with one argument
- in $rdi, and a return value in $rax per native calling convention:
- .. code-block:: llvm
- %target = inttoptr i64 -281474976710654 to i8*
- %val = call i64 (i64, i32, ...)*
- @llvm.experimental.patchpoint.i64(i64 78, i32 15,
- i8* %target, i32 1, i64* %ptr)
- %add = add i64 %val, 3
- ret i64 %add
- May generate:
- .. code-block:: none
- 0x00 movabsq $0xffff000000000002, %r11 <--- patch point address
- 0x0a callq *%r11
- 0x0d nop
- 0x0e nop <--- end of reserved 15-bytes
- 0x0f addq $0x3, %rax
- 0x10 movl %rax, 8(%rsp)
- Note that no stack map locations will be recorded. If the patched code
- sequence does not need arguments fixed to specific calling convention
- registers, then the ``anyregcc`` convention may be used:
- .. code-block:: none
- %val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15,
- i8* %target, i32 1,
- i64* %ptr)
- The stack map now indicates the location of the %ptr argument and
- return value:
- .. code-block:: none
- Stack Map: ID=78, Loc0=%r9 Loc1=%r8
- The patch code sequence may now use the argument that happened to be
- allocated in %r8 and return a value allocated in %r9:
- .. code-block:: none
- 0x00 movslq 4(%r8) %r9 <--- patched code at patch point address
- 0x03 nop
- ...
- 0x0e nop <--- end of reserved 15-bytes
- 0x0f addq $0x3, %r9
- 0x10 movl %r9, 8(%rsp)
- .. _stackmap-format:
- Stack Map Format
- ================
- The existence of a stack map or patch point intrinsic within an LLVM
- Module forces code emission to create a :ref:`stackmap-section`. The
- format of this section follows:
- .. code-block:: none
- Header {
- uint8 : Stack Map Version (current version is 3)
- uint8 : Reserved (expected to be 0)
- uint16 : Reserved (expected to be 0)
- }
- uint32 : NumFunctions
- uint32 : NumConstants
- uint32 : NumRecords
- StkSizeRecord[NumFunctions] {
- uint64 : Function Address
- uint64 : Stack Size
- uint64 : Record Count
- }
- Constants[NumConstants] {
- uint64 : LargeConstant
- }
- StkMapRecord[NumRecords] {
- uint64 : PatchPoint ID
- uint32 : Instruction Offset
- uint16 : Reserved (record flags)
- uint16 : NumLocations
- Location[NumLocations] {
- uint8 : Register | Direct | Indirect | Constant | ConstantIndex
- uint8 : Reserved (expected to be 0)
- uint16 : Location Size
- uint16 : Dwarf RegNum
- uint16 : Reserved (expected to be 0)
- int32 : Offset or SmallConstant
- }
- uint32 : Padding (only if required to align to 8 byte)
- uint16 : Padding
- uint16 : NumLiveOuts
- LiveOuts[NumLiveOuts]
- uint16 : Dwarf RegNum
- uint8 : Reserved
- uint8 : Size in Bytes
- }
- uint32 : Padding (only if required to align to 8 byte)
- }
- The first byte of each location encodes a type that indicates how to
- interpret the ``RegNum`` and ``Offset`` fields as follows:
- ======== ========== =================== ===========================
- Encoding Type Value Description
- -------- ---------- ------------------- ---------------------------
- 0x1 Register Reg Value in a register
- 0x2 Direct Reg + Offset Frame index value
- 0x3 Indirect [Reg + Offset] Spilled value
- 0x4 Constant Offset Small constant
- 0x5 ConstIndex Constants[Offset] Large constant
- ======== ========== =================== ===========================
- In the common case, a value is available in a register, and the
- ``Offset`` field will be zero. Values spilled to the stack are encoded
- as ``Indirect`` locations. The runtime must load those values from a
- stack address, typically in the form ``[BP + Offset]``. If an
- ``alloca`` value is passed directly to a stack map intrinsic, then
- LLVM may fold the frame index into the stack map as an optimization to
- avoid allocating a register or stack slot. These frame indices will be
- encoded as ``Direct`` locations in the form ``BP + Offset``. LLVM may
- also optimize constants by emitting them directly in the stack map,
- either in the ``Offset`` of a ``Constant`` location or in the constant
- pool, referred to by ``ConstantIndex`` locations.
- At each callsite, a "liveout" register list is also recorded. These
- are the registers that are live across the stackmap and therefore must
- be saved by the runtime. This is an important optimization when the
- patchpoint intrinsic is used with a calling convention that by default
- preserves most registers as callee-save.
- Each entry in the liveout register list contains a DWARF register
- number and size in bytes. The stackmap format deliberately omits
- specific subregister information. Instead the runtime must interpret
- this information conservatively. For example, if the stackmap reports
- one byte at ``%rax``, then the value may be in either ``%al`` or
- ``%ah``. It doesn't matter in practice, because the runtime will
- simply save ``%rax``. However, if the stackmap reports 16 bytes at
- ``%ymm0``, then the runtime can safely optimize by saving only
- ``%xmm0``.
- The stack map format is a contract between an LLVM SVN revision and
- the runtime. It is currently experimental and may change in the short
- term, but minimizing the need to update the runtime is
- important. Consequently, the stack map design is motivated by
- simplicity and extensibility. Compactness of the representation is
- secondary because the runtime is expected to parse the data
- immediately after compiling a module and encode the information in its
- own format. Since the runtime controls the allocation of sections, it
- can reuse the same stack map space for multiple modules.
- Stackmap support is currently only implemented for 64-bit
- platforms. However, a 32-bit implementation should be able to use the
- same format with an insignificant amount of wasted space.
- .. _stackmap-section:
- Stack Map Section
- ^^^^^^^^^^^^^^^^^
- A JIT compiler can easily access this section by providing its own
- memory manager via the LLVM C API
- ``LLVMCreateSimpleMCJITMemoryManager()``. When creating the memory
- manager, the JIT provides a callback:
- ``LLVMMemoryManagerAllocateDataSectionCallback()``. When LLVM creates
- this section, it invokes the callback and passes the section name. The
- JIT can record the in-memory address of the section at this time and
- later parse it to recover the stack map data.
- For MachO (e.g. on Darwin), the stack map section name is
- "__llvm_stackmaps". The segment name is "__LLVM_STACKMAPS".
- For ELF (e.g. on Linux), the stack map section name is
- ".llvm_stackmaps". The segment name is "__LLVM_STACKMAPS".
- Stack Map Usage
- ===============
- The stack map support described in this document can be used to
- precisely determine the location of values at a specific position in
- the code. LLVM does not maintain any mapping between those values and
- any higher-level entity. The runtime must be able to interpret the
- stack map record given only the ID, offset, and the order of the
- locations, records, and functions, which LLVM preserves.
- Note that this is quite different from the goal of debug information,
- which is a best-effort attempt to track the location of named
- variables at every instruction.
- An important motivation for this design is to allow a runtime to
- commandeer a stack frame when execution reaches an instruction address
- associated with a stack map. The runtime must be able to rebuild a
- stack frame and resume program execution using the information
- provided by the stack map. For example, execution may resume in an
- interpreter or a recompiled version of the same function.
- This usage restricts LLVM optimization. Clearly, LLVM must not move
- stores across a stack map. However, loads must also be handled
- conservatively. If the load may trigger an exception, hoisting it
- above a stack map could be invalid. For example, the runtime may
- determine that a load is safe to execute without a type check given
- the current state of the type system. If the type system changes while
- some activation of the load's function exists on the stack, the load
- becomes unsafe. The runtime can prevent subsequent execution of that
- load by immediately patching any stack map location that lies between
- the current call site and the load (typically, the runtime would
- simply patch all stack map locations to invalidate the function). If
- the compiler had hoisted the load above the stack map, then the
- program could crash before the runtime could take back control.
- To enforce these semantics, stackmap and patchpoint intrinsics are
- considered to potentially read and write all memory. This may limit
- optimization more than some clients desire. This limitation may be
- avoided by marking the call site as "readonly". In the future we may
- also allow meta-data to be added to the intrinsic call to express
- aliasing, thereby allowing optimizations to hoist certain loads above
- stack maps.
- Direct Stack Map Entries
- ^^^^^^^^^^^^^^^^^^^^^^^^
- As shown in :ref:`stackmap-section`, a Direct stack map location
- records the address of frame index. This address is itself the value
- that the runtime requested. This differs from Indirect locations,
- which refer to a stack locations from which the requested values must
- be loaded. Direct locations can communicate the address if an alloca,
- while Indirect locations handle register spills.
- For example:
- .. code-block:: none
- entry:
- %a = alloca i64...
- llvm.experimental.stackmap(i64 <ID>, i32 <shadowBytes>, i64* %a)
- The runtime can determine this alloca's relative location on the
- stack immediately after compilation, or at any time thereafter. This
- differs from Register and Indirect locations, because the runtime can
- only read the values in those locations when execution reaches the
- instruction address of the stack map.
- This functionality requires LLVM to treat entry-block allocas
- specially when they are directly consumed by an intrinsics. (This is
- the same requirement imposed by the llvm.gcroot intrinsic.) LLVM
- transformations must not substitute the alloca with any intervening
- value. This can be verified by the runtime simply by checking that the
- stack map's location is a Direct location type.
- Supported Architectures
- =======================
- Support for StackMap generation and the related intrinsics requires
- some code for each backend. Today, only a subset of LLVM's backends
- are supported. The currently supported architectures are X86_64,
- PowerPC, and Aarch64.
|