StackMaps.rst 21 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517
  1. ===================================
  2. Stack maps and patch points in LLVM
  3. ===================================
  4. .. contents::
  5. :local:
  6. :depth: 2
  7. Definitions
  8. ===========
  9. In this document we refer to the "runtime" collectively as all
  10. components that serve as the LLVM client, including the LLVM IR
  11. generator, object code consumer, and code patcher.
  12. A stack map records the location of ``live values`` at a particular
  13. instruction address. These ``live values`` do not refer to all the
  14. LLVM values live across the stack map. Instead, they are only the
  15. values that the runtime requires to be live at this point. For
  16. example, they may be the values the runtime will need to resume
  17. program execution at that point independent of the compiled function
  18. containing the stack map.
  19. LLVM emits stack map data into the object code within a designated
  20. :ref:`stackmap-section`. This stack map data contains a record for
  21. each stack map. The record stores the stack map's instruction address
  22. and contains a entry for each mapped value. Each entry encodes a
  23. value's location as a register, stack offset, or constant.
  24. A patch point is an instruction address at which space is reserved for
  25. patching a new instruction sequence at run time. Patch points look
  26. much like calls to LLVM. They take arguments that follow a calling
  27. convention and may return a value. They also imply stack map
  28. generation, which allows the runtime to locate the patchpoint and
  29. find the location of ``live values`` at that point.
  30. Motivation
  31. ==========
  32. This functionality is currently experimental but is potentially useful
  33. in a variety of settings, the most obvious being a runtime (JIT)
  34. compiler. Example applications of the patchpoint intrinsics are
  35. implementing an inline call cache for polymorphic method dispatch or
  36. optimizing the retrieval of properties in dynamically typed languages
  37. such as JavaScript.
  38. The intrinsics documented here are currently used by the JavaScript
  39. compiler within the open source WebKit project, see the `FTL JIT
  40. <https://trac.webkit.org/wiki/FTLJIT>`_, but they are designed to be
  41. used whenever stack maps or code patching are needed. Because the
  42. intrinsics have experimental status, compatibility across LLVM
  43. releases is not guaranteed.
  44. The stack map functionality described in this document is separate
  45. from the functionality described in
  46. :ref:`stack-map`. `GCFunctionMetadata` provides the location of
  47. pointers into a collected heap captured by the `GCRoot` intrinsic,
  48. which can also be considered a "stack map". Unlike the stack maps
  49. defined above, the `GCFunctionMetadata` stack map interface does not
  50. provide a way to associate live register values of arbitrary type with
  51. an instruction address, nor does it specify a format for the resulting
  52. stack map. The stack maps described here could potentially provide
  53. richer information to a garbage collecting runtime, but that usage
  54. will not be discussed in this document.
  55. Intrinsics
  56. ==========
  57. The following two kinds of intrinsics can be used to implement stack
  58. maps and patch points: ``llvm.experimental.stackmap`` and
  59. ``llvm.experimental.patchpoint``. Both kinds of intrinsics generate a
  60. stack map record, and they both allow some form of code patching. They
  61. can be used independently (i.e. ``llvm.experimental.patchpoint``
  62. implicitly generates a stack map without the need for an additional
  63. call to ``llvm.experimental.stackmap``). The choice of which to use
  64. depends on whether it is necessary to reserve space for code patching
  65. and whether any of the intrinsic arguments should be lowered according
  66. to calling conventions. ``llvm.experimental.stackmap`` does not
  67. reserve any space, nor does it expect any call arguments. If the
  68. runtime patches code at the stack map's address, it will destructively
  69. overwrite the program text. This is unlike
  70. ``llvm.experimental.patchpoint``, which reserves space for in-place
  71. patching without overwriting surrounding code. The
  72. ``llvm.experimental.patchpoint`` intrinsic also lowers a specified
  73. number of arguments according to its calling convention. This allows
  74. patched code to make in-place function calls without marshaling.
  75. Each instance of one of these intrinsics generates a stack map record
  76. in the :ref:`stackmap-section`. The record includes an ID, allowing
  77. the runtime to uniquely identify the stack map, and the offset within
  78. the code from the beginning of the enclosing function.
  79. '``llvm.experimental.stackmap``' Intrinsic
  80. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  81. Syntax:
  82. """""""
  83. ::
  84. declare void
  85. @llvm.experimental.stackmap(i64 <id>, i32 <numShadowBytes>, ...)
  86. Overview:
  87. """""""""
  88. The '``llvm.experimental.stackmap``' intrinsic records the location of
  89. specified values in the stack map without generating any code.
  90. Operands:
  91. """""""""
  92. The first operand is an ID to be encoded within the stack map. The
  93. second operand is the number of shadow bytes following the
  94. intrinsic. The variable number of operands that follow are the ``live
  95. values`` for which locations will be recorded in the stack map.
  96. To use this intrinsic as a bare-bones stack map, with no code patching
  97. support, the number of shadow bytes can be set to zero.
  98. Semantics:
  99. """"""""""
  100. The stack map intrinsic generates no code in place, unless nops are
  101. needed to cover its shadow (see below). However, its offset from
  102. function entry is stored in the stack map. This is the relative
  103. instruction address immediately following the instructions that
  104. precede the stack map.
  105. The stack map ID allows a runtime to locate the desired stack map
  106. record. LLVM passes this ID through directly to the stack map
  107. record without checking uniqueness.
  108. LLVM guarantees a shadow of instructions following the stack map's
  109. instruction offset during which neither the end of the basic block nor
  110. another call to ``llvm.experimental.stackmap`` or
  111. ``llvm.experimental.patchpoint`` may occur. This allows the runtime to
  112. patch the code at this point in response to an event triggered from
  113. outside the code. The code for instructions following the stack map
  114. may be emitted in the stack map's shadow, and these instructions may
  115. be overwritten by destructive patching. Without shadow bytes, this
  116. destructive patching could overwrite program text or data outside the
  117. current function. We disallow overlapping stack map shadows so that
  118. the runtime does not need to consider this corner case.
  119. For example, a stack map with 8 byte shadow:
  120. .. code-block:: llvm
  121. call void @runtime()
  122. call void (i64, i32, ...)* @llvm.experimental.stackmap(i64 77, i32 8,
  123. i64* %ptr)
  124. %val = load i64* %ptr
  125. %add = add i64 %val, 3
  126. ret i64 %add
  127. May require one byte of nop-padding:
  128. .. code-block:: none
  129. 0x00 callq _runtime
  130. 0x05 nop <--- stack map address
  131. 0x06 movq (%rdi), %rax
  132. 0x07 addq $3, %rax
  133. 0x0a popq %rdx
  134. 0x0b ret <---- end of 8-byte shadow
  135. Now, if the runtime needs to invalidate the compiled code, it may
  136. patch 8 bytes of code at the stack map's address at follows:
  137. .. code-block:: none
  138. 0x00 callq _runtime
  139. 0x05 movl $0xffff, %rax <--- patched code at stack map address
  140. 0x0a callq *%rax <---- end of 8-byte shadow
  141. This way, after the normal call to the runtime returns, the code will
  142. execute a patched call to a special entry point that can rebuild a
  143. stack frame from the values located by the stack map.
  144. '``llvm.experimental.patchpoint.*``' Intrinsic
  145. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  146. Syntax:
  147. """""""
  148. ::
  149. declare void
  150. @llvm.experimental.patchpoint.void(i64 <id>, i32 <numBytes>,
  151. i8* <target>, i32 <numArgs>, ...)
  152. declare i64
  153. @llvm.experimental.patchpoint.i64(i64 <id>, i32 <numBytes>,
  154. i8* <target>, i32 <numArgs>, ...)
  155. Overview:
  156. """""""""
  157. The '``llvm.experimental.patchpoint.*``' intrinsics creates a function
  158. call to the specified ``<target>`` and records the location of specified
  159. values in the stack map.
  160. Operands:
  161. """""""""
  162. The first operand is an ID, the second operand is the number of bytes
  163. reserved for the patchable region, the third operand is the target
  164. address of a function (optionally null), and the fourth operand
  165. specifies how many of the following variable operands are considered
  166. function call arguments. The remaining variable number of operands are
  167. the ``live values`` for which locations will be recorded in the stack
  168. map.
  169. Semantics:
  170. """"""""""
  171. The patch point intrinsic generates a stack map. It also emits a
  172. function call to the address specified by ``<target>`` if the address
  173. is not a constant null. The function call and its arguments are
  174. lowered according to the calling convention specified at the
  175. intrinsic's callsite. Variants of the intrinsic with non-void return
  176. type also return a value according to calling convention.
  177. On PowerPC, note that ``<target>`` must be the ABI function pointer for the
  178. intended target of the indirect call. Specifically, when compiling for the
  179. ELF V1 ABI, ``<target>`` is the function-descriptor address normally used as
  180. the C/C++ function-pointer representation.
  181. Requesting zero patch point arguments is valid. In this case, all
  182. variable operands are handled just like
  183. ``llvm.experimental.stackmap.*``. The difference is that space will
  184. still be reserved for patching, a call will be emitted, and a return
  185. value is allowed.
  186. The location of the arguments are not normally recorded in the stack
  187. map because they are already fixed by the calling convention. The
  188. remaining ``live values`` will have their location recorded, which
  189. could be a register, stack location, or constant. A special calling
  190. convention has been introduced for use with stack maps, anyregcc,
  191. which forces the arguments to be loaded into registers but allows
  192. those register to be dynamically allocated. These argument registers
  193. will have their register locations recorded in the stack map in
  194. addition to the remaining ``live values``.
  195. The patch point also emits nops to cover at least ``<numBytes>`` of
  196. instruction encoding space. Hence, the client must ensure that
  197. ``<numBytes>`` is enough to encode a call to the target address on the
  198. supported targets. If the call target is constant null, then there is
  199. no minimum requirement. A zero-byte null target patchpoint is
  200. valid.
  201. The runtime may patch the code emitted for the patch point, including
  202. the call sequence and nops. However, the runtime may not assume
  203. anything about the code LLVM emits within the reserved space. Partial
  204. patching is not allowed. The runtime must patch all reserved bytes,
  205. padding with nops if necessary.
  206. This example shows a patch point reserving 15 bytes, with one argument
  207. in $rdi, and a return value in $rax per native calling convention:
  208. .. code-block:: llvm
  209. %target = inttoptr i64 -281474976710654 to i8*
  210. %val = call i64 (i64, i32, ...)*
  211. @llvm.experimental.patchpoint.i64(i64 78, i32 15,
  212. i8* %target, i32 1, i64* %ptr)
  213. %add = add i64 %val, 3
  214. ret i64 %add
  215. May generate:
  216. .. code-block:: none
  217. 0x00 movabsq $0xffff000000000002, %r11 <--- patch point address
  218. 0x0a callq *%r11
  219. 0x0d nop
  220. 0x0e nop <--- end of reserved 15-bytes
  221. 0x0f addq $0x3, %rax
  222. 0x10 movl %rax, 8(%rsp)
  223. Note that no stack map locations will be recorded. If the patched code
  224. sequence does not need arguments fixed to specific calling convention
  225. registers, then the ``anyregcc`` convention may be used:
  226. .. code-block:: none
  227. %val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15,
  228. i8* %target, i32 1,
  229. i64* %ptr)
  230. The stack map now indicates the location of the %ptr argument and
  231. return value:
  232. .. code-block:: none
  233. Stack Map: ID=78, Loc0=%r9 Loc1=%r8
  234. The patch code sequence may now use the argument that happened to be
  235. allocated in %r8 and return a value allocated in %r9:
  236. .. code-block:: none
  237. 0x00 movslq 4(%r8) %r9 <--- patched code at patch point address
  238. 0x03 nop
  239. ...
  240. 0x0e nop <--- end of reserved 15-bytes
  241. 0x0f addq $0x3, %r9
  242. 0x10 movl %r9, 8(%rsp)
  243. .. _stackmap-format:
  244. Stack Map Format
  245. ================
  246. The existence of a stack map or patch point intrinsic within an LLVM
  247. Module forces code emission to create a :ref:`stackmap-section`. The
  248. format of this section follows:
  249. .. code-block:: none
  250. Header {
  251. uint8 : Stack Map Version (current version is 3)
  252. uint8 : Reserved (expected to be 0)
  253. uint16 : Reserved (expected to be 0)
  254. }
  255. uint32 : NumFunctions
  256. uint32 : NumConstants
  257. uint32 : NumRecords
  258. StkSizeRecord[NumFunctions] {
  259. uint64 : Function Address
  260. uint64 : Stack Size
  261. uint64 : Record Count
  262. }
  263. Constants[NumConstants] {
  264. uint64 : LargeConstant
  265. }
  266. StkMapRecord[NumRecords] {
  267. uint64 : PatchPoint ID
  268. uint32 : Instruction Offset
  269. uint16 : Reserved (record flags)
  270. uint16 : NumLocations
  271. Location[NumLocations] {
  272. uint8 : Register | Direct | Indirect | Constant | ConstantIndex
  273. uint8 : Reserved (expected to be 0)
  274. uint16 : Location Size
  275. uint16 : Dwarf RegNum
  276. uint16 : Reserved (expected to be 0)
  277. int32 : Offset or SmallConstant
  278. }
  279. uint32 : Padding (only if required to align to 8 byte)
  280. uint16 : Padding
  281. uint16 : NumLiveOuts
  282. LiveOuts[NumLiveOuts]
  283. uint16 : Dwarf RegNum
  284. uint8 : Reserved
  285. uint8 : Size in Bytes
  286. }
  287. uint32 : Padding (only if required to align to 8 byte)
  288. }
  289. The first byte of each location encodes a type that indicates how to
  290. interpret the ``RegNum`` and ``Offset`` fields as follows:
  291. ======== ========== =================== ===========================
  292. Encoding Type Value Description
  293. -------- ---------- ------------------- ---------------------------
  294. 0x1 Register Reg Value in a register
  295. 0x2 Direct Reg + Offset Frame index value
  296. 0x3 Indirect [Reg + Offset] Spilled value
  297. 0x4 Constant Offset Small constant
  298. 0x5 ConstIndex Constants[Offset] Large constant
  299. ======== ========== =================== ===========================
  300. In the common case, a value is available in a register, and the
  301. ``Offset`` field will be zero. Values spilled to the stack are encoded
  302. as ``Indirect`` locations. The runtime must load those values from a
  303. stack address, typically in the form ``[BP + Offset]``. If an
  304. ``alloca`` value is passed directly to a stack map intrinsic, then
  305. LLVM may fold the frame index into the stack map as an optimization to
  306. avoid allocating a register or stack slot. These frame indices will be
  307. encoded as ``Direct`` locations in the form ``BP + Offset``. LLVM may
  308. also optimize constants by emitting them directly in the stack map,
  309. either in the ``Offset`` of a ``Constant`` location or in the constant
  310. pool, referred to by ``ConstantIndex`` locations.
  311. At each callsite, a "liveout" register list is also recorded. These
  312. are the registers that are live across the stackmap and therefore must
  313. be saved by the runtime. This is an important optimization when the
  314. patchpoint intrinsic is used with a calling convention that by default
  315. preserves most registers as callee-save.
  316. Each entry in the liveout register list contains a DWARF register
  317. number and size in bytes. The stackmap format deliberately omits
  318. specific subregister information. Instead the runtime must interpret
  319. this information conservatively. For example, if the stackmap reports
  320. one byte at ``%rax``, then the value may be in either ``%al`` or
  321. ``%ah``. It doesn't matter in practice, because the runtime will
  322. simply save ``%rax``. However, if the stackmap reports 16 bytes at
  323. ``%ymm0``, then the runtime can safely optimize by saving only
  324. ``%xmm0``.
  325. The stack map format is a contract between an LLVM SVN revision and
  326. the runtime. It is currently experimental and may change in the short
  327. term, but minimizing the need to update the runtime is
  328. important. Consequently, the stack map design is motivated by
  329. simplicity and extensibility. Compactness of the representation is
  330. secondary because the runtime is expected to parse the data
  331. immediately after compiling a module and encode the information in its
  332. own format. Since the runtime controls the allocation of sections, it
  333. can reuse the same stack map space for multiple modules.
  334. Stackmap support is currently only implemented for 64-bit
  335. platforms. However, a 32-bit implementation should be able to use the
  336. same format with an insignificant amount of wasted space.
  337. .. _stackmap-section:
  338. Stack Map Section
  339. ^^^^^^^^^^^^^^^^^
  340. A JIT compiler can easily access this section by providing its own
  341. memory manager via the LLVM C API
  342. ``LLVMCreateSimpleMCJITMemoryManager()``. When creating the memory
  343. manager, the JIT provides a callback:
  344. ``LLVMMemoryManagerAllocateDataSectionCallback()``. When LLVM creates
  345. this section, it invokes the callback and passes the section name. The
  346. JIT can record the in-memory address of the section at this time and
  347. later parse it to recover the stack map data.
  348. For MachO (e.g. on Darwin), the stack map section name is
  349. "__llvm_stackmaps". The segment name is "__LLVM_STACKMAPS".
  350. For ELF (e.g. on Linux), the stack map section name is
  351. ".llvm_stackmaps". The segment name is "__LLVM_STACKMAPS".
  352. Stack Map Usage
  353. ===============
  354. The stack map support described in this document can be used to
  355. precisely determine the location of values at a specific position in
  356. the code. LLVM does not maintain any mapping between those values and
  357. any higher-level entity. The runtime must be able to interpret the
  358. stack map record given only the ID, offset, and the order of the
  359. locations, records, and functions, which LLVM preserves.
  360. Note that this is quite different from the goal of debug information,
  361. which is a best-effort attempt to track the location of named
  362. variables at every instruction.
  363. An important motivation for this design is to allow a runtime to
  364. commandeer a stack frame when execution reaches an instruction address
  365. associated with a stack map. The runtime must be able to rebuild a
  366. stack frame and resume program execution using the information
  367. provided by the stack map. For example, execution may resume in an
  368. interpreter or a recompiled version of the same function.
  369. This usage restricts LLVM optimization. Clearly, LLVM must not move
  370. stores across a stack map. However, loads must also be handled
  371. conservatively. If the load may trigger an exception, hoisting it
  372. above a stack map could be invalid. For example, the runtime may
  373. determine that a load is safe to execute without a type check given
  374. the current state of the type system. If the type system changes while
  375. some activation of the load's function exists on the stack, the load
  376. becomes unsafe. The runtime can prevent subsequent execution of that
  377. load by immediately patching any stack map location that lies between
  378. the current call site and the load (typically, the runtime would
  379. simply patch all stack map locations to invalidate the function). If
  380. the compiler had hoisted the load above the stack map, then the
  381. program could crash before the runtime could take back control.
  382. To enforce these semantics, stackmap and patchpoint intrinsics are
  383. considered to potentially read and write all memory. This may limit
  384. optimization more than some clients desire. This limitation may be
  385. avoided by marking the call site as "readonly". In the future we may
  386. also allow meta-data to be added to the intrinsic call to express
  387. aliasing, thereby allowing optimizations to hoist certain loads above
  388. stack maps.
  389. Direct Stack Map Entries
  390. ^^^^^^^^^^^^^^^^^^^^^^^^
  391. As shown in :ref:`stackmap-section`, a Direct stack map location
  392. records the address of frame index. This address is itself the value
  393. that the runtime requested. This differs from Indirect locations,
  394. which refer to a stack locations from which the requested values must
  395. be loaded. Direct locations can communicate the address if an alloca,
  396. while Indirect locations handle register spills.
  397. For example:
  398. .. code-block:: none
  399. entry:
  400. %a = alloca i64...
  401. llvm.experimental.stackmap(i64 <ID>, i32 <shadowBytes>, i64* %a)
  402. The runtime can determine this alloca's relative location on the
  403. stack immediately after compilation, or at any time thereafter. This
  404. differs from Register and Indirect locations, because the runtime can
  405. only read the values in those locations when execution reaches the
  406. instruction address of the stack map.
  407. This functionality requires LLVM to treat entry-block allocas
  408. specially when they are directly consumed by an intrinsics. (This is
  409. the same requirement imposed by the llvm.gcroot intrinsic.) LLVM
  410. transformations must not substitute the alloca with any intervening
  411. value. This can be verified by the runtime simply by checking that the
  412. stack map's location is a Direct location type.
  413. Supported Architectures
  414. =======================
  415. Support for StackMap generation and the related intrinsics requires
  416. some code for each backend. Today, only a subset of LLVM's backends
  417. are supported. The currently supported architectures are X86_64,
  418. PowerPC, and Aarch64.