ShadowCallStack.rst 9.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211
  1. ===============
  2. ShadowCallStack
  3. ===============
  4. .. contents::
  5. :local:
  6. Introduction
  7. ============
  8. ShadowCallStack is an instrumentation pass, currently only implemented for
  9. aarch64, that protects programs against return address overwrites
  10. (e.g. stack buffer overflows.) It works by saving a function's return address
  11. to a separately allocated 'shadow call stack' in the function prolog in
  12. non-leaf functions and loading the return address from the shadow call stack
  13. in the function epilog. The return address is also stored on the regular stack
  14. for compatibility with unwinders, but is otherwise unused.
  15. The aarch64 implementation is considered production ready, and
  16. an `implementation of the runtime`_ has been added to Android's libc
  17. (bionic). An x86_64 implementation was evaluated using Chromium and was found
  18. to have critical performance and security deficiencies--it was removed in
  19. LLVM 9.0. Details on the x86_64 implementation can be found in the
  20. `Clang 7.0.1 documentation`_.
  21. .. _`implementation of the runtime`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/bionic/pthread_create.cpp#128
  22. .. _`Clang 7.0.1 documentation`: https://releases.llvm.org/7.0.1/tools/clang/docs/ShadowCallStack.html
  23. Comparison
  24. ----------
  25. To optimize for memory consumption and cache locality, the shadow call
  26. stack stores only an array of return addresses. This is in contrast to other
  27. schemes, like :doc:`SafeStack`, that mirror the entire stack and trade-off
  28. consuming more memory for shorter function prologs and epilogs with fewer
  29. memory accesses.
  30. `Return Flow Guard`_ is a pure software implementation of shadow call stacks
  31. on x86_64. Like the previous implementation of ShadowCallStack on x86_64, it is
  32. inherently racy due to the architecture's use of the stack for calls and
  33. returns.
  34. Intel `Control-flow Enforcement Technology`_ (CET) is a proposed hardware
  35. extension that would add native support to use a shadow stack to store/check
  36. return addresses at call/return time. Being a hardware implementation, it
  37. would not suffer from race conditions and would not incur the overhead of
  38. function instrumentation, but it does require operating system support.
  39. .. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/
  40. .. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
  41. Compatibility
  42. -------------
  43. A runtime is not provided in compiler-rt so one must be provided by the
  44. compiled application or the operating system. Integrating the runtime into
  45. the operating system should be preferred since otherwise all thread creation
  46. and destruction would need to be intercepted by the application.
  47. The instrumentation makes use of the platform register ``x18``. On some
  48. platforms, ``x18`` is reserved, and on others, it is designated as a scratch
  49. register. This generally means that any code that may run on the same thread
  50. as code compiled with ShadowCallStack must either target one of the platforms
  51. whose ABI reserves ``x18`` (currently Android, Darwin, Fuchsia and Windows)
  52. or be compiled with the flag ``-ffixed-x18``. If absolutely necessary, code
  53. compiled without ``-ffixed-x18`` may be run on the same thread as code that
  54. uses ShadowCallStack by saving the register value temporarily on the stack
  55. (`example in Android`_) but this should be done with care since it risks
  56. leaking the shadow call stack address.
  57. .. _`example in Android`: https://android-review.googlesource.com/c/platform/frameworks/base/+/803717
  58. Because of the use of register ``x18``, the ShadowCallStack feature is
  59. incompatible with any other feature that may use ``x18``. However, there
  60. is no inherent reason why ShadowCallStack needs to use register ``x18``
  61. specifically; in principle, a platform could choose to reserve and use another
  62. register for ShadowCallStack, but this would be incompatible with the AAPCS64.
  63. Special unwind information is required on functions that are compiled
  64. with ShadowCallStack and that may be unwound, i.e. functions compiled with
  65. ``-fexceptions`` (which is the default in C++). Some unwinders (such as the
  66. libgcc 4.9 unwinder) do not understand this unwind info and will segfault
  67. when encountering it. LLVM libunwind processes this unwind info correctly,
  68. however. This means that if exceptions are used together with ShadowCallStack,
  69. the program must use a compatible unwinder.
  70. Security
  71. ========
  72. ShadowCallStack is intended to be a stronger alternative to
  73. ``-fstack-protector``. It protects from non-linear overflows and arbitrary
  74. memory writes to the return address slot.
  75. The instrumentation makes use of the ``x18`` register to reference the shadow
  76. call stack, meaning that references to the shadow call stack do not have
  77. to be stored in memory. This makes it possible to implement a runtime that
  78. avoids exposing the address of the shadow call stack to attackers that can
  79. read arbitrary memory. However, attackers could still try to exploit side
  80. channels exposed by the operating system `[1]`_ `[2]`_ or processor `[3]`_
  81. to discover the address of the shadow call stack.
  82. .. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/
  83. .. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf
  84. .. _`[3]`: https://www.vusec.net/projects/anc/
  85. Unless care is taken when allocating the shadow call stack, it may be
  86. possible for an attacker to guess its address using the addresses of
  87. other allocations. Therefore, the address should be chosen to make this
  88. difficult. One way to do this is to allocate a large guard region without
  89. read/write permissions, randomly select a small region within it to be
  90. used as the address of the shadow call stack and mark only that region as
  91. read/write. This also mitigates somewhat against processor side channels.
  92. The intent is that the Android runtime `will do this`_, but the platform will
  93. first need to be `changed`_ to avoid using ``setrlimit(RLIMIT_AS)`` to limit
  94. memory allocations in certain processes, as this also limits the number of
  95. guard regions that can be allocated.
  96. .. _`will do this`: https://android-review.googlesource.com/c/platform/bionic/+/891622
  97. .. _`changed`: https://android-review.googlesource.com/c/platform/frameworks/av/+/837745
  98. The runtime will need the address of the shadow call stack in order to
  99. deallocate it when destroying the thread. If the entire program is compiled
  100. with ``-ffixed-x18``, this is trivial: the address can be derived from the
  101. value stored in ``x18`` (e.g. by masking out the lower bits). If a guard
  102. region is used, the address of the start of the guard region could then be
  103. stored at the start of the shadow call stack itself. But if it is possible
  104. for code compiled without ``-ffixed-x18`` to run on a thread managed by the
  105. runtime, which is the case on Android for example, the address must be stored
  106. somewhere else instead. On Android we store the address of the start of the
  107. guard region in TLS and deallocate the entire guard region including the
  108. shadow call stack at thread exit. This is considered acceptable given that
  109. the address of the start of the guard region is already somewhat guessable.
  110. One way in which the address of the shadow call stack could leak is in the
  111. ``jmp_buf`` data structure used by ``setjmp`` and ``longjmp``. The Android
  112. runtime `avoids this`_ by only storing the low bits of ``x18`` in the
  113. ``jmp_buf``, which requires the address of the shadow call stack to be
  114. aligned to its size.
  115. .. _`avoids this`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/arch-arm64/bionic/setjmp.S#49
  116. The architecture's call and return instructions (``bl`` and ``ret``) operate on
  117. a register rather than the stack, which means that leaf functions are generally
  118. protected from return address overwrites even without ShadowCallStack.
  119. Usage
  120. =====
  121. To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack``
  122. flag to both compile and link command lines. On aarch64, you also need to pass
  123. ``-ffixed-x18`` unless your target already reserves ``x18``.
  124. Low-level API
  125. -------------
  126. ``__has_feature(shadow_call_stack)``
  127. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  128. In some cases one may need to execute different code depending on whether
  129. ShadowCallStack is enabled. The macro ``__has_feature(shadow_call_stack)`` can
  130. be used for this purpose.
  131. .. code-block:: c
  132. #if defined(__has_feature)
  133. # if __has_feature(shadow_call_stack)
  134. // code that builds only under ShadowCallStack
  135. # endif
  136. #endif
  137. ``__attribute__((no_sanitize("shadow-call-stack")))``
  138. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  139. Use ``__attribute__((no_sanitize("shadow-call-stack")))`` on a function
  140. declaration to specify that the shadow call stack instrumentation should not be
  141. applied to that function, even if enabled globally.
  142. Example
  143. =======
  144. The following example code:
  145. .. code-block:: c++
  146. int foo() {
  147. return bar() + 1;
  148. }
  149. Generates the following aarch64 assembly when compiled with ``-O2``:
  150. .. code-block:: none
  151. stp x29, x30, [sp, #-16]!
  152. mov x29, sp
  153. bl bar
  154. add w0, w0, #1
  155. ldp x29, x30, [sp], #16
  156. ret
  157. Adding ``-fsanitize=shadow-call-stack`` would output the following assembly:
  158. .. code-block:: none
  159. str x30, [x18], #8
  160. stp x29, x30, [sp, #-16]!
  161. mov x29, sp
  162. bl bar
  163. add w0, w0, #1
  164. ldp x29, x30, [sp], #16
  165. ldr x30, [x18, #-8]!
  166. ret