|
@@ -0,0 +1,150 @@
|
|
|
+===============
|
|
|
+ShadowCallStack
|
|
|
+===============
|
|
|
+
|
|
|
+.. contents::
|
|
|
+ :local:
|
|
|
+
|
|
|
+Introduction
|
|
|
+============
|
|
|
+
|
|
|
+ShadowCallStack is an **experimental** instrumentation pass, currently only
|
|
|
+implemented for x86_64, that protects programs against return address
|
|
|
+overwrites (e.g. stack buffer overflows.) It works by saving a function's return
|
|
|
+address to a separately allocated 'shadow call stack' in the function prolog and
|
|
|
+checking the return address on the stack against the shadow call stack in the
|
|
|
+function epilog.
|
|
|
+
|
|
|
+Comparison
|
|
|
+----------
|
|
|
+
|
|
|
+To optimize for memory consumption and cache locality, the shadow call stack
|
|
|
+stores an index followed by an array of return addresses. This is in contrast
|
|
|
+to other schemes, like :doc:`SafeStack`, that mirror the entire stack and
|
|
|
+trade-off consuming more memory for shorter function prologs and epilogs with
|
|
|
+fewer memory accesses. Similarly, `Return Flow Guard`_ consumes more memory with
|
|
|
+shorter function prologs and epilogs than ShadowCallStack but suffers from the
|
|
|
+same race conditions (see `Security`_). Intel `Control-flow Enforcement Technology`_
|
|
|
+(CET) is a proposed hardware extension that would add native support to
|
|
|
+use a shadow stack to store/check return addresses at call/return time. It
|
|
|
+would not suffer from race conditions at calls and returns and not incur the
|
|
|
+overhead of function instrumentation, but it does require operating system
|
|
|
+support.
|
|
|
+
|
|
|
+.. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/
|
|
|
+.. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
|
|
|
+
|
|
|
+Compatibility
|
|
|
+-------------
|
|
|
+
|
|
|
+ShadowCallStack currently only supports x86_64. A runtime is not currently
|
|
|
+provided in compiler-rt so one must be provided by the compiled application.
|
|
|
+
|
|
|
+Security
|
|
|
+========
|
|
|
+
|
|
|
+ShadowCallStack is intended to be a stronger alternative to
|
|
|
+``-fstack-protector``. It protects from non-linear overflows and arbitrary
|
|
|
+memory writes to the return address slot; however, similarly to
|
|
|
+``-fstack-protector`` this protection suffers from race conditions because of
|
|
|
+the call-return semantics on x86_64. There is a short race between the call
|
|
|
+instruction and the first instruction in the function that reads the return
|
|
|
+address where an attacker could overwrite the return address and bypass
|
|
|
+ShadowCallStack. Similarly, there is a time-of-check-to-time-of-use race in the
|
|
|
+function epilog where an attacker could overwrite the return address after it
|
|
|
+has been checked and before it has been returned to. Modifying the call-return
|
|
|
+semantics to fix this on x86_64 would incur an unacceptable performance overhead
|
|
|
+due to return branch prediction.
|
|
|
+
|
|
|
+The instrumentation makes use of the ``gs`` segment register to reference the
|
|
|
+shadow call stack meaning that references to the shadow call stack do not have
|
|
|
+to be stored in memory. This makes it possible to implement a runtime that
|
|
|
+avoids exposing the address of the shadow call stack to attackers that can read
|
|
|
+arbitrary memory. However, attackers could still try to exploit side channels
|
|
|
+exposed by the operating system `[1]`_ `[2]`_ or processor `[3]`_ to discover
|
|
|
+the address of the shadow call stack.
|
|
|
+
|
|
|
+.. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/
|
|
|
+.. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf
|
|
|
+.. _`[3]`: https://www.vusec.net/projects/anc/
|
|
|
+
|
|
|
+Leaf functions are optimized to store the return address in a free register
|
|
|
+and avoid writing to the shadow call stack if a register is available. Very
|
|
|
+short leaf functions are uninstrumented if their execution is judged to be
|
|
|
+shorter than the race condition window intrinsic to the instrumentation.
|
|
|
+
|
|
|
+Usage
|
|
|
+=====
|
|
|
+
|
|
|
+To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack`` flag
|
|
|
+to both compile and link command lines.
|
|
|
+
|
|
|
+Low-level API
|
|
|
+-------------
|
|
|
+
|
|
|
+``__has_feature(shadow_call_stack)``
|
|
|
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
+
|
|
|
+In some cases one may need to execute different code depending on whether
|
|
|
+ShadowCallStack is enabled. The macro ``__has_feature(shadow_call_stack)`` can
|
|
|
+be used for this purpose.
|
|
|
+
|
|
|
+.. code-block:: c
|
|
|
+
|
|
|
+ #if defined(__has_feature)
|
|
|
+ # if __has_feature(shadow_call_stack)
|
|
|
+ // code that builds only under ShadowCallStack
|
|
|
+ # endif
|
|
|
+ #endif
|
|
|
+
|
|
|
+``__attribute__((no_sanitize("shadow-call-stack")))``
|
|
|
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
+
|
|
|
+Use ``__attribute__((no_sanitize("shadow-call-stack")))`` on a function
|
|
|
+declaration to specify that the shadow call stack instrumentation should not be
|
|
|
+applied to that function, even if enabled globally.
|
|
|
+
|
|
|
+Example
|
|
|
+=======
|
|
|
+
|
|
|
+The following example code:
|
|
|
+
|
|
|
+.. code-block:: c++
|
|
|
+
|
|
|
+ int foo() {
|
|
|
+ return bar() + 1;
|
|
|
+ }
|
|
|
+
|
|
|
+Generates the following x86_64 assembly when compiled with ``-O2``:
|
|
|
+
|
|
|
+.. code-block:: gas
|
|
|
+
|
|
|
+ push %rax
|
|
|
+ callq foo
|
|
|
+ add $0x1,%eax
|
|
|
+ pop %rcx
|
|
|
+ retq
|
|
|
+
|
|
|
+Adding ``-fsanitize=shadow-call-stack`` would output the following:
|
|
|
+
|
|
|
+.. code-block:: gas
|
|
|
+
|
|
|
+ mov (%rsp),%r10
|
|
|
+ xor %r11,%r11
|
|
|
+ addq $0x8,%gs:(%r11)
|
|
|
+ mov %gs:(%r11),%r11
|
|
|
+ mov %r10,%gs:(%r11)
|
|
|
+ push %rax
|
|
|
+ callq foo
|
|
|
+ add $0x1,%eax
|
|
|
+ pop %rcx
|
|
|
+ xor %r11,%r11
|
|
|
+ mov %gs:(%r11),%r10
|
|
|
+ mov %gs:(%r10),%r10
|
|
|
+ subq $0x8,%gs:(%r11)
|
|
|
+ cmp %r10,(%rsp)
|
|
|
+ jne trap
|
|
|
+ retq
|
|
|
+
|
|
|
+ trap:
|
|
|
+ ud2
|