FuzzingLLVM.rst 9.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280
  1. ================================
  2. Fuzzing LLVM libraries and tools
  3. ================================
  4. .. contents::
  5. :local:
  6. :depth: 2
  7. Introduction
  8. ============
  9. The LLVM tree includes a number of fuzzers for various components. These are
  10. built on top of :doc:`LibFuzzer <LibFuzzer>`. In order to build and run these
  11. fuzzers, see :ref:`building-fuzzers`.
  12. Available Fuzzers
  13. =================
  14. clang-fuzzer
  15. ------------
  16. A |generic fuzzer| that tries to compile textual input as C++ code. Some of the
  17. bugs this fuzzer has reported are `on bugzilla`__ and `on OSS Fuzz's
  18. tracker`__.
  19. __ https://llvm.org/pr23057
  20. __ https://bugs.chromium.org/p/oss-fuzz/issues/list?q=proj-llvm+clang-fuzzer
  21. clang-proto-fuzzer
  22. ------------------
  23. A |protobuf fuzzer| that compiles valid C++ programs generated from a protobuf
  24. class that describes a subset of the C++ language.
  25. This fuzzer accepts clang command line options after `ignore_remaining_args=1`.
  26. For example, the following command will fuzz clang with a higher optimization
  27. level:
  28. .. code-block:: shell
  29. % bin/clang-proto-fuzzer <corpus-dir> -ignore_remaining_args=1 -O3
  30. clang-format-fuzzer
  31. -------------------
  32. A |generic fuzzer| that runs clang-format_ on C++ text fragments. Some of the
  33. bugs this fuzzer has reported are `on bugzilla`__
  34. and `on OSS Fuzz's tracker`__.
  35. .. _clang-format: https://clang.llvm.org/docs/ClangFormat.html
  36. __ https://llvm.org/pr23052
  37. __ https://bugs.chromium.org/p/oss-fuzz/issues/list?q=proj-llvm+clang-format-fuzzer
  38. llvm-as-fuzzer
  39. --------------
  40. A |generic fuzzer| that tries to parse text as :doc:`LLVM assembly <LangRef>`.
  41. Some of the bugs this fuzzer has reported are `on bugzilla`__.
  42. __ https://llvm.org/pr24639
  43. llvm-dwarfdump-fuzzer
  44. ---------------------
  45. A |generic fuzzer| that interprets inputs as object files and runs
  46. :doc:`llvm-dwarfdump <CommandGuide/llvm-dwarfdump>` on them. Some of the bugs
  47. this fuzzer has reported are `on OSS Fuzz's tracker`__
  48. __ https://bugs.chromium.org/p/oss-fuzz/issues/list?q=proj-llvm+llvm-dwarfdump-fuzzer
  49. llvm-demangle-fuzzer
  50. ---------------------
  51. A |generic fuzzer| for the Itanium demangler used in various LLVM tools. We've
  52. fuzzed __cxa_demangle to death, why not fuzz LLVM's implementation of the same
  53. function!
  54. llvm-isel-fuzzer
  55. ----------------
  56. A |LLVM IR fuzzer| aimed at finding bugs in instruction selection.
  57. This fuzzer accepts flags after `ignore_remaining_args=1`. The flags match
  58. those of :doc:`llc <CommandGuide/llc>` and the triple is required. For example,
  59. the following command would fuzz AArch64 with :doc:`GlobalISel`:
  60. .. code-block:: shell
  61. % bin/llvm-isel-fuzzer <corpus-dir> -ignore_remaining_args=1 -mtriple aarch64 -global-isel -O0
  62. Some flags can also be specified in the binary name itself in order to support
  63. OSS Fuzz, which has trouble with required arguments. To do this, you can copy
  64. or move ``llvm-isel-fuzzer`` to ``llvm-isel-fuzzer--x-y-z``, separating options
  65. from the binary name using "--". The valid options are architecture names
  66. (``aarch64``, ``x86_64``), optimization levels (``O0``, ``O2``), or specific
  67. keywords, like ``gisel`` for enabling global instruction selection. In this
  68. mode, the same example could be run like so:
  69. .. code-block:: shell
  70. % bin/llvm-isel-fuzzer--aarch64-O0-gisel <corpus-dir>
  71. llvm-opt-fuzzer
  72. ---------------
  73. A |LLVM IR fuzzer| aimed at finding bugs in optimization passes.
  74. It receives optimzation pipeline and runs it for each fuzzer input.
  75. Interface of this fuzzer almost directly mirrors ``llvm-isel-fuzzer``. Both
  76. ``mtriple`` and ``passes`` arguments are required. Passes are specified in a
  77. format suitable for the new pass manager. You can find some documentation about
  78. this format in the doxygen for ``PassBuilder::parsePassPipeline``.
  79. .. code-block:: shell
  80. % bin/llvm-opt-fuzzer <corpus-dir> -ignore_remaining_args=1 -mtriple x86_64 -passes instcombine
  81. Similarly to the ``llvm-isel-fuzzer`` arguments in some predefined configurations
  82. might be embedded directly into the binary file name:
  83. .. code-block:: shell
  84. % bin/llvm-opt-fuzzer--x86_64-instcombine <corpus-dir>
  85. llvm-mc-assemble-fuzzer
  86. -----------------------
  87. A |generic fuzzer| that fuzzes the MC layer's assemblers by treating inputs as
  88. target specific assembly.
  89. Note that this fuzzer has an unusual command line interface which is not fully
  90. compatible with all of libFuzzer's features. Fuzzer arguments must be passed
  91. after ``--fuzzer-args``, and any ``llc`` flags must use two dashes. For
  92. example, to fuzz the AArch64 assembler you might use the following command:
  93. .. code-block:: console
  94. llvm-mc-fuzzer --triple=aarch64-linux-gnu --fuzzer-args -max_len=4
  95. This scheme will likely change in the future.
  96. llvm-mc-disassemble-fuzzer
  97. --------------------------
  98. A |generic fuzzer| that fuzzes the MC layer's disassemblers by treating inputs
  99. as assembled binary data.
  100. Note that this fuzzer has an unusual command line interface which is not fully
  101. compatible with all of libFuzzer's features. See the notes above about
  102. ``llvm-mc-assemble-fuzzer`` for details.
  103. .. |generic fuzzer| replace:: :ref:`generic fuzzer <fuzzing-llvm-generic>`
  104. .. |protobuf fuzzer|
  105. replace:: :ref:`libprotobuf-mutator based fuzzer <fuzzing-llvm-protobuf>`
  106. .. |LLVM IR fuzzer|
  107. replace:: :ref:`structured LLVM IR fuzzer <fuzzing-llvm-ir>`
  108. Mutators and Input Generators
  109. =============================
  110. The inputs for a fuzz target are generated via random mutations of a
  111. :ref:`corpus <libfuzzer-corpus>`. There are a few options for the kinds of
  112. mutations that a fuzzer in LLVM might want.
  113. .. _fuzzing-llvm-generic:
  114. Generic Random Fuzzing
  115. ----------------------
  116. The most basic form of input mutation is to use the built in mutators of
  117. LibFuzzer. These simply treat the input corpus as a bag of bits and make random
  118. mutations. This type of fuzzer is good for stressing the surface layers of a
  119. program, and is good at testing things like lexers, parsers, or binary
  120. protocols.
  121. Some of the in-tree fuzzers that use this type of mutator are `clang-fuzzer`_,
  122. `clang-format-fuzzer`_, `llvm-as-fuzzer`_, `llvm-dwarfdump-fuzzer`_,
  123. `llvm-mc-assemble-fuzzer`_, and `llvm-mc-disassemble-fuzzer`_.
  124. .. _fuzzing-llvm-protobuf:
  125. Structured Fuzzing using ``libprotobuf-mutator``
  126. ------------------------------------------------
  127. We can use libprotobuf-mutator_ in order to perform structured fuzzing and
  128. stress deeper layers of programs. This works by defining a protobuf class that
  129. translates arbitrary data into structurally interesting input. Specifically, we
  130. use this to work with a subset of the C++ language and perform mutations that
  131. produce valid C++ programs in order to exercise parts of clang that are more
  132. interesting than parser error handling.
  133. To build this kind of fuzzer you need `protobuf`_ and its dependencies
  134. installed, and you need to specify some extra flags when configuring the build
  135. with :doc:`CMake <CMake>`. For example, `clang-proto-fuzzer`_ can be enabled by
  136. adding ``-DCLANG_ENABLE_PROTO_FUZZER=ON`` to the flags described in
  137. :ref:`building-fuzzers`.
  138. The only in-tree fuzzer that uses ``libprotobuf-mutator`` today is
  139. `clang-proto-fuzzer`_.
  140. .. _libprotobuf-mutator: https://github.com/google/libprotobuf-mutator
  141. .. _protobuf: https://github.com/google/protobuf
  142. .. _fuzzing-llvm-ir:
  143. Structured Fuzzing of LLVM IR
  144. -----------------------------
  145. We also use a more direct form of structured fuzzing for fuzzers that take
  146. :doc:`LLVM IR <LangRef>` as input. This is achieved through the ``FuzzMutate``
  147. library, which was `discussed at EuroLLVM 2017`_.
  148. The ``FuzzMutate`` library is used to structurally fuzz backends in
  149. `llvm-isel-fuzzer`_.
  150. .. _discussed at EuroLLVM 2017: https://www.youtube.com/watch?v=UBbQ_s6hNgg
  151. Building and Running
  152. ====================
  153. .. _building-fuzzers:
  154. Configuring LLVM to Build Fuzzers
  155. ---------------------------------
  156. Fuzzers will be built and linked to libFuzzer by default as long as you build
  157. LLVM with sanitizer coverage enabled. You would typically also enable at least
  158. one sanitizer to find bugs faster. The most common way to build the fuzzers is
  159. by adding the following two flags to your CMake invocation:
  160. ``-DLLVM_USE_SANITIZER=Address -DLLVM_USE_SANITIZE_COVERAGE=On``.
  161. .. note:: If you have ``compiler-rt`` checked out in an LLVM tree when building
  162. with sanitizers, you'll want to specify ``-DLLVM_BUILD_RUNTIME=Off``
  163. to avoid building the sanitizers themselves with sanitizers enabled.
  164. .. note:: You may run into issues if you build with BFD ld, which is the
  165. default linker on many unix systems. These issues are being tracked
  166. in https://llvm.org/PR34636.
  167. Continuously Running and Finding Bugs
  168. -------------------------------------
  169. There used to be a public buildbot running LLVM fuzzers continuously, and while
  170. this did find issues, it didn't have a very good way to report problems in an
  171. actionable way. Because of this, we're moving towards using `OSS Fuzz`_ more
  172. instead.
  173. You can browse the `LLVM project issue list`_ for the bugs found by
  174. `LLVM on OSS Fuzz`_. These are also mailed to the `llvm-bugs mailing
  175. list`_.
  176. .. _OSS Fuzz: https://github.com/google/oss-fuzz
  177. .. _LLVM project issue list:
  178. https://bugs.chromium.org/p/oss-fuzz/issues/list?q=Proj-llvm
  179. .. _LLVM on OSS Fuzz:
  180. https://github.com/google/oss-fuzz/blob/master/projects/llvm
  181. .. _llvm-bugs mailing list:
  182. http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
  183. Utilities for Writing Fuzzers
  184. =============================
  185. There are some utilities available for writing fuzzers in LLVM.
  186. Some helpers for handling the command line interface are available in
  187. ``include/llvm/FuzzMutate/FuzzerCLI.h``, including functions to parse command
  188. line options in a consistent way and to implement standalone main functions so
  189. your fuzzer can be built and tested when not built against libFuzzer.
  190. There is also some handling of the CMake config for fuzzers, where you should
  191. use the ``add_llvm_fuzzer`` to set up fuzzer targets. This function works
  192. similarly to functions such as ``add_llvm_tool``, but they take care of linking
  193. to LibFuzzer when appropriate and can be passed the ``DUMMY_MAIN`` argument to
  194. enable standalone testing.