2
0

fuzzing.txt 7.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175
  1. = Fuzzing =
  2. == Introduction ==
  3. This document describes the virtual-device fuzzing infrastructure in QEMU and
  4. how to use it to implement additional fuzzers.
  5. == Basics ==
  6. Fuzzing operates by passing inputs to an entry point/target function. The
  7. fuzzer tracks the code coverage triggered by the input. Based on these
  8. findings, the fuzzer mutates the input and repeats the fuzzing.
  9. To fuzz QEMU, we rely on libfuzzer. Unlike other fuzzers such as AFL, libfuzzer
  10. is an _in-process_ fuzzer. For the developer, this means that it is their
  11. responsibility to ensure that state is reset between fuzzing-runs.
  12. == Building the fuzzers ==
  13. NOTE: If possible, build a 32-bit binary. When forking, the 32-bit fuzzer is
  14. much faster, since the page-map has a smaller size. This is due to the fact that
  15. AddressSanitizer mmaps ~20TB of memory, as part of its detection. This results
  16. in a large page-map, and a much slower fork().
  17. To build the fuzzers, install a recent version of clang:
  18. Configure with (substitute the clang binaries with the version you installed).
  19. Here, enable-sanitizers, is optional but it allows us to reliably detect bugs
  20. such as out-of-bounds accesses, use-after-frees, double-frees etc.
  21. CC=clang-8 CXX=clang++-8 /path/to/configure --enable-fuzzing \
  22. --enable-sanitizers
  23. Fuzz targets are built similarly to system/softmmu:
  24. make i386-softmmu/fuzz
  25. This builds ./i386-softmmu/qemu-fuzz-i386
  26. The first option to this command is: --fuzz-target=FUZZ_NAME
  27. To list all of the available fuzzers run qemu-fuzz-i386 with no arguments.
  28. For example:
  29. ./i386-softmmu/qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz
  30. Internally, libfuzzer parses all arguments that do not begin with "--".
  31. Information about these is available by passing -help=1
  32. Now the only thing left to do is wait for the fuzzer to trigger potential
  33. crashes.
  34. == Useful libFuzzer flags ==
  35. As mentioned above, libFuzzer accepts some arguments. Passing -help=1 will list
  36. the available arguments. In particular, these arguments might be helpful:
  37. $CORPUS_DIR/ : Specify a directory as the last argument to libFuzzer. libFuzzer
  38. stores each "interesting" input in this corpus directory. The next time you run
  39. libFuzzer, it will read all of the inputs from the corpus, and continue fuzzing
  40. from there. You can also specify multiple directories. libFuzzer loads existing
  41. inputs from all specified directories, but will only write new ones to the
  42. first one specified.
  43. -max_len=4096 : specify the maximum byte-length of the inputs libFuzzer will
  44. generate.
  45. -close_fd_mask={1,2,3} : close, stderr, or both. Useful for targets that
  46. trigger many debug/error messages, or create output on the serial console.
  47. -jobs=4 -workers=4 : These arguments configure libFuzzer to run 4 fuzzers in
  48. parallel (4 fuzzing jobs in 4 worker processes). Alternatively, with only
  49. -jobs=N, libFuzzer automatically spawns a number of workers less than or equal
  50. to half the available CPU cores. Replace 4 with a number appropriate for your
  51. machine. Make sure to specify a $CORPUS_DIR, which will allow the parallel
  52. fuzzers to share information about the interesting inputs they find.
  53. -use_value_profile=1 : For each comparison operation, libFuzzer computes
  54. (caller_pc&4095) | (popcnt(Arg1 ^ Arg2) << 12) and places this in the coverage
  55. table. Useful for targets with "magic" constants. If Arg1 came from the fuzzer's
  56. input and Arg2 is a magic constant, then each time the Hamming distance
  57. between Arg1 and Arg2 decreases, libFuzzer adds the input to the corpus.
  58. -shrink=1 : Tries to make elements of the corpus "smaller". Might lead to
  59. better coverage performance, depending on the target.
  60. Note that libFuzzer's exact behavior will depend on the version of
  61. clang and libFuzzer used to build the device fuzzers.
  62. == Generating Coverage Reports ==
  63. Code coverage is a crucial metric for evaluating a fuzzer's performance.
  64. libFuzzer's output provides a "cov: " column that provides a total number of
  65. unique blocks/edges covered. To examine coverage on a line-by-line basis we
  66. can use Clang coverage:
  67. 1. Configure libFuzzer to store a corpus of all interesting inputs (see
  68. CORPUS_DIR above)
  69. 2. ./configure the QEMU build with:
  70. --enable-fuzzing \
  71. --extra-cflags="-fprofile-instr-generate -fcoverage-mapping"
  72. 3. Re-run the fuzzer. Specify $CORPUS_DIR/* as an argument, telling libfuzzer
  73. to execute all of the inputs in $CORPUS_DIR and exit. Once the process
  74. exits, you should find a file, "default.profraw" in the working directory.
  75. 4. Execute these commands to generate a detailed HTML coverage-report:
  76. llvm-profdata merge -output=default.profdata default.profraw
  77. llvm-cov show ./path/to/qemu-fuzz-i386 -instr-profile=default.profdata \
  78. --format html -output-dir=/path/to/output/report
  79. == Adding a new fuzzer ==
  80. Coverage over virtual devices can be improved by adding additional fuzzers.
  81. Fuzzers are kept in tests/qtest/fuzz/ and should be added to
  82. tests/qtest/fuzz/Makefile.include
  83. Fuzzers can rely on both qtest and libqos to communicate with virtual devices.
  84. 1. Create a new source file. For example ``tests/qtest/fuzz/foo-device-fuzz.c``.
  85. 2. Write the fuzzing code using the libqtest/libqos API. See existing fuzzers
  86. for reference.
  87. 3. Register the fuzzer in ``tests/fuzz/Makefile.include`` by appending the
  88. corresponding object to fuzz-obj-y
  89. Fuzzers can be more-or-less thought of as special qtest programs which can
  90. modify the qtest commands and/or qtest command arguments based on inputs
  91. provided by libfuzzer. Libfuzzer passes a byte array and length. Commonly the
  92. fuzzer loops over the byte-array interpreting it as a list of qtest commands,
  93. addresses, or values.
  94. = Implementation Details =
  95. == The Fuzzer's Lifecycle ==
  96. The fuzzer has two entrypoints that libfuzzer calls. libfuzzer provides it's
  97. own main(), which performs some setup, and calls the entrypoints:
  98. LLVMFuzzerInitialize: called prior to fuzzing. Used to initialize all of the
  99. necessary state
  100. LLVMFuzzerTestOneInput: called for each fuzzing run. Processes the input and
  101. resets the state at the end of each run.
  102. In more detail:
  103. LLVMFuzzerInitialize parses the arguments to the fuzzer (must start with two
  104. dashes, so they are ignored by libfuzzer main()). Currently, the arguments
  105. select the fuzz target. Then, the qtest client is initialized. If the target
  106. requires qos, qgraph is set up and the QOM/LIBQOS modules are initialized.
  107. Then the QGraph is walked and the QEMU cmd_line is determined and saved.
  108. After this, the vl.c:qemu__main is called to set up the guest. There are
  109. target-specific hooks that can be called before and after qemu_main, for
  110. additional setup(e.g. PCI setup, or VM snapshotting).
  111. LLVMFuzzerTestOneInput: Uses qtest/qos functions to act based on the fuzz
  112. input. It is also responsible for manually calling the main loop/main_loop_wait
  113. to ensure that bottom halves are executed and any cleanup required before the
  114. next input.
  115. Since the same process is reused for many fuzzing runs, QEMU state needs to
  116. be reset at the end of each run. There are currently two implemented
  117. options for resetting state:
  118. 1. Reboot the guest between runs.
  119. Pros: Straightforward and fast for simple fuzz targets.
  120. Cons: Depending on the device, does not reset all device state. If the
  121. device requires some initialization prior to being ready for fuzzing
  122. (common for QOS-based targets), this initialization needs to be done after
  123. each reboot.
  124. Example target: i440fx-qtest-reboot-fuzz
  125. 2. Run each test case in a separate forked process and copy the coverage
  126. information back to the parent. This is fairly similar to AFL's "deferred"
  127. fork-server mode [3]
  128. Pros: Relatively fast. Devices only need to be initialized once. No need
  129. to do slow reboots or vmloads.
  130. Cons: Not officially supported by libfuzzer. Does not work well for devices
  131. that rely on dedicated threads.
  132. Example target: virtio-net-fork-fuzz