123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175 |
- = Fuzzing =
- == Introduction ==
- This document describes the virtual-device fuzzing infrastructure in QEMU and
- how to use it to implement additional fuzzers.
- == Basics ==
- Fuzzing operates by passing inputs to an entry point/target function. The
- fuzzer tracks the code coverage triggered by the input. Based on these
- findings, the fuzzer mutates the input and repeats the fuzzing.
- To fuzz QEMU, we rely on libfuzzer. Unlike other fuzzers such as AFL, libfuzzer
- is an _in-process_ fuzzer. For the developer, this means that it is their
- responsibility to ensure that state is reset between fuzzing-runs.
- == Building the fuzzers ==
- NOTE: If possible, build a 32-bit binary. When forking, the 32-bit fuzzer is
- much faster, since the page-map has a smaller size. This is due to the fact that
- AddressSanitizer mmaps ~20TB of memory, as part of its detection. This results
- in a large page-map, and a much slower fork().
- To build the fuzzers, install a recent version of clang:
- Configure with (substitute the clang binaries with the version you installed).
- Here, enable-sanitizers, is optional but it allows us to reliably detect bugs
- such as out-of-bounds accesses, use-after-frees, double-frees etc.
- CC=clang-8 CXX=clang++-8 /path/to/configure --enable-fuzzing \
- --enable-sanitizers
- Fuzz targets are built similarly to system/softmmu:
- make i386-softmmu/fuzz
- This builds ./i386-softmmu/qemu-fuzz-i386
- The first option to this command is: --fuzz-target=FUZZ_NAME
- To list all of the available fuzzers run qemu-fuzz-i386 with no arguments.
- For example:
- ./i386-softmmu/qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz
- Internally, libfuzzer parses all arguments that do not begin with "--".
- Information about these is available by passing -help=1
- Now the only thing left to do is wait for the fuzzer to trigger potential
- crashes.
- == Useful libFuzzer flags ==
- As mentioned above, libFuzzer accepts some arguments. Passing -help=1 will list
- the available arguments. In particular, these arguments might be helpful:
- $CORPUS_DIR/ : Specify a directory as the last argument to libFuzzer. libFuzzer
- stores each "interesting" input in this corpus directory. The next time you run
- libFuzzer, it will read all of the inputs from the corpus, and continue fuzzing
- from there. You can also specify multiple directories. libFuzzer loads existing
- inputs from all specified directories, but will only write new ones to the
- first one specified.
- -max_len=4096 : specify the maximum byte-length of the inputs libFuzzer will
- generate.
- -close_fd_mask={1,2,3} : close, stderr, or both. Useful for targets that
- trigger many debug/error messages, or create output on the serial console.
- -jobs=4 -workers=4 : These arguments configure libFuzzer to run 4 fuzzers in
- parallel (4 fuzzing jobs in 4 worker processes). Alternatively, with only
- -jobs=N, libFuzzer automatically spawns a number of workers less than or equal
- to half the available CPU cores. Replace 4 with a number appropriate for your
- machine. Make sure to specify a $CORPUS_DIR, which will allow the parallel
- fuzzers to share information about the interesting inputs they find.
- -use_value_profile=1 : For each comparison operation, libFuzzer computes
- (caller_pc&4095) | (popcnt(Arg1 ^ Arg2) << 12) and places this in the coverage
- table. Useful for targets with "magic" constants. If Arg1 came from the fuzzer's
- input and Arg2 is a magic constant, then each time the Hamming distance
- between Arg1 and Arg2 decreases, libFuzzer adds the input to the corpus.
- -shrink=1 : Tries to make elements of the corpus "smaller". Might lead to
- better coverage performance, depending on the target.
- Note that libFuzzer's exact behavior will depend on the version of
- clang and libFuzzer used to build the device fuzzers.
- == Generating Coverage Reports ==
- Code coverage is a crucial metric for evaluating a fuzzer's performance.
- libFuzzer's output provides a "cov: " column that provides a total number of
- unique blocks/edges covered. To examine coverage on a line-by-line basis we
- can use Clang coverage:
- 1. Configure libFuzzer to store a corpus of all interesting inputs (see
- CORPUS_DIR above)
- 2. ./configure the QEMU build with:
- --enable-fuzzing \
- --extra-cflags="-fprofile-instr-generate -fcoverage-mapping"
- 3. Re-run the fuzzer. Specify $CORPUS_DIR/* as an argument, telling libfuzzer
- to execute all of the inputs in $CORPUS_DIR and exit. Once the process
- exits, you should find a file, "default.profraw" in the working directory.
- 4. Execute these commands to generate a detailed HTML coverage-report:
- llvm-profdata merge -output=default.profdata default.profraw
- llvm-cov show ./path/to/qemu-fuzz-i386 -instr-profile=default.profdata \
- --format html -output-dir=/path/to/output/report
- == Adding a new fuzzer ==
- Coverage over virtual devices can be improved by adding additional fuzzers.
- Fuzzers are kept in tests/qtest/fuzz/ and should be added to
- tests/qtest/fuzz/Makefile.include
- Fuzzers can rely on both qtest and libqos to communicate with virtual devices.
- 1. Create a new source file. For example ``tests/qtest/fuzz/foo-device-fuzz.c``.
- 2. Write the fuzzing code using the libqtest/libqos API. See existing fuzzers
- for reference.
- 3. Register the fuzzer in ``tests/fuzz/Makefile.include`` by appending the
- corresponding object to fuzz-obj-y
- Fuzzers can be more-or-less thought of as special qtest programs which can
- modify the qtest commands and/or qtest command arguments based on inputs
- provided by libfuzzer. Libfuzzer passes a byte array and length. Commonly the
- fuzzer loops over the byte-array interpreting it as a list of qtest commands,
- addresses, or values.
- = Implementation Details =
- == The Fuzzer's Lifecycle ==
- The fuzzer has two entrypoints that libfuzzer calls. libfuzzer provides it's
- own main(), which performs some setup, and calls the entrypoints:
- LLVMFuzzerInitialize: called prior to fuzzing. Used to initialize all of the
- necessary state
- LLVMFuzzerTestOneInput: called for each fuzzing run. Processes the input and
- resets the state at the end of each run.
- In more detail:
- LLVMFuzzerInitialize parses the arguments to the fuzzer (must start with two
- dashes, so they are ignored by libfuzzer main()). Currently, the arguments
- select the fuzz target. Then, the qtest client is initialized. If the target
- requires qos, qgraph is set up and the QOM/LIBQOS modules are initialized.
- Then the QGraph is walked and the QEMU cmd_line is determined and saved.
- After this, the vl.c:qemu__main is called to set up the guest. There are
- target-specific hooks that can be called before and after qemu_main, for
- additional setup(e.g. PCI setup, or VM snapshotting).
- LLVMFuzzerTestOneInput: Uses qtest/qos functions to act based on the fuzz
- input. It is also responsible for manually calling the main loop/main_loop_wait
- to ensure that bottom halves are executed and any cleanup required before the
- next input.
- Since the same process is reused for many fuzzing runs, QEMU state needs to
- be reset at the end of each run. There are currently two implemented
- options for resetting state:
- 1. Reboot the guest between runs.
- Pros: Straightforward and fast for simple fuzz targets.
- Cons: Depending on the device, does not reset all device state. If the
- device requires some initialization prior to being ready for fuzzing
- (common for QOS-based targets), this initialization needs to be done after
- each reboot.
- Example target: i440fx-qtest-reboot-fuzz
- 2. Run each test case in a separate forked process and copy the coverage
- information back to the parent. This is fairly similar to AFL's "deferred"
- fork-server mode [3]
- Pros: Relatively fast. Devices only need to be initialized once. No need
- to do slow reboots or vmloads.
- Cons: Not officially supported by libfuzzer. Does not work well for devices
- that rely on dedicated threads.
- Example target: virtio-net-fork-fuzz
|