123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305 |
- ========
- Fuzzing
- ========
- This document describes the virtual-device fuzzing infrastructure in QEMU and
- how to use it to implement additional fuzzers.
- Basics
- ------
- Fuzzing operates by passing inputs to an entry point/target function. The
- fuzzer tracks the code coverage triggered by the input. Based on these
- findings, the fuzzer mutates the input and repeats the fuzzing.
- To fuzz QEMU, we rely on libfuzzer. Unlike other fuzzers such as AFL, libfuzzer
- is an *in-process* fuzzer. For the developer, this means that it is their
- responsibility to ensure that state is reset between fuzzing-runs.
- Building the fuzzers
- --------------------
- To build the fuzzers, install a recent version of clang:
- Configure with (substitute the clang binaries with the version you installed).
- Here, enable-asan and enable-ubsan are optional but they allow us to reliably
- detect bugs such as out-of-bounds accesses, uses-after-free, double-frees
- etc.::
- CC=clang-8 CXX=clang++-8 /path/to/configure \
- --enable-fuzzing --enable-asan --enable-ubsan
- Fuzz targets are built similarly to system targets::
- make qemu-fuzz-i386
- This builds ``./qemu-fuzz-i386``
- The first option to this command is: ``--fuzz-target=FUZZ_NAME``
- To list all of the available fuzzers run ``qemu-fuzz-i386`` with no arguments.
- For example::
- ./qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz
- Internally, libfuzzer parses all arguments that do not begin with ``"--"``.
- Information about these is available by passing ``-help=1``
- Now the only thing left to do is wait for the fuzzer to trigger potential
- crashes.
- Useful libFuzzer flags
- ----------------------
- As mentioned above, libFuzzer accepts some arguments. Passing ``-help=1`` will
- list the available arguments. In particular, these arguments might be helpful:
- * ``CORPUS_DIR/`` : Specify a directory as the last argument to libFuzzer.
- libFuzzer stores each "interesting" input in this corpus directory. The next
- time you run libFuzzer, it will read all of the inputs from the corpus, and
- continue fuzzing from there. You can also specify multiple directories.
- libFuzzer loads existing inputs from all specified directories, but will only
- write new ones to the first one specified.
- * ``-max_len=4096`` : specify the maximum byte-length of the inputs libFuzzer
- will generate.
- * ``-close_fd_mask={1,2,3}`` : close, stderr, or both. Useful for targets that
- trigger many debug/error messages, or create output on the serial console.
- * ``-jobs=4 -workers=4`` : These arguments configure libFuzzer to run 4 fuzzers in
- parallel (4 fuzzing jobs in 4 worker processes). Alternatively, with only
- ``-jobs=N``, libFuzzer automatically spawns a number of workers less than or equal
- to half the available CPU cores. Replace 4 with a number appropriate for your
- machine. Make sure to specify a ``CORPUS_DIR``, which will allow the parallel
- fuzzers to share information about the interesting inputs they find.
- * ``-use_value_profile=1`` : For each comparison operation, libFuzzer computes
- ``(caller_pc&4095) | (popcnt(Arg1 ^ Arg2) << 12)`` and places this in the
- coverage table. Useful for targets with "magic" constants. If Arg1 came from
- the fuzzer's input and Arg2 is a magic constant, then each time the Hamming
- distance between Arg1 and Arg2 decreases, libFuzzer adds the input to the
- corpus.
- * ``-shrink=1`` : Tries to make elements of the corpus "smaller". Might lead to
- better coverage performance, depending on the target.
- Note that libFuzzer's exact behavior will depend on the version of
- clang and libFuzzer used to build the device fuzzers.
- Generating Coverage Reports
- ---------------------------
- Code coverage is a crucial metric for evaluating a fuzzer's performance.
- libFuzzer's output provides a "cov: " column that provides a total number of
- unique blocks/edges covered. To examine coverage on a line-by-line basis we
- can use Clang coverage:
- 1. Configure libFuzzer to store a corpus of all interesting inputs (see
- CORPUS_DIR above)
- 2. ``./configure`` the QEMU build with ::
- --enable-fuzzing \
- --extra-cflags="-fprofile-instr-generate -fcoverage-mapping"
- 3. Re-run the fuzzer. Specify $CORPUS_DIR/* as an argument, telling libfuzzer
- to execute all of the inputs in $CORPUS_DIR and exit. Once the process
- exits, you should find a file, "default.profraw" in the working directory.
- 4. Execute these commands to generate a detailed HTML coverage-report::
- llvm-profdata merge -output=default.profdata default.profraw
- llvm-cov show ./path/to/qemu-fuzz-i386 -instr-profile=default.profdata \
- --format html -output-dir=/path/to/output/report
- Adding a new fuzzer
- -------------------
- Coverage over virtual devices can be improved by adding additional fuzzers.
- Fuzzers are kept in ``tests/qtest/fuzz/`` and should be added to
- ``tests/qtest/fuzz/meson.build``
- Fuzzers can rely on both qtest and libqos to communicate with virtual devices.
- 1. Create a new source file. For example ``tests/qtest/fuzz/foo-device-fuzz.c``.
- 2. Write the fuzzing code using the libqtest/libqos API. See existing fuzzers
- for reference.
- 3. Add the fuzzer to ``tests/qtest/fuzz/meson.build``.
- Fuzzers can be more-or-less thought of as special qtest programs which can
- modify the qtest commands and/or qtest command arguments based on inputs
- provided by libfuzzer. Libfuzzer passes a byte array and length. Commonly the
- fuzzer loops over the byte-array interpreting it as a list of qtest commands,
- addresses, or values.
- The Generic Fuzzer
- ------------------
- Writing a fuzz target can be a lot of effort (especially if a device driver has
- not be built-out within libqos). Many devices can be fuzzed to some degree,
- without any device-specific code, using the generic-fuzz target.
- The generic-fuzz target is capable of fuzzing devices over their PIO, MMIO,
- and DMA input-spaces. To apply the generic-fuzz to a device, we need to define
- two env-variables, at minimum:
- * ``QEMU_FUZZ_ARGS=`` is the set of QEMU arguments used to configure a machine, with
- the device attached. For example, if we want to fuzz the virtio-net device
- attached to a pc-i440fx machine, we can specify::
- QEMU_FUZZ_ARGS="-M pc -nodefaults -netdev user,id=user0 \
- -device virtio-net,netdev=user0"
- * ``QEMU_FUZZ_OBJECTS=`` is a set of space-delimited strings used to identify
- the MemoryRegions that will be fuzzed. These strings are compared against
- MemoryRegion names and MemoryRegion owner names, to decide whether each
- MemoryRegion should be fuzzed. These strings support globbing. For the
- virtio-net example, we could use one of ::
- QEMU_FUZZ_OBJECTS='virtio-net'
- QEMU_FUZZ_OBJECTS='virtio*'
- QEMU_FUZZ_OBJECTS='virtio* pcspk' # Fuzz the virtio devices and the speaker
- QEMU_FUZZ_OBJECTS='*' # Fuzz the whole machine``
- The ``"info mtree"`` and ``"info qom-tree"`` monitor commands can be especially
- useful for identifying the ``MemoryRegion`` and ``Object`` names used for
- matching.
- As a generic rule-of-thumb, the more ``MemoryRegions``/Devices we match, the
- greater the input-space, and the smaller the probability of finding crashing
- inputs for individual devices. As such, it is usually a good idea to limit the
- fuzzer to only a few ``MemoryRegions``.
- To ensure that these env variables have been configured correctly, we can use::
- ./qemu-fuzz-i386 --fuzz-target=generic-fuzz -runs=0
- The output should contain a complete list of matched MemoryRegions.
- OSS-Fuzz
- --------
- QEMU is continuously fuzzed on `OSS-Fuzz
- <https://github.com/google/oss-fuzz>`_. By default, the OSS-Fuzz build
- will try to fuzz every fuzz-target. Since the generic-fuzz target
- requires additional information provided in environment variables, we
- pre-define some generic-fuzz configs in
- ``tests/qtest/fuzz/generic_fuzz_configs.h``. Each config must specify:
- - ``.name``: To identify the fuzzer config
- - ``.args`` OR ``.argfunc``: A string or pointer to a function returning a
- string. These strings are used to specify the ``QEMU_FUZZ_ARGS``
- environment variable. ``argfunc`` is useful when the config relies on e.g.
- a dynamically created temp directory, or a free tcp/udp port.
- - ``.objects``: A string that specifies the ``QEMU_FUZZ_OBJECTS`` environment
- variable.
- To fuzz additional devices/device configuration on OSS-Fuzz, send patches for
- either a new device-specific fuzzer or a new generic-fuzz config.
- Build details:
- - The Dockerfile that sets up the environment for building QEMU's
- fuzzers on OSS-Fuzz can be fund in the OSS-Fuzz repository
- __(https://github.com/google/oss-fuzz/blob/master/projects/qemu/Dockerfile)
- - The script responsible for building the fuzzers can be found in the
- QEMU source tree at ``scripts/oss-fuzz/build.sh``
- Building Crash Reproducers
- -----------------------------------------
- When we find a crash, we should try to create an independent reproducer, that
- can be used on a non-fuzzer build of QEMU. This filters out any potential
- false-positives, and improves the debugging experience for developers.
- Here are the steps for building a reproducer for a crash found by the
- generic-fuzz target.
- - Ensure the crash reproduces::
- qemu-fuzz-i386 --fuzz-target... ./crash-...
- - Gather the QTest output for the crash::
- QEMU_FUZZ_TIMEOUT=0 QTEST_LOG=1 FUZZ_SERIALIZE_QTEST=1 \
- qemu-fuzz-i386 --fuzz-target... ./crash-... &> /tmp/trace
- - Reorder and clean-up the resulting trace::
- scripts/oss-fuzz/reorder_fuzzer_qtest_trace.py /tmp/trace > /tmp/reproducer
- - Get the arguments needed to start qemu, and provide a path to qemu::
- less /tmp/trace # The args should be logged at the top of this file
- export QEMU_ARGS="-machine ..."
- export QEMU_PATH="path/to/qemu-system"
- - Ensure the crash reproduces in qemu-system::
- $QEMU_PATH $QEMU_ARGS -qtest stdio < /tmp/reproducer
- - From the crash output, obtain some string that identifies the crash. This
- can be a line in the stack-trace, for example::
- export CRASH_TOKEN="hw/usb/hcd-xhci.c:1865"
- - Minimize the reproducer::
- scripts/oss-fuzz/minimize_qtest_trace.py -M1 -M2 \
- /tmp/reproducer /tmp/reproducer-minimized
- - Confirm that the minimized reproducer still crashes::
- $QEMU_PATH $QEMU_ARGS -qtest stdio < /tmp/reproducer-minimized
- - Create a one-liner reproducer that can be sent over email::
- ./scripts/oss-fuzz/output_reproducer.py -bash /tmp/reproducer-minimized
- - Output the C source code for a test case that will reproduce the bug::
- ./scripts/oss-fuzz/output_reproducer.py -owner "John Smith <john@smith.com>"\
- -name "test_function_name" /tmp/reproducer-minimized
- - Report the bug and send a patch with the C reproducer upstream
- Implementation Details / Fuzzer Lifecycle
- -----------------------------------------
- The fuzzer has two entrypoints that libfuzzer calls. libfuzzer provides it's
- own ``main()``, which performs some setup, and calls the entrypoints:
- ``LLVMFuzzerInitialize``: called prior to fuzzing. Used to initialize all of the
- necessary state
- ``LLVMFuzzerTestOneInput``: called for each fuzzing run. Processes the input and
- resets the state at the end of each run.
- In more detail:
- ``LLVMFuzzerInitialize`` parses the arguments to the fuzzer (must start with two
- dashes, so they are ignored by libfuzzer ``main()``). Currently, the arguments
- select the fuzz target. Then, the qtest client is initialized. If the target
- requires qos, qgraph is set up and the QOM/LIBQOS modules are initialized.
- Then the QGraph is walked and the QEMU cmd_line is determined and saved.
- After this, the ``vl.c:main`` is called to set up the guest. There are
- target-specific hooks that can be called before and after main, for
- additional setup(e.g. PCI setup, or VM snapshotting).
- ``LLVMFuzzerTestOneInput``: Uses qtest/qos functions to act based on the fuzz
- input. It is also responsible for manually calling ``main_loop_wait`` to ensure
- that bottom halves are executed and any cleanup required before the next input.
- Since the same process is reused for many fuzzing runs, QEMU state needs to
- be reset at the end of each run. For example, this can be done by rebooting the
- VM, after each run.
- - *Pros*: Straightforward and fast for simple fuzz targets.
- - *Cons*: Depending on the device, does not reset all device state. If the
- device requires some initialization prior to being ready for fuzzing (common
- for QOS-based targets), this initialization needs to be done after each
- reboot.
- - *Example target*: ``i440fx-qtest-reboot-fuzz``
|