123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177 |
- Block I/O error injection using ``blkdebug``
- ============================================
- ..
- Copyright (C) 2014-2015 Red Hat Inc
- This work is licensed under the terms of the GNU GPL, version 2 or later. See
- the COPYING file in the top-level directory.
- The ``blkdebug`` block driver is a rule-based error injection engine. It can be
- used to exercise error code paths in block drivers including ``ENOSPC`` (out of
- space) and ``EIO``.
- This document gives an overview of the features available in ``blkdebug``.
- Background
- ----------
- Block drivers have many error code paths that handle I/O errors. Image formats
- are especially complex since metadata I/O errors during cluster allocation or
- while updating tables happen halfway through request processing and require
- discipline to keep image files consistent.
- Error injection allows test cases to trigger I/O errors at specific points.
- This way, all error paths can be tested to make sure they are correct.
- Rules
- -----
- The ``blkdebug`` block driver takes a list of "rules" that tell the error injection
- engine when to fail an I/O request.
- Each I/O request is evaluated against the rules. If a rule matches the request
- then its "action" is executed.
- Rules can be placed in a configuration file; the configuration file
- follows the same .ini-like format used by QEMU's ``-readconfig`` option, and
- each section of the file represents a rule.
- The following configuration file defines a single rule::
- $ cat blkdebug.conf
- [inject-error]
- event = "read_aio"
- errno = "28"
- This rule fails all aio read requests with ``ENOSPC`` (28). Note that the errno
- value depends on the host. On Linux, see
- ``/usr/include/asm-generic/errno-base.h`` for errno values.
- Invoke QEMU as follows::
- $ qemu-system-x86_64
- -drive if=none,cache=none,file=blkdebug:blkdebug.conf:test.img,id=drive0 \
- -device virtio-blk-pci,drive=drive0,id=virtio-blk-pci0
- Rules support the following attributes:
- ``event``
- which type of operation to match (e.g. ``read_aio``, ``write_aio``,
- ``flush_to_os``, ``flush_to_disk``). See `Events`_ for
- information on events.
- ``state``
- (optional) the engine must be in this state number in order for this
- rule to match. See `State transitions`_ for information
- on states.
- ``errno``
- the numeric errno value to return when a request matches this rule.
- The errno values depend on the host since the numeric values are not
- standardized in the POSIX specification.
- ``sector``
- (optional) a sector number that the request must overlap in order to
- match this rule
- ``once``
- (optional, default ``off``) only execute this action on the first
- matching request
- ``immediately``
- (optional, default ``off``) return a NULL ``BlockAIOCB``
- pointer and fail without an errno instead. This
- exercises the code path where ``BlockAIOCB`` fails and the
- caller's ``BlockCompletionFunc`` is not invoked.
- Events
- ------
- Block drivers provide information about the type of I/O request they are about
- to make so rules can match specific types of requests. For example, the ``qcow2``
- block driver tells ``blkdebug`` when it accesses the L1 table so rules can match
- only L1 table accesses and not other metadata or guest data requests.
- The core events are:
- ``read_aio``
- guest data read
- ``write_aio``
- guest data write
- ``flush_to_os``
- write out unwritten block driver state (e.g. cached metadata)
- ``flush_to_disk``
- flush the host block device's disk cache
- See ``qapi/block-core.json:BlkdebugEvent`` for the full list of events.
- You may need to grep block driver source code to understand the
- meaning of specific events.
- State transitions
- -----------------
- There are cases where more power is needed to match a particular I/O request in
- a longer sequence of requests. For example::
- write_aio
- flush_to_disk
- write_aio
- How do we match the 2nd ``write_aio`` but not the first? This is where state
- transitions come in.
- The error injection engine has an integer called the "state" that always starts
- initialized to 1. The state integer is internal to ``blkdebug`` and cannot be
- observed from outside but rules can interact with it for powerful matching
- behavior.
- Rules can be conditional on the current state and they can transition to a new
- state.
- When a rule's "state" attribute is non-zero then the current state must equal
- the attribute in order for the rule to match.
- For example, to match the 2nd write_aio::
- [set-state]
- event = "write_aio"
- state = "1"
- new_state = "2"
- [inject-error]
- event = "write_aio"
- state = "2"
- errno = "5"
- The first ``write_aio`` request matches the ``set-state`` rule and transitions from
- state 1 to state 2. Once state 2 has been entered, the ``set-state`` rule no
- longer matches since it requires state 1. But the ``inject-error`` rule now
- matches the next ``write_aio`` request and injects ``EIO`` (5).
- State transition rules support the following attributes:
- ``event``
- which type of operation to match (e.g. ``read_aio``, ``write_aio``,
- ``flush_to_os`, ``flush_to_disk``). See `Events`_ for
- information on events.
- ``state``
- (optional) the engine must be in this state number in order for this
- rule to match
- ``new_state``
- transition to this state number
- Suspend and resume
- ------------------
- Exercising code paths in block drivers may require specific ordering amongst
- concurrent requests. The "breakpoint" feature allows requests to be halted on
- a ``blkdebug`` event and resumed later. This makes it possible to achieve
- deterministic ordering when multiple requests are in flight.
- Breakpoints on ``blkdebug`` events are associated with a user-defined ``tag`` string.
- This tag serves as an identifier by which the request can be resumed at a later
- point.
- See the ``qemu-io(1)`` ``break``, ``resume``, ``remove_break``, and ``wait_break``
- commands for details.
|