2
0

blkdebug.rst 5.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177
  1. Block I/O error injection using ``blkdebug``
  2. ============================================
  3. ..
  4. Copyright (C) 2014-2015 Red Hat Inc
  5. This work is licensed under the terms of the GNU GPL, version 2 or later. See
  6. the COPYING file in the top-level directory.
  7. The ``blkdebug`` block driver is a rule-based error injection engine. It can be
  8. used to exercise error code paths in block drivers including ``ENOSPC`` (out of
  9. space) and ``EIO``.
  10. This document gives an overview of the features available in ``blkdebug``.
  11. Background
  12. ----------
  13. Block drivers have many error code paths that handle I/O errors. Image formats
  14. are especially complex since metadata I/O errors during cluster allocation or
  15. while updating tables happen halfway through request processing and require
  16. discipline to keep image files consistent.
  17. Error injection allows test cases to trigger I/O errors at specific points.
  18. This way, all error paths can be tested to make sure they are correct.
  19. Rules
  20. -----
  21. The ``blkdebug`` block driver takes a list of "rules" that tell the error injection
  22. engine when to fail an I/O request.
  23. Each I/O request is evaluated against the rules. If a rule matches the request
  24. then its "action" is executed.
  25. Rules can be placed in a configuration file; the configuration file
  26. follows the same .ini-like format used by QEMU's ``-readconfig`` option, and
  27. each section of the file represents a rule.
  28. The following configuration file defines a single rule::
  29. $ cat blkdebug.conf
  30. [inject-error]
  31. event = "read_aio"
  32. errno = "28"
  33. This rule fails all aio read requests with ``ENOSPC`` (28). Note that the errno
  34. value depends on the host. On Linux, see
  35. ``/usr/include/asm-generic/errno-base.h`` for errno values.
  36. Invoke QEMU as follows::
  37. $ qemu-system-x86_64
  38. -drive if=none,cache=none,file=blkdebug:blkdebug.conf:test.img,id=drive0 \
  39. -device virtio-blk-pci,drive=drive0,id=virtio-blk-pci0
  40. Rules support the following attributes:
  41. ``event``
  42. which type of operation to match (e.g. ``read_aio``, ``write_aio``,
  43. ``flush_to_os``, ``flush_to_disk``). See `Events`_ for
  44. information on events.
  45. ``state``
  46. (optional) the engine must be in this state number in order for this
  47. rule to match. See `State transitions`_ for information
  48. on states.
  49. ``errno``
  50. the numeric errno value to return when a request matches this rule.
  51. The errno values depend on the host since the numeric values are not
  52. standardized in the POSIX specification.
  53. ``sector``
  54. (optional) a sector number that the request must overlap in order to
  55. match this rule
  56. ``once``
  57. (optional, default ``off``) only execute this action on the first
  58. matching request
  59. ``immediately``
  60. (optional, default ``off``) return a NULL ``BlockAIOCB``
  61. pointer and fail without an errno instead. This
  62. exercises the code path where ``BlockAIOCB`` fails and the
  63. caller's ``BlockCompletionFunc`` is not invoked.
  64. Events
  65. ------
  66. Block drivers provide information about the type of I/O request they are about
  67. to make so rules can match specific types of requests. For example, the ``qcow2``
  68. block driver tells ``blkdebug`` when it accesses the L1 table so rules can match
  69. only L1 table accesses and not other metadata or guest data requests.
  70. The core events are:
  71. ``read_aio``
  72. guest data read
  73. ``write_aio``
  74. guest data write
  75. ``flush_to_os``
  76. write out unwritten block driver state (e.g. cached metadata)
  77. ``flush_to_disk``
  78. flush the host block device's disk cache
  79. See ``qapi/block-core.json:BlkdebugEvent`` for the full list of events.
  80. You may need to grep block driver source code to understand the
  81. meaning of specific events.
  82. State transitions
  83. -----------------
  84. There are cases where more power is needed to match a particular I/O request in
  85. a longer sequence of requests. For example::
  86. write_aio
  87. flush_to_disk
  88. write_aio
  89. How do we match the 2nd ``write_aio`` but not the first? This is where state
  90. transitions come in.
  91. The error injection engine has an integer called the "state" that always starts
  92. initialized to 1. The state integer is internal to ``blkdebug`` and cannot be
  93. observed from outside but rules can interact with it for powerful matching
  94. behavior.
  95. Rules can be conditional on the current state and they can transition to a new
  96. state.
  97. When a rule's "state" attribute is non-zero then the current state must equal
  98. the attribute in order for the rule to match.
  99. For example, to match the 2nd write_aio::
  100. [set-state]
  101. event = "write_aio"
  102. state = "1"
  103. new_state = "2"
  104. [inject-error]
  105. event = "write_aio"
  106. state = "2"
  107. errno = "5"
  108. The first ``write_aio`` request matches the ``set-state`` rule and transitions from
  109. state 1 to state 2. Once state 2 has been entered, the ``set-state`` rule no
  110. longer matches since it requires state 1. But the ``inject-error`` rule now
  111. matches the next ``write_aio`` request and injects ``EIO`` (5).
  112. State transition rules support the following attributes:
  113. ``event``
  114. which type of operation to match (e.g. ``read_aio``, ``write_aio``,
  115. ``flush_to_os`, ``flush_to_disk``). See `Events`_ for
  116. information on events.
  117. ``state``
  118. (optional) the engine must be in this state number in order for this
  119. rule to match
  120. ``new_state``
  121. transition to this state number
  122. Suspend and resume
  123. ------------------
  124. Exercising code paths in block drivers may require specific ordering amongst
  125. concurrent requests. The "breakpoint" feature allows requests to be halted on
  126. a ``blkdebug`` event and resumed later. This makes it possible to achieve
  127. deterministic ordering when multiple requests are in flight.
  128. Breakpoints on ``blkdebug`` events are associated with a user-defined ``tag`` string.
  129. This tag serves as an identifier by which the request can be resumed at a later
  130. point.
  131. See the ``qemu-io(1)`` ``break``, ``resume``, ``remove_break``, and ``wait_break``
  132. commands for details.