multiple-iothreads.txt 6.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137
  1. Copyright (c) 2014 Red Hat Inc.
  2. This work is licensed under the terms of the GNU GPL, version 2 or later. See
  3. the COPYING file in the top-level directory.
  4. This document explains the IOThread feature and how to write code that runs
  5. outside the QEMU global mutex.
  6. The main loop and IOThreads
  7. ---------------------------
  8. QEMU is an event-driven program that can do several things at once using an
  9. event loop. The VNC server and the QMP monitor are both processed from the
  10. same event loop, which monitors their file descriptors until they become
  11. readable and then invokes a callback.
  12. The default event loop is called the main loop (see main-loop.c). It is
  13. possible to create additional event loop threads using -object
  14. iothread,id=my-iothread.
  15. Side note: The main loop and IOThread are both event loops but their code is
  16. not shared completely. Sometimes it is useful to remember that although they
  17. are conceptually similar they are currently not interchangeable.
  18. Why IOThreads are useful
  19. ------------------------
  20. IOThreads allow the user to control the placement of work. The main loop is a
  21. scalability bottleneck on hosts with many CPUs. Work can be spread across
  22. several IOThreads instead of just one main loop. When set up correctly this
  23. can improve I/O latency and reduce jitter seen by the guest.
  24. The main loop is also deeply associated with the QEMU global mutex, which is a
  25. scalability bottleneck in itself. vCPU threads and the main loop use the QEMU
  26. global mutex to serialize execution of QEMU code. This mutex is necessary
  27. because a lot of QEMU's code historically was not thread-safe.
  28. The fact that all I/O processing is done in a single main loop and that the
  29. QEMU global mutex is contended by all vCPU threads and the main loop explain
  30. why it is desirable to place work into IOThreads.
  31. The experimental virtio-blk data-plane implementation has been benchmarked and
  32. shows these effects:
  33. ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
  34. How to program for IOThreads
  35. ----------------------------
  36. The main difference between legacy code and new code that can run in an
  37. IOThread is dealing explicitly with the event loop object, AioContext
  38. (see include/block/aio.h). Code that only works in the main loop
  39. implicitly uses the main loop's AioContext. Code that supports running
  40. in IOThreads must be aware of its AioContext.
  41. AioContext supports the following services:
  42. * File descriptor monitoring (read/write/error on POSIX hosts)
  43. * Event notifiers (inter-thread signalling)
  44. * Timers
  45. * Bottom Halves (BH) deferred callbacks
  46. There are several old APIs that use the main loop AioContext:
  47. * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
  48. * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
  49. * LEGACY timer_new_ms() - create a timer
  50. * LEGACY qemu_bh_new() - create a BH
  51. * LEGACY qemu_aio_wait() - run an event loop iteration
  52. Since they implicitly work on the main loop they cannot be used in code that
  53. runs in an IOThread. They might cause a crash or deadlock if called from an
  54. IOThread since the QEMU global mutex is not held.
  55. Instead, use the AioContext functions directly (see include/block/aio.h):
  56. * aio_set_fd_handler() - monitor a file descriptor
  57. * aio_set_event_notifier() - monitor an event notifier
  58. * aio_timer_new() - create a timer
  59. * aio_bh_new() - create a BH
  60. * aio_poll() - run an event loop iteration
  61. The AioContext can be obtained from the IOThread using
  62. iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
  63. Code that takes an AioContext argument works both in IOThreads or the main
  64. loop, depending on which AioContext instance the caller passes in.
  65. How to synchronize with an IOThread
  66. -----------------------------------
  67. AioContext is not thread-safe so some rules must be followed when using file
  68. descriptors, event notifiers, timers, or BHs across threads:
  69. 1. AioContext functions can always be called safely. They handle their
  70. own locking internally.
  71. 2. Other threads wishing to access the AioContext must use
  72. aio_context_acquire()/aio_context_release() for mutual exclusion. Once the
  73. context is acquired no other thread can access it or run event loop iterations
  74. in this AioContext.
  75. aio_context_acquire()/aio_context_release() calls may be nested. This
  76. means you can call them if you're not sure whether #2 applies.
  77. There is currently no lock ordering rule if a thread needs to acquire multiple
  78. AioContexts simultaneously. Therefore, it is only safe for code holding the
  79. QEMU global mutex to acquire other AioContexts.
  80. Side note: the best way to schedule a function call across threads is to call
  81. aio_bh_schedule_oneshot(). No acquire/release or locking is needed.
  82. AioContext and the block layer
  83. ------------------------------
  84. The AioContext originates from the QEMU block layer, even though nowadays
  85. AioContext is a generic event loop that can be used by any QEMU subsystem.
  86. The block layer has support for AioContext integrated. Each BlockDriverState
  87. is associated with an AioContext using bdrv_set_aio_context() and
  88. bdrv_get_aio_context(). This allows block layer code to process I/O inside the
  89. right AioContext. Other subsystems may wish to follow a similar approach.
  90. Block layer code must therefore expect to run in an IOThread and avoid using
  91. old APIs that implicitly use the main loop. See the "How to program for
  92. IOThreads" above for information on how to do that.
  93. If main loop code such as a QMP function wishes to access a BlockDriverState
  94. it must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure
  95. that callbacks in the IOThread do not run in parallel.
  96. Code running in the monitor typically needs to ensure that past
  97. requests from the guest are completed. When a block device is running
  98. in an IOThread, the IOThread can also process requests from the guest
  99. (via ioeventfd). To achieve both objects, wrap the code between
  100. bdrv_drained_begin() and bdrv_drained_end(), thus creating a "drained
  101. section". The functions must be called between aio_context_acquire()
  102. and aio_context_release(). You can freely release and re-acquire the
  103. AioContext within a drained section.
  104. Long-running jobs (usually in the form of coroutines) are best scheduled in
  105. the BlockDriverState's AioContext to avoid the need to acquire/release around
  106. each bdrv_*() call. The functions bdrv_add/remove_aio_context_notifier,
  107. or alternatively blk_add/remove_aio_context_notifier if you use BlockBackends,
  108. can be used to get a notification whenever bdrv_set_aio_context() moves a
  109. BlockDriverState to a different AioContext.