qpl-compression.rst 9.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260
  1. ===============
  2. QPL Compression
  3. ===============
  4. The Intel Query Processing Library (Intel ``QPL``) is an open-source library to
  5. provide compression and decompression features and it is based on deflate
  6. compression algorithm (RFC 1951).
  7. The ``QPL`` compression relies on Intel In-Memory Analytics Accelerator(``IAA``)
  8. and Shared Virtual Memory(``SVM``) technology, they are new features supported
  9. from Intel 4th Gen Intel Xeon Scalable processors, codenamed Sapphire Rapids
  10. processor(``SPR``).
  11. For more ``QPL`` introduction, please refer to `QPL Introduction
  12. <https://intel.github.io/qpl/documentation/introduction_docs/introduction.html>`_
  13. QPL Compression Framework
  14. =========================
  15. ::
  16. +----------------+ +------------------+
  17. | MultiFD Thread | |accel-config tool |
  18. +-------+--------+ +--------+---------+
  19. | |
  20. | |
  21. |compress/decompress |
  22. +-------+--------+ | Setup IAA
  23. | QPL library | | Resources
  24. +-------+---+----+ |
  25. | | |
  26. | +-------------+-------+
  27. | Open IAA |
  28. | Devices +-----+-----+
  29. | |idxd driver|
  30. | +-----+-----+
  31. | |
  32. | |
  33. | +-----+-----+
  34. +-----------+IAA Devices|
  35. Submit jobs +-----------+
  36. via enqcmd
  37. QPL Build And Installation
  38. --------------------------
  39. .. code-block:: shell
  40. $git clone --recursive https://github.com/intel/qpl.git qpl
  41. $mkdir qpl/build
  42. $cd qpl/build
  43. $cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr -DQPL_LIBRARY_TYPE=SHARED ..
  44. $sudo cmake --build . --target install
  45. For more details about ``QPL`` installation, please refer to `QPL Installation
  46. <https://intel.github.io/qpl/documentation/get_started_docs/installation.html>`_
  47. IAA Device Management
  48. ---------------------
  49. The number of ``IAA`` devices will vary depending on the Xeon product model.
  50. On a ``SPR`` server, there can be a maximum of 8 ``IAA`` devices, with up to
  51. 4 devices per socket.
  52. By default, all ``IAA`` devices are disabled and need to be configured and
  53. enabled by users manually.
  54. Check the number of devices through the following command
  55. .. code-block:: shell
  56. #lspci -d 8086:0cfe
  57. 6a:02.0 System peripheral: Intel Corporation Device 0cfe
  58. 6f:02.0 System peripheral: Intel Corporation Device 0cfe
  59. 74:02.0 System peripheral: Intel Corporation Device 0cfe
  60. 79:02.0 System peripheral: Intel Corporation Device 0cfe
  61. e7:02.0 System peripheral: Intel Corporation Device 0cfe
  62. ec:02.0 System peripheral: Intel Corporation Device 0cfe
  63. f1:02.0 System peripheral: Intel Corporation Device 0cfe
  64. f6:02.0 System peripheral: Intel Corporation Device 0cfe
  65. IAA Device Configuration And Enabling
  66. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  67. The ``accel-config`` tool is used to enable ``IAA`` devices and configure
  68. ``IAA`` hardware resources(work queues and engines). One ``IAA`` device
  69. has 8 work queues and 8 processing engines, multiple engines can be assigned
  70. to a work queue via ``group`` attribute.
  71. For ``accel-config`` installation, please refer to `accel-config installation
  72. <https://github.com/intel/idxd-config>`_
  73. One example of configuring and enabling an ``IAA`` device.
  74. .. code-block:: shell
  75. #accel-config config-engine iax1/engine1.0 -g 0
  76. #accel-config config-engine iax1/engine1.1 -g 0
  77. #accel-config config-engine iax1/engine1.2 -g 0
  78. #accel-config config-engine iax1/engine1.3 -g 0
  79. #accel-config config-engine iax1/engine1.4 -g 0
  80. #accel-config config-engine iax1/engine1.5 -g 0
  81. #accel-config config-engine iax1/engine1.6 -g 0
  82. #accel-config config-engine iax1/engine1.7 -g 0
  83. #accel-config config-wq iax1/wq1.0 -g 0 -s 128 -p 10 -b 1 -t 128 -m shared -y user -n app1 -d user
  84. #accel-config enable-device iax1
  85. #accel-config enable-wq iax1/wq1.0
  86. .. note::
  87. IAX is an early name for IAA
  88. - The ``IAA`` device index is 1, use ``ls -lh /sys/bus/dsa/devices/iax*``
  89. command to query the ``IAA`` device index.
  90. - 8 engines and 1 work queue are configured in group 0, so all compression jobs
  91. submitted to this work queue can be processed by all engines at the same time.
  92. - Set work queue attributes including the work mode, work queue size and so on.
  93. - Enable the ``IAA1`` device and work queue 1.0
  94. .. note::
  95. Set work queue mode to shared mode, since ``QPL`` library only supports
  96. shared mode
  97. For more detailed configuration, please refer to `IAA Configuration Samples
  98. <https://github.com/intel/idxd-config/tree/stable/Documentation/accfg>`_
  99. IAA Unit Test
  100. ^^^^^^^^^^^^^
  101. - Enabling ``IAA`` devices for Xeon platform, please refer to `IAA User Guide
  102. <https://www.intel.com/content/www/us/en/content-details/780887/intel-in-memory-analytics-accelerator-intel-iaa.html>`_
  103. - ``IAA`` device driver is Intel Data Accelerator Driver (idxd), it is
  104. recommended that the minimum version of Linux kernel is 5.18.
  105. - Add ``"intel_iommu=on,sm_on"`` parameter to kernel command line
  106. for ``SVM`` feature enabling.
  107. Here is an easy way to verify ``IAA`` device driver and ``SVM`` with `iaa_test
  108. <https://github.com/intel/idxd-config/tree/stable/test>`_
  109. .. code-block:: shell
  110. #./test/iaa_test
  111. [ info] alloc wq 0 shared size 128 addr 0x7f26cebe5000 batch sz 0xfffffffe xfer sz 0x80000000
  112. [ info] test noop: tflags 0x1 num_desc 1
  113. [ info] preparing descriptor for noop
  114. [ info] Submitted all noop jobs
  115. [ info] verifying task result for 0x16f7e20
  116. [ info] test with op 0 passed
  117. IAA Resources Allocation For Migration
  118. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  119. There is no ``IAA`` resource configuration parameters for migration and
  120. ``accel-config`` tool configuration cannot directly specify the ``IAA``
  121. resources used for migration.
  122. The multifd migration with ``QPL`` compression method will use all work
  123. queues that are enabled and shared mode.
  124. .. note::
  125. Accessing IAA resources requires ``sudo`` command or ``root`` privileges
  126. by default. Administrators can modify the IAA device node ownership
  127. so that QEMU can use IAA with specified user permissions.
  128. For example
  129. #chown -R qemu /dev/iax
  130. Shared Virtual Memory(SVM) Introduction
  131. =======================================
  132. An ability for an accelerator I/O device to operate in the same virtual
  133. memory space of applications on host processors. It also implies the
  134. ability to operate from pageable memory, avoiding functional requirements
  135. to pin memory for DMA operations.
  136. When using ``SVM`` technology, users do not need to reserve memory for the
  137. ``IAA`` device and perform pin memory operation. The ``IAA`` device can
  138. directly access data using the virtual address of the process.
  139. For more ``SVM`` technology, please refer to
  140. `Shared Virtual Addressing (SVA) with ENQCMD
  141. <https://docs.kernel.org/next/x86/sva.html>`_
  142. How To Use QPL Compression In Migration
  143. =======================================
  144. 1 - Installation of ``QPL`` library and ``accel-config`` library if using IAA
  145. 2 - Configure and enable ``IAA`` devices and work queues via ``accel-config``
  146. 3 - Build ``QEMU`` with ``--enable-qpl`` parameter
  147. E.g. configure --target-list=x86_64-softmmu --enable-kvm ``--enable-qpl``
  148. 4 - Enable ``QPL`` compression during migration
  149. Set ``migrate_set_parameter multifd-compression qpl`` when migrating, the
  150. ``QPL`` compression does not support configuring the compression level, it
  151. only supports one compression level.
  152. The Difference Between QPL And ZLIB
  153. ===================================
  154. Although both ``QPL`` and ``ZLIB`` are based on the deflate compression
  155. algorithm, and ``QPL`` can support the header and tail of ``ZLIB``, ``QPL``
  156. is still not fully compatible with the ``ZLIB`` compression in the migration.
  157. ``QPL`` only supports 4K history buffer, and ``ZLIB`` is 32K by default.
  158. ``ZLIB`` compresses data that ``QPL`` may not decompress correctly and
  159. vice versa.
  160. ``QPL`` does not support the ``Z_SYNC_FLUSH`` operation in ``ZLIB`` streaming
  161. compression, current ``ZLIB`` implementation uses ``Z_SYNC_FLUSH``, so each
  162. ``multifd`` thread has a ``ZLIB`` streaming context, and all page compression
  163. and decompression are based on this stream. ``QPL`` cannot decompress such data
  164. and vice versa.
  165. The introduction for ``Z_SYNC_FLUSH``, please refer to `Zlib Manual
  166. <https://www.zlib.net/manual.html>`_
  167. The Best Practices
  168. ==================
  169. When user enables the IAA device for ``QPL`` compression, it is recommended
  170. to add ``-mem-prealloc`` parameter to the destination boot parameters. This
  171. parameter can avoid the occurrence of I/O page fault and reduce the overhead
  172. of IAA compression and decompression.
  173. The example of booting with ``-mem-prealloc`` parameter
  174. .. code-block:: shell
  175. $qemu-system-x86_64 --enable-kvm -cpu host --mem-prealloc ...
  176. An example about I/O page fault measurement of destination without
  177. ``-mem-prealloc``, the ``svm_prq`` indicates the number of I/O page fault
  178. occurrences and processing time.
  179. .. code-block:: shell
  180. #echo 1 > /sys/kernel/debug/iommu/intel/dmar_perf_latency
  181. #echo 2 > /sys/kernel/debug/iommu/intel/dmar_perf_latency
  182. #echo 3 > /sys/kernel/debug/iommu/intel/dmar_perf_latency
  183. #echo 4 > /sys/kernel/debug/iommu/intel/dmar_perf_latency
  184. #cat /sys/kernel/debug/iommu/intel/dmar_perf_latency
  185. IOMMU: dmar18 Register Base Address: c87fc000
  186. <0.1us 0.1us-1us 1us-10us 10us-100us 100us-1ms 1ms-10ms >=10ms min(us) max(us) average(us)
  187. inv_iotlb 0 286 123 0 0 0 0 0 1 0
  188. inv_devtlb 0 276 133 0 0 0 0 0 2 0
  189. inv_iec 0 0 0 0 0 0 0 0 0 0
  190. svm_prq 0 0 25206 364 395 0 0 1 556 9