|
@@ -0,0 +1,166 @@
|
|
|
+===============================
|
|
|
+IOMMUFD BACKEND usage with VFIO
|
|
|
+===============================
|
|
|
+
|
|
|
+(Same meaning for backend/container/BE)
|
|
|
+
|
|
|
+With the introduction of iommufd, the Linux kernel provides a generic
|
|
|
+interface for user space drivers to propagate their DMA mappings to kernel
|
|
|
+for assigned devices. While the legacy kernel interface is group-centric,
|
|
|
+the new iommufd interface is device-centric, relying on device fd and iommufd.
|
|
|
+
|
|
|
+To support both interfaces in the QEMU VFIO device, introduce a base container
|
|
|
+to abstract the common part of VFIO legacy and iommufd container. So that the
|
|
|
+generic VFIO code can use either container.
|
|
|
+
|
|
|
+The base container implements generic functions such as memory_listener and
|
|
|
+address space management whereas the derived container implements callbacks
|
|
|
+specific to either legacy or iommufd. Each container has its own way to setup
|
|
|
+secure context and dma management interface. The below diagram shows how it
|
|
|
+looks like with both containers.
|
|
|
+
|
|
|
+::
|
|
|
+
|
|
|
+ VFIO AddressSpace/Memory
|
|
|
+ +-------+ +----------+ +-----+ +-----+
|
|
|
+ | pci | | platform | | ap | | ccw |
|
|
|
+ +---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+
|
|
|
+ | | | | | AddressSpace |
|
|
|
+ | | | | +------------+---------+
|
|
|
+ +---V-----------V-----------V--------V----+ /
|
|
|
+ | VFIOAddressSpace | <------------+
|
|
|
+ | | | MemoryListener
|
|
|
+ | VFIOContainerBase list |
|
|
|
+ +-------+----------------------------+----+
|
|
|
+ | |
|
|
|
+ | |
|
|
|
+ +-------V------+ +--------V----------+
|
|
|
+ | iommufd | | vfio legacy |
|
|
|
+ | container | | container |
|
|
|
+ +-------+------+ +--------+----------+
|
|
|
+ | |
|
|
|
+ | /dev/iommu | /dev/vfio/vfio
|
|
|
+ | /dev/vfio/devices/vfioX | /dev/vfio/$group_id
|
|
|
+ Userspace | |
|
|
|
+ ============+============================+===========================
|
|
|
+ Kernel | device fd |
|
|
|
+ +---------------+ | group/container fd
|
|
|
+ | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU)
|
|
|
+ | ATTACH_IOAS) | | device fd
|
|
|
+ | | |
|
|
|
+ | +-------V------------V-----------------+
|
|
|
+ iommufd | | vfio |
|
|
|
+ (map/unmap | +---------+--------------------+-------+
|
|
|
+ ioas_copy) | | | map/unmap
|
|
|
+ | | |
|
|
|
+ +------V------+ +-----V------+ +------V--------+
|
|
|
+ | iommfd core | | device | | vfio iommu |
|
|
|
+ +-------------+ +------------+ +---------------+
|
|
|
+
|
|
|
+* Secure Context setup
|
|
|
+
|
|
|
+ - iommufd BE: uses device fd and iommufd to setup secure context
|
|
|
+ (bind_iommufd, attach_ioas)
|
|
|
+ - vfio legacy BE: uses group fd and container fd to setup secure context
|
|
|
+ (set_container, set_iommu)
|
|
|
+
|
|
|
+* Device access
|
|
|
+
|
|
|
+ - iommufd BE: device fd is opened through ``/dev/vfio/devices/vfioX``
|
|
|
+ - vfio legacy BE: device fd is retrieved from group fd ioctl
|
|
|
+
|
|
|
+* DMA Mapping flow
|
|
|
+
|
|
|
+ 1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
|
|
|
+ 2. VFIO populates DMA map/unmap via the container BEs
|
|
|
+ * iommufd BE: uses iommufd
|
|
|
+ * vfio legacy BE: uses container fd
|
|
|
+
|
|
|
+Example configuration
|
|
|
+=====================
|
|
|
+
|
|
|
+Step 1: configure the host device
|
|
|
+---------------------------------
|
|
|
+
|
|
|
+It's exactly same as the VFIO device with legacy VFIO container.
|
|
|
+
|
|
|
+Step 2: configure QEMU
|
|
|
+----------------------
|
|
|
+
|
|
|
+Interactions with the ``/dev/iommu`` are abstracted by a new iommufd
|
|
|
+object (compiled in with the ``CONFIG_IOMMUFD`` option).
|
|
|
+
|
|
|
+Any QEMU device (e.g. VFIO device) wishing to use ``/dev/iommu`` must
|
|
|
+be linked with an iommufd object. It gets a new optional property
|
|
|
+named iommufd which allows to pass an iommufd object. Take ``vfio-pci``
|
|
|
+device for example:
|
|
|
+
|
|
|
+.. code-block:: bash
|
|
|
+
|
|
|
+ -object iommufd,id=iommufd0
|
|
|
+ -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
|
|
|
+
|
|
|
+Note the ``/dev/iommu`` and VFIO cdev can be externally opened by a
|
|
|
+management layer. In such a case the fd is passed, the fd supports a
|
|
|
+string naming the fd or a number, for example:
|
|
|
+
|
|
|
+.. code-block:: bash
|
|
|
+
|
|
|
+ -object iommufd,id=iommufd0,fd=22
|
|
|
+ -device vfio-pci,iommufd=iommufd0,fd=23
|
|
|
+
|
|
|
+If the ``fd`` property is not passed, the fd is opened by QEMU.
|
|
|
+
|
|
|
+If no ``iommufd`` object is passed to the ``vfio-pci`` device, iommufd
|
|
|
+is not used and the user gets the behavior based on the legacy VFIO
|
|
|
+container:
|
|
|
+
|
|
|
+.. code-block:: bash
|
|
|
+
|
|
|
+ -device vfio-pci,host=0000:02:00.0
|
|
|
+
|
|
|
+Supported platform
|
|
|
+==================
|
|
|
+
|
|
|
+Supports x86, ARM and s390x currently.
|
|
|
+
|
|
|
+Caveats
|
|
|
+=======
|
|
|
+
|
|
|
+Dirty page sync
|
|
|
+---------------
|
|
|
+
|
|
|
+Dirty page sync with iommufd backend is unsupported yet, live migration is
|
|
|
+disabled by default. But it can be force enabled like below, low efficient
|
|
|
+though.
|
|
|
+
|
|
|
+.. code-block:: bash
|
|
|
+
|
|
|
+ -object iommufd,id=iommufd0
|
|
|
+ -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0,enable-migration=on
|
|
|
+
|
|
|
+P2P DMA
|
|
|
+-------
|
|
|
+
|
|
|
+PCI p2p DMA is unsupported as IOMMUFD doesn't support mapping hardware PCI
|
|
|
+BAR region yet. Below warning shows for assigned PCI device, it's not a bug.
|
|
|
+
|
|
|
+.. code-block:: none
|
|
|
+
|
|
|
+ qemu-system-x86_64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR?
|
|
|
+ qemu-system-x86_64: vfio_container_dma_map(0x560cb6cb1620, 0xe000000021000, 0x3000, 0x7f32ed55c000) = -14 (Bad address)
|
|
|
+
|
|
|
+FD passing with mdev
|
|
|
+--------------------
|
|
|
+
|
|
|
+``vfio-pci`` device checks sysfsdev property to decide if backend is a mdev.
|
|
|
+If FD passing is used, there is no way to know that and the mdev is treated
|
|
|
+like a real PCI device. There is an error as below if user wants to enable
|
|
|
+RAM discarding for mdev.
|
|
|
+
|
|
|
+.. code-block:: none
|
|
|
+
|
|
|
+ qemu-system-x86_64: -device vfio-pci,iommufd=iommufd0,x-balloon-allowed=on,fd=9: vfio VFIO_FD9: x-balloon-allowed only potentially compatible with mdev devices
|
|
|
+
|
|
|
+``vfio-ap`` and ``vfio-ccw`` devices don't have same issue as their backend
|
|
|
+devices are always mdev and RAM discarding is force enabled.
|