123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152115311541155115611571158115911601161116211631164116511661167116811691170117111721173117411751176117711781179118011811182118311841185118611871188118911901191119211931194119511961197119811991200120112021203120412051206120712081209121012111212121312141215121612171218121912201221122212231224122512261227122812291230123112321233123412351236123712381239124012411242124312441245124612471248124912501251125212531254125512561257125812591260126112621263126412651266126712681269127012711272127312741275127612771278127912801281128212831284128512861287128812891290129112921293129412951296129712981299130013011302130313041305130613071308130913101311131213131314131513161317131813191320132113221323132413251326132713281329133013311332133313341335133613371338133913401341134213431344134513461347134813491350135113521353135413551356135713581359136013611362136313641365136613671368136913701371137213731374137513761377137813791380138113821383138413851386138713881389139013911392139313941395139613971398139914001401140214031404140514061407140814091410141114121413141414151416141714181419142014211422142314241425142614271428142914301431143214331434143514361437143814391440144114421443144414451446144714481449145014511452145314541455145614571458145914601461146214631464146514661467146814691470147114721473147414751476147714781479148014811482148314841485148614871488148914901491149214931494149514961497149814991500150115021503150415051506150715081509151015111512151315141515151615171518151915201521152215231524152515261527152815291530153115321533153415351536153715381539154015411542154315441545154615471548154915501551155215531554155515561557155815591560156115621563156415651566156715681569157015711572157315741575157615771578157915801581158215831584158515861587158815891590159115921593159415951596159715981599160016011602160316041605160616071608160916101611161216131614161516161617161816191620162116221623162416251626162716281629163016311632163316341635163616371638163916401641164216431644164516461647164816491650165116521653165416551656165716581659166016611662166316641665166616671668166916701671167216731674167516761677167816791680168116821683168416851686168716881689169016911692169316941695169616971698169917001701170217031704170517061707170817091710171117121713171417151716171717181719172017211722172317241725172617271728172917301731173217331734173517361737173817391740174117421743174417451746174717481749175017511752175317541755175617571758175917601761176217631764176517661767176817691770177117721773177417751776177717781779178017811782178317841785178617871788178917901791179217931794179517961797179817991800180118021803180418051806180718081809181018111812181318141815181618171818181918201821182218231824182518261827182818291830183118321833183418351836183718381839184018411842184318441845184618471848184918501851185218531854185518561857185818591860186118621863186418651866186718681869187018711872187318741875187618771878187918801881188218831884188518861887188818891890189118921893189418951896189718981899190019011902190319041905190619071908190919101911191219131914191519161917191819191920192119221923192419251926192719281929193019311932193319341935193619371938193919401941194219431944194519461947194819491950195119521953195419551956195719581959196019611962196319641965196619671968196919701971197219731974197519761977197819791980198119821983198419851986198719881989199019911992199319941995199619971998199920002001200220032004200520062007200820092010 |
- .. _vhost_user_proto:
- ===================
- Vhost-user Protocol
- ===================
- ..
- Copyright 2014 Virtual Open Systems Sarl.
- Copyright 2019 Intel Corporation
- Licence: This work is licensed under the terms of the GNU GPL,
- version 2 or later. See the COPYING file in the top-level
- directory.
- .. contents:: Table of Contents
- Introduction
- ============
- This protocol is aiming to complement the ``ioctl`` interface used to
- control the vhost implementation in the Linux kernel. It implements
- the control plane needed to establish virtqueue sharing with a user
- space process on the same host. It uses communication over a Unix
- domain socket to share file descriptors in the ancillary data of the
- message.
- The protocol defines 2 sides of the communication, *front-end* and
- *back-end*. The *front-end* is the application that shares its virtqueues, in
- our case QEMU. The *back-end* is the consumer of the virtqueues.
- In the current implementation QEMU is the *front-end*, and the *back-end*
- is the external process consuming the virtio queues, for example a
- software Ethernet switch running in user space, such as Snabbswitch,
- or a block device back-end processing read & write to a virtual
- disk. In order to facilitate interoperability between various back-end
- implementations, it is recommended to follow the :ref:`Backend program
- conventions <backend_conventions>`.
- The *front-end* and *back-end* can be either a client (i.e. connecting) or
- server (listening) in the socket communication.
- Support for platforms other than Linux
- --------------------------------------
- While vhost-user was initially developed targeting Linux, nowadays it
- is supported on any platform that provides the following features:
- - A way for requesting shared memory represented by a file descriptor
- so it can be passed over a UNIX domain socket and then mapped by the
- other process.
- - AF_UNIX sockets with SCM_RIGHTS, so QEMU and the other process can
- exchange messages through it, including ancillary data when needed.
- - Either eventfd or pipe/pipe2. On platforms where eventfd is not
- available, QEMU will automatically fall back to pipe2 or, as a last
- resort, pipe. Each file descriptor will be used for receiving or
- sending events by reading or writing (respectively) an 8-byte value
- to the corresponding it. The 8-value itself has no meaning and
- should not be interpreted.
- Message Specification
- =====================
- .. Note:: All numbers are in the machine native byte order.
- A vhost-user message consists of 3 header fields and a payload.
- +---------+-------+------+---------+
- | request | flags | size | payload |
- +---------+-------+------+---------+
- Header
- ------
- :request: 32-bit type of the request
- :flags: 32-bit bit field
- - Lower 2 bits are the version (currently 0x01)
- - Bit 2 is the reply flag - needs to be sent on each reply from the back-end
- - Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for
- details.
- :size: 32-bit size of the payload
- Payload
- -------
- Depending on the request type, **payload** can be:
- A single 64-bit integer
- ^^^^^^^^^^^^^^^^^^^^^^^
- +-----+
- | u64 |
- +-----+
- :u64: a 64-bit unsigned integer
- A vring state description
- ^^^^^^^^^^^^^^^^^^^^^^^^^
- +-------+-----+
- | index | num |
- +-------+-----+
- :index: a 32-bit index
- :num: a 32-bit number
- A vring descriptor index for split virtqueues
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- +-------------+---------------------+
- | vring index | index in avail ring |
- +-------------+---------------------+
- :vring index: 32-bit index of the respective virtqueue
- :index in avail ring: 32-bit value, of which currently only the lower 16
- bits are used:
- - Bits 0–15: Index of the next *Available Ring* descriptor that the
- back-end will process. This is a free-running index that is not
- wrapped by the ring size.
- - Bits 16–31: Reserved (set to zero)
- Vring descriptor indices for packed virtqueues
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- +-------------+--------------------+
- | vring index | descriptor indices |
- +-------------+--------------------+
- :vring index: 32-bit index of the respective virtqueue
- :descriptor indices: 32-bit value:
- - Bits 0–14: Index of the next *Available Ring* descriptor that the
- back-end will process. This is a free-running index that is not
- wrapped by the ring size.
- - Bit 15: Driver (Available) Ring Wrap Counter
- - Bits 16–30: Index of the entry in the *Used Ring* where the back-end
- will place the next descriptor. This is a free-running index that
- is not wrapped by the ring size.
- - Bit 31: Device (Used) Ring Wrap Counter
- A vring address description
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^
- +-------+-------+------------+------+-----------+-----+
- | index | flags | descriptor | used | available | log |
- +-------+-------+------------+------+-----------+-----+
- :index: a 32-bit vring index
- :flags: a 32-bit vring flags
- :descriptor: a 64-bit ring address of the vring descriptor table
- :used: a 64-bit ring address of the vring used ring
- :available: a 64-bit ring address of the vring available ring
- :log: a 64-bit guest address for logging
- Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has
- been negotiated. Otherwise it is a user address.
- .. _memory_region_description:
- Memory region description
- ^^^^^^^^^^^^^^^^^^^^^^^^^
- +---------------+------+--------------+-------------+
- | guest address | size | user address | mmap offset |
- +---------------+------+--------------+-------------+
- :guest address: a 64-bit guest address of the region
- :size: a 64-bit size
- :user address: a 64-bit user address
- :mmap offset: a 64-bit offset where region starts in the mapped memory
- When the ``VHOST_USER_PROTOCOL_F_XEN_MMAP`` protocol feature has been
- successfully negotiated, the memory region description contains two extra
- fields at the end.
- +---------------+------+--------------+-------------+----------------+-------+
- | guest address | size | user address | mmap offset | xen mmap flags | domid |
- +---------------+------+--------------+-------------+----------------+-------+
- :xen mmap flags: a 32-bit bit field
- - Bit 0 is set for Xen foreign memory mapping.
- - Bit 1 is set for Xen grant memory mapping.
- - Bit 8 is set if the memory region can not be mapped in advance, and memory
- areas within this region must be mapped / unmapped only when required by the
- back-end. The back-end shouldn't try to map the entire region at once, as the
- front-end may not allow it. The back-end should rather map only the required
- amount of memory at once and unmap it after it is used.
- :domid: a 32-bit Xen hypervisor specific domain id.
- Single memory region description
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- +---------+--------+
- | padding | region |
- +---------+--------+
- :padding: 64-bit
- :region: region is represented by :ref:`Memory region description <memory_region_description>`.
- Multiple Memory regions description
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- +-------------+---------+---------+-----+---------+
- | num regions | padding | region0 | ... | region7 |
- +-------------+---------+---------+-----+---------+
- :num regions: a 32-bit number of regions
- :padding: 32-bit
- :regions: regions field contains 8 regions of type :ref:`Memory region description <memory_region_description>`.
- Log description
- ^^^^^^^^^^^^^^^
- +----------+------------+
- | log size | log offset |
- +----------+------------+
- :log size: a 64-bit size of area used for logging
- :log offset: a 64-bit offset from start of supplied file descriptor where
- logging starts (i.e. where guest address 0 would be
- logged)
- An IOTLB message
- ^^^^^^^^^^^^^^^^
- +------+------+--------------+-------------------+------+
- | iova | size | user address | permissions flags | type |
- +------+------+--------------+-------------------+------+
- :iova: a 64-bit I/O virtual address programmed by the guest
- :size: a 64-bit size
- :user address: a 64-bit user address
- :permissions flags: an 8-bit value:
- - 0: No access
- - 1: Read access
- - 2: Write access
- - 3: Read/Write access
- :type: an 8-bit IOTLB message type:
- - 1: IOTLB miss
- - 2: IOTLB update
- - 3: IOTLB invalidate
- - 4: IOTLB access fail
- Virtio device config space
- ^^^^^^^^^^^^^^^^^^^^^^^^^^
- +--------+------+-------+---------+
- | offset | size | flags | payload |
- +--------+------+-------+---------+
- :offset: a 32-bit offset of virtio device's configuration space
- :size: a 32-bit configuration space access size in bytes
- :flags: a 32-bit value:
- - 0: Vhost front-end messages used for writable fields
- - 1: Vhost front-end messages used for live migration
- :payload: Size bytes array holding the contents of the virtio
- device's configuration space
- Vring area description
- ^^^^^^^^^^^^^^^^^^^^^^
- +-----+------+--------+
- | u64 | size | offset |
- +-----+------+--------+
- :u64: a 64-bit integer contains vring index and flags
- :size: a 64-bit size of this area
- :offset: a 64-bit offset of this area from the start of the
- supplied file descriptor
- Inflight description
- ^^^^^^^^^^^^^^^^^^^^
- +-----------+-------------+------------+------------+
- | mmap size | mmap offset | num queues | queue size |
- +-----------+-------------+------------+------------+
- :mmap size: a 64-bit size of area to track inflight I/O
- :mmap offset: a 64-bit offset of this area from the start
- of the supplied file descriptor
- :num queues: a 16-bit number of virtqueues
- :queue size: a 16-bit size of virtqueues
- VhostUserShared
- ^^^^^^^^^^^^^^^
- +------+
- | UUID |
- +------+
- :UUID: 16 bytes UUID, whose first three components (a 32-bit value, then
- two 16-bit values) are stored in big endian.
- Device state transfer parameters
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- +--------------------+-----------------+
- | transfer direction | migration phase |
- +--------------------+-----------------+
- :transfer direction: a 32-bit enum, describing the direction in which
- the state is transferred:
- - 0: Save: Transfer the state from the back-end to the front-end,
- which happens on the source side of migration
- - 1: Load: Transfer the state from the front-end to the back-end,
- which happens on the destination side of migration
- :migration phase: a 32-bit enum, describing the state in which the VM
- guest and devices are:
- - 0: Stopped (in the period after the transfer of memory-mapped
- regions before switch-over to the destination): The VM guest is
- stopped, and the vhost-user device is suspended (see
- :ref:`Suspended device state <suspended_device_state>`).
- In the future, additional phases might be added e.g. to allow
- iterative migration while the device is running.
- C structure
- -----------
- In QEMU the vhost-user message is implemented with the following struct:
- .. code:: c
- typedef struct VhostUserMsg {
- VhostUserRequest request;
- uint32_t flags;
- uint32_t size;
- union {
- uint64_t u64;
- struct vhost_vring_state state;
- struct vhost_vring_addr addr;
- VhostUserMemory memory;
- VhostUserLog log;
- struct vhost_iotlb_msg iotlb;
- VhostUserConfig config;
- VhostUserVringArea area;
- VhostUserInflight inflight;
- };
- } QEMU_PACKED VhostUserMsg;
- Communication
- =============
- The protocol for vhost-user is based on the existing implementation of
- vhost for the Linux Kernel. Most messages that can be sent via the
- Unix domain socket implementing vhost-user have an equivalent ioctl to
- the kernel implementation.
- The communication consists of the *front-end* sending message requests and
- the *back-end* sending message replies. Most of the requests don't require
- replies, except for the following requests:
- * ``VHOST_USER_GET_FEATURES``
- * ``VHOST_USER_GET_PROTOCOL_FEATURES``
- * ``VHOST_USER_GET_VRING_BASE``
- * ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``)
- * ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
- .. seealso::
- :ref:`REPLY_ACK <reply_ack>`
- The section on ``REPLY_ACK`` protocol extension.
- There are several messages that the front-end sends with file descriptors passed
- in the ancillary data:
- * ``VHOST_USER_ADD_MEM_REG``
- * ``VHOST_USER_SET_MEM_TABLE``
- * ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``)
- * ``VHOST_USER_SET_LOG_FD``
- * ``VHOST_USER_SET_VRING_KICK``
- * ``VHOST_USER_SET_VRING_CALL``
- * ``VHOST_USER_SET_VRING_ERR``
- * ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
- * ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
- * ``VHOST_USER_SET_DEVICE_STATE_FD``
- If *front-end* is unable to send the full message or receives a wrong
- reply it will close the connection. An optional reconnection mechanism
- can be implemented.
- If *back-end* detects some error such as incompatible features, it may also
- close the connection. This should only happen in exceptional circumstances.
- Any protocol extensions are gated by protocol feature bits, which
- allows full backwards compatibility on both front-end and back-end. As
- older back-ends don't support negotiating protocol features, a feature
- bit was dedicated for this purpose::
- #define VHOST_USER_F_PROTOCOL_FEATURES 30
- Note that VHOST_USER_F_PROTOCOL_FEATURES is the UNUSED (30) feature
- bit defined in `VIRTIO 1.1 6.3 Legacy Interface: Reserved Feature Bits
- <https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-4130003>`_.
- VIRTIO devices do not advertise this feature bit and therefore VIRTIO
- drivers cannot negotiate it.
- This reserved feature bit was reused by the vhost-user protocol to add
- vhost-user protocol feature negotiation in a backwards compatible
- fashion. Old vhost-user front-end and back-end implementations continue to
- work even though they are not aware of vhost-user protocol feature
- negotiation.
- Ring states
- -----------
- Rings have two independent states: started/stopped, and enabled/disabled.
- * While a ring is stopped, the back-end must not process the ring at
- all, regardless of whether it is enabled or disabled. The
- enabled/disabled state should still be tracked, though, so it can come
- into effect once the ring is started.
- * started and disabled: The back-end must process the ring without
- causing any side effects. For example, for a networking device,
- in the disabled state the back-end must not supply any new RX packets,
- but must process and discard any TX packets.
- * started and enabled: The back-end must process the ring normally, i.e.
- process all requests and execute them.
- Each ring is initialized in a stopped and disabled state. The back-end
- must start a ring upon receiving a kick (that is, detecting that file
- descriptor is readable) on the descriptor specified by
- ``VHOST_USER_SET_VRING_KICK`` or receiving the in-band message
- ``VHOST_USER_VRING_KICK`` if negotiated, and stop a ring upon receiving
- ``VHOST_USER_GET_VRING_BASE``.
- Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``.
- In addition, upon receiving a ``VHOST_USER_SET_FEATURES`` message from
- the front-end without ``VHOST_USER_F_PROTOCOL_FEATURES`` set, the
- back-end must enable all rings immediately.
- While processing the rings (whether they are enabled or not), the back-end
- must support changing some configuration aspects on the fly.
- .. _suspended_device_state:
- Suspended device state
- ^^^^^^^^^^^^^^^^^^^^^^
- While all vrings are stopped, the device is *suspended*. In addition to
- not processing any vring (because they are stopped), the device must:
- * not write to any guest memory regions,
- * not send any notifications to the guest,
- * not send any messages to the front-end,
- * still process and reply to messages from the front-end.
- Multiple queue support
- ----------------------
- Many devices have a fixed number of virtqueues. In this case the front-end
- already knows the number of available virtqueues without communicating with the
- back-end.
- Some devices do not have a fixed number of virtqueues. Instead the maximum
- number of virtqueues is chosen by the back-end. The number can depend on host
- resource availability or back-end implementation details. Such devices are called
- multiple queue devices.
- Multiple queue support allows the back-end to advertise the maximum number of
- queues. This is treated as a protocol extension, hence the back-end has to
- implement protocol features first. The multiple queues feature is supported
- only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set.
- The max number of queues the back-end supports can be queried with message
- ``VHOST_USER_GET_QUEUE_NUM``. Front-end should stop when the number of requested
- queues is bigger than that.
- As all queues share one connection, the front-end uses a unique index for each
- queue in the sent message to identify a specified queue.
- The front-end enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``.
- vhost-user-net has historically automatically enabled the first queue pair.
- Back-ends should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol
- feature, even for devices with a fixed number of virtqueues, since it is simple
- to implement and offers a degree of introspection.
- Front-ends must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for
- devices with a fixed number of virtqueues. Only true multiqueue devices
- require this protocol feature.
- Migration
- ---------
- During live migration, the front-end may need to track the modifications
- the back-end makes to the memory mapped regions. The front-end should mark
- the dirty pages in a log. Once it complies to this logging, it may
- declare the ``VHOST_F_LOG_ALL`` vhost feature.
- To start/stop logging of data/used ring writes, the front-end may send
- messages ``VHOST_USER_SET_FEATURES`` with ``VHOST_F_LOG_ALL`` and
- ``VHOST_USER_SET_VRING_ADDR`` with ``VHOST_VRING_F_LOG`` in ring's
- flags set to 1/0, respectively.
- All the modifications to memory pointed by vring "descriptor" should
- be marked. Modifications to "used" vring should be marked if
- ``VHOST_VRING_F_LOG`` is part of ring's flags.
- Dirty pages are of size::
- #define VHOST_LOG_PAGE 0x1000
- The log memory fd is provided in the ancillary data of
- ``VHOST_USER_SET_LOG_BASE`` message when the back-end has
- ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature.
- The size of the log is supplied as part of ``VhostUserMsg`` which
- should be large enough to cover all known guest addresses. Log starts
- at the supplied offset in the supplied file descriptor. The log
- covers from address 0 to the maximum of guest regions. In pseudo-code,
- to mark page at ``addr`` as dirty::
- page = addr / VHOST_LOG_PAGE
- log[page / 8] |= 1 << page % 8
- Where ``addr`` is the guest physical address.
- Use atomic operations, as the log may be concurrently manipulated.
- Note that when logging modifications to the used ring (when
- ``VHOST_VRING_F_LOG`` is set for this ring), ``log_guest_addr`` should
- be used to calculate the log offset: the write to first byte of the
- used ring is logged at this offset from log start. Also note that this
- value might be outside the legal guest physical address range
- (i.e. does not have to be covered by the ``VhostUserMemory`` table), but
- the bit offset of the last byte of the ring must fall within the size
- supplied by ``VhostUserLog``.
- ``VHOST_USER_SET_LOG_FD`` is an optional message with an eventfd in
- ancillary data, it may be used to inform the front-end that the log has
- been modified.
- Once the source has finished migration, rings will be stopped by the
- source (:ref:`Suspended device state <suspended_device_state>`). No
- further update must be done before rings are restarted.
- In postcopy migration the back-end is started before all the memory has
- been received from the source host, and care must be taken to avoid
- accessing pages that have yet to be received. The back-end opens a
- 'userfault'-fd and registers the memory with it; this fd is then
- passed back over to the front-end. The front-end services requests on the
- userfaultfd for pages that are accessed and when the page is available
- it performs WAKE ioctl's on the userfaultfd to wake the stalled
- back-end. The front-end indicates support for this via the
- ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature.
- .. _migrating_backend_state:
- Migrating back-end state
- ^^^^^^^^^^^^^^^^^^^^^^^^
- Migrating device state involves transferring the state from one
- back-end, called the source, to another back-end, called the
- destination. After migration, the destination transparently resumes
- operation without requiring the driver to re-initialize the device at
- the VIRTIO level. If the migration fails, then the source can
- transparently resume operation until another migration attempt is made.
- Generally, the front-end is connected to a virtual machine guest (which
- contains the driver), which has its own state to transfer between source
- and destination, and therefore will have an implementation-specific
- mechanism to do so. The ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature
- provides functionality to have the front-end include the back-end's
- state in this transfer operation so the back-end does not need to
- implement its own mechanism, and so the virtual machine may have its
- complete state, including vhost-user devices' states, contained within a
- single stream of data.
- To do this, the back-end state is transferred from back-end to front-end
- on the source side, and vice versa on the destination side. This
- transfer happens over a channel that is negotiated using the
- ``VHOST_USER_SET_DEVICE_STATE_FD`` message. This message has two
- parameters:
- * Direction of transfer: On the source, the data is saved, transferring
- it from the back-end to the front-end. On the destination, the data
- is loaded, transferring it from the front-end to the back-end.
- * Migration phase: Currently, the only supported phase is the period
- after the transfer of memory-mapped regions before switch-over to the
- destination, when both the source and destination devices are
- suspended (:ref:`Suspended device state <suspended_device_state>`).
- In the future, additional phases might be supported to allow iterative
- migration while the device is running.
- The nature of the channel is implementation-defined, but it must
- generally behave like a pipe: The writing end will write all the data it
- has into it, signalling the end of data by closing its end. The reading
- end must read all of this data (until encountering the end of file) and
- process it.
- * When saving, the writing end is the source back-end, and the reading
- end is the source front-end. After reading the state data from the
- channel, the source front-end must transfer it to the destination
- front-end through an implementation-defined mechanism.
- * When loading, the writing end is the destination front-end, and the
- reading end is the destination back-end. After reading the state data
- from the channel, the destination back-end must deserialize its
- internal state from that data and set itself up to allow the driver to
- seamlessly resume operation on the VIRTIO level.
- Seamlessly resuming operation means that the migration must be
- transparent to the guest driver, which operates on the VIRTIO level.
- This driver will not perform any re-initialization steps, but continue
- to use the device as if no migration had occurred. The vhost-user
- front-end, however, will re-initialize the vhost state on the
- destination, following the usual protocol for establishing a connection
- to a vhost-user back-end: This includes, for example, setting up memory
- mappings and kick and call FDs as necessary, negotiating protocol
- features, or setting the initial vring base indices (to the same value
- as on the source side, so that operation can resume).
- Both on the source and on the destination side, after the respective
- front-end has seen all data transferred (when the transfer FD has been
- closed), it sends the ``VHOST_USER_CHECK_DEVICE_STATE`` message to
- verify that data transfer was successful in the back-end, too. The
- back-end responds once it knows whether the transfer and processing was
- successful or not.
- Memory access
- -------------
- The front-end sends a list of vhost memory regions to the back-end using the
- ``VHOST_USER_SET_MEM_TABLE`` message. Each region has two base
- addresses: a guest address and a user address.
- Messages contain guest addresses and/or user addresses to reference locations
- within the shared memory. The mapping of these addresses works as follows.
- User addresses map to the vhost memory region containing that user address.
- When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated:
- * Guest addresses map to the vhost memory region containing that guest
- address.
- When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated:
- * Guest addresses are also called I/O virtual addresses (IOVAs). They are
- translated to user addresses via the IOTLB.
- * The vhost memory region guest address is not used.
- IOMMU support
- -------------
- When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated, the
- front-end sends IOTLB entries update & invalidation by sending
- ``VHOST_USER_IOTLB_MSG`` requests to the back-end with a ``struct
- vhost_iotlb_msg`` as payload. For update events, the ``iotlb`` payload
- has to be filled with the update message type (2), the I/O virtual
- address, the size, the user virtual address, and the permissions
- flags. Addresses and size must be within vhost memory regions set via
- the ``VHOST_USER_SET_MEM_TABLE`` request. For invalidation events, the
- ``iotlb`` payload has to be filled with the invalidation message type
- (3), the I/O virtual address and the size. On success, the back-end is
- expected to reply with a zero payload, non-zero otherwise.
- The back-end relies on the back-end communication channel (see :ref:`Back-end
- communication <backend_communication>` section below) to send IOTLB miss
- and access failure events, by sending ``VHOST_USER_BACKEND_IOTLB_MSG``
- requests to the front-end with a ``struct vhost_iotlb_msg`` as
- payload. For miss events, the iotlb payload has to be filled with the
- miss message type (1), the I/O virtual address and the permissions
- flags. For access failure event, the iotlb payload has to be filled
- with the access failure message type (4), the I/O virtual address and
- the permissions flags. For synchronization purpose, the back-end may
- rely on the reply-ack feature, so the front-end may send a reply when
- operation is completed if the reply-ack feature is negotiated and
- back-ends requests a reply. For miss events, completed operation means
- either front-end sent an update message containing the IOTLB entry
- containing requested address and permission, or front-end sent nothing if
- the IOTLB miss message is invalid (invalid IOVA or permission).
- The front-end isn't expected to take the initiative to send IOTLB update
- messages, as the back-end sends IOTLB miss messages for the guest virtual
- memory areas it needs to access.
- .. _backend_communication:
- Back-end communication
- ----------------------
- An optional communication channel is provided if the back-end declares
- ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` protocol feature, to allow the
- back-end to make requests to the front-end.
- The fd is provided via ``VHOST_USER_SET_BACKEND_REQ_FD`` ancillary data.
- A back-end may then send ``VHOST_USER_BACKEND_*`` messages to the front-end
- using this fd communication channel.
- If ``VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD`` protocol feature is
- negotiated, back-end can send file descriptors (at most 8 descriptors in
- each message) to front-end via ancillary data using this fd communication
- channel.
- Inflight I/O tracking
- ---------------------
- To support reconnecting after restart or crash, back-end may need to
- resubmit inflight I/Os. If virtqueue is processed in order, we can
- easily achieve that by getting the inflight descriptors from
- descriptor table (split virtqueue) or descriptor ring (packed
- virtqueue). However, it can't work when we process descriptors
- out-of-order because some entries which store the information of
- inflight descriptors in available ring (split virtqueue) or descriptor
- ring (packed virtqueue) might be overridden by new entries. To solve
- this problem, the back-end need to allocate an extra buffer to store this
- information of inflight descriptors and share it with front-end for
- persistent. ``VHOST_USER_GET_INFLIGHT_FD`` and
- ``VHOST_USER_SET_INFLIGHT_FD`` are used to transfer this buffer
- between front-end and back-end. And the format of this buffer is described
- below:
- +---------------+---------------+-----+---------------+
- | queue0 region | queue1 region | ... | queueN region |
- +---------------+---------------+-----+---------------+
- N is the number of available virtqueues. The back-end could get it from num
- queues field of ``VhostUserInflight``.
- For split virtqueue, queue region can be implemented as:
- .. code:: c
- typedef struct DescStateSplit {
- /* Indicate whether this descriptor is inflight or not.
- * Only available for head-descriptor. */
- uint8_t inflight;
- /* Padding */
- uint8_t padding[5];
- /* Maintain a list for the last batch of used descriptors.
- * Only available when batching is used for submitting */
- uint16_t next;
- /* Used to preserve the order of fetching available descriptors.
- * Only available for head-descriptor. */
- uint64_t counter;
- } DescStateSplit;
- typedef struct QueueRegionSplit {
- /* The feature flags of this region. Now it's initialized to 0. */
- uint64_t features;
- /* The version of this region. It's 1 currently.
- * Zero value indicates an uninitialized buffer */
- uint16_t version;
- /* The size of DescStateSplit array. It's equal to the virtqueue size.
- * The back-end could get it from queue size field of VhostUserInflight. */
- uint16_t desc_num;
- /* The head of list that track the last batch of used descriptors. */
- uint16_t last_batch_head;
- /* Store the idx value of used ring */
- uint16_t used_idx;
- /* Used to track the state of each descriptor in descriptor table */
- DescStateSplit desc[];
- } QueueRegionSplit;
- To track inflight I/O, the queue region should be processed as follows:
- When receiving available buffers from the driver:
- #. Get the next available head-descriptor index from available ring, ``i``
- #. Set ``desc[i].counter`` to the value of global counter
- #. Increase global counter by 1
- #. Set ``desc[i].inflight`` to 1
- When supplying used buffers to the driver:
- 1. Get corresponding used head-descriptor index, i
- 2. Set ``desc[i].next`` to ``last_batch_head``
- 3. Set ``last_batch_head`` to ``i``
- #. Steps 1,2,3 may be performed repeatedly if batching is possible
- #. Increase the ``idx`` value of used ring by the size of the batch
- #. Set the ``inflight`` field of each ``DescStateSplit`` entry in the batch to 0
- #. Set ``used_idx`` to the ``idx`` value of used ring
- When reconnecting:
- #. If the value of ``used_idx`` does not match the ``idx`` value of
- used ring (means the inflight field of ``DescStateSplit`` entries in
- last batch may be incorrect),
- a. Subtract the value of ``used_idx`` from the ``idx`` value of
- used ring to get last batch size of ``DescStateSplit`` entries
- #. Set the ``inflight`` field of each ``DescStateSplit`` entry to 0 in last batch
- list which starts from ``last_batch_head``
- #. Set ``used_idx`` to the ``idx`` value of used ring
- #. Resubmit inflight ``DescStateSplit`` entries in order of their
- counter value
- For packed virtqueue, queue region can be implemented as:
- .. code:: c
- typedef struct DescStatePacked {
- /* Indicate whether this descriptor is inflight or not.
- * Only available for head-descriptor. */
- uint8_t inflight;
- /* Padding */
- uint8_t padding;
- /* Link to the next free entry */
- uint16_t next;
- /* Link to the last entry of descriptor list.
- * Only available for head-descriptor. */
- uint16_t last;
- /* The length of descriptor list.
- * Only available for head-descriptor. */
- uint16_t num;
- /* Used to preserve the order of fetching available descriptors.
- * Only available for head-descriptor. */
- uint64_t counter;
- /* The buffer id */
- uint16_t id;
- /* The descriptor flags */
- uint16_t flags;
- /* The buffer length */
- uint32_t len;
- /* The buffer address */
- uint64_t addr;
- } DescStatePacked;
- typedef struct QueueRegionPacked {
- /* The feature flags of this region. Now it's initialized to 0. */
- uint64_t features;
- /* The version of this region. It's 1 currently.
- * Zero value indicates an uninitialized buffer */
- uint16_t version;
- /* The size of DescStatePacked array. It's equal to the virtqueue size.
- * The back-end could get it from queue size field of VhostUserInflight. */
- uint16_t desc_num;
- /* The head of free DescStatePacked entry list */
- uint16_t free_head;
- /* The old head of free DescStatePacked entry list */
- uint16_t old_free_head;
- /* The used index of descriptor ring */
- uint16_t used_idx;
- /* The old used index of descriptor ring */
- uint16_t old_used_idx;
- /* Device ring wrap counter */
- uint8_t used_wrap_counter;
- /* The old device ring wrap counter */
- uint8_t old_used_wrap_counter;
- /* Padding */
- uint8_t padding[7];
- /* Used to track the state of each descriptor fetched from descriptor ring */
- DescStatePacked desc[];
- } QueueRegionPacked;
- To track inflight I/O, the queue region should be processed as follows:
- When receiving available buffers from the driver:
- #. Get the next available descriptor entry from descriptor ring, ``d``
- #. If ``d`` is head descriptor,
- a. Set ``desc[old_free_head].num`` to 0
- #. Set ``desc[old_free_head].counter`` to the value of global counter
- #. Increase global counter by 1
- #. Set ``desc[old_free_head].inflight`` to 1
- #. If ``d`` is last descriptor, set ``desc[old_free_head].last`` to
- ``free_head``
- #. Increase ``desc[old_free_head].num`` by 1
- #. Set ``desc[free_head].addr``, ``desc[free_head].len``,
- ``desc[free_head].flags``, ``desc[free_head].id`` to ``d.addr``,
- ``d.len``, ``d.flags``, ``d.id``
- #. Set ``free_head`` to ``desc[free_head].next``
- #. If ``d`` is last descriptor, set ``old_free_head`` to ``free_head``
- When supplying used buffers to the driver:
- 1. Get corresponding used head-descriptor entry from descriptor ring,
- ``d``
- 2. Get corresponding ``DescStatePacked`` entry, ``e``
- 3. Set ``desc[e.last].next`` to ``free_head``
- 4. Set ``free_head`` to the index of ``e``
- #. Steps 1,2,3,4 may be performed repeatedly if batching is possible
- #. Increase ``used_idx`` by the size of the batch and update
- ``used_wrap_counter`` if needed
- #. Update ``d.flags``
- #. Set the ``inflight`` field of each head ``DescStatePacked`` entry
- in the batch to 0
- #. Set ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter``
- to ``free_head``, ``used_idx``, ``used_wrap_counter``
- When reconnecting:
- #. If ``used_idx`` does not match ``old_used_idx`` (means the
- ``inflight`` field of ``DescStatePacked`` entries in last batch may
- be incorrect),
- a. Get the next descriptor ring entry through ``old_used_idx``, ``d``
- #. Use ``old_used_wrap_counter`` to calculate the available flags
- #. If ``d.flags`` is not equal to the calculated flags value (means
- back-end has submitted the buffer to guest driver before crash, so
- it has to commit the in-progress update), set ``old_free_head``,
- ``old_used_idx``, ``old_used_wrap_counter`` to ``free_head``,
- ``used_idx``, ``used_wrap_counter``
- #. Set ``free_head``, ``used_idx``, ``used_wrap_counter`` to
- ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter``
- (roll back any in-progress update)
- #. Set the ``inflight`` field of each ``DescStatePacked`` entry in
- free list to 0
- #. Resubmit inflight ``DescStatePacked`` entries in order of their
- counter value
- In-band notifications
- ---------------------
- In some limited situations (e.g. for simulation) it is desirable to
- have the kick, call and error (if used) signals done via in-band
- messages instead of asynchronous eventfd notifications. This can be
- done by negotiating the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS``
- protocol feature.
- Note that due to the fact that too many messages on the sockets can
- cause the sending application(s) to block, it is not advised to use
- this feature unless absolutely necessary. It is also considered an
- error to negotiate this feature without also negotiating
- ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` and ``VHOST_USER_PROTOCOL_F_REPLY_ACK``,
- the former is necessary for getting a message channel from the back-end
- to the front-end, while the latter needs to be used with the in-band
- notification messages to block until they are processed, both to avoid
- blocking later and for proper processing (at least in the simulation
- use case.) As it has no other way of signalling this error, the back-end
- should close the connection as a response to a
- ``VHOST_USER_SET_PROTOCOL_FEATURES`` message that sets the in-band
- notifications feature flag without the other two.
- Protocol features
- -----------------
- .. code:: c
- #define VHOST_USER_PROTOCOL_F_MQ 0
- #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1
- #define VHOST_USER_PROTOCOL_F_RARP 2
- #define VHOST_USER_PROTOCOL_F_REPLY_ACK 3
- #define VHOST_USER_PROTOCOL_F_MTU 4
- #define VHOST_USER_PROTOCOL_F_BACKEND_REQ 5
- #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN 6
- #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
- #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8
- #define VHOST_USER_PROTOCOL_F_CONFIG 9
- #define VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD 10
- #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11
- #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
- #define VHOST_USER_PROTOCOL_F_RESET_DEVICE 13
- #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14
- #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15
- #define VHOST_USER_PROTOCOL_F_STATUS 16
- #define VHOST_USER_PROTOCOL_F_XEN_MMAP 17
- #define VHOST_USER_PROTOCOL_F_SHARED_OBJECT 18
- #define VHOST_USER_PROTOCOL_F_DEVICE_STATE 19
- Front-end message types
- -----------------------
- ``VHOST_USER_GET_FEATURES``
- :id: 1
- :equivalent ioctl: ``VHOST_GET_FEATURES``
- :request payload: N/A
- :reply payload: ``u64``
- Get from the underlying vhost implementation the features bitmask.
- Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals back-end support
- for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and
- ``VHOST_USER_SET_PROTOCOL_FEATURES``.
- ``VHOST_USER_SET_FEATURES``
- :id: 2
- :equivalent ioctl: ``VHOST_SET_FEATURES``
- :request payload: ``u64``
- :reply payload: N/A
- Enable features in the underlying vhost implementation using a
- bitmask. Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals
- back-end support for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and
- ``VHOST_USER_SET_PROTOCOL_FEATURES``.
- ``VHOST_USER_GET_PROTOCOL_FEATURES``
- :id: 15
- :equivalent ioctl: ``VHOST_GET_FEATURES``
- :request payload: N/A
- :reply payload: ``u64``
- Get the protocol feature bitmask from the underlying vhost
- implementation. Only legal if feature bit
- ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in
- ``VHOST_USER_GET_FEATURES``. It does not need to be acknowledged by
- ``VHOST_USER_SET_FEATURES``.
- .. Note::
- Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must
- support this message even before ``VHOST_USER_SET_FEATURES`` was
- called.
- ``VHOST_USER_SET_PROTOCOL_FEATURES``
- :id: 16
- :equivalent ioctl: ``VHOST_SET_FEATURES``
- :request payload: ``u64``
- :reply payload: N/A
- Enable protocol features in the underlying vhost implementation.
- Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in
- ``VHOST_USER_GET_FEATURES``. It does not need to be acknowledged by
- ``VHOST_USER_SET_FEATURES``.
- .. Note::
- Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must support
- this message even before ``VHOST_USER_SET_FEATURES`` was called.
- ``VHOST_USER_SET_OWNER``
- :id: 3
- :equivalent ioctl: ``VHOST_SET_OWNER``
- :request payload: N/A
- :reply payload: N/A
- Issued when a new connection is established. It marks the sender
- as the front-end that owns of the session. This can be used on the *back-end*
- as a "session start" flag.
- ``VHOST_USER_RESET_OWNER``
- :id: 4
- :request payload: N/A
- :reply payload: N/A
- .. admonition:: Deprecated
- This is no longer used. Used to be sent to request disabling all
- rings, but some back-ends interpreted it to also discard connection
- state (this interpretation would lead to bugs). It is recommended
- that back-ends either ignore this message, or use it to disable all
- rings.
- ``VHOST_USER_SET_MEM_TABLE``
- :id: 5
- :equivalent ioctl: ``VHOST_SET_MEM_TABLE``
- :request payload: multiple memory regions description
- :reply payload: (postcopy only) multiple memory regions description
- Sets the memory map regions on the back-end so it can translate the
- vring addresses. In the ancillary data there is an array of file
- descriptors for each memory mapped region. The size and ordering of
- the fds matches the number and ordering of memory regions.
- When ``VHOST_USER_POSTCOPY_LISTEN`` has been received,
- ``SET_MEM_TABLE`` replies with the bases of the memory mapped
- regions to the front-end. The back-end must have mmap'd the regions but
- not yet accessed them and should not yet generate a userfault
- event.
- .. Note::
- ``NEED_REPLY_MASK`` is not set in this case. QEMU will then
- reply back to the list of mappings with an empty
- ``VHOST_USER_SET_MEM_TABLE`` as an acknowledgement; only upon
- reception of this message may the guest start accessing the memory
- and generating faults.
- ``VHOST_USER_SET_LOG_BASE``
- :id: 6
- :equivalent ioctl: ``VHOST_SET_LOG_BASE``
- :request payload: u64
- :reply payload: N/A
- Sets logging shared memory space.
- When the back-end has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature,
- the log memory fd is provided in the ancillary data of
- ``VHOST_USER_SET_LOG_BASE`` message, the size and offset of shared
- memory area provided in the message.
- ``VHOST_USER_SET_LOG_FD``
- :id: 7
- :equivalent ioctl: ``VHOST_SET_LOG_FD``
- :request payload: N/A
- :reply payload: N/A
- Sets the logging file descriptor, which is passed as ancillary data.
- ``VHOST_USER_SET_VRING_NUM``
- :id: 8
- :equivalent ioctl: ``VHOST_SET_VRING_NUM``
- :request payload: vring state description
- :reply payload: N/A
- Set the size of the queue.
- ``VHOST_USER_SET_VRING_ADDR``
- :id: 9
- :equivalent ioctl: ``VHOST_SET_VRING_ADDR``
- :request payload: vring address description
- :reply payload: N/A
- Sets the addresses of the different aspects of the vring.
- ``VHOST_USER_SET_VRING_BASE``
- :id: 10
- :equivalent ioctl: ``VHOST_SET_VRING_BASE``
- :request payload: vring descriptor index/indices
- :reply payload: N/A
- Sets the next index to use for descriptors in this vring:
- * For a split virtqueue, sets only the next descriptor index to
- process in the *Available Ring*. The device is supposed to read the
- next index in the *Used Ring* from the respective vring structure in
- guest memory.
- * For a packed virtqueue, both indices are supplied, as they are not
- explicitly available in memory.
- Consequently, the payload type is specific to the type of virt queue
- (*a vring descriptor index for split virtqueues* vs. *vring descriptor
- indices for packed virtqueues*).
- ``VHOST_USER_GET_VRING_BASE``
- :id: 11
- :equivalent ioctl: ``VHOST_USER_GET_VRING_BASE``
- :request payload: vring state description
- :reply payload: vring descriptor index/indices
- Stops the vring and returns the current descriptor index or indices:
- * For a split virtqueue, returns only the 16-bit next descriptor
- index to process in the *Available Ring*. Note that this may
- differ from the available ring index in the vring structure in
- memory, which points to where the driver will put new available
- descriptors. For the *Used Ring*, the device only needs the next
- descriptor index at which to put new descriptors, which is the
- value in the vring structure in memory, so this value is not
- covered by this message.
- * For a packed virtqueue, neither index is explicitly available to
- read from memory, so both indices (as maintained by the device) are
- returned.
- Consequently, the payload type is specific to the type of virt queue
- (*a vring descriptor index for split virtqueues* vs. *vring descriptor
- indices for packed virtqueues*).
- When and as long as all of a device's vrings are stopped, it is
- *suspended*, see :ref:`Suspended device state
- <suspended_device_state>`.
- The request payload's *num* field is currently reserved and must be
- set to 0.
- ``VHOST_USER_SET_VRING_KICK``
- :id: 12
- :equivalent ioctl: ``VHOST_SET_VRING_KICK``
- :request payload: ``u64``
- :reply payload: N/A
- Set the event file descriptor for adding buffers to the vring. It is
- passed in the ancillary data.
- Bits (0-7) of the payload contain the vring index. Bit 8 is the
- invalid FD flag. This flag is set when there is no file descriptor
- in the ancillary data. This signals that polling should be used
- instead of waiting for the kick. Note that if the protocol feature
- ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` has been negotiated
- this message isn't necessary as the ring is also started on the
- ``VHOST_USER_VRING_KICK`` message, it may however still be used to
- set an event file descriptor (which will be preferred over the
- message) or to enable polling.
- ``VHOST_USER_SET_VRING_CALL``
- :id: 13
- :equivalent ioctl: ``VHOST_SET_VRING_CALL``
- :request payload: ``u64``
- :reply payload: N/A
- Set the event file descriptor to signal when buffers are used. It is
- passed in the ancillary data.
- Bits (0-7) of the payload contain the vring index. Bit 8 is the
- invalid FD flag. This flag is set when there is no file descriptor
- in the ancillary data. This signals that polling will be used
- instead of waiting for the call. Note that if the protocol features
- ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and
- ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` have been negotiated this message
- isn't necessary as the ``VHOST_USER_BACKEND_VRING_CALL`` message can be
- used, it may however still be used to set an event file descriptor
- or to enable polling.
- ``VHOST_USER_SET_VRING_ERR``
- :id: 14
- :equivalent ioctl: ``VHOST_SET_VRING_ERR``
- :request payload: ``u64``
- :reply payload: N/A
- Set the event file descriptor to signal when error occurs. It is
- passed in the ancillary data.
- Bits (0-7) of the payload contain the vring index. Bit 8 is the
- invalid FD flag. This flag is set when there is no file descriptor
- in the ancillary data. Note that if the protocol features
- ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and
- ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` have been negotiated this message
- isn't necessary as the ``VHOST_USER_BACKEND_VRING_ERR`` message can be
- used, it may however still be used to set an event file descriptor
- (which will be preferred over the message).
- ``VHOST_USER_GET_QUEUE_NUM``
- :id: 17
- :equivalent ioctl: N/A
- :request payload: N/A
- :reply payload: u64
- Query how many queues the back-end supports.
- This request should be sent only when ``VHOST_USER_PROTOCOL_F_MQ``
- is set in queried protocol features by
- ``VHOST_USER_GET_PROTOCOL_FEATURES``.
- ``VHOST_USER_SET_VRING_ENABLE``
- :id: 18
- :equivalent ioctl: N/A
- :request payload: vring state description
- :reply payload: N/A
- Signal the back-end to enable or disable corresponding vring.
- This request should be sent only when
- ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated.
- ``VHOST_USER_SEND_RARP``
- :id: 19
- :equivalent ioctl: N/A
- :request payload: ``u64``
- :reply payload: N/A
- Ask vhost user back-end to broadcast a fake RARP to notify the migration
- is terminated for guest that does not support GUEST_ANNOUNCE.
- Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is
- present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit
- ``VHOST_USER_PROTOCOL_F_RARP`` is present in
- ``VHOST_USER_GET_PROTOCOL_FEATURES``. The first 6 bytes of the
- payload contain the mac address of the guest to allow the vhost user
- back-end to construct and broadcast the fake RARP.
- ``VHOST_USER_NET_SET_MTU``
- :id: 20
- :equivalent ioctl: N/A
- :request payload: ``u64``
- :reply payload: N/A
- Set host MTU value exposed to the guest.
- This request should be sent only when ``VIRTIO_NET_F_MTU`` feature
- has been successfully negotiated, ``VHOST_USER_F_PROTOCOL_FEATURES``
- is present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit
- ``VHOST_USER_PROTOCOL_F_NET_MTU`` is present in
- ``VHOST_USER_GET_PROTOCOL_FEATURES``.
- If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must
- respond with zero in case the specified MTU is valid, or non-zero
- otherwise.
- ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
- :id: 21
- :equivalent ioctl: N/A
- :request payload: N/A
- :reply payload: N/A
- Set the socket file descriptor for back-end initiated requests. It is passed
- in the ancillary data.
- This request should be sent only when
- ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, and protocol
- feature bit ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` bit is present in
- ``VHOST_USER_GET_PROTOCOL_FEATURES``. If
- ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must
- respond with zero for success, non-zero otherwise.
- ``VHOST_USER_IOTLB_MSG``
- :id: 22
- :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type)
- :request payload: ``struct vhost_iotlb_msg``
- :reply payload: ``u64``
- Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload.
- The front-end sends such requests to update and invalidate entries in the
- device IOTLB. The back-end has to acknowledge the request with sending
- zero as ``u64`` payload for success, non-zero otherwise.
- This request should be send only when ``VIRTIO_F_IOMMU_PLATFORM``
- feature has been successfully negotiated.
- ``VHOST_USER_SET_VRING_ENDIAN``
- :id: 23
- :equivalent ioctl: ``VHOST_SET_VRING_ENDIAN``
- :request payload: vring state description
- :reply payload: N/A
- Set the endianness of a VQ for legacy devices. Little-endian is
- indicated with state.num set to 0 and big-endian is indicated with
- state.num set to 1. Other values are invalid.
- This request should be sent only when
- ``VHOST_USER_PROTOCOL_F_CROSS_ENDIAN`` has been negotiated.
- Backends that negotiated this feature should handle both
- endiannesses and expect this message once (per VQ) during device
- configuration (ie. before the front-end starts the VQ).
- ``VHOST_USER_GET_CONFIG``
- :id: 24
- :equivalent ioctl: N/A
- :request payload: virtio device config space
- :reply payload: virtio device config space
- When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is
- submitted by the vhost-user front-end to fetch the contents of the
- virtio device configuration space, vhost-user back-end's payload size
- MUST match the front-end's request, vhost-user back-end uses zero length of
- payload to indicate an error to the vhost-user front-end. The vhost-user
- front-end may cache the contents to avoid repeated
- ``VHOST_USER_GET_CONFIG`` calls.
- ``VHOST_USER_SET_CONFIG``
- :id: 25
- :equivalent ioctl: N/A
- :request payload: virtio device config space
- :reply payload: N/A
- When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is
- submitted by the vhost-user front-end when the Guest changes the virtio
- device configuration space and also can be used for live migration
- on the destination host. The vhost-user back-end must check the flags
- field, and back-ends MUST NOT accept SET_CONFIG for read-only
- configuration space fields unless the live migration bit is set.
- ``VHOST_USER_CREATE_CRYPTO_SESSION``
- :id: 26
- :equivalent ioctl: N/A
- :request payload: crypto session description
- :reply payload: crypto session description
- Create a session for crypto operation. The back-end must return
- the session id, 0 or positive for success, negative for failure.
- This request should be sent only when
- ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been
- successfully negotiated. It's a required feature for crypto
- devices.
- ``VHOST_USER_CLOSE_CRYPTO_SESSION``
- :id: 27
- :equivalent ioctl: N/A
- :request payload: ``u64``
- :reply payload: N/A
- Close a session for crypto operation which was previously
- created by ``VHOST_USER_CREATE_CRYPTO_SESSION``.
- This request should be sent only when
- ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been
- successfully negotiated. It's a required feature for crypto
- devices.
- ``VHOST_USER_POSTCOPY_ADVISE``
- :id: 28
- :request payload: N/A
- :reply payload: userfault fd
- When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the front-end
- advises back-end that a migration with postcopy enabled is underway,
- the back-end must open a userfaultfd for later use. Note that at this
- stage the migration is still in precopy mode.
- ``VHOST_USER_POSTCOPY_LISTEN``
- :id: 29
- :request payload: N/A
- :reply payload: N/A
- The front-end advises back-end that a transition to postcopy mode has
- happened. The back-end must ensure that shared memory is registered
- with userfaultfd to cause faulting of non-present pages.
- This is always sent sometime after a ``VHOST_USER_POSTCOPY_ADVISE``,
- and thus only when ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported.
- ``VHOST_USER_POSTCOPY_END``
- :id: 30
- :request payload: N/A
- :reply payload: ``u64``
- The front-end advises that postcopy migration has now completed. The back-end
- must disable the userfaultfd. The reply is an acknowledgement
- only.
- When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, this message
- is sent at the end of the migration, after
- ``VHOST_USER_POSTCOPY_LISTEN`` was previously sent.
- The value returned is an error indication; 0 is success.
- ``VHOST_USER_GET_INFLIGHT_FD``
- :id: 31
- :equivalent ioctl: N/A
- :request payload: inflight description
- :reply payload: N/A
- When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has
- been successfully negotiated, this message is submitted by the front-end to
- get a shared buffer from back-end. The shared buffer will be used to
- track inflight I/O by back-end. QEMU should retrieve a new one when vm
- reset.
- ``VHOST_USER_SET_INFLIGHT_FD``
- :id: 32
- :equivalent ioctl: N/A
- :request payload: inflight description
- :reply payload: N/A
- When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has
- been successfully negotiated, this message is submitted by the front-end to
- send the shared inflight buffer back to the back-end so that the back-end
- could get inflight I/O after a crash or restart.
- ``VHOST_USER_GPU_SET_SOCKET``
- :id: 33
- :equivalent ioctl: N/A
- :request payload: N/A
- :reply payload: N/A
- Sets the GPU protocol socket file descriptor, which is passed as
- ancillary data. The GPU protocol is used to inform the front-end of
- rendering state and updates. See vhost-user-gpu.rst for details.
- ``VHOST_USER_RESET_DEVICE``
- :id: 34
- :equivalent ioctl: N/A
- :request payload: N/A
- :reply payload: N/A
- Ask the vhost user back-end to disable all rings and reset all
- internal device state to the initial state, ready to be
- reinitialized. The back-end retains ownership of the device
- throughout the reset operation.
- Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol
- feature is set by the back-end.
- ``VHOST_USER_VRING_KICK``
- :id: 35
- :equivalent ioctl: N/A
- :request payload: vring state description
- :reply payload: N/A
- When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
- feature has been successfully negotiated, this message may be
- submitted by the front-end to indicate that a buffer was added to
- the vring instead of signalling it using the vring's kick file
- descriptor or having the back-end rely on polling.
- The state.num field is currently reserved and must be set to 0.
- ``VHOST_USER_GET_MAX_MEM_SLOTS``
- :id: 36
- :equivalent ioctl: N/A
- :request payload: N/A
- :reply payload: u64
- When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
- feature has been successfully negotiated, this message is submitted
- by the front-end to the back-end. The back-end should return the message with a
- u64 payload containing the maximum number of memory slots for
- QEMU to expose to the guest. The value returned by the back-end
- will be capped at the maximum number of ram slots which can be
- supported by the target platform.
- ``VHOST_USER_ADD_MEM_REG``
- :id: 37
- :equivalent ioctl: N/A
- :request payload: N/A
- :reply payload: single memory region description
- When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
- feature has been successfully negotiated, this message is submitted
- by the front-end to the back-end. The message payload contains a memory
- region descriptor struct, describing a region of guest memory which
- the back-end device must map in. When the
- ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has
- been successfully negotiated, along with the
- ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and
- update the memory tables of the back-end device.
- Exactly one file descriptor from which the memory is mapped is
- passed in the ancillary data.
- In postcopy mode (see ``VHOST_USER_POSTCOPY_LISTEN``), the back-end
- replies with the bases of the memory mapped region to the front-end.
- For further details on postcopy, see ``VHOST_USER_SET_MEM_TABLE``.
- They apply to ``VHOST_USER_ADD_MEM_REG`` accordingly.
- ``VHOST_USER_REM_MEM_REG``
- :id: 38
- :equivalent ioctl: N/A
- :request payload: N/A
- :reply payload: single memory region description
- When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
- feature has been successfully negotiated, this message is submitted
- by the front-end to the back-end. The message payload contains a memory
- region descriptor struct, describing a region of guest memory which
- the back-end device must unmap. When the
- ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has
- been successfully negotiated, along with the
- ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and
- update the memory tables of the back-end device.
- The memory region to be removed is identified by its guest address,
- user address and size. The mmap offset is ignored.
- No file descriptors SHOULD be passed in the ancillary data. For
- compatibility with existing incorrect implementations, the back-end MAY
- accept messages with one file descriptor. If a file descriptor is
- passed, the back-end MUST close it without using it otherwise.
- ``VHOST_USER_SET_STATUS``
- :id: 39
- :equivalent ioctl: VHOST_VDPA_SET_STATUS
- :request payload: ``u64``
- :reply payload: N/A
- When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been
- successfully negotiated, this message is submitted by the front-end to
- notify the back-end with updated device status as defined in the Virtio
- specification.
- ``VHOST_USER_GET_STATUS``
- :id: 40
- :equivalent ioctl: VHOST_VDPA_GET_STATUS
- :request payload: N/A
- :reply payload: ``u64``
- When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been
- successfully negotiated, this message is submitted by the front-end to
- query the back-end for its device status as defined in the Virtio
- specification.
- ``VHOST_USER_GET_SHARED_OBJECT``
- :id: 41
- :equivalent ioctl: N/A
- :request payload: ``struct VhostUserShared``
- :reply payload: dmabuf fd
- When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol
- feature has been successfully negotiated, and the UUID is found
- in the exporters cache, this message is submitted by the front-end
- to retrieve a given dma-buf fd from a given back-end, determined by
- the requested UUID. Back-end will reply passing the fd when the operation
- is successful, or no fd otherwise.
- ``VHOST_USER_SET_DEVICE_STATE_FD``
- :id: 42
- :equivalent ioctl: N/A
- :request payload: device state transfer parameters
- :reply payload: ``u64``
- Front-end and back-end negotiate a channel over which to transfer the
- back-end's internal state during migration. Either side (front-end or
- back-end) may create the channel. The nature of this channel is not
- restricted or defined in this document, but whichever side creates it
- must create a file descriptor that is provided to the respectively
- other side, allowing access to the channel. This FD must behave as
- follows:
- * For the writing end, it must allow writing the whole back-end state
- sequentially. Closing the file descriptor signals the end of
- transfer.
- * For the reading end, it must allow reading the whole back-end state
- sequentially. The end of file signals the end of the transfer.
- For example, the channel may be a pipe, in which case the two ends of
- the pipe fulfill these requirements respectively.
- Initially, the front-end creates a channel along with such an FD. It
- passes the FD to the back-end as ancillary data of a
- ``VHOST_USER_SET_DEVICE_STATE_FD`` message. The back-end may create a
- different transfer channel, passing the respective FD back to the
- front-end as ancillary data of the reply. If so, the front-end must
- then discard its channel and use the one provided by the back-end.
- Whether the back-end should decide to use its own channel is decided
- based on efficiency: If the channel is a pipe, both ends will most
- likely need to copy data into and out of it. Any channel that allows
- for more efficient processing on at least one end, e.g. through
- zero-copy, is considered more efficient and thus preferred. If the
- back-end can provide such a channel, it should decide to use it.
- The request payload contains parameters for the subsequent data
- transfer, as described in the :ref:`Migrating back-end state
- <migrating_backend_state>` section.
- The value returned is both an indication for success, and whether a
- file descriptor for a back-end-provided channel is returned: Bits 0–7
- are 0 on success, and non-zero on error. Bit 8 is the invalid FD
- flag; this flag is set when there is no file descriptor returned.
- When this flag is not set, the front-end must use the returned file
- descriptor as its end of the transfer channel. The back-end must not
- both indicate an error and return a file descriptor.
- Using this function requires prior negotiation of the
- ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature.
- ``VHOST_USER_CHECK_DEVICE_STATE``
- :id: 43
- :equivalent ioctl: N/A
- :request payload: N/A
- :reply payload: ``u64``
- After transferring the back-end's internal state during migration (see
- the :ref:`Migrating back-end state <migrating_backend_state>`
- section), check whether the back-end was able to successfully fully
- process the state.
- The value returned indicates success or error; 0 is success, any
- non-zero value is an error.
- Using this function requires prior negotiation of the
- ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature.
- Back-end message types
- ----------------------
- For this type of message, the request is sent by the back-end and the reply
- is sent by the front-end.
- ``VHOST_USER_BACKEND_IOTLB_MSG`` (previous name ``VHOST_USER_SLAVE_IOTLB_MSG``)
- :id: 1
- :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type)
- :request payload: ``struct vhost_iotlb_msg``
- :reply payload: N/A
- Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload.
- The back-end sends such requests to notify of an IOTLB miss, or an IOTLB
- access failure. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is
- negotiated, and back-end set the ``VHOST_USER_NEED_REPLY`` flag, the front-end
- must respond with zero when operation is successfully completed, or
- non-zero otherwise. This request should be send only when
- ``VIRTIO_F_IOMMU_PLATFORM`` feature has been successfully
- negotiated.
- ``VHOST_USER_BACKEND_CONFIG_CHANGE_MSG`` (previous name ``VHOST_USER_SLAVE_CONFIG_CHANGE_MSG``)
- :id: 2
- :equivalent ioctl: N/A
- :request payload: N/A
- :reply payload: N/A
- When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, vhost-user
- back-end sends such messages to notify that the virtio device's
- configuration space has changed, for those host devices which can
- support such feature, host driver can send ``VHOST_USER_GET_CONFIG``
- message to the back-end to get the latest content. If
- ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the back-end sets the
- ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond with zero when
- operation is successfully completed, or non-zero otherwise.
- ``VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG`` (previous name ``VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG``)
- :id: 3
- :equivalent ioctl: N/A
- :request payload: vring area description
- :reply payload: N/A
- Sets host notifier for a specified queue. The queue index is
- contained in the ``u64`` field of the vring area description. The
- host notifier is described by the file descriptor (typically it's a
- VFIO device fd) which is passed as ancillary data and the size
- (which is mmap size and should be the same as host page size) and
- offset (which is mmap offset) carried in the vring area
- description. QEMU can mmap the file descriptor based on the size and
- offset to get a memory range. Registering a host notifier means
- mapping this memory range to the VM as the specified queue's notify
- MMIO region. The back-end sends this request to tell QEMU to de-register
- the existing notifier if any and register the new notifier if the
- request is sent with a file descriptor.
- This request should be sent only when
- ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been
- successfully negotiated.
- ``VHOST_USER_BACKEND_VRING_CALL`` (previous name ``VHOST_USER_SLAVE_VRING_CALL``)
- :id: 4
- :equivalent ioctl: N/A
- :request payload: vring state description
- :reply payload: N/A
- When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
- feature has been successfully negotiated, this message may be
- submitted by the back-end to indicate that a buffer was used from
- the vring instead of signalling this using the vring's call file
- descriptor or having the front-end relying on polling.
- The state.num field is currently reserved and must be set to 0.
- ``VHOST_USER_BACKEND_VRING_ERR`` (previous name ``VHOST_USER_SLAVE_VRING_ERR``)
- :id: 5
- :equivalent ioctl: N/A
- :request payload: vring state description
- :reply payload: N/A
- When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
- feature has been successfully negotiated, this message may be
- submitted by the back-end to indicate that an error occurred on the
- specific vring, instead of signalling the error file descriptor
- set by the front-end via ``VHOST_USER_SET_VRING_ERR``.
- The state.num field is currently reserved and must be set to 0.
- ``VHOST_USER_BACKEND_SHARED_OBJECT_ADD``
- :id: 6
- :equivalent ioctl: N/A
- :request payload: ``struct VhostUserShared``
- :reply payload: N/A
- When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol
- feature has been successfully negotiated, this message can be submitted
- by the backends to add themselves as exporters to the virtio shared lookup
- table. The back-end device gets associated with a UUID in the shared table.
- The back-end is responsible of keeping its own table with exported dma-buf fds.
- When another back-end tries to import the resource associated with the UUID,
- it will send a message to the front-end, which will act as a proxy to the
- exporter back-end. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and
- the back-end sets the ``VHOST_USER_NEED_REPLY`` flag, the front-end must
- respond with zero when operation is successfully completed, or non-zero
- otherwise.
- ``VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE``
- :id: 7
- :equivalent ioctl: N/A
- :request payload: ``struct VhostUserShared``
- :reply payload: N/A
- When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol
- feature has been successfully negotiated, this message can be submitted
- by the backend to remove themselves from to the virtio-dmabuf shared
- table API. Only the back-end owning the entry (i.e., the one that first added
- it) will have permission to remove it. Otherwise, the message is ignored.
- The shared table will remove the back-end device associated with
- the UUID. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the
- back-end sets the ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond
- with zero when operation is successfully completed, or non-zero otherwise.
- ``VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP``
- :id: 8
- :equivalent ioctl: N/A
- :request payload: ``struct VhostUserShared``
- :reply payload: dmabuf fd and ``u64``
- When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol
- feature has been successfully negotiated, this message can be submitted
- by the backends to retrieve a given dma-buf fd from the virtio-dmabuf
- shared table given a UUID. Frontend will reply passing the fd and a zero
- when the operation is successful, or non-zero otherwise. Note that if the
- operation fails, no fd is sent to the backend.
- .. _reply_ack:
- VHOST_USER_PROTOCOL_F_REPLY_ACK
- -------------------------------
- The original vhost-user specification only demands replies for certain
- commands. This differs from the vhost protocol implementation where
- commands are sent over an ``ioctl()`` call and block until the back-end
- has completed.
- With this protocol extension negotiated, the sender (QEMU) can set the
- ``need_reply`` [Bit 3] flag to any command. This indicates that the
- back-end MUST respond with a Payload ``VhostUserMsg`` indicating success
- or failure. The payload should be set to zero on success or non-zero
- on failure, unless the message already has an explicit reply body.
- The reply payload gives QEMU a deterministic indication of the result
- of the command. Today, QEMU is expected to terminate the main vhost-user
- loop upon receiving such errors. In future, qemu could be taught to be more
- resilient for selective requests.
- For the message types that already solicit a reply from the back-end,
- the presence of ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` or need_reply bit
- being set brings no behavioural change. (See the Communication_
- section for details.)
- .. _backend_conventions:
- Backend program conventions
- ===========================
- vhost-user back-ends can provide various devices & services and may
- need to be configured manually depending on the use case. However, it
- is a good idea to follow the conventions listed here when
- possible. Users, QEMU or libvirt, can then rely on some common
- behaviour to avoid heterogeneous configuration and management of the
- back-end programs and facilitate interoperability.
- Each back-end installed on a host system should come with at least one
- JSON file that conforms to the vhost-user.json schema. Each file
- informs the management applications about the back-end type, and binary
- location. In addition, it defines rules for management apps for
- picking the highest priority back-end when multiple match the search
- criteria (see ``@VhostUserBackend`` documentation in the schema file).
- If the back-end is not capable of enabling a requested feature on the
- host (such as 3D acceleration with virgl), or the initialization
- failed, the back-end should fail to start early and exit with a status
- != 0. It may also print a message to stderr for further details.
- The back-end program must not daemonize itself, but it may be
- daemonized by the management layer. It may also have a restricted
- access to the system.
- File descriptors 0, 1 and 2 will exist, and have regular
- stdin/stdout/stderr usage (they may have been redirected to /dev/null
- by the management layer, or to a log handler).
- The back-end program must end (as quickly and cleanly as possible) when
- the SIGTERM signal is received. Eventually, it may receive SIGKILL by
- the management layer after a few seconds.
- The following command line options have an expected behaviour. They
- are mandatory, unless explicitly said differently:
- --socket-path=PATH
- This option specify the location of the vhost-user Unix domain socket.
- It is incompatible with --fd.
- --fd=FDNUM
- When this argument is given, the back-end program is started with the
- vhost-user socket as file descriptor FDNUM. It is incompatible with
- --socket-path.
- --print-capabilities
- Output to stdout the back-end capabilities in JSON format, and then
- exit successfully. Other options and arguments should be ignored, and
- the back-end program should not perform its normal function. The
- capabilities can be reported dynamically depending on the host
- capabilities.
- The JSON output is described in the ``vhost-user.json`` schema, by
- ```@VHostUserBackendCapabilities``. Example:
- .. code:: json
- {
- "type": "foo",
- "features": [
- "feature-a",
- "feature-b"
- ]
- }
- vhost-user-input
- ----------------
- Command line options:
- --evdev-path=PATH
- Specify the linux input device.
- (optional)
- --no-grab
- Do no request exclusive access to the input device.
- (optional)
- vhost-user-gpu
- --------------
- Command line options:
- --render-node=PATH
- Specify the GPU DRM render node.
- (optional)
- --virgl
- Enable virgl rendering support.
- (optional)
- vhost-user-blk
- --------------
- Command line options:
- --blk-file=PATH
- Specify block device or file path.
- (optional)
- --read-only
- Enable read-only.
- (optional)
|