5 年之前 · c02c112a2c
--- a/docs/system/index.rst
+++ b/docs/system/index.rst
@@ -14,4 +14,5 @@ Contents:
 
				 .. toctree::
			
 
				    :maxdepth: 2
			
 
				 
			
 
				+   security
			
 
				    vfio-ap
			
--- a/docs/system/security.rst
+++ b/docs/system/security.rst
@@ -0,0 +1,173 @@
 
				+Security
			
 
				+========
			
 
				+
			
 
				+Overview
			
 
				+--------
			
 
				+
			
 
				+This chapter explains the security requirements that QEMU is designed to meet
			
 
				+and principles for securely deploying QEMU.
			
 
				+
			
 
				+Security Requirements
			
 
				+---------------------
			
 
				+
			
 
				+QEMU supports many different use cases, some of which have stricter security
			
 
				+requirements than others.  The community has agreed on the overall security
			
 
				+requirements that users may depend on.  These requirements define what is
			
 
				+considered supported from a security perspective.
			
 
				+
			
 
				+Virtualization Use Case
			
 
				+'''''''''''''''''''''''
			
 
				+
			
 
				+The virtualization use case covers cloud and virtual private server (VPS)
			
 
				+hosting, as well as traditional data center and desktop virtualization.  These
			
 
				+use cases rely on hardware virtualization extensions to execute guest code
			
 
				+safely on the physical CPU at close-to-native speed.
			
 
				+
			
 
				+The following entities are untrusted, meaning that they may be buggy or
			
 
				+malicious:
			
 
				+
			
 
				+- Guest
			
 
				+- User-facing interfaces (e.g. VNC, SPICE, WebSocket)
			
 
				+- Network protocols (e.g. NBD, live migration)
			
 
				+- User-supplied files (e.g. disk images, kernels, device trees)
			
 
				+- Passthrough devices (e.g. PCI, USB)
			
 
				+
			
 
				+Bugs affecting these entities are evaluated on whether they can cause damage in
			
 
				+real-world use cases and treated as security bugs if this is the case.
			
 
				+
			
 
				+Non-virtualization Use Case
			
 
				+'''''''''''''''''''''''''''
			
 
				+
			
 
				+The non-virtualization use case covers emulation using the Tiny Code Generator
			
 
				+(TCG).  In principle the TCG and device emulation code used in conjunction with
			
 
				+the non-virtualization use case should meet the same security requirements as
			
 
				+the virtualization use case.  However, for historical reasons much of the
			
 
				+non-virtualization use case code was not written with these security
			
 
				+requirements in mind.
			
 
				+
			
 
				+Bugs affecting the non-virtualization use case are not considered security
			
 
				+bugs at this time.  Users with non-virtualization use cases must not rely on
			
 
				+QEMU to provide guest isolation or any security guarantees.
			
 
				+
			
 
				+Architecture
			
 
				+------------
			
 
				+
			
 
				+This section describes the design principles that ensure the security
			
 
				+requirements are met.
			
 
				+
			
 
				+Guest Isolation
			
 
				+'''''''''''''''
			
 
				+
			
 
				+Guest isolation is the confinement of guest code to the virtual machine.  When
			
 
				+guest code gains control of execution on the host this is called escaping the
			
 
				+virtual machine.  Isolation also includes resource limits such as throttling of
			
 
				+CPU, memory, disk, or network.  Guests must be unable to exceed their resource
			
 
				+limits.
			
 
				+
			
 
				+QEMU presents an attack surface to the guest in the form of emulated devices.
			
 
				+The guest must not be able to gain control of QEMU.  Bugs in emulated devices
			
 
				+could allow malicious guests to gain code execution in QEMU.  At this point the
			
 
				+guest has escaped the virtual machine and is able to act in the context of the
			
 
				+QEMU process on the host.
			
 
				+
			
 
				+Guests often interact with other guests and share resources with them.  A
			
 
				+malicious guest must not gain control of other guests or access their data.
			
 
				+Disk image files and network traffic must be protected from other guests unless
			
 
				+explicitly shared between them by the user.
			
 
				+
			
 
				+Principle of Least Privilege
			
 
				+''''''''''''''''''''''''''''
			
 
				+
			
 
				+The principle of least privilege states that each component only has access to
			
 
				+the privileges necessary for its function.  In the case of QEMU this means that
			
 
				+each process only has access to resources belonging to the guest.
			
 
				+
			
 
				+The QEMU process should not have access to any resources that are inaccessible
			
 
				+to the guest.  This way the guest does not gain anything by escaping into the
			
 
				+QEMU process since it already has access to those same resources from within
			
 
				+the guest.
			
 
				+
			
 
				+Following the principle of least privilege immediately fulfills guest isolation
			
 
				+requirements.  For example, guest A only has access to its own disk image file
			
 
				+``a.img`` and not guest B's disk image file ``b.img``.
			
 
				+
			
 
				+In reality certain resources are inaccessible to the guest but must be
			
 
				+available to QEMU to perform its function.  For example, host system calls are
			
 
				+necessary for QEMU but are not exposed to guests.  A guest that escapes into
			
 
				+the QEMU process can then begin invoking host system calls.
			
 
				+
			
 
				+New features must be designed to follow the principle of least privilege.
			
 
				+Should this not be possible for technical reasons, the security risk must be
			
 
				+clearly documented so users are aware of the trade-off of enabling the feature.
			
 
				+
			
 
				+Isolation mechanisms
			
 
				+''''''''''''''''''''
			
 
				+
			
 
				+Several isolation mechanisms are available to realize this architecture of
			
 
				+guest isolation and the principle of least privilege.  With the exception of
			
 
				+Linux seccomp, these mechanisms are all deployed by management tools that
			
 
				+launch QEMU, such as libvirt.  They are also platform-specific so they are only
			
 
				+described briefly for Linux here.
			
 
				+
			
 
				+The fundamental isolation mechanism is that QEMU processes must run as
			
 
				+unprivileged users.  Sometimes it seems more convenient to launch QEMU as
			
 
				+root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
			
 
				+huge security risk.  File descriptor passing can be used to give an otherwise
			
 
				+unprivileged QEMU process access to host devices without running QEMU as root.
			
 
				+It is also possible to launch QEMU as a non-root user and configure UNIX groups
			
 
				+for access to ``/dev/kvm``, ``/dev/net/tun``, and other device nodes.
			
 
				+Some Linux distros already ship with UNIX groups for these devices by default.
			
 
				+
			
 
				+- SELinux and AppArmor make it possible to confine processes beyond the
			
 
				+  traditional UNIX process and file permissions model.  They restrict the QEMU
			
 
				+  process from accessing processes and files on the host system that are not
			
 
				+  needed by QEMU.
			
 
				+
			
 
				+- Resource limits and cgroup controllers provide throughput and utilization
			
 
				+  limits on key resources such as CPU time, memory, and I/O bandwidth.
			
 
				+
			
 
				+- Linux namespaces can be used to make process, file system, and other system
			
 
				+  resources unavailable to QEMU.  A namespaced QEMU process is restricted to only
			
 
				+  those resources that were granted to it.
			
 
				+
			
 
				+- Linux seccomp is available via the QEMU ``--sandbox`` option.  It disables
			
 
				+  system calls that are not needed by QEMU, thereby reducing the host kernel
			
 
				+  attack surface.
			
 
				+
			
 
				+Sensitive configurations
			
 
				+------------------------
			
 
				+
			
 
				+There are aspects of QEMU that can have security implications which users &
			
 
				+management applications must be aware of.
			
 
				+
			
 
				+Monitor console (QMP and HMP)
			
 
				+'''''''''''''''''''''''''''''
			
 
				+
			
 
				+The monitor console (whether used with QMP or HMP) provides an interface
			
 
				+to dynamically control many aspects of QEMU's runtime operation. Many of the
			
 
				+commands exposed will instruct QEMU to access content on the host file system
			
 
				+and/or trigger spawning of external processes.
			
 
				+
			
 
				+For example, the ``migrate`` command allows for the spawning of arbitrary
			
 
				+processes for the purpose of tunnelling the migration data stream. The
			
 
				+``blockdev-add`` command instructs QEMU to open arbitrary files, exposing
			
 
				+their content to the guest as a virtual disk.
			
 
				+
			
 
				+Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor,
			
 
				+or Linux namespaces, the monitor console should be considered to have privileges
			
 
				+equivalent to those of the user account QEMU is running under.
			
 
				+
			
 
				+It is further important to consider the security of the character device backend
			
 
				+over which the monitor console is exposed. It needs to have protection against
			
 
				+malicious third parties which might try to make unauthorized connections, or
			
 
				+perform man-in-the-middle attacks. Many of the character device backends do not
			
 
				+satisfy this requirement and so must not be used for the monitor console.
			
 
				+
			
 
				+The general recommendation is that the monitor console should be exposed over
			
 
				+a UNIX domain socket backend to the local host only. Use of the TCP based
			
 
				+character device backend is inappropriate unless configured to use both TLS
			
 
				+encryption and authorization control policy on client connections.
			
 
				+
			
 
				+In summary, the monitor console is considered a privileged control interface to
			
 
				+QEMU and as such should only be made accessible to a trusted management
			
 
				+application or user.