123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167 |
- @node Security
- @chapter Security
- @section Overview
- This chapter explains the security requirements that QEMU is designed to meet
- and principles for securely deploying QEMU.
- @section Security Requirements
- QEMU supports many different use cases, some of which have stricter security
- requirements than others. The community has agreed on the overall security
- requirements that users may depend on. These requirements define what is
- considered supported from a security perspective.
- @subsection Virtualization Use Case
- The virtualization use case covers cloud and virtual private server (VPS)
- hosting, as well as traditional data center and desktop virtualization. These
- use cases rely on hardware virtualization extensions to execute guest code
- safely on the physical CPU at close-to-native speed.
- The following entities are untrusted, meaning that they may be buggy or
- malicious:
- @itemize
- @item Guest
- @item User-facing interfaces (e.g. VNC, SPICE, WebSocket)
- @item Network protocols (e.g. NBD, live migration)
- @item User-supplied files (e.g. disk images, kernels, device trees)
- @item Passthrough devices (e.g. PCI, USB)
- @end itemize
- Bugs affecting these entities are evaluated on whether they can cause damage in
- real-world use cases and treated as security bugs if this is the case.
- @subsection Non-virtualization Use Case
- The non-virtualization use case covers emulation using the Tiny Code Generator
- (TCG). In principle the TCG and device emulation code used in conjunction with
- the non-virtualization use case should meet the same security requirements as
- the virtualization use case. However, for historical reasons much of the
- non-virtualization use case code was not written with these security
- requirements in mind.
- Bugs affecting the non-virtualization use case are not considered security
- bugs at this time. Users with non-virtualization use cases must not rely on
- QEMU to provide guest isolation or any security guarantees.
- @section Architecture
- This section describes the design principles that ensure the security
- requirements are met.
- @subsection Guest Isolation
- Guest isolation is the confinement of guest code to the virtual machine. When
- guest code gains control of execution on the host this is called escaping the
- virtual machine. Isolation also includes resource limits such as throttling of
- CPU, memory, disk, or network. Guests must be unable to exceed their resource
- limits.
- QEMU presents an attack surface to the guest in the form of emulated devices.
- The guest must not be able to gain control of QEMU. Bugs in emulated devices
- could allow malicious guests to gain code execution in QEMU. At this point the
- guest has escaped the virtual machine and is able to act in the context of the
- QEMU process on the host.
- Guests often interact with other guests and share resources with them. A
- malicious guest must not gain control of other guests or access their data.
- Disk image files and network traffic must be protected from other guests unless
- explicitly shared between them by the user.
- @subsection Principle of Least Privilege
- The principle of least privilege states that each component only has access to
- the privileges necessary for its function. In the case of QEMU this means that
- each process only has access to resources belonging to the guest.
- The QEMU process should not have access to any resources that are inaccessible
- to the guest. This way the guest does not gain anything by escaping into the
- QEMU process since it already has access to those same resources from within
- the guest.
- Following the principle of least privilege immediately fulfills guest isolation
- requirements. For example, guest A only has access to its own disk image file
- @code{a.img} and not guest B's disk image file @code{b.img}.
- In reality certain resources are inaccessible to the guest but must be
- available to QEMU to perform its function. For example, host system calls are
- necessary for QEMU but are not exposed to guests. A guest that escapes into
- the QEMU process can then begin invoking host system calls.
- New features must be designed to follow the principle of least privilege.
- Should this not be possible for technical reasons, the security risk must be
- clearly documented so users are aware of the trade-off of enabling the feature.
- @subsection Isolation mechanisms
- Several isolation mechanisms are available to realize this architecture of
- guest isolation and the principle of least privilege. With the exception of
- Linux seccomp, these mechanisms are all deployed by management tools that
- launch QEMU, such as libvirt. They are also platform-specific so they are only
- described briefly for Linux here.
- The fundamental isolation mechanism is that QEMU processes must run as
- unprivileged users. Sometimes it seems more convenient to launch QEMU as
- root to give it access to host devices (e.g. @code{/dev/net/tun}) but this poses a
- huge security risk. File descriptor passing can be used to give an otherwise
- unprivileged QEMU process access to host devices without running QEMU as root.
- It is also possible to launch QEMU as a non-root user and configure UNIX groups
- for access to @code{/dev/kvm}, @code{/dev/net/tun}, and other device nodes.
- Some Linux distros already ship with UNIX groups for these devices by default.
- @itemize
- @item SELinux and AppArmor make it possible to confine processes beyond the
- traditional UNIX process and file permissions model. They restrict the QEMU
- process from accessing processes and files on the host system that are not
- needed by QEMU.
- @item Resource limits and cgroup controllers provide throughput and utilization
- limits on key resources such as CPU time, memory, and I/O bandwidth.
- @item Linux namespaces can be used to make process, file system, and other system
- resources unavailable to QEMU. A namespaced QEMU process is restricted to only
- those resources that were granted to it.
- @item Linux seccomp is available via the QEMU @option{--sandbox} option. It disables
- system calls that are not needed by QEMU, thereby reducing the host kernel
- attack surface.
- @end itemize
- @section Sensitive configurations
- There are aspects of QEMU that can have security implications which users &
- management applications must be aware of.
- @subsection Monitor console (QMP and HMP)
- The monitor console (whether used with QMP or HMP) provides an interface
- to dynamically control many aspects of QEMU's runtime operation. Many of the
- commands exposed will instruct QEMU to access content on the host file system
- and/or trigger spawning of external processes.
- For example, the @code{migrate} command allows for the spawning of arbitrary
- processes for the purpose of tunnelling the migration data stream. The
- @code{blockdev-add} command instructs QEMU to open arbitrary files, exposing
- their content to the guest as a virtual disk.
- Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor,
- or Linux namespaces, the monitor console should be considered to have privileges
- equivalent to those of the user account QEMU is running under.
- It is further important to consider the security of the character device backend
- over which the monitor console is exposed. It needs to have protection against
- malicious third parties which might try to make unauthorized connections, or
- perform man-in-the-middle attacks. Many of the character device backends do not
- satisfy this requirement and so must not be used for the monitor console.
- The general recommendation is that the monitor console should be exposed over
- a UNIX domain socket backend to the local host only. Use of the TCP based
- character device backend is inappropriate unless configured to use both TLS
- encryption and authorization control policy on client connections.
- In summary, the monitor console is considered a privileged control interface to
- QEMU and as such should only be made accessible to a trusted management
- application or user.
|