123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371 |
- @node Implementation notes
- @appendix Implementation notes
- @menu
- * CPU emulation::
- * Translator Internals::
- * QEMU compared to other emulators::
- * Bibliography::
- @end menu
- @node CPU emulation
- @section CPU emulation
- @menu
- * x86:: x86 and x86-64 emulation
- * ARM:: ARM emulation
- * MIPS:: MIPS emulation
- * PPC:: PowerPC emulation
- * SPARC:: Sparc32 and Sparc64 emulation
- * Xtensa:: Xtensa emulation
- @end menu
- @node x86
- @subsection x86 and x86-64 emulation
- QEMU x86 target features:
- @itemize
- @item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation.
- LDT/GDT and IDT are emulated. VM86 mode is also supported to run
- DOSEMU. There is some support for MMX/3DNow!, SSE, SSE2, SSE3, SSSE3,
- and SSE4 as well as x86-64 SVM.
- @item Support of host page sizes bigger than 4KB in user mode emulation.
- @item QEMU can emulate itself on x86.
- @item An extensive Linux x86 CPU test program is included @file{tests/test-i386}.
- It can be used to test other x86 virtual CPUs.
- @end itemize
- Current QEMU limitations:
- @itemize
- @item Limited x86-64 support.
- @item IPC syscalls are missing.
- @item The x86 segment limits and access rights are not tested at every
- memory access (yet). Hopefully, very few OSes seem to rely on that for
- normal use.
- @end itemize
- @node ARM
- @subsection ARM emulation
- @itemize
- @item Full ARM 7 user emulation.
- @item NWFPE FPU support included in user Linux emulation.
- @item Can run most ARM Linux binaries.
- @end itemize
- @node MIPS
- @subsection MIPS emulation
- @itemize
- @item The system emulation allows full MIPS32/MIPS64 Release 2 emulation,
- including privileged instructions, FPU and MMU, in both little and big
- endian modes.
- @item The Linux userland emulation can run many 32 bit MIPS Linux binaries.
- @end itemize
- Current QEMU limitations:
- @itemize
- @item Self-modifying code is not always handled correctly.
- @item 64 bit userland emulation is not implemented.
- @item The system emulation is not complete enough to run real firmware.
- @item The watchpoint debug facility is not implemented.
- @end itemize
- @node PPC
- @subsection PowerPC emulation
- @itemize
- @item Full PowerPC 32 bit emulation, including privileged instructions,
- FPU and MMU.
- @item Can run most PowerPC Linux binaries.
- @end itemize
- @node SPARC
- @subsection Sparc32 and Sparc64 emulation
- @itemize
- @item Full SPARC V8 emulation, including privileged
- instructions, FPU and MMU. SPARC V9 emulation includes most privileged
- and VIS instructions, FPU and I/D MMU. Alignment is fully enforced.
- @item Can run most 32-bit SPARC Linux binaries, SPARC32PLUS Linux binaries and
- some 64-bit SPARC Linux binaries.
- @end itemize
- Current QEMU limitations:
- @itemize
- @item IPC syscalls are missing.
- @item Floating point exception support is buggy.
- @item Atomic instructions are not correctly implemented.
- @item There are still some problems with Sparc64 emulators.
- @end itemize
- @node Xtensa
- @subsection Xtensa emulation
- @itemize
- @item Core Xtensa ISA emulation, including most options: code density,
- loop, extended L32R, 16- and 32-bit multiplication, 32-bit division,
- MAC16, miscellaneous operations, boolean, FP coprocessor, coprocessor
- context, debug, multiprocessor synchronization,
- conditional store, exceptions, relocatable vectors, unaligned exception,
- interrupts (including high priority and timer), hardware alignment,
- region protection, region translation, MMU, windowed registers, thread
- pointer, processor ID.
- @item Not implemented options: data/instruction cache (including cache
- prefetch and locking), XLMI, processor interface. Also options not
- covered by the core ISA (e.g. FLIX, wide branches) are not implemented.
- @item Can run most Xtensa Linux binaries.
- @item New core configuration that requires no additional instructions
- may be created from overlay with minimal amount of hand-written code.
- @end itemize
- @node Translator Internals
- @section Translator Internals
- QEMU is a dynamic translator. When it first encounters a piece of code,
- it converts it to the host instruction set. Usually dynamic translators
- are very complicated and highly CPU dependent. QEMU uses some tricks
- which make it relatively easily portable and simple while achieving good
- performances.
- QEMU's dynamic translation backend is called TCG, for "Tiny Code
- Generator". For more information, please take a look at @code{tcg/README}.
- Some notable features of QEMU's dynamic translator are:
- @table @strong
- @item CPU state optimisations:
- The target CPUs have many internal states which change the way it
- evaluates instructions. In order to achieve a good speed, the
- translation phase considers that some state information of the virtual
- CPU cannot change in it. The state is recorded in the Translation
- Block (TB). If the state changes (e.g. privilege level), a new TB will
- be generated and the previous TB won't be used anymore until the state
- matches the state recorded in the previous TB. The same idea can be applied
- to other aspects of the CPU state. For example, on x86, if the SS,
- DS and ES segments have a zero base, then the translator does not even
- generate an addition for the segment base.
- @item Direct block chaining:
- After each translated basic block is executed, QEMU uses the simulated
- Program Counter (PC) and other cpu state information (such as the CS
- segment base value) to find the next basic block.
- In order to accelerate the most common cases where the new simulated PC
- is known, QEMU can patch a basic block so that it jumps directly to the
- next one.
- The most portable code uses an indirect jump. An indirect jump makes
- it easier to make the jump target modification atomic. On some host
- architectures (such as x86 or PowerPC), the @code{JUMP} opcode is
- directly patched so that the block chaining has no overhead.
- @item Self-modifying code and translated code invalidation:
- Self-modifying code is a special challenge in x86 emulation because no
- instruction cache invalidation is signaled by the application when code
- is modified.
- User-mode emulation marks a host page as write-protected (if it is
- not already read-only) every time translated code is generated for a
- basic block. Then, if a write access is done to the page, Linux raises
- a SEGV signal. QEMU then invalidates all the translated code in the page
- and enables write accesses to the page. For system emulation, write
- protection is achieved through the software MMU.
- Correct translated code invalidation is done efficiently by maintaining
- a linked list of every translated block contained in a given page. Other
- linked lists are also maintained to undo direct block chaining.
- On RISC targets, correctly written software uses memory barriers and
- cache flushes, so some of the protection above would not be
- necessary. However, QEMU still requires that the generated code always
- matches the target instructions in memory in order to handle
- exceptions correctly.
- @item Exception support:
- longjmp() is used when an exception such as division by zero is
- encountered.
- The host SIGSEGV and SIGBUS signal handlers are used to get invalid
- memory accesses. QEMU keeps a map from host program counter to
- target program counter, and looks up where the exception happened
- based on the host program counter at the exception point.
- On some targets, some bits of the virtual CPU's state are not flushed to the
- memory until the end of the translation block. This is done for internal
- emulation state that is rarely accessed directly by the program and/or changes
- very often throughout the execution of a translation block---this includes
- condition codes on x86, delay slots on SPARC, conditional execution on
- ARM, and so on. This state is stored for each target instruction, and
- looked up on exceptions.
- @item MMU emulation:
- For system emulation QEMU uses a software MMU. In that mode, the MMU
- virtual to physical address translation is done at every memory
- access.
- QEMU uses an address translation cache (TLB) to speed up the translation.
- In order to avoid flushing the translated code each time the MMU
- mappings change, all caches in QEMU are physically indexed. This
- means that each basic block is indexed with its physical address.
- In order to avoid invalidating the basic block chain when MMU mappings
- change, chaining is only performed when the destination of the jump
- shares a page with the basic block that is performing the jump.
- The MMU can also distinguish RAM and ROM memory areas from MMIO memory
- areas. Access is faster for RAM and ROM because the translation cache also
- hosts the offset between guest address and host memory. Accessing MMIO
- memory areas instead calls out to C code for device emulation.
- Finally, the MMU helps tracking dirty pages and pages pointed to by
- translation blocks.
- @end table
- @node QEMU compared to other emulators
- @section QEMU compared to other emulators
- Like bochs [1], QEMU emulates an x86 CPU. But QEMU is much faster than
- bochs as it uses dynamic compilation. Bochs is closely tied to x86 PC
- emulation while QEMU can emulate several processors.
- Like Valgrind [2], QEMU does user space emulation and dynamic
- translation. Valgrind is mainly a memory debugger while QEMU has no
- support for it (QEMU could be used to detect out of bound memory
- accesses as Valgrind, but it has no support to track uninitialised data
- as Valgrind does). The Valgrind dynamic translator generates better code
- than QEMU (in particular it does register allocation) but it is closely
- tied to an x86 host and target and has no support for precise exceptions
- and system emulation.
- EM86 [3] is the closest project to user space QEMU (and QEMU still uses
- some of its code, in particular the ELF file loader). EM86 was limited
- to an alpha host and used a proprietary and slow interpreter (the
- interpreter part of the FX!32 Digital Win32 code translator [4]).
- TWIN from Willows Software was a Windows API emulator like Wine. It is less
- accurate than Wine but includes a protected mode x86 interpreter to launch
- x86 Windows executables. Such an approach has greater potential because most
- of the Windows API is executed natively but it is far more difficult to
- develop because all the data structures and function parameters exchanged
- between the API and the x86 code must be converted.
- User mode Linux [5] was the only solution before QEMU to launch a
- Linux kernel as a process while not needing any host kernel
- patches. However, user mode Linux requires heavy kernel patches while
- QEMU accepts unpatched Linux kernels. The price to pay is that QEMU is
- slower.
- The Plex86 [6] PC virtualizer is done in the same spirit as the now
- obsolete qemu-fast system emulator. It requires a patched Linux kernel
- to work (you cannot launch the same kernel on your PC), but the
- patches are really small. As it is a PC virtualizer (no emulation is
- done except for some privileged instructions), it has the potential of
- being faster than QEMU. The downside is that a complicated (and
- potentially unsafe) host kernel patch is needed.
- The commercial PC Virtualizers (VMWare [7], VirtualPC [8]) are faster
- than QEMU (without virtualization), but they all need specific, proprietary
- and potentially unsafe host drivers. Moreover, they are unable to
- provide cycle exact simulation as an emulator can.
- VirtualBox [9], Xen [10] and KVM [11] are based on QEMU. QEMU-SystemC
- [12] uses QEMU to simulate a system where some hardware devices are
- developed in SystemC.
- @node Bibliography
- @section Bibliography
- @table @asis
- @item [1]
- @url{http://bochs.sourceforge.net/}, the Bochs IA-32 Emulator Project,
- by Kevin Lawton et al.
- @item [2]
- @url{http://www.valgrind.org/}, Valgrind, an open-source memory debugger
- for GNU/Linux.
- @item [3]
- @url{http://ftp.dreamtime.org/pub/linux/Linux-Alpha/em86/v0.2/docs/em86.html},
- the EM86 x86 emulator on Alpha-Linux.
- @item [4]
- @url{http://www.usenix.org/publications/library/proceedings/usenix-nt97/@/full_papers/chernoff/chernoff.pdf},
- DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT, by Anton
- Chernoff and Ray Hookway.
- @item [5]
- @url{http://user-mode-linux.sourceforge.net/},
- The User-mode Linux Kernel.
- @item [6]
- @url{http://www.plex86.org/},
- The new Plex86 project.
- @item [7]
- @url{http://www.vmware.com/},
- The VMWare PC virtualizer.
- @item [8]
- @url{https://www.microsoft.com/download/details.aspx?id=3702},
- The VirtualPC PC virtualizer.
- @item [9]
- @url{http://virtualbox.org/},
- The VirtualBox PC virtualizer.
- @item [10]
- @url{http://www.xen.org/},
- The Xen hypervisor.
- @item [11]
- @url{http://www.linux-kvm.org/},
- Kernel Based Virtual Machine (KVM).
- @item [12]
- @url{http://www.greensocs.com/projects/QEMUSystemC},
- QEMU-SystemC, a hardware co-simulator.
- @end table
|