Original Xbox Emulator for Windows, macOS, and Linux (Active Development)
Go to file
Alex Williamson 89d5202edc vfio/pci: Allow relocating MSI-X MMIO
Recently proposed vfio-pci kernel changes (v4.16) remove the
restriction preventing userspace from mmap'ing PCI BARs in areas
overlapping the MSI-X vector table.  This change is primarily intended
to benefit host platforms which make use of system page sizes larger
than the PCI spec recommendation for alignment of MSI-X data
structures (ie. not x86_64).  In the case of POWER systems, the SPAPR
spec requires the VM to program MSI-X using hypercalls, rendering the
MSI-X vector table unused in the VM view of the device.  However,
ARM64 platforms also support 64KB pages and rely on QEMU emulation of
MSI-X.  Regardless of the kernel driver allowing mmaps overlapping
the MSI-X vector table, emulation of the MSI-X vector table also
prevents direct mapping of device MMIO spaces overlapping this page.
Thanks to the fact that PCI devices have a standard self discovery
mechanism, we can try to resolve this by relocating the MSI-X data
structures, either by creating a new PCI BAR or extending an existing
BAR and updating the MSI-X capability for the new location.  There's
even a very slim chance that this could benefit devices which do not
adhere to the PCI spec alignment guidelines on x86_64 systems.

This new x-msix-relocation option accepts the following choices:

  off: Disable MSI-X relocation, use native device config (default)
  auto: Use a known good combination for the platform/device (none yet)
  bar0..bar5: Specify the target BAR for MSI-X data structures

If compatible, the target BAR will either be created or extended and
the new portion will be used for MSI-X emulation.

The first obvious user question with this option is how to determine
whether a given platform and device might benefit from this option.
In most cases, the answer is that it won't, especially on x86_64.
Devices often dedicate an entire BAR to MSI-X and therefore no
performance sensitive registers overlap the MSI-X area.  Take for
example:

# lspci -vvvs 0a:00.0
0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
	...
	Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K]
	Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K]
	...
	Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
		Vector table: BAR=3 offset=00000000
		PBA: BAR=3 offset=00002000

This device uses the 16K bar3 for MSI-X with the vector table at
offset zero and the pending bits arrary at offset 8K, fully honoring
the PCI spec alignment guidance.  The data sheet specifically refers
to this as an MSI-X BAR.  This device would not see a benefit from
MSI-X relocation regardless of the platform, regardless of the page
size.

However, here's another example:

# lspci -vvvs 02:00.0
02:00.0 Serial Attached SCSI controller: xxxxxxxx
	...
	Region 0: I/O ports at c000 [size=256]
	Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K]
	Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K]
	...
	Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
		Vector table: BAR=1 offset=0000e000
		PBA: BAR=1 offset=0000f000

Here the MSI-X data structures are placed on separate 4K pages at the
end of a 64KB BAR.  If our host page size is 4K, we're likely fine,
but at 64KB page size, MSI-X emulation at that location prevents the
entire BAR from being directly mapped into the VM address space.
Overlapping performance sensitive registers then starts to be a very
likely scenario on such a platform.  At this point, the user could
enable tracing on vfio_region_read and vfio_region_write to determine
more conclusively if device accesses are being trapped through QEMU.

Upon finding a device and platform in need of MSI-X relocation, the
next problem is how to choose target PCI BAR to host the MSI-X data
structures.  A few key rules to keep in mind for this selection
include:

 * There are only 6 BAR slots, bar0..bar5
 * 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot
 * PCI BARs are always a power of 2 in size, extending == doubling
 * The maximum size of a 32-bit BAR is 2GB
 * MSI-X data structures must reside in an MMIO BAR

Using these rules, we can evaluate each BAR of the second example
device above as follows:

 bar0: I/O port BAR, incompatible with MSI-X tables
 bar1: BAR could be extended, incurring another 64KB of MMIO
 bar2: Unavailable, bar1 is 64-bit, this register is used by bar1
 bar3: BAR could be extended, incurring another 256KB of MMIO
 bar4: Unavailable, bar3 is 64bit, this register is used by bar3
 bar5: Available, empty BAR, minimum additional MMIO

A secondary optimization we might wish to make in relocating MSI-X
is to minimize the additional MMIO required for the device, therefore
we might test the available choices in order of preference as bar5,
bar1, and finally bar3.  The original proposal for this feature
included an 'auto' option which would choose bar5 in this case, but
various drivers have been found that make assumptions about the
properties of the "first" BAR or the size of BARs such that there
appears to be no foolproof automatic selection available, requiring
known good combinations to be sourced from users.  This patch is
pre-enabled for an 'auto' selection making use of a validated lookup
table, but no entries are yet identified.

Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-02-06 11:08:26 -07:00
accel accel/tcg: add size paremeter in tlb_fill() 2018-01-25 16:02:24 +01:00
audio maint: Fix macros with broken 'do/while(0); ' usage 2018-01-16 14:54:52 +01:00
backends tpm: report backend request error 2018-01-29 14:22:43 -05:00
block block/ssh: fix possible segmentation fault when .desc is not null-terminated 2018-01-31 22:37:00 -05:00
bsd-user misc: remove headers implicitly included 2017-12-18 17:07:02 +03:00
capstone@22ead3e0bf disas: Add capstone as submodule 2017-10-26 11:56:20 +02:00
chardev chardev: Clean up previous patch indentation 2018-01-16 14:54:52 +01:00
contrib contrib/vhost-user-blk: introduce a vhost-user-blk sample application 2018-01-18 21:52:37 +02:00
crypto crypto: fix stack-buffer-overflow error 2018-01-16 14:54:50 +01:00
default-configs hw/hppa: Implement DINO system board 2018-01-31 05:30:50 -08:00
disas target/xtensa updates: 2018-01-24 16:59:36 +00:00
docs Pull request 2018-01-24 15:28:36 +00:00
dtc@e54388015a Update dtc to fix compilation problem on Mac OS 10.6 2018-01-10 12:53:00 +11:00
fpu softfloat: define floatx80_round() 2017-06-29 20:27:39 +02:00
fsdev fsdev: improve error handling of backend init 2018-01-08 11:18:23 +01:00
gdb-xml s390x/gdb: add gs registers 2017-07-14 12:29:49 +02:00
hw vfio/pci: Allow relocating MSI-X MMIO 2018-02-06 11:08:26 -07:00
include qapi: Create DEFINE_PROP_OFF_AUTO_PCIBAR 2018-02-06 11:08:26 -07:00
io io: introduce a network socket listener API 2017-12-15 15:07:26 +00:00
libdecnumber build: remove CONFIG_LIBDECNUMBER 2017-10-16 18:03:52 +02:00
linux-headers linux-headers: update 2018-01-22 11:07:47 +01:00
linux-user target/hppa: Add control registers 2018-01-30 10:08:18 -08:00
migration Pull request for various patches that have been reviewed and 2018-01-23 10:15:09 +00:00
nbd qapi: add nbd-server-remove 2018-01-26 09:37:20 -06:00
net net: Allow netdevs to be used with 'hostfwd_add' and 'hostfwd_remove' 2018-01-29 16:05:37 +08:00
pc-bios roms/seabios-hppa: Update submodule and image 2018-02-04 14:11:18 -08:00
po po: add missing translations in de, fr, it, zh 2016-12-14 18:47:19 +00:00
qapi qapi: Create DEFINE_PROP_OFF_AUTO_PCIBAR 2018-02-06 11:08:26 -07:00
qga sockets: remove obsolete code that updated listen address 2017-12-21 09:22:44 +01:00
qobject qapi: Add qobject_is_equal() 2017-11-17 18:21:30 +01:00
qom tcg: Add CPUState cflags_next_tb 2017-10-24 13:53:41 -07:00
replay migration: pre_save return int 2017-09-27 11:35:59 +01:00
roms roms/seabios-hppa: Update submodule and image 2018-02-04 14:11:18 -08:00
scripts dump-guest-memory.py: skip vmcoreinfo section if not available 2018-02-01 12:13:52 +01:00
scsi scsi: fix scsi_convert_sense crash when in_buf == NULL && in_len == 0 2018-01-12 09:54:13 +01:00
slirp slirp: add in6_dhcp_multicast() 2018-01-14 18:16:13 +01:00
stubs tpm: add stubs 2017-10-25 01:05:04 -04:00
target spapr/iommu: Enable in-kernel TCE acceleration via VFIO KVM device 2018-02-06 11:08:24 -07:00
tcg tcg/ppc: Allow a 32-bit offset to the constant pool 2018-01-16 08:21:56 -08:00
tests tests: Enable boot-serial-test for hppa 2018-02-04 14:11:07 -08:00
trace trace: Try using tracefs first 2017-12-18 14:37:36 +00:00
ui ui: correctly advance output buffer when writing SASL data 2018-02-02 07:48:18 +01:00
util Block layer patches 2018-01-24 22:55:57 +00:00
.dir-locals.el Add .dir-locals.el file to configure emacs coding style 2015-10-08 19:46:01 +03:00
.editorconfig add editorconfig 2017-07-20 09:56:56 +02:00
.exrc qemu: add .exrc 2012-09-07 09:02:44 +03:00
.gdbinit .gdbinit: load QEMU sub-commands when gdb starts 2017-06-07 14:38:45 +01:00
.gitignore contrib/vhost-user-blk: introduce a vhost-user-blk sample application 2018-01-18 21:52:37 +02:00
.gitmodules pc-bios: Add hppa-firmware.img and git submodule 2018-01-31 05:30:50 -08:00
.mailmap MAINTAINERS: Update Paul Burton's email address 2017-11-06 07:36:43 -08:00
.shippable.yml shippable: add win32/64 targets 2017-07-18 10:58:36 +01:00
.travis.yml travis: move make -j flag out of script 2017-07-18 09:39:19 +01:00
CODING_STYLE coding_style: add point about 0x in trace-events 2017-08-01 12:13:07 +01:00
COPYING COPYING: update from FSF 2008-10-12 17:54:42 +00:00
COPYING.LIB Update FSF address in GPL/LGPL boilerplate 2009-01-04 22:05:52 +00:00
COPYING.PYTHON scripts: add argparse module for Python 2.6 compatibility 2017-08-30 12:02:11 +01:00
Changelog Use HTTPS for qemu.org and other domains 2017-11-21 13:34:13 +00:00
HACKING HACKING: document #include order 2017-01-03 16:38:47 +00:00
LICENSE vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio 2014-12-19 15:24:06 -07:00
MAINTAINERS Merge tpm 2018/02/03 v1 2018-02-05 09:31:37 +00:00
Makefile pc-bios: Add hppa-firmware.img and git submodule 2018-01-31 05:30:50 -08:00
Makefile.objs hw/hppa: Implement DINO system board 2018-01-31 05:30:50 -08:00
Makefile.target Fix build of console and GUI executables for Windows 2017-11-23 10:46:42 +00:00
README Use HTTPS for qemu.org and other domains 2017-11-21 13:34:13 +00:00
VERSION Open 2.12 development tree 2017-12-13 17:05:59 +00:00
arch_init.c target/hppa: Skeleton support for hppa-softmmu 2018-01-30 10:08:18 -08:00
balloon.c trace: switch to modular code generation for sub-directories 2017-01-31 17:11:18 +00:00
block.c block: Keep nodes drained between reopen_queue/multiple 2017-12-22 15:05:32 +01:00
blockdev-nbd.c qapi: add nbd-server-remove 2018-01-26 09:37:20 -06:00
blockdev.c blockdev: Mark BD-{remove,insert}-medium stable 2018-01-23 12:34:42 +01:00
blockjob.c blockjob: Pause job on draining any job BDS 2017-12-22 15:05:32 +01:00
bootdevice.c Makefile: Move bootdevice.o to common-obj-y 2017-07-04 14:39:27 +02:00
bt-host.c all: Clean up includes 2016-02-04 17:41:30 +00:00
bt-vhci.c all: Clean up includes 2016-02-04 17:41:30 +00:00
configure target/hppa: Enable MTTCG 2018-01-31 05:30:50 -08:00
cpus-common.c *_run_on_cpu: introduce run_on_cpu_data type 2016-10-31 15:00:25 +01:00
cpus.c cpus: unify qemu_*_wait_io_event 2018-01-16 14:54:51 +01:00
device-hotplug.c blockdev: Split monitor reference from BB creation 2016-03-17 15:47:56 +01:00
device_tree.c device_tree: fix compiler warnings (clang 5) 2017-05-07 09:57:51 +03:00
disas.c disas: Dump insn bytes along with capstone disassembly 2017-11-09 08:46:38 +01:00
dma-helpers.c block: explicitly acquire aiocontext in bottom halves that need it 2017-02-21 11:39:39 +00:00
dump.c dump: fix note_name_equal() 2018-01-02 14:49:54 +01:00
exec.c hostmem-file: add "align" option 2018-01-19 11:18:51 -02:00
gdbstub.c gdbstub: add tracing 2017-12-18 14:37:36 +00:00
hmp-commands-info.hx target/m68k: add HMP command "info tlb" 2018-01-25 16:02:25 +01:00
hmp-commands.hx -----BEGIN PGP SIGNATURE----- 2018-01-29 14:29:17 +00:00
hmp.c hmp: Add nbd_server_remove to mirror QMP command 2018-01-26 09:56:12 -06:00
hmp.h hmp: Add nbd_server_remove to mirror QMP command 2018-01-26 09:56:12 -06:00
ioport.c trace: switch to modular code generation for sub-directories 2017-01-31 17:11:18 +00:00
iothread.c iothread: fix iothread_stop() race condition 2017-12-19 10:25:09 +00:00
memory.c memory/iommu: Add get_attr() 2018-02-06 11:08:24 -07:00
memory_ldst.inc.c exec: introduce memory_ldst.inc.c 2016-12-22 16:00:23 +01:00
memory_mapping.c Replace all occurances of __FUNCTION__ with __func__ 2018-01-22 09:46:18 +01:00
module-common.c all: Clean up includes 2016-02-04 17:41:30 +00:00
monitor.c readline: add a free function 2018-01-16 14:54:50 +01:00
numa.c hostmem-file: add "align" option 2018-01-19 11:18:51 -02:00
os-posix.c os-posix: Drop misleading comment 2017-10-16 21:01:37 +03:00
os-win32.c shutdown: Add source information to SHUTDOWN and RESET 2017-05-23 13:28:17 +02:00
qapi-schema.json qmp: remove qmp_cpu 2017-12-20 19:18:33 +01:00
qdev-monitor.c qdev: Check for the availability of a hotplug controller before adding a device 2018-01-19 11:18:51 -02:00
qdict-test-data.txt Introduce QDict test data file 2009-09-04 09:37:34 -05:00
qemu-bridge-helper.c all: Remove unnecessary glib.h includes 2016-06-07 18:19:24 +03:00
qemu-doc.texi ppc: Deprecate qemu-system-ppcemb 2018-01-27 17:25:27 +11:00
qemu-ga.texi qemu-ga: Remove stray 'q' in documentation 2016-10-28 18:17:23 +03:00
qemu-img-cmds.hx qemu-img: add --shrink flag for resize 2017-09-26 15:00:32 +02:00
qemu-img.c block: Add errp to bdrv_snapshot_goto() 2017-11-21 14:48:22 +01:00
qemu-img.texi qemu-img.1: Image invalidation on qemu-img commit 2017-10-26 14:59:18 +02:00
qemu-io-cmds.c block: Keep nodes drained between reopen_queue/multiple 2017-12-22 15:05:32 +01:00
qemu-io.c qemu-io: Add -C for opening with copy-on-read 2017-10-06 16:28:58 +02:00
qemu-keymap.c tools: add qemu-keymap 2017-10-16 14:50:54 +02:00
qemu-nbd.c blockdev: convert qemu-nbd server to QIONetListener 2017-12-21 09:30:32 +01:00
qemu-nbd.texi nbd: Add qemu-nbd -D for human-readable description 2016-11-02 09:28:55 +01:00
qemu-option-trace.texi docs: update manpage for stderr->log rename 2017-02-13 13:38:31 +00:00
qemu-options-wrapper.h qemu-options: Remove stray colons from output of --help 2017-12-20 09:04:27 +01:00
qemu-options.h Clean up ill-advised or unusual header guards 2016-07-12 16:20:46 +02:00
qemu-options.hx qemu-doc: Get rid of "vlan=X" example in the documentation 2018-01-29 16:05:38 +08:00
qemu-seccomp.c seccomp: add resourcecontrol argument to command line 2017-09-15 10:15:06 +02:00
qemu-tech.texi qemu-doc: merge qemu-tech and qemu-doc 2016-10-07 10:05:54 +02:00
qemu.nsi Use HTTPS for qemu.org and other domains 2017-11-21 13:34:13 +00:00
qemu.sasl Default to GSSAPI (Kerberos) instead of DIGEST-MD5 for SASL 2017-05-09 14:41:47 +01:00
qmp.c qmp: remove qmp_cpu 2017-12-20 19:18:33 +01:00
qtest.c qtest: Don't perform side effects inside assertion 2017-09-15 09:05:19 +02:00
replication.c replication: Introduce new APIs to do replication operation 2016-09-13 11:00:56 +01:00
replication.h replication: Introduce new APIs to do replication operation 2016-09-13 11:00:56 +01:00
rules.mak build-sys: silence make by default or V=0 2018-01-12 13:22:02 +01:00
thunk.c thunk: assert nb_fields is valid 2017-07-31 13:06:39 +03:00
tpm.c tpm: remove tpm_register_model() 2017-12-14 23:39:15 -05:00
trace-events find_ram_offset: Add comments and tracing 2018-01-16 14:54:52 +01:00
version.rc Use HTTPS for qemu.org and other domains 2017-11-21 13:34:13 +00:00
vl.c usb: -usbdevice cleanups, storage fix, QOMify ccid. 2018-01-26 13:29:28 +00:00

README

         QEMU README
         ===========

QEMU is a generic and open source machine & userspace emulator and
virtualizer.

QEMU is capable of emulating a complete machine in software without any
need for hardware virtualization support. By using dynamic translation,
it achieves very good performance. QEMU can also integrate with the Xen
and KVM hypervisors to provide emulated hardware while allowing the
hypervisor to manage the CPU. With hypervisor support, QEMU can achieve
near native performance for CPUs. When QEMU emulates CPUs directly it is
capable of running operating systems made for one machine (e.g. an ARMv7
board) on a different machine (e.g. an x86_64 PC board).

QEMU is also capable of providing userspace API virtualization for Linux
and BSD kernel interfaces. This allows binaries compiled against one
architecture ABI (e.g. the Linux PPC64 ABI) to be run on a host using a
different architecture ABI (e.g. the Linux x86_64 ABI). This does not
involve any hardware emulation, simply CPU and syscall emulation.

QEMU aims to fit into a variety of use cases. It can be invoked directly
by users wishing to have full control over its behaviour and settings.
It also aims to facilitate integration into higher level management
layers, by providing a stable command line interface and monitor API.
It is commonly invoked indirectly via the libvirt library when using
open source applications such as oVirt, OpenStack and virt-manager.

QEMU as a whole is released under the GNU General Public License,
version 2. For full licensing details, consult the LICENSE file.


Building
========

QEMU is multi-platform software intended to be buildable on all modern
Linux platforms, OS-X, Win32 (via the Mingw64 toolchain) and a variety
of other UNIX targets. The simple steps to build QEMU are:

  mkdir build
  cd build
  ../configure
  make

Additional information can also be found online via the QEMU website:

  https://qemu.org/Hosts/Linux
  https://qemu.org/Hosts/Mac
  https://qemu.org/Hosts/W32


Submitting patches
==================

The QEMU source code is maintained under the GIT version control system.

   git clone git://git.qemu.org/qemu.git

When submitting patches, the preferred approach is to use 'git
format-patch' and/or 'git send-email' to format & send the mail to the
qemu-devel@nongnu.org mailing list. All patches submitted must contain
a 'Signed-off-by' line from the author. Patches should follow the
guidelines set out in the HACKING and CODING_STYLE files.

Additional information on submitting patches can be found online via
the QEMU website

  https://qemu.org/Contribute/SubmitAPatch
  https://qemu.org/Contribute/TrivialPatches


Bug reporting
=============

The QEMU project uses Launchpad as its primary upstream bug tracker. Bugs
found when running code built from QEMU git or upstream released sources
should be reported via:

  https://bugs.launchpad.net/qemu/

If using QEMU via an operating system vendor pre-built binary package, it
is preferable to report bugs to the vendor's own bug tracker first. If
the bug is also known to affect latest upstream code, it can also be
reported via launchpad.

For additional information on bug reporting consult:

  https://qemu.org/Contribute/ReportABug


Contact
=======

The QEMU community can be contacted in a number of ways, with the two
main methods being email and IRC

 - qemu-devel@nongnu.org
   https://lists.nongnu.org/mailman/listinfo/qemu-devel
 - #qemu on irc.oftc.net

Information on additional methods of contacting the community can be
found online via the QEMU website:

  https://qemu.org/Contribute/StartHere

-- End