mirror of https://github.com/xemu-project/xemu.git
![]() Recently proposed vfio-pci kernel changes (v4.16) remove the restriction preventing userspace from mmap'ing PCI BARs in areas overlapping the MSI-X vector table. This change is primarily intended to benefit host platforms which make use of system page sizes larger than the PCI spec recommendation for alignment of MSI-X data structures (ie. not x86_64). In the case of POWER systems, the SPAPR spec requires the VM to program MSI-X using hypercalls, rendering the MSI-X vector table unused in the VM view of the device. However, ARM64 platforms also support 64KB pages and rely on QEMU emulation of MSI-X. Regardless of the kernel driver allowing mmaps overlapping the MSI-X vector table, emulation of the MSI-X vector table also prevents direct mapping of device MMIO spaces overlapping this page. Thanks to the fact that PCI devices have a standard self discovery mechanism, we can try to resolve this by relocating the MSI-X data structures, either by creating a new PCI BAR or extending an existing BAR and updating the MSI-X capability for the new location. There's even a very slim chance that this could benefit devices which do not adhere to the PCI spec alignment guidelines on x86_64 systems. This new x-msix-relocation option accepts the following choices: off: Disable MSI-X relocation, use native device config (default) auto: Use a known good combination for the platform/device (none yet) bar0..bar5: Specify the target BAR for MSI-X data structures If compatible, the target BAR will either be created or extended and the new portion will be used for MSI-X emulation. The first obvious user question with this option is how to determine whether a given platform and device might benefit from this option. In most cases, the answer is that it won't, especially on x86_64. Devices often dedicate an entire BAR to MSI-X and therefore no performance sensitive registers overlap the MSI-X area. Take for example: # lspci -vvvs 0a:00.0 0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection ... Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K] Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K] ... Capabilities: [70] MSI-X: Enable+ Count=10 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00002000 This device uses the 16K bar3 for MSI-X with the vector table at offset zero and the pending bits arrary at offset 8K, fully honoring the PCI spec alignment guidance. The data sheet specifically refers to this as an MSI-X BAR. This device would not see a benefit from MSI-X relocation regardless of the platform, regardless of the page size. However, here's another example: # lspci -vvvs 02:00.0 02:00.0 Serial Attached SCSI controller: xxxxxxxx ... Region 0: I/O ports at c000 [size=256] Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K] Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K] ... Capabilities: [c0] MSI-X: Enable+ Count=16 Masked- Vector table: BAR=1 offset=0000e000 PBA: BAR=1 offset=0000f000 Here the MSI-X data structures are placed on separate 4K pages at the end of a 64KB BAR. If our host page size is 4K, we're likely fine, but at 64KB page size, MSI-X emulation at that location prevents the entire BAR from being directly mapped into the VM address space. Overlapping performance sensitive registers then starts to be a very likely scenario on such a platform. At this point, the user could enable tracing on vfio_region_read and vfio_region_write to determine more conclusively if device accesses are being trapped through QEMU. Upon finding a device and platform in need of MSI-X relocation, the next problem is how to choose target PCI BAR to host the MSI-X data structures. A few key rules to keep in mind for this selection include: * There are only 6 BAR slots, bar0..bar5 * 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot * PCI BARs are always a power of 2 in size, extending == doubling * The maximum size of a 32-bit BAR is 2GB * MSI-X data structures must reside in an MMIO BAR Using these rules, we can evaluate each BAR of the second example device above as follows: bar0: I/O port BAR, incompatible with MSI-X tables bar1: BAR could be extended, incurring another 64KB of MMIO bar2: Unavailable, bar1 is 64-bit, this register is used by bar1 bar3: BAR could be extended, incurring another 256KB of MMIO bar4: Unavailable, bar3 is 64bit, this register is used by bar3 bar5: Available, empty BAR, minimum additional MMIO A secondary optimization we might wish to make in relocating MSI-X is to minimize the additional MMIO required for the device, therefore we might test the available choices in order of preference as bar5, bar1, and finally bar3. The original proposal for this feature included an 'auto' option which would choose bar5 in this case, but various drivers have been found that make assumptions about the properties of the "first" BAR or the size of BARs such that there appears to be no foolproof automatic selection available, requiring known good combinations to be sourced from users. This patch is pre-enabled for an 'auto' selection making use of a validated lookup table, but no entries are yet identified. Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> |
||
---|---|---|
accel | ||
audio | ||
backends | ||
block | ||
bsd-user | ||
capstone@22ead3e0bf | ||
chardev | ||
contrib | ||
crypto | ||
default-configs | ||
disas | ||
docs | ||
dtc@e54388015a | ||
fpu | ||
fsdev | ||
gdb-xml | ||
hw | ||
include | ||
io | ||
libdecnumber | ||
linux-headers | ||
linux-user | ||
migration | ||
nbd | ||
net | ||
pc-bios | ||
po | ||
qapi | ||
qga | ||
qobject | ||
qom | ||
replay | ||
roms | ||
scripts | ||
scsi | ||
slirp | ||
stubs | ||
target | ||
tcg | ||
tests | ||
trace | ||
ui | ||
util | ||
.dir-locals.el | ||
.editorconfig | ||
.exrc | ||
.gdbinit | ||
.gitignore | ||
.gitmodules | ||
.mailmap | ||
.shippable.yml | ||
.travis.yml | ||
CODING_STYLE | ||
COPYING | ||
COPYING.LIB | ||
COPYING.PYTHON | ||
Changelog | ||
HACKING | ||
LICENSE | ||
MAINTAINERS | ||
Makefile | ||
Makefile.objs | ||
Makefile.target | ||
README | ||
VERSION | ||
arch_init.c | ||
balloon.c | ||
block.c | ||
blockdev-nbd.c | ||
blockdev.c | ||
blockjob.c | ||
bootdevice.c | ||
bt-host.c | ||
bt-vhci.c | ||
configure | ||
cpus-common.c | ||
cpus.c | ||
device-hotplug.c | ||
device_tree.c | ||
disas.c | ||
dma-helpers.c | ||
dump.c | ||
exec.c | ||
gdbstub.c | ||
hmp-commands-info.hx | ||
hmp-commands.hx | ||
hmp.c | ||
hmp.h | ||
ioport.c | ||
iothread.c | ||
memory.c | ||
memory_ldst.inc.c | ||
memory_mapping.c | ||
module-common.c | ||
monitor.c | ||
numa.c | ||
os-posix.c | ||
os-win32.c | ||
qapi-schema.json | ||
qdev-monitor.c | ||
qdict-test-data.txt | ||
qemu-bridge-helper.c | ||
qemu-doc.texi | ||
qemu-ga.texi | ||
qemu-img-cmds.hx | ||
qemu-img.c | ||
qemu-img.texi | ||
qemu-io-cmds.c | ||
qemu-io.c | ||
qemu-keymap.c | ||
qemu-nbd.c | ||
qemu-nbd.texi | ||
qemu-option-trace.texi | ||
qemu-options-wrapper.h | ||
qemu-options.h | ||
qemu-options.hx | ||
qemu-seccomp.c | ||
qemu-tech.texi | ||
qemu.nsi | ||
qemu.sasl | ||
qmp.c | ||
qtest.c | ||
replication.c | ||
replication.h | ||
rules.mak | ||
thunk.c | ||
tpm.c | ||
trace-events | ||
version.rc | ||
vl.c |
README
QEMU README =========== QEMU is a generic and open source machine & userspace emulator and virtualizer. QEMU is capable of emulating a complete machine in software without any need for hardware virtualization support. By using dynamic translation, it achieves very good performance. QEMU can also integrate with the Xen and KVM hypervisors to provide emulated hardware while allowing the hypervisor to manage the CPU. With hypervisor support, QEMU can achieve near native performance for CPUs. When QEMU emulates CPUs directly it is capable of running operating systems made for one machine (e.g. an ARMv7 board) on a different machine (e.g. an x86_64 PC board). QEMU is also capable of providing userspace API virtualization for Linux and BSD kernel interfaces. This allows binaries compiled against one architecture ABI (e.g. the Linux PPC64 ABI) to be run on a host using a different architecture ABI (e.g. the Linux x86_64 ABI). This does not involve any hardware emulation, simply CPU and syscall emulation. QEMU aims to fit into a variety of use cases. It can be invoked directly by users wishing to have full control over its behaviour and settings. It also aims to facilitate integration into higher level management layers, by providing a stable command line interface and monitor API. It is commonly invoked indirectly via the libvirt library when using open source applications such as oVirt, OpenStack and virt-manager. QEMU as a whole is released under the GNU General Public License, version 2. For full licensing details, consult the LICENSE file. Building ======== QEMU is multi-platform software intended to be buildable on all modern Linux platforms, OS-X, Win32 (via the Mingw64 toolchain) and a variety of other UNIX targets. The simple steps to build QEMU are: mkdir build cd build ../configure make Additional information can also be found online via the QEMU website: https://qemu.org/Hosts/Linux https://qemu.org/Hosts/Mac https://qemu.org/Hosts/W32 Submitting patches ================== The QEMU source code is maintained under the GIT version control system. git clone git://git.qemu.org/qemu.git When submitting patches, the preferred approach is to use 'git format-patch' and/or 'git send-email' to format & send the mail to the qemu-devel@nongnu.org mailing list. All patches submitted must contain a 'Signed-off-by' line from the author. Patches should follow the guidelines set out in the HACKING and CODING_STYLE files. Additional information on submitting patches can be found online via the QEMU website https://qemu.org/Contribute/SubmitAPatch https://qemu.org/Contribute/TrivialPatches Bug reporting ============= The QEMU project uses Launchpad as its primary upstream bug tracker. Bugs found when running code built from QEMU git or upstream released sources should be reported via: https://bugs.launchpad.net/qemu/ If using QEMU via an operating system vendor pre-built binary package, it is preferable to report bugs to the vendor's own bug tracker first. If the bug is also known to affect latest upstream code, it can also be reported via launchpad. For additional information on bug reporting consult: https://qemu.org/Contribute/ReportABug Contact ======= The QEMU community can be contacted in a number of ways, with the two main methods being email and IRC - qemu-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/qemu-devel - #qemu on irc.oftc.net Information on additional methods of contacting the community can be found online via the QEMU website: https://qemu.org/Contribute/StartHere -- End