xemu/hw
Peter Xu 2cc9ddcceb intel_iommu: better handling of dmar state switch
QEMU is not handling the global DMAR switch well, especially when from
"on" to "off".

Let's first take the example of system reset.

Assuming that a guest has IOMMU enabled.  When it reboots, we will drop
all the existing DMAR mappings to handle the system reset, however we'll
still keep the existing memory layouts which has the IOMMU memory region
enabled.  So after the reboot and before the kernel reloads again, there
will be no mapping at all for the host device.  That's problematic since
any software (for example, SeaBIOS) that runs earlier than the kernel
after the reboot will assume the IOMMU is disabled, so any DMA from the
software will fail.

For example, a guest that boots on an assigned NVMe device might fail to
find the boot device after a system reboot/reset and we'll be able to
observe SeaBIOS errors if we capture the debugging log:

  WARNING - Timeout at nvme_wait:144!

Meanwhile, we should see DMAR errors on the host of that NVMe device.
It's the DMA fault that caused a NVMe driver timeout.

The correct fix should be that we do proper switching of device DMA
address spaces when system resets, which will setup correct memory
regions and notify the backend of the devices.  This might not affect
much on non-assigned devices since QEMU VT-d emulation will assume a
default passthrough mapping if DMAR is not enabled in the GCMD
register (please refer to vtd_iommu_translate).  However that's required
for an assigned devices, since that'll rebuild the correct GPA to HPA
mapping that is needed for any DMA operation during guest bootstrap.

Besides the system reset, we have some other places that might change
the global DMAR status and we'd better do the same thing there.  For
example, when we change the state of GCMD register, or the DMAR root
pointer.  Do the same refresh for all these places.  For these two
places we'll also need to explicitly invalidate the context entry cache
and iotlb cache.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1625173
CC: QEMU Stable <qemu-stable@nongnu.org>
Reported-by: Cong Li <coli@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
--
v2:
- do the same for GCMD write, or root pointer update [Alex]
- test is carried out by me this time, by observing the
  vtd_switch_address_space tracepoint after system reboot
v3:
- rewrite commit message as suggested by Alex
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05 13:24:02 -05:00
..
9pfs fsdev: Clean up error reporting in qemu_fsdev_add() 2018-10-19 14:51:34 +02:00
acpi pci, pc, virtio: fixes, features 2018-09-24 18:49:11 +01:00
adc Include qapi/error.h exactly where needed 2018-02-09 13:50:17 +01:00
alpha hw/alpha/typhoon: Remove unuseful code 2018-10-24 06:44:59 -03:00
arm hw/arm: versal: Add a virtual Xilinx Versal board 2018-11-02 14:11:31 +00:00
audio audio: use TYPE_MV88W8618_AUDIO instead of hardcoded string 2018-10-29 13:50:15 +01:00
block virtio-blk: fix comment for virtio_blk_rw_complete 2018-11-05 13:24:02 -05:00
bt hw/bt: Replace fprintf(stderr, "*\n" with error_report() 2018-01-22 09:51:00 +01:00
char hw/char: Implement nRF51 SoC UART 2018-11-02 14:03:33 +00:00
core Machine queue, 2018-10-25 2018-10-25 20:17:12 +01:00
cpu hw/cpu/a15mpcore: If CPU has EL2, enable it on the GIC and wire it up 2018-08-24 13:17:34 +01:00
cris hw/cris: Use the IEC binary prefix definitions 2018-07-02 15:41:15 +02:00
display vga: two fixes. 2018-10-29 12:59:15 +00:00
dma hw/dma/pl080: Remove hw_error() if DMA is enabled 2018-08-20 11:24:33 +01:00
gpio hw/i2c: Use DeviceClass::realize instead of I2CSlaveClass::init 2018-06-01 15:14:31 +02:00
hppa hw/hppa/dino: Remove unuseful code 2018-10-24 06:44:59 -03:00
hyperv hyperv_testdev: add SynIC message and event testmodes 2018-10-19 13:44:14 +02:00
i2c i2c: switch ddc to use the new edid generator 2018-10-15 09:57:33 +02:00
i386 intel_iommu: better handling of dmar state switch 2018-11-05 13:24:02 -05:00
ide replay: replay BH for IDE trim operation 2018-10-02 19:09:13 +02:00
input ps2: prevent changing irq state on save and load 2018-10-02 18:47:55 +02:00
intc target/arm: Move some system registers into a substructure 2018-10-24 07:50:16 +01:00
ipack hw/ipack: Use the IEC binary prefix definitions 2018-07-02 15:41:12 +02:00
ipmi ipmi: Use proper struct reference for BT vmstate 2018-08-23 18:46:25 +02:00
isa configs: Add a CONFIG_SMC37C669 switch for the "smc37c669-superio" device 2018-10-24 07:33:44 +01:00
lm32 hw/lm32: Use the IEC binary prefix definitions 2018-07-02 15:41:15 +02:00
m68k hw/m68k: Use the IEC binary prefix definitions 2018-07-02 15:41:14 +02:00
mem memory-device: trace when pre_plugging/plugging/unplugging 2018-10-24 06:44:59 -03:00
microblaze hw/microblaze/xlnx-zynqmp-pmu: Fix introspection problem in 'xlnx, zynqmp-pmu-soc' 2018-07-23 15:21:25 +01:00
mips hw/mips/malta: Remove unuseful code 2018-10-24 06:44:59 -03:00
misc Error reporting patches for 2018-10-22 2018-10-23 17:20:23 +01:00
moxie change get_image_size return type to int64_t 2018-10-02 19:08:49 +02:00
net QEMU trivial patches collected between June and October 2018 2018-10-30 15:49:55 +00:00
nios2 hw/nios2: Use the IEC binary prefix definitions 2018-07-02 15:41:15 +02:00
nvram ppc: move at24c to its own CONFIG_ symbol 2018-10-30 09:12:09 +01:00
openrisc Change references to serial_hds[] to serial_hd() 2018-04-26 13:57:00 +01:00
pci qmp, hmp: make subsystem/system-vendor identities optional 2018-10-11 19:58:26 +01:00
pci-bridge hw/pci: add PCI resource reserve capability to legacy PCI bridge 2018-09-07 17:05:18 -04:00
pci-host QEMU trivial patches collected between June and October 2018 2018-10-30 15:49:55 +00:00
pcmcia hw: Clean up includes 2016-01-29 15:07:25 +00:00
ppc memory-device: add and use memory_device_get_region_size() 2018-10-24 06:44:59 -03:00
rdma config: split PVRDMA from RDMA 2018-08-18 18:01:34 +03:00
riscv RISC-V: Don't add NULL bootargs to device-tree 2018-10-17 13:02:30 -07:00
s390x hw/s390x: Include the tod-qemu also for builds with --disable-tcg 2018-10-12 11:32:19 +02:00
scsi scsi-disk: fix rerror/werror=ignore 2018-10-19 13:44:13 +02:00
sd ssi-sd: Make devices picking up backends unavailable with -device 2018-10-24 07:50:16 +01:00
sh4 hw/sh4/sh_pci: Use DeviceState::realize rather than SysBusDevice::init 2018-10-24 06:44:59 -03:00
smbios smbios: Clean up error handling in smbios_add() 2018-10-19 14:51:34 +02:00
sparc sun4m: don't use legacy fw_cfg_init_mem() function 2018-08-20 19:18:31 +01:00
sparc64 hw/sparc64/niagara: Model the I/O Bridge with the 'unimplemented_device' 2018-10-24 06:44:59 -03:00
ssi hw/ssi/xilinx_spi: Use DeviceState::realize rather than SysBusDevice::init 2018-10-24 06:44:59 -03:00
timer hw/timer/sun4v-rtc: Use DeviceState::realize rather than SysBusDevice::init 2018-10-24 06:44:59 -03:00
tpm tpm: Zero-init structure to avoid uninitialized variables in valgrind log 2018-10-30 17:34:22 -04:00
tricore hw/tricore: Use the IEC binary prefix definitions 2018-07-02 15:41:14 +02:00
unicore32 hw/input/i8042: Extract declarations from i386/pc.h into input/i8042.h 2018-03-12 16:12:48 +01:00
usb usb: fixes for ohci and smart card emulation. 2018-10-30 13:32:38 +00:00
vfio vfio: Clean up error reporting after previous commit 2018-10-19 14:51:34 +02:00
virtio Error reporting patches for 2018-10-22 2018-10-23 17:20:23 +01:00
watchdog qapi: Drop qapi_event_send_FOO()'s Error ** argument 2018-08-28 18:21:38 +02:00
xen xen: Use the PCI_DEVICE macro 2018-10-26 17:17:32 +02:00
xenpv hw/xen: Use the IEC binary prefix definitions 2018-07-02 15:41:13 +02:00
xtensa hw/xtensa: Use the IEC binary prefix definitions 2018-07-02 15:41:14 +02:00
Makefile.objs memory-device: introduce separate config option 2018-10-24 06:44:59 -03:00