ShuriZma/xemu - xemu

Commit Graph

Author	SHA1	Message	Date
Jagannathan Raman	7ee3f82384	multi-process: PCI BAR read/write handling for proxy & remote endpoints Proxy device object implements handler for PCI BAR writes and reads. The handler uses BAR_WRITE/BAR_READ message to communicate to the remote process with the BAR address and value to be written/read. The remote process implements handler for BAR_WRITE/BAR_READ message. Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: a8b76714a9688be5552c4c92d089bc9e8a4707ff.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-02-10 09:23:28 +00:00
Elena Ufimtseva	11ab872588	multi-process: Forward PCI config space acceses to the remote process The Proxy Object sends the PCI config space accesses as messages to the remote process over the communication channel Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: d3c94f4618813234655356c60e6f0d0362ff42d6.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-02-10 09:23:28 +00:00
Elena Ufimtseva	e7b2c9eaa2	multi-process: add proxy communication functions Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: d54edb4176361eed86b903e8f27058363b6c83b3.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-02-10 09:23:28 +00:00
Elena Ufimtseva	9f8112073a	multi-process: introduce proxy object Defines a PCI Device proxy object as a child of TYPE_PCI_DEVICE. Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: b5186ebfedf8e557044d09a768846c59230ad3a7.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-02-10 09:23:28 +00:00
Jagannathan Raman	ed5d001916	multi-process: setup memory manager for remote device SyncSysMemMsg message format is defined. It is used to send file descriptors of the RAM regions to remote device. RAM on the remote device is configured with a set of file descriptors. Old RAM regions are deleted and new regions, each with an fd, is added to the RAM. Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 7d2d1831d812e85f681e7a8ab99e032cf4704689.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-02-10 09:23:28 +00:00
Jagannathan Raman	c7d80c7c1d	multi-process: Associate fd of a PCIDevice with its object Associate the file descriptor for a PCIDevice in remote process with DeviceState object. Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: f405a2ed5d7518b87bea7c59cfdf334d67e5ee51.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-02-10 09:23:28 +00:00
Jagannathan Raman	48b06f50d8	multi-process: Initialize message handler in remote device Initializes the message handler function in the remote process. It is called whenever there's an event pending on QIOChannel that registers this function. Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 99d38d8b93753a6409ac2340e858858cda59ab1b.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-02-10 09:23:28 +00:00
Elena Ufimtseva	ad22c3088b	multi-process: define MPQemuMsg format and transmission functions Defines MPQemuMsg, which is the message that is sent to the remote process. This message is sent over QIOChannel and is used to command the remote process to perform various tasks. Define transmission functions used by proxy and by remote. Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 56ca8bcf95195b2b195b08f6b9565b6d7410bce5.1611938319.git.jag.raman@oracle.com [Replace struct iovec send[2] = {0} with {} to make clang happy as suggested by Peter Maydell <peter.maydell@linaro.org>. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-02-10 09:23:28 +00:00
Jagannathan Raman	3f0e7e57a3	multi-process: setup a machine object for remote device process x-remote-machine object sets up various subsystems of the remote device process. Instantiate PCI host bridge object and initialize RAM, IO & PCI memory regions. Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: c537f38d17f90453ca610c6b70cf3480274e0ba1.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-02-10 09:23:28 +00:00
Jagannathan Raman	6fbd84d632	multi-process: setup PCI host bridge for remote device PCI host bridge is setup for the remote device process. It is implemented using remote-pcihost object. It is an extension of the PCI host bridge setup by QEMU. Remote-pcihost configures a PCI bus which could be used by the remote PCI device to latch on to. Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 0871ba857abb2eafacde07e7fe66a3f12415bfb2.1611938319.git.jag.raman@oracle.com [Added PCI_EXPRESS condition in hw/remote/Kconfig since remote-pcihost needs PCIe. This solves "make check" failure on s390x. Fix suggested by Philippe Mathieu-Daudé <philmd@redhat.com> and Thomas Huth <thuth@redhat.com>. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-02-10 09:23:22 +00:00
Bin Meng	ce8e43760e	hw/net: fsl_etsec: Reverse the RCTRL.RSF logic Per MPC8548ERM [1] chapter 14.5.3.4.1: When RCTRL.RSF is 1, frames less than 64 bytes are accepted upon a DA match. But currently QEMU does the opposite. This commit reverses the RCTRL.RSF testing logic to match the manual. Due to the reverse of the logic, certain guests may potentially break if they don't program eTSEC to have RCTRL.RSF bit set. When RCTRL.RSF is 0, short frames are silently dropped, however as of today both slirp and tap networking do not pad short frames (e.g.: an ARP packet) to the minimum frame size of 60 bytes. So ARP requests will be dropped, preventing the guest from becoming visible on the network. The same issue was reported on e1000 and vmxenet3 before, see: commit `78aeb23ede` ("e1000: Pad short frames to minimum size (60 bytes)") commit `40a87c6c9b` ("vmxnet3: Pad short frames to minimum size (60 bytes)") [1] https://www.nxp.com/docs/en/reference-manual/MPC8548ERM.pdf Fixes: `eb1e7c3e51` ("Add Enhanced Three-Speed Ethernet Controller (eTSEC)") Signed-off-by: Bin Meng <bin.meng@windriver.com> Message-Id: <1612923021-19746-1-git-send-email-bmeng.cn@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 14:50:11 +11:00
Bin Meng	11dbcc70c6	hw/ppc: e500: Fill in correct <clock-frequency> for the serial nodes At present the <clock-frequency> property of the serial node is populated with value zero. U-Boot's ns16550 driver is not happy about this, so let's fill in a meaningful value. Signed-off-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1612362288-22216-2-git-send-email-bmeng.cn@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 14:50:11 +11:00
Bin Meng	0c36ab7114	hw/ppc: e500: Use a macro for the platform clock frequency At present the platform clock frequency is using a magic number. Convert it to a macro and use it everywhere. Signed-off-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1612362288-22216-1-git-send-email-bmeng.cn@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 14:50:11 +11:00
Cédric Le Goater	dd7ef911b3	ppc/pnv: Set default RAM size to 1 GB The memory layout of the PowerNV machine is defined as : #define KERNEL_LOAD_BASE ((void )0x20000000) #define KERNEL_LOAD_SIZE 0x08000000 #define INITRAMFS_LOAD_BASE KERNEL_LOAD_BASE + KERNEL_LOAD_SIZE #define INITRAMFS_LOAD_SIZE 0x08000000 #define SKIBOOT_BASE 0x30000000 #define SKIBOOT_SIZE 0x01c10000 #define CPU_STACKS_BASE (SKIBOOT_BASE + SKIBOOT_SIZE) #define STACK_SHIFT 15 #define STACK_SIZE (1 << STACK_SHIFT) The overall size of the CPU stacks is (max PIR + 1) 32K and the machine easily reaches 800MB of minimum required RAM. Any value below will result in a skiboot crash : [ 0.034949905,3] MEM: Partial overlap detected between regions: [ 0.034959039,3] MEM: ibm,firmware-stacks [0x31c10000-0x3a450000] (new) [ 0.034968576,3] MEM: ibm,firmware-allocs-memory@0 [0x31c10000-0x38400000] [ 0.034980367,3] Out of memory adding skiboot reserved areas [ 0.035074945,3] *********************************************** [ 0.035093627,3] < assert failed at core/mem_region.c:1129 > [ 0.035104247,3] . [ 0.035108025,3] . [ 0.035111651,3] . [ 0.035115231,3] OO__) [ 0.035119198,3] <"__/ [ 0.035122980,3] ^ ^ Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210129111719.790692-1-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 14:50:11 +11:00
Daniel Henrique Barboza	b01fec3659	spapr_numa.c: fix ibm,max-associativity-domains calculation The current logic for calculating 'maxdomain' making it a sum of numa_state->num_nodes with spapr->gpu_numa_id. spapr->gpu_numa_id is used as a index to determine the next available NUMA id that a given NVGPU can use. The problem is that the initial value of gpu_numa_id, for any topology that has more than one NUMA node, is equal to numa_state->num_nodes. This means that our maxdomain will always be, at least, twice the amount of existing NUMA nodes. This means that a guest with 4 NUMA nodes will end up with the following max-associativity-domains: rtas/ibm,max-associativity-domains 00000004 00000008 00000008 00000008 00000008 This overtuning of maxdomains doesn't go unnoticed in the guest, being detected in SLUB during boot: dmesg \| grep SLUB [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=8 SLUB is detecting 8 total nodes, with 4 nodes being online. This patch fixes ibm,max-associativity-domains by considering the amount of NVGPUs NUMA nodes presented in the guest, instead of just spapr->gpu_numa_id. Reported-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210128174213.1349181-4-danielhb413@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:50 +11:00
Daniel Henrique Barboza	6640706972	spapr_numa.c: create spapr_numa_initial_nvgpu_numa_id() helper We'll need to check the initial value given to spapr->gpu_numa_id when building the rtas DT, so put it in a helper for easier access and to avoid repetition. Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210128174213.1349181-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:50 +11:00
Daniel Henrique Barboza	3b880445e6	spapr: move spapr_machine_using_legacy_numa() to spapr_numa.c This function is used only in spapr_numa.c. Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210128174213.1349181-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:50 +11:00
Cédric Le Goater	032c226bc6	ppc/pnv: Introduce a LPC FW memory region attribute to map the PNOR This to map the PNOR from the machine init handler directly and finish the cleanup of the LPC model. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210126171059.307867-8-clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:50 +11:00
Cédric Le Goater	8304ab7905	ppc/pnv: Remove default disablement of the PNOR contents On PowerNV systems, the BMC is in charge of mapping the PNOR contents on the LPC FW address space using the HIOMAP protocol. Under QEMU, we emulate this behavior and we also add an extra control on the flash accesses by letting the HIOMAP command handler decide whether the memory region is accessible or not depending on the firmware requests. However, this behavior is not compatible with hostboot like firmwares which need this mapping to be always available. For this reason, the PNOR memory region is initially disabled for skiboot mode only. This is badly placed under the LPC model and requires the use of the machine. Since it doesn't add much, simply remove the initial setting. The extra control in the HIOMAP command handler will still be performed. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210126171059.307867-7-clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:50 +11:00
Cédric Le Goater	50ae2452b5	ppc/pnv: Discard internal BMC initialization when BMC is external The PowerNV machine can be run with an external IPMI BMC device connected to a remote QEMU machine acting as BMC, using these options : -chardev socket,id=ipmi0,host=localhost,port=9002,reconnect=10 \ -device ipmi-bmc-extern,id=bmc0,chardev=ipmi0 \ -device isa-ipmi-bt,bmc=bmc0,irq=10 \ -nodefaults In that case, some aspects of the BMC initialization should be skipped, since they rely on the simulator interface. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210126171059.307867-6-clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:50 +11:00
Cédric Le Goater	60ef80101e	ppc/pnv: Simplify pnv_bmc_create() and reuse pnv_bmc_set_pnor() to share the setting of the PNOR. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210126171059.307867-5-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:50 +11:00
Cédric Le Goater	05ce9b73b8	ppc/pnv: Use skiboot addresses to load kernel and ramfs The current settings are useful to load large kernels (with debug) but it moves the initrd image in a memory region not protected by skiboot. If skiboot is compiled with DEBUG=1, memory poisoning will corrupt the initrd. Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210126171059.307867-4-clg@kaod.org> Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:50 +11:00
Cédric Le Goater	cb9428642e	ppc/xive: Add firmware bit when dumping the ENDs ENDs allocated by OPAL for the HW thread VPs are tagged as owned by FW. Dump the state in 'info pic'. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210126171059.307867-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:50 +11:00
Cédric Le Goater	2cfc9f1a96	ppc/pnv: Add trace events for PCI event notification On POWER9 systems, PHB controllers signal the XIVE interrupt controller of a source interrupt notification using a store on a MMIO region. Add traces for such events. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210126171059.307867-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:50 +11:00
Greg Kurz	040bdafce1	spapr: Adjust firmware path of PCI devices It is currently not possible to perform a strict boot from USB storage: $ qemu-system-ppc64 -accel kvm -nodefaults -nographic -serial stdio \ -boot strict=on \ -device qemu-xhci \ -device usb-storage,drive=disk,bootindex=0 \ -blockdev driver=file,node-name=disk,filename=fedora-ppc64le.qcow2 SLOF ********************************************************************** QEMU Starting Build Date = Jul 17 2020 11:15:24 FW Version = git-e18ddad8516ff2cf Press "s" to enter Open Firmware. Populating /vdevice methods Populating /vdevice/vty@71000000 Populating /vdevice/nvram@71000001 Populating /pci@800000020000000 00 0000 (D) : 1b36 000d serial bus [ usb-xhci ] No NVRAM common partition, re-initializing... Scanning USB XHCI: Initializing USB Storage SCSI: Looking for devices 101000000000000 DISK : "QEMU QEMU HARDDISK 2.5+" Using default console: /vdevice/vty@71000000 Welcome to Open Firmware Copyright (c) 2004, 2017 IBM Corporation All rights reserved. This program and the accompanying materials are made available under the terms of the BSD License available at http://www.opensource.org/licenses/bsd-license.php Trying to load: from: /pci@800000020000000/usb@0/storage@1/disk@101000000000000 ... E3405: No such device E3407: Load failed Type 'boot' and press return to continue booting the system. Type 'reset-all' and press return to reboot the system. Ready! 0 > The device tree handed over by QEMU to SLOF indeed contains: qemu,boot-list = "/pci@800000020000000/usb@0/storage@1/disk@101000000000000 HALT"; but the device node is named usb-xhci@0, not usb@0. This happens because the firmware names of PCI devices returned by get_boot_devices_list() come from pcibus_get_fw_dev_path(), while the sPAPR PHB code uses a different naming scheme for device nodes. This inconsistency has always been there but it was hidden for a long time because SLOF used to rename USB device nodes, until this commit, merged in QEMU 4.2.0 : commit `85164ad4ed` Author: Alexey Kardashevskiy <aik@ozlabs.ru> Date: Wed Sep 11 16:24:32 2019 +1000 pseries: Update SLOF firmware image This fixes USB host bus adapter name in the device tree to match QEMU's one. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Fortunately, sPAPR implements the firmware path provider interface. This provides a way to override the default firmware paths. Just factor out the sPAPR PHB naming logic from spapr_dt_pci_device() to a helper, and use it in the sPAPR firmware path provider hook. Fixes: `85164ad4ed` ("pseries: Update SLOF firmware image") Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210122170157.246374-1-groug@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:50 +11:00
Daniel Henrique Barboza	a85bb34e1c	spapr.c: add 'name' property for hotplugged CPUs nodes In the CPU hotunplug bug [1] the guest kernel throws a scary message in dmesg: pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16 The reason isn't related to the bug though. This happens because the kernel file arch/powerpc/platform/pseries/hotplug-cpu.c, function dlpar_cpu_remove(), is not finding the device_node.name of the offending CPU. We're not populating the 'name' property for hotplugged CPUs. Since the kernel relies on device_node.name for identifying CPU nodes, and the CPUs that are coldplugged has the 'name' property filled by SLOF, this is creating an unneeded inconsistency between hotplug and coldplug CPUs in the kernel. Let's fill the 'name' property for hotplugged CPUs as well. This will make the guest dmesg throws a less intimidating message when we try to unplug the last online CPU: pseries-hotplug-cpu: Failed to offline CPU PowerPC,POWER9@1, rc: -16 [1] https://bugzilla.redhat.com/1911414 Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210120232305.241521-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:49 +11:00
Daniel Henrique Barboza	7265bc3e54	spapr.c: use g_auto* with 'nodename' in CPU DT functions Next patch will use the 'nodename' string in spapr_core_dt_populate() after the point it's being freed today. Instead of moving 'g_free(nodename)' around, let's do a QoL change in both CPU DT functions where 'nodename' is being freed, and use g_autofree to avoid the 'g_free()' call altogether. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210120232305.241521-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:49 +11:00
Jagannathan Raman	3090de695b	multi-process: Add config option for multi-process QEMU Add configuration options to enable or disable multiprocess QEMU code Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 6cc37253e35418ebd7b675a31a3df6e3c7a12dc1.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-02-09 20:53:56 +00:00
Jagannathan Raman	44a4ff31c0	memory: alloc RAM from file at offset Allow RAM MemoryRegion to be created from an offset in a file, instead of allocating at offset of 0 by default. This is needed to synchronize RAM between QEMU & remote process. Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 609996697ad8617e3b01df38accc5c208c24d74e.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-02-09 20:53:56 +00:00
Peter Maydell	1214d55d1c	Emulated NVMe device updates * deallocate or unwritten logical block error feature (me) * dataset management command (me) * compare command (Gollu Appalanaidu) * namespace types (Niklas Cassel) * zoned namespaces (Dmitry Fomichev) * smart critical warning toggle (Zhenwei Pi) * allow cmb and pmr to coexist (me) * pmr rds/wds support (Naveen Nagar) * cmb v1.4 logic (Padmakar Kalghatgi) And a lot of smaller fixes from Gollu Appalanaidu and Minwoo Im. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmAiON4ACgkQTeGvMW1P DenhxQf/WzoiO5bXmvZeRv+IHoIn5GhJ1NxLRYO5MX0diswh0BBwUmIscbEDMe1b GsD6rpd9YfEO80L/sEGqgV09HT+6e8YwDsNNZBTAsUNRVx0WxckgcGWcrzNNA9Nc WEl0Q8si5USSQ1C7djplLXdR6p4pbA/gIk6AjNIo3q2VK1ZqCBhQGESGEfGgrAXW xWo8C1V8dnKdxUYI2blbti44sElHZJ6jcF5N3Xmv0UUa1WL0hh0u6qr7IbCZe1kO SUFWMIGLF+1C35MUyWpgCjCn5cUdnTA0s/SLEuWtDlNYRhRRh0D6LZviVTZi38Wx 6Cxg/bRkSlcKo1/jswwYcAaH7qQ4Eg== =NB9D -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/nvme/tags/nvme-next-pull-request' into staging Emulated NVMe device updates * deallocate or unwritten logical block error feature (me) * dataset management command (me) * compare command (Gollu Appalanaidu) * namespace types (Niklas Cassel) * zoned namespaces (Dmitry Fomichev) * smart critical warning toggle (Zhenwei Pi) * allow cmb and pmr to coexist (me) * pmr rds/wds support (Naveen Nagar) * cmb v1.4 logic (Padmakar Kalghatgi) And a lot of smaller fixes from Gollu Appalanaidu and Minwoo Im. # gpg: Signature made Tue 09 Feb 2021 07:25:18 GMT # gpg: using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9 # gpg: Good signature from "Klaus Jensen <its@irrelevant.dk>" [unknown] # gpg: aka "Klaus Jensen <k.jensen@samsung.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468 4272 63D5 6FC5 E55D A838 # Subkey fingerprint: 5228 33AA 75E2 DCE6 A247 66C0 4DE1 AF31 6D4F 0DE9 * remotes/nvme/tags/nvme-next-pull-request: (56 commits) hw/block/nvme: refactor the logic for zone write checks hw/block/nvme: fix zone boundary check for append hw/block/nvme: fix wrong parameter name 'cross_read' hw/block/nvme: align with existing style hw/block/nvme: fix set feature save field check hw/block/nvme: fix set feature for error recovery hw/block/nvme: error if drive less than a zone size hw/block/nvme: lift cmb restrictions hw/block/nvme: bump to v1.4 hw/block/nvme: move cmb logic to v1.4 hw/block/nvme: add PMR RDS/WDS support hw/block/nvme: disable PMR at boot up hw/block/nvme: remove redundant zeroing of PMR registers hw/block/nvme: rename PMR/CMB shift/mask fields hw/block/nvme: allow cmb and pmr to coexist hw/block/nvme: move msix table and pba to BAR 0 hw/block/nvme: indicate CMB support through controller capabilities register hw/block/nvme: fix 64 bit register hi/lo split writes hw/block/nvme: add size to mmio read/write trace events hw/block/nvme: trigger async event during injecting smart warning ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-02-09 13:24:37 +00:00
Peter Maydell	41d306ec7d	* Fuzzing improvements (Qiuhao, Alexander) * i386: Fix BMI decoding for instructions with the 0x66 prefix (David) * initial attempt at fixing event_notifier emulation (Maxim) * i386: PKS emulation, fix for "qemu-system-i386 -cpu host" (myself) * meson: RBD test fixes (myself) * meson: TCI warnings (Philippe) * Leaner build for --disable-guest-agent, --disable-system and --disable-tools (Philippe, Stefan) * --enable-tcg-interpreter fix (Richard) * i386: SVM feature bits (Wei) * KVM bugfix (Thomas H.) * Add missing MemoryRegionOps callbacks (PJP) -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmAhR4cUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOviAf/Ymk/KwHZKySVqHOjDZnvP5PXMAru p1zlMRLAorXK+CTbshkIliaQyD8ggzT4HCinJ2NisdfTWMmlWbgr8gahNqKyZ5UG HlL28va3dvGhelswh/CNso1ZhVb2Q+aAYn/c6LXQva2r0xi26ohJTkIkSCPP/bnI +73dGzwAilBOsBVbn4cCm/70XtwDpPkw41IZIDoy/4lhL8ZdpHMz8oOjNIlOdlcU aEDfM8vYE4C70OtUlRZ1OwVxzcjS1Bf6dQYcpg5gAKy/jAAqR+v2PStxXiUuj5D3 cAzd03Goh78Wcre+CbWxDKGcGtiooUT+J09wmvDPYVUHcpQMbumf4MufrQ== =INB5 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging * Fuzzing improvements (Qiuhao, Alexander) * i386: Fix BMI decoding for instructions with the 0x66 prefix (David) * initial attempt at fixing event_notifier emulation (Maxim) * i386: PKS emulation, fix for "qemu-system-i386 -cpu host" (myself) * meson: RBD test fixes (myself) * meson: TCI warnings (Philippe) * Leaner build for --disable-guest-agent, --disable-system and --disable-tools (Philippe, Stefan) * --enable-tcg-interpreter fix (Richard) * i386: SVM feature bits (Wei) * KVM bugfix (Thomas H.) * Add missing MemoryRegionOps callbacks (PJP) # gpg: Signature made Mon 08 Feb 2021 14:15:35 GMT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (46 commits) target/i386: Expose VMX entry/exit load pkrs control bits target/i386: Add support for save/load IA32_PKRS MSR imx7-ccm: add digprog mmio write method tz-ppc: add dummy read/write methods spapr_pci: add spapr msi read method nvram: add nrf51_soc flash read method prep: add ppc-parity write method vfio: add quirk device write method pci-host: designware: add pcie-msi read method hw/pci-host: add pci-intack write method cpu-throttle: Remove timer_mod() from cpu_throttle_set() replay: rng-builtin support pc-bios/descriptors: fix paths in json files replay: fix replay of the interrupts accel/kvm/kvm-all: Fix wrong return code handling in dirty log code qapi/meson: Restrict UI module to system emulation and tools qapi/meson: Restrict system-mode specific modules qapi/meson: Remove QMP from user-mode emulation qapi/meson: Restrict qdev code to system-mode emulation meson: Restrict emulation code ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-02-09 10:04:51 +00:00
Klaus Jensen	3e22762edc	hw/block/nvme: refactor the logic for zone write checks Refactor the zone write check logic such that the most "meaningful" error is returned first. That is, first, if the zone is not writable, return an appropriate status code for that. Then, make sure we are actually writing at the write pointer and finally check that we do not cross the zone write boundary. This aligns with the "priority" of status codes for zone read checks. Also add a couple of additional descriptive trace events and remove an always true assert. Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com> Tested-by: Niklas Cassel <niklas.cassel@wdc.com> Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:54 +01:00
Klaus Jensen	a679dc3efd	hw/block/nvme: fix zone boundary check for append When a zone append is processed the controller checks that validity of the write before assigning the LBA to the append command. This causes the boundary check to be wrong. Fix this by checking the write after assigning the LBA. Remove the append special case from the nvme_check_zone_write and open code it in nvme_do_write, assigning the slba when basic sanity checks have been performed. Then check the validity of the resulting write like any other write command. In the process, also fix a missing endianness conversion for the zone append ALBA. Reported-by: Niklas Cassel <Niklas.Cassel@wdc.com> Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com> Tested-by: Niklas Cassel <niklas.cassel@wdc.com> Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:54 +01:00
Minwoo Im	74cbbf3031	hw/block/nvme: fix wrong parameter name 'cross_read' The actual parameter name is 'cross_read' rather than 'cross_zone_read'. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:54 +01:00
Gollu Appalanaidu	74eb89219e	hw/block/nvme: align with existing style Change status checks to align with the existing style and remove the explicit check against NVME_SUCCESS. Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:54 +01:00
Gollu Appalanaidu	0065f42ef1	hw/block/nvme: fix set feature save field check Currently, no features are saveable, so the current check is not wrong, but add a check against the feature capabilities to make sure this will not regress if saveable features are added later. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:54 +01:00
Gollu Appalanaidu	56990c777a	hw/block/nvme: fix set feature for error recovery Only enable DULBE if the namespace supports it. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:54 +01:00
Minwoo Im	044f1876b0	hw/block/nvme: error if drive less than a zone size If a user assigns a backing device with less capacity than the size of a single zone, the namespace capacity will be reported as zero and the kernel will silently fail to allocate the namespace. This patch errors out in case that the backing device cannot accomodate at least a single zone. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> [k.jensen: small fixup in the error and commit message] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:54 +01:00
Klaus Jensen	38001f7340	hw/block/nvme: lift cmb restrictions The controller now implements v1.4 and we can lift the restrictions on CMB Data Pointer and Command Independent Locations Support (CDPCILS) and CMB Data Pointer Mixed Locations Support (CDPMLS) since the device really does not care about mixed host/cmb pointers in those cases. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:54 +01:00
Klaus Jensen	c2a3640de8	hw/block/nvme: bump to v1.4 With the new CMB logic in place, bump the implemented specification version to v1.4 by default. This requires adding the setting the CNTRLTYPE field and modifying the VWC field since 0x00 is no longer a valid value for bits 2:1. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:54 +01:00
Padmakar Kalghatgi	f4319477b4	hw/block/nvme: move cmb logic to v1.4 Implement v1.4 logic for configuring the Controller Memory Buffer. By default, the v1.4 scheme will be used (CMB must be explicitly enabled by the host), so drivers that only support v1.3 will not be able to use the CMB anymore. To retain the v1.3 behavior, set the boolean 'legacy-cmb' nvme device parameter. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Padmakar Kalghatgi <p.kalghatgi@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:54 +01:00
Naveen Nagar	7ec9f2eef9	hw/block/nvme: add PMR RDS/WDS support Add support for the PMRMSCL and PMRMSCU MMIO registers. This allows adding RDS/WDS support for PMR as well. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Naveen Nagar <naveen.n1@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Klaus Jensen	75c3c9de96	hw/block/nvme: disable PMR at boot up The PMR should not be enabled at boot up. Disable the PMR MemoryRegion initially and implement MMIO for PMRCTL, allowing the host to enable the PMR explicitly. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Klaus Jensen	b78b9bb0ee	hw/block/nvme: remove redundant zeroing of PMR registers The controller registers are initially zero. Remove the redundant zeroing. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Klaus Jensen	8e9e8b4821	hw/block/nvme: rename PMR/CMB shift/mask fields Use the correct field names. Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Klaus Jensen	709cc8fc68	hw/block/nvme: allow cmb and pmr to coexist With BAR 4 now free to use, allow PMR and CMB to be enabled simultaneously. Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Klaus Jensen	1901b4967c	hw/block/nvme: move msix table and pba to BAR 0 In the interest of supporting both CMB and PMR to be enabled on the same device, move the MSI-X table and pending bit array out of BAR 4 and into BAR 0. This is a simplified version of the patch contributed by Andrzej Jakowski (see [1]). Leaving the CMB at offset 0 removes the need for changes to CMB address mapping code. [1]: https://lore.kernel.org/qemu-devel/20200729220107.37758-3-andrzej.jakowski@linux.intel.com/ Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Tested-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Andrzej Jakowski	c705063129	hw/block/nvme: indicate CMB support through controller capabilities register This patch sets CMBS bit in controller capabilities register when user configures NVMe driver with CMB support, so capabilites are correctly reported to guest OS. Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com> Reviewed-by: Maxim Levitsky <mlevitsky@gmail.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Klaus Jensen	0d3d5da2cc	hw/block/nvme: fix 64 bit register hi/lo split writes 64 bit registers like ASQ and ACQ should be writable by both a hi/lo 32 bit write combination as well as a plain 64 bit write. The spec does not define ordering on the hi/lo split, but the code currently assumes that the low order bits are written first. Additionally, the code does not consider that another address might already have been written into the register, causing the OR'ing to result in a bad address. Fix this by explicitly overwriting only the low or high order bits for 32 bit writes. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-02-08 21:15:53 +01:00
Klaus Jensen	ffacaf0908	hw/block/nvme: add size to mmio read/write trace events Add the size of the mmio read/write to the trace event. Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
zhenwei pi	c62720f137	hw/block/nvme: trigger async event during injecting smart warning During smart critical warning injection by setting property from QMP command, also try to trigger asynchronous event. Suggested by Keith, if a event has already been raised, there is no need to enqueue the duplicate event any more. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> [k.jensen: fix typo in commit message] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
zhenwei pi	4714791b66	hw/block/nvme: add smart_critical_warning property There is a very low probability that hitting physical NVMe disk hardware critical warning case, it's hard to write & test a monitor agent service. For debugging purposes, add a new 'smart_critical_warning' property to emulate this situation. The orignal version of this change is implemented by adding a fixed property which could be initialized by QEMU command line. Suggested by Philippe & Klaus, rework like current version. Test with this patch: 1, change smart_critical_warning property for a running VM: #virsh qemu-monitor-command nvme-upstream '{ "execute": "qom-set", "arguments": { "path": "/machine/peripheral-anon/device[0]", "property": "smart_critical_warning", "value":16 } }' 2, run smartctl in guest #smartctl -H -l error /dev/nvme0n1 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: FAILED! - volatile memory backup device has failed Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Klaus Jensen	635b23ad43	hw/block/nvme: fix zone write finalize The zone write pointer is unconditionally advanced, even for write faults. Make sure that the zone is always transitioned to Full if the write pointer reaches zone capacity. Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Minwoo Im	24ec776a5a	hw/block/nvme: remove unused argument in nvme_ns_setup nvme_ns_setup() finally does not have nothing to do with NvmeCtrl instance. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Minwoo Im	15d024d4aa	hw/block/nvme: split setup and register for namespace In NVMe, namespace is being attached to process I/O. We register NVMe namespace to a controller via nvme_register_namespace() during nvme_ns_setup(). This is main reason of receiving NvmeCtrl object instance to this function to map the namespace to a controller. To make namespace instance more independent, it should be split into two parts: setup and register. This patch split them into two differnt parts, and finally nvme_ns_setup() does not have nothing to do with NvmeCtrl instance at all. This patch is a former patch to introduce NVMe subsystem scheme to the existing design especially for multi-path. In that case, it should be split into two to make namespace independent from a controller. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Minwoo Im	337ccd7650	hw/block/nvme: remove unused argument in nvme_ns_init_blk Removed no longer used aregument NvmeCtrl object in nvme_ns_init_blk(). Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Minwoo Im	aa5e55e3b0	hw/block/nvme: open code for volatile write cache Volatile Write Cache(VWC) feature is set in nvme_ns_setup() in the initial time. This feature is related to block device backed, but this feature is controlled in controller level via Set/Get Features command. This patch removed dependency between nvme and nvme-ns to manage the VWC flag value. Also, it open coded the Get Features for VWC to check all namespaces attached to the controller, and if false detected, return directly false. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> [k.jensen: report write cache preset if present on ANY namespace] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Minwoo Im	1490be5a8a	hw/block/nvme: remove unused argument in nvme_ns_init_zoned nvme_ns_init_zoned() has no use for given NvmeCtrl object. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Dmitry Fomichev	add961300c	hw/block/nvme: Correct error status for unaligned ZA TP 4053 says (in section 2.3.1.1) - ... if a Zone Append command specifies a ZSLBA that is not the lowest logical block address in that zone, then the controller shall abort that command with a status code of Invalid Field In Command. In the code, Zone Invalid Write is returned instead, fix this. Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:53 +01:00
Klaus Jensen	521ea778b2	hw/block/nvme: remove unnecessary check for append nvme_io_cmd already checks if the namespace supports the Zone Append command, so the removed check is dead code. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>	2021-02-08 21:15:53 +01:00
Klaus Jensen	cd42771a33	hw/block/nvme: add missing string representations for commands Add missing string representations for a couple of new commands. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>	2021-02-08 21:15:53 +01:00
Klaus Jensen	5f5dc4c6a9	hw/block/nvme: zero out zones on reset The zoned command set specification states that "All logical blocks in a zone shall be marked as deallocated when [the zone is reset]". Since the device guarantees 0x00 to be read from deallocated blocks we have to issue a pwrite_zeroes since we cannot be sure that a discard will do anything. But typically, this will be achieved with an efficient unmap/discard operation. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>	2021-02-08 21:15:53 +01:00
Klaus Jensen	b05fde2881	hw/block/nvme: enum style fix Align with existing style and use a typedef for header-file enums. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>	2021-02-08 21:15:53 +01:00
Klaus Jensen	5720669605	hw/block/nvme: merge implicitly/explicitly opened processing masks Implicitly and explicitly opended zones are always bulk processed together, so merge the two processing masks. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>	2021-02-08 21:15:53 +01:00
Klaus Jensen	165f134f3d	hw/block/nvme: fix shutdown/reset logic A shutdown is only about flushing stuff. It is the host that should delete any queues, so do not perform a reset here. Also, on shutdown, make sure that the PMR is flushed if in use. Fixes: 368f4e752cf9 ("hw/block/nvme: Process controller reset and shutdown differently") Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>	2021-02-08 21:15:53 +01:00
Klaus Jensen	1b5804a80d	hw/block/nvme: conditionally enable DULBE for zoned namespaces The device uses the BDRV_BLOCK_ZERO flag to determine the "deallocated" status of logical blocks. Since the zoned namespaces command set specification defines that logical blocks SHALL be marked as deallocated when the zone is in the Empty or Offline states, DULBE can only be supported if the zone size is a multiple of the calculated deallocation granularity (reported in NPDG) which depends on the underlying block device cluster size (if applicable) or the configured discard_granularity. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:15:49 +01:00
Klaus Jensen	55886345d0	hw/block/nvme: fix for non-msix machines Commit `1c0c2163aa` ("hw/block/nvme: verify msix_init_exclusive_bar() return value") had the unintended effect of breaking support on several platforms not supporting MSI-X. Still check for errors, but only report that MSI-X is unsupported instead of bailing out. Fixes: `1c0c2163aa` ("hw/block/nvme: verify msix_init_exclusive_bar() return value") Fixes: `fbf2e5375e` ("hw/block/nvme: Verify msix_vector_use() returned value") Reported-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:05:28 +01:00
Dmitry Fomichev	00dd640dff	hw/block/nvme: Document zoned parameters in usage text Added brief descriptions of the new device properties that are now available to users to configure features of Zoned Namespace Command Set in the emulator. This patch is for documentation only, no functionality change. Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:05:28 +01:00
Dmitry Fomichev	1a9290ade3	hw/block/nvme: Support Zone Descriptor Extensions Zone Descriptor Extension is a label that can be assigned to a zone. It can be set to an Empty zone and it stays assigned until the zone is reset. This commit adds a new optional module property, "zoned.descr_ext_size". Its value must be a multiple of 64 bytes. If this value is non-zero, it becomes possible to assign extensions of that size to any Empty zones. The default value for this property is 0, therefore setting extensions is disabled by default. Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:05:28 +01:00
Dmitry Fomichev	8d18ddcd22	hw/block/nvme: Introduce max active and open zone limits Add two module properties, "zoned.max_active" and "zoned.max_open" to control the maximum number of zones that can be active or open. Once these variables are set to non-default values, these limits are checked during I/O and Too Many Active or Too Many Open command status is returned if they are exceeded. Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:05:28 +01:00
Dmitry Fomichev	a479335bfa	hw/block/nvme: Support Zoned Namespace Command Set The emulation code has been changed to advertise NVM Command Set when "zoned" device property is not set (default) and Zoned Namespace Command Set otherwise. Define values and structures that are needed to support Zoned Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator. Define trace events where needed in newly introduced code. In order to improve scalability, all open, closed and full zones are organized in separate linked lists. Consequently, almost all zone operations don't require scanning of the entire zone array (which potentially can be quite large) - it is only necessary to enumerate one or more zone lists. Handlers for three new NVMe commands introduced in Zoned Namespace Command Set specification are added, namely for Zone Management Receive, Zone Management Send and Zone Append. Device initialization code has been extended to create a proper configuration for zoned operation using device properties. Read/Write command handler is modified to only allow writes at the write pointer if the namespace is zoned. For Zone Append command, writes implicitly happen at the write pointer and the starting write pointer value is returned as the result of the command. Write Zeroes handler is modified to add zoned checks that are identical to those done as a part of Write flow. Subsequent commits in this series add ZDE support and checks for active and open zone limits. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com> Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 21:05:27 +01:00
Niklas Cassel	922e6f4ebd	hw/block/nvme: Support allocated CNS command variants Many CNS commands have "allocated" command variants. These include a namespace as long as it is allocated, that is a namespace is included regardless if it is active (attached) or not. While these commands are optional (they are mandatory for controllers supporting the namespace attachment command), our QEMU implementation is more complete by actually providing support for these CNS values. However, since our QEMU model currently does not support the namespace attachment command, these new allocated CNS commands will return the same result as the active CNS command variants. The reason for not hooking up this command completely is because the NVMe specification requires the namespace management command to be supported if the namespace attachment command is supported. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 20:58:34 +01:00
Niklas Cassel	141354d55b	hw/block/nvme: Add support for Namespace Types Define the structures and constants required to implement Namespace Types support. Namespace Types introduce a new command set, "I/O Command Sets", that allows the host to retrieve the command sets associated with a namespace. Introduce support for the command set and enable detection for the NVM Command Set. The new workflows for identify commands rely heavily on zero-filled identify structs. E.g., certain CNS commands are defined to return a zero-filled identify struct when an inactive namespace NSID is supplied. Add a helper function in order to avoid code duplication when reporting zero-filled identify structures. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 20:58:34 +01:00
Dmitry Fomichev	62e8faa468	hw/block/nvme: Add Commands Supported and Effects log This log page becomes necessary to implement to allow checking for Zone Append command support in Zoned Namespace Command Set. This commit adds the code to report this log page for NVM Command Set only. The parts that are specific to zoned operation will be added later in the series. All incoming admin and i/o commands are now only processed if their corresponding support bits are set in this log. This provides an easy way to control what commands to support and what not to depending on set CC.CSS. Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 20:58:32 +01:00
Peter Maydell	2436651b26	Migration pull 2021-02-08 v2 Dropped vmstate: Fix memory leak in vmstate_handle_alloc Broke on Power Added migration: only check page size match if RAM postcopy is enabled -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAmAhIE4ACgkQBRYzHrxb /ecPuA/+Pgo++1ZSseJUgbLePwyTVc0jahdcvYEDmLUn8UM6ikBcBXBgUKHdkFW3 bjSSVgB/xxvXSiafBK4xFNrCqSgqMSr3DJcHmvWgv2wVARcYf6Z26Da53LZq1Qru 0tvRyb40Od1f9zb8Zj7e2Y3pjQ9ybLLbjfNhgnOBbQivqWkjZI31oV2KUCWY2+eV T1BEwr6mgYepqhmeB6OvQZtaQVC5toirS6NajNF4nt0vZEIGIvK6/A9erCVU8Tze 5ch1J0MUqgc3q6ZSE/I9BHEy6MaL0X8G6H+ezjxdoRQtbt1iM/YqZJCSrXkAxiLC ROohryb6qVk26+UYuana79faLwrw359WlkwNEE6SEIRSENu+6p7bgN3LZuCILCO7 xJEkeTgy6r40IGCkDC9aWa8pyLHpNX9gyLpGBHdIRD6zEOWaKNtzh7E2uo/T0ann BpcfgQOsYN25hIHiiXnxozUREbx71VDfMq7GqGB6eC3u2+a3U6jpSJb1nNq5NB89 FJYLZy5Rbuy7OStMwfMsxRs7E63XvGgnwrN8FczU/pumCPX4lDYIpnocqinUmP8p XubRQQVaVDSKIq1mvzw7iR/1NsP9vfYvnrAIv941f38NBmDKqdPuMOXR/qB/Kp2Y jB7b1L5/JcXbWsQmK7fda9jmPzFwSO2cTeTiUonk9RfuuDEws0A= =4tbe -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20210208a' into staging Migration pull 2021-02-08 v2 Dropped vmstate: Fix memory leak in vmstate_handle_alloc Broke on Power Added migration: only check page size match if RAM postcopy is enabled # gpg: Signature made Mon 08 Feb 2021 11:28:14 GMT # gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full] # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7 * remotes/dgilbert/tags/pull-migration-20210208a: (27 commits) migration: only check page size match if RAM postcopy is enabled migration: introduce snapshot-{save, load, delete} QMP commands iotests: fix loading of common.config from tests/ subdir iotests: add support for capturing and matching QMP events migration: introduce a delete_snapshot wrapper migration: wire up support for snapshot device selection migration: control whether snapshots are ovewritten block: rename and alter bdrv_all_find_snapshot semantics block: allow specifying name of block device for vmstate storage block: add ability to specify list of blockdevs during snapshot migration: stop returning errno from load_snapshot() migration: Make save_snapshot() return bool, not 0/-1 block: push error reporting into bdrv_all_*_snapshot functions migration: Display the migration blockers migration: Add blocker information migration: Fix a few absurdly defective error messages migration: Fix cache_init()'s "Failed to allocate" error messages migration: Clean up signed vs. unsigned XBZRLE cache-size migration: Fix migrate-set-parameters argument validation migration: introduce 'userfaultfd-wrlat.py' script ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-02-08 18:23:47 +00:00
Dmitry Fomichev	3ec1d547a5	hw/block/nvme: Combine nvme_write_zeroes() and nvme_write() Move write processing to nvme_do_write() that now handles both WRITE and WRITE ZEROES. Both nvme_write() and nvme_write_zeroes() become inline helper functions. Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 18:55:48 +01:00
Dmitry Fomichev	13a7b6539d	hw/block/nvme: Separate read and write handlers The majority of code in nvme_rw() is becoming read- or write-specific. Move these parts to two separate handlers, nvme_read() and nvme_write() to make the code more readable and to remove multiple is_write checks that has been present in the i/o path. This is a refactoring patch, no change in functionality. Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 18:55:48 +01:00
Dmitry Fomichev	b52f26cd1f	hw/block/nvme: Generate namespace UUIDs In NVMe 1.4, a namespace must report an ID descriptor of UUID type if it doesn't support EUI64 or NGUID. Add a new namespace property, "uuid", that provides the user the option to either specify the UUID explicitly or have a UUID generated automatically every time a namespace is initialized. Suggested-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 18:55:48 +01:00
Dmitry Fomichev	ba69f22481	hw/block/nvme: Process controller reset and shutdown differently Controller reset ans subsystem shutdown are handled very much the same in the current code, but some of the steps should be different in these two cases. Introduce two new functions, nvme_reset_ctrl() and nvme_shutdown_ctrl(), to separate some portions of the code from nvme_clear_ctrl(). The steps that are made different between reset and shutdown are that BAR.CC is not reset to zero upon the shutdown and namespace data is flushed to backing storage as a part of shutdown handling, but not upon reset. Suggested-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 18:55:48 +01:00
Klaus Jensen	e1f81c1478	hw/block/nvme: fix bad clearing of CAP Commit `37712e00b1` ("hw/block/nvme: factor out pmr setup") changed the control flow such that the CAP register is erronously cleared after nvme_init_pmr() has configured it. Since the entire NvmeCtrl structure is zero-filled initially, there is no need for the explicit clearing, so just remove it. Fixes: `37712e00b1` ("hw/block/nvme: factor out pmr setup") Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>	2021-02-08 18:55:48 +01:00
Gollu Appalanaidu	0a384f923f	hw/block/nvme: add compare command Add the Compare command. This implementation uses a bounce buffer to read in the data from storage and then compare with the host supplied buffer. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> [k.jensen: rebased] Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-02-08 18:55:48 +01:00
Klaus Jensen	2605257a26	hw/block/nvme: add the dataset management command Add support for the Dataset Management command and the Deallocate attribute. Deallocation results in discards being sent to the underlying block device. Whether of not the blocks are actually deallocated is affected by the same factors as Write Zeroes (see previous commit). format \| discard \| dsm (512B) dsm (4KiB) dsm (64KiB) -------------------------------------------------------- qcow2 ignore n n n qcow2 unmap n n y raw ignore n n n raw unmap n y y Again, a raw format and 4KiB LBAs are preferable. In order to set the Namespace Preferred Deallocate Granularity and Alignment fields (NPDG and NPDA), choose a sane minimum discard granularity of 4KiB. If we are using a passthru device supporting discard at a 512B granularity, user should set the discard_granularity property explicitly. NPDG and NPDA will also account for the cluster_size of the block driver if required (i.e. for QCOW2). See NVM Express 1.3d, Section 6.7 ("Dataset Management command"). Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-02-08 18:55:48 +01:00
Klaus Jensen	54064e51d1	hw/block/nvme: add dulbe support Add support for reporting the Deallocated or Unwritten Logical Block Error (DULBE). Rely on the block status flags reported by the block layer and consider any block with the BDRV_BLOCK_ZERO flag to be deallocated. Multiple factors affect when a Write Zeroes command result in deallocation of blocks. * the underlying file system block size * the blockdev format * the 'discard' and 'logical_block_size' parameters format \| discard \| wz (512B) wz (4KiB) wz (64KiB) ----------------------------------------------------- qcow2 ignore n n y qcow2 unmap n n y raw ignore n y y raw unmap n y y So, this works best with an image in raw format and 4KiB LBAs, since holes can then be punched on a per-block basis (this assumes a file system with a 4kb block size, YMMV). A qcow2 image, uses a cluster size of 64KiB by default and blocks will only be marked deallocated if a full cluster is zeroed or discarded. However, this is consistent with the spec since Write Zeroes "should" deallocate the block if the Deallocate attribute is set and "may" deallocate if the Deallocate attribute is not set. Thus, we always try to deallocate (the BDRV_REQ_MAY_UNMAP flag is always set). Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>	2021-02-08 18:55:48 +01:00
Klaus Jensen	54eea8d947	hw/block/nvme: pull aio error handling Add a new function, nvme_aio_err, to handle errors resulting from AIOs and use this from the callbacks. Signed-off-by: Klaus Jensen <k.jensen@samsung.com>	2021-02-08 18:55:48 +01:00
Klaus Jensen	c519d9d55e	hw/block/nvme: remove superfluous NvmeCtrl parameter nvme_check_bounds has no use of the NvmeCtrl parameter; remove it. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>	2021-02-08 18:55:47 +01:00
Prasad J Pandit	735754aaa1	imx7-ccm: add digprog mmio write method Add digprog mmio write method to avoid assert failure during initialisation. Reviewed-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Message-Id: <20200811114133.672647-9-ppandit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-08 15:15:32 +01:00
Prasad J Pandit	2c9fb3b784	tz-ppc: add dummy read/write methods Add tz-ppc-dummy mmio read/write methods to avoid assert failure during initialisation. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-Id: <20200811114133.672647-8-ppandit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-08 15:15:32 +01:00
Prasad J Pandit	921604e175	spapr_pci: add spapr msi read method Add spapr msi mmio read method to avoid NULL pointer dereference issue. Reported-by: Lei Sun <slei.casper@gmail.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Message-Id: <20200811114133.672647-7-ppandit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-08 15:15:32 +01:00
Prasad J Pandit	b5bf601f36	nvram: add nrf51_soc flash read method Add nrf51_soc mmio read method to avoid NULL pointer dereference issue. Reported-by: Lei Sun <slei.casper@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-Id: <20200811114133.672647-6-ppandit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-08 15:15:32 +01:00
Prasad J Pandit	f867cebaed	prep: add ppc-parity write method Add ppc-parity mmio write method to avoid NULL pointer dereference issue. Reported-by: Lei Sun <slei.casper@gmail.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-Id: <20200811114133.672647-5-ppandit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-08 15:15:32 +01:00
Prasad J Pandit	24202d2b56	vfio: add quirk device write method Add vfio quirk device mmio write method to avoid NULL pointer dereference issue. Reported-by: Lei Sun <slei.casper@gmail.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Message-Id: <20200811114133.672647-4-ppandit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-08 15:15:32 +01:00
Prasad J Pandit	4f2a5202a0	pci-host: designware: add pcie-msi read method Add pcie-msi mmio read method to avoid NULL pointer dereference issue. Reported-by: Lei Sun <slei.casper@gmail.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Message-Id: <20200811114133.672647-3-ppandit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-08 15:15:32 +01:00
Prasad J Pandit	520f26fc6d	hw/pci-host: add pci-intack write method Add pci-intack mmio write method to avoid NULL pointer dereference issue. Reported-by: Lei Sun <slei.casper@gmail.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Message-Id: <20200811114133.672647-2-ppandit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-08 15:15:32 +01:00
Maxim Levitsky	dec2bb14b8	virtio-scsi: don't uninitialize queues that we didn't initialize Count number of queues that we initialized and only deinitialize these that we initialized successfully. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201217150040.906961-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-08 14:43:55 +01:00
Jinhao Gao	e6ddad1fd5	spapr_pci: Fix memory leak of vmstate_spapr_pci When VM migrate VMState of spapr_pci, the field(msi_devs) of spapr_pci having a flag of VMS_ALLOC need to allocate memory. If the src doesn't free memory of msi_devs in SaveStateEntry of spapr_pci after QEMUFile save VMState of spapr_pci, it may result in memory leak of msi_devs. We add the post_save func to free memory, which prevents memory leak. Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Jinhao Gao <gaojinhao@huawei.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20201231061020.828-2-gaojinhao@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
David Gibson	651615d92d	s390: Recognize confidential-guest-support option At least some s390 cpu models support "Protected Virtualization" (PV), a mechanism to protect guests from eavesdropping by a compromised hypervisor. This is similar in function to other mechanisms like AMD's SEV and POWER's PEF, which are controlled by the "confidential-guest-support" machine option. s390 is a slightly special case, because we already supported PV, simply by using a CPU model with the required feature (S390_FEAT_UNPACK). To integrate this with the option used by other platforms, we implement the following compromise: - When the confidential-guest-support option is set, s390 will recognize it, verify that the CPU can support PV (failing if not) and set virtio default options necessary for encrypted or protected guests, as on other platforms. i.e. if confidential-guest-support is set, we will either create a guest capable of entering PV mode, or fail outright. - If confidential-guest-support is not set, guests might still be able to enter PV mode, if the CPU has the right model. This may be a little surprising, but shouldn't actually be harmful. To start a guest supporting Protected Virtualization using the new option use the command line arguments: -object s390-pv-guest,id=pv0 -machine confidential-guest-support=pv0 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>	2021-02-08 16:57:38 +11:00
David Gibson	9f88a7a3df	confidential guest support: Alter virtio default properties for protected guests The default behaviour for virtio devices is not to use the platforms normal DMA paths, but instead to use the fact that it's running in a hypervisor to directly access guest memory. That doesn't work if the guest's memory is protected from hypervisor access, such as with AMD's SEV or POWER's PEF. So, if a confidential guest mechanism is enabled, then apply the iommu_platform=on option so it will go through normal DMA mechanisms. Those will presumably have some way of marking memory as shared with the hypervisor or hardware so that DMA will work. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2021-02-08 16:57:38 +11:00
David Gibson	6742eefc93	spapr: PEF: prevent migration We haven't yet implemented the fairly involved handshaking that will be needed to migrate PEF protected guests. For now, just use a migration blocker so we get a meaningful error if someone attempts this (this is the same approach used by AMD SEV). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2021-02-08 16:57:38 +11:00
David Gibson	6c8ebe30ea	spapr: Add PEF based confidential guest support Some upcoming POWER machines have a system called PEF (Protected Execution Facility) which uses a small ultravisor to allow guests to run in a way that they can't be eavesdropped by the hypervisor. The effect is roughly similar to AMD SEV, although the mechanisms are quite different. Most of the work of this is done between the guest, KVM and the ultravisor, with little need for involvement by qemu. However qemu does need to tell KVM to allow secure VMs. Because the availability of secure mode is a guest visible difference which depends on having the right hardware and firmware, we don't enable this by default. In order to run a secure guest you need to create a "pef-guest" object and set the confidential-guest-support property to point to it. Note that this just allows secure guests, the architecture of PEF is such that the guest still needs to talk to the ultravisor to enter secure mode. Qemu has no direct way of knowing if the guest is in secure mode, and certainly can't know until well after machine creation time. To start a PEF-capable guest, use the command line options: -object pef-guest,id=pef0 -machine confidential-guest-support=pef0 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2021-02-08 16:57:38 +11:00
David Gibson	e0292d7c62	confidential guest support: Rework the "memory-encryption" property Currently the "memory-encryption" property is only looked at once we get to kvm_init(). Although protection of guest memory from the hypervisor isn't something that could really ever work with TCG, it's not conceptually tied to the KVM accelerator. In addition, the way the string property is resolved to an object is almost identical to how a QOM link property is handled. So, create a new "confidential-guest-support" link property which sets this QOM interface link directly in the machine. For compatibility we keep the "memory-encryption" property, but now implemented in terms of the new property. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com>	2021-02-08 16:57:38 +11:00

1 2 3 4 5 ...

28161 Commits