ShuriZma/xemu - xemu

Commit Graph

Author	SHA1	Message	Date
Richard Henderson	65c123fdf5	target/arm: Implement FEAT_HAFDBS, dirty bit portion Perform the atomic update for hardware management of the dirty bit. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221024051851.3074715-14-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 11:34:31 +01:00
Richard Henderson	71943a1e90	target/arm: Implement FEAT_HAFDBS, access flag portion Perform the atomic update for hardware management of the access flag. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221024051851.3074715-13-richard.henderson@linaro.org [PMM: Fix accidental PROT_WRITE to PAGE_WRITE; add missing main-loop.h include] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 10:27:24 +01:00
Richard Henderson	34a57faeab	target/arm: Tidy merging of attributes from descriptor and table Replace some gotos with some nested if statements. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20221024051851.3074715-12-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 10:27:24 +01:00
Richard Henderson	0e8df0fe24	target/arm: Consider GP an attribute in get_phys_addr_lpae Both GP and DBM are in the upper attribute block. Extend the computation of attrs to include them, then simplify the setting of guarded. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20221024051851.3074715-11-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 10:27:24 +01:00
Richard Henderson	4566609176	target/arm: Don't shift attrs in get_phys_addr_lpae Leave the upper and lower attributes in the place they originate from in the descriptor. Shifting them around is confusing, since one cannot read the bit numbers out of the manual. Also, new attributes have been added which would alter the shifts. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20221024051851.3074715-10-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 10:27:24 +01:00
Richard Henderson	27c1b81d61	target/arm: Fix fault reporting in get_phys_addr_lpae Always overriding fi->type was incorrect, as we would not properly propagate the fault type from S1_ptw_translate, or arm_ldq_ptw. Simplify things by providing a new label for a translation fault. For other faults, store into fi directly. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20221024051851.3074715-9-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 10:27:24 +01:00
Richard Henderson	fe4ddc151b	target/arm: Remove loop from get_phys_addr_lpae The unconditional loop was used both to iterate over levels and to control parsing of attributes. Use an explicit goto in both cases. While this appears less clean for iterating over levels, we will need to jump back into the middle of this loop for atomic updates, which is even uglier. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221024051851.3074715-8-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 10:27:23 +01:00
Richard Henderson	f0a398a249	target/arm: Add ARMFault_UnsuppAtomicUpdate This fault type is to be used with FEAT_HAFDBS when the guest enables hw updates, but places the tables in memory where atomic updates are unsupported. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20221024051851.3074715-7-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 10:27:23 +01:00
Richard Henderson	93e5b3a6f9	target/arm: Move S1_ptw_translate outside arm_ld[lq]_ptw Separate S1 translation from the actual lookup. Will enable lpae hardware updates. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221024051851.3074715-6-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 10:27:23 +01:00
Richard Henderson	8973922783	target/arm: Extract HA and HD in aa64_va_parameters Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20221024051851.3074715-5-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 10:27:23 +01:00
Richard Henderson	980a68925c	target/arm: Add isar predicates for FEAT_HAFDBS The MMFR1 field may indicate support for hardware update of access flag alone, or access flag and dirty bit. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221024051851.3074715-4-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 10:27:23 +01:00
Richard Henderson	48da29e485	target/arm: Add ptw_idx to S1Translate Hoist the computation of the mmu_idx for the ptw up to get_phys_addr_with_struct and get_phys_addr_twostage. This removes the duplicate check for stage2 disabled from the middle of the walk, performing it only once. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20221024051851.3074715-3-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 10:27:23 +01:00
Richard Henderson	edc05dd43a	target/arm: Introduce regime_is_stage2 Reduce the amount of typing required for this check. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221024051851.3074715-2-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 10:27:23 +01:00
Axel Heider	7719419deb	target/imx: reload cmp timer outside of the reload ptimer transaction When running seL4 tests (https://docs.sel4.systems/projects/sel4test) on the sabrelight platform, the timer tests fail. The arm/imx6 EPIT timer interrupt does not fire properly, instead of a e.g. second in can take up to a minute to finally see the interrupt. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1263 Signed-off-by: Axel Heider <axel.heider@hensoldt.net> Message-id: 166663118138.13362.1229967229046092876-0@git.sr.ht Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 10:27:23 +01:00
Peter Maydell	7764963b92	hw/hyperv/hyperv.c: Use device_cold_reset() instead of device_legacy_reset() The semantic difference between the deprecated device_legacy_reset() function and the newer device_cold_reset() function is that the new function resets both the device itself and any qbuses it owns, whereas the legacy function resets just the device itself and nothing else. In hyperv_synic_reset() we reset a SynICState, which has no qbuses, so for this purpose the two functions behave identically and we can stop using the deprecated one. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-id: 20221013171817.1447562-1-peter.maydell@linaro.org	2022-10-27 10:27:23 +01:00
Damien Hedde	310616d367	hw/core/resettable: fix reset level counting The code for handling the reset level count in the Resettable code has two issues: The reset count is only decremented for the 1->0 case. This means that if there's ever a nested reset that takes the count to 2 then it will never again be decremented. Eventually the count will exceed the '50' limit in resettable_phase_enter() and QEMU will trip over the assertion failure. The repro case in issue 1266 is an example of this that happens now the SCSI subsystem uses three-phase reset. Secondly, the count is decremented only after the exit phase handler is called. Moving the reset count decrement from "just after" to "just before" calling the exit phase handler allows resettable_is_in_reset() to return false during the handler execution. This simplifies reset handling in resettable devices. Typically, a function that updates the device state will just need to read the current reset state and not anymore treat the "in a reset-exit transition" as a special case. Note that the semantics change to the *_is_in_reset() functions will have no effect on the current codebase, because only two devices (hw/char/cadence_uart.c and hw/misc/zynq_sclr.c) currently call those functions, and in neither case do they do it from the device's exit phase methed. Fixes: `4a5fc890` ("scsi: Use device_cold_reset() and bus_cold_reset()") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1266 Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reported-by: Michael Peter <michael.peter@hensoldt-cyber.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20221020142749.3357951-1-peter.maydell@linaro.org Buglink: https://bugs.launchpad.net/qemu/+bug/1905297 Reported-by: Michael Peter <michael.peter@hensoldt-cyber.com> [PMM: adjust the docs paragraph changed to get the name of the 'enter' phase right and to clarify exactly when the count is adjusted; rewrite the commit message] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 10:27:23 +01:00
Ake Koomsin	c939a7c7b9	target/arm: honor HCR_E2H and HCR_TGE in arm_excp_unmasked() An exception targeting EL2 from lower EL is actually maskable when HCR_E2H and HCR_TGE are both set. This applies to both secure and non-secure Security state. We can remove the conditions that try to suppress masking of interrupts when we are Secure and the exception targets EL2 and Secure EL2 is disabled. This is OK because in that situation arm_phys_excp_target_el() will never return 2 as the target EL. The 'not if secure' check in this function was originally written before arm_hcr_el2_eff(), and back then the target EL returned by arm_phys_excp_target_el() could be 2 even if we were in Secure EL0/EL1; but it is no longer needed. Signed-off-by: Ake Koomsin <ake@igel.co.jp> Message-id: 20221017092432.546881-1-ake@igel.co.jp [PMM: Add commit message paragraph explaining why it's OK to remove the checks on secure and SCR_EEL2] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 10:27:23 +01:00
Jean-Philippe Brucker	7cd5d384bb	hw/arm/virt: Fix devicetree warnings about the virtio-iommu node The "PCI Bus Binding to: IEEE Std 1275-1994" defines the compatible string for a PCIe bus or endpoint as "pci<vendorid>,<deviceid>" or similar. Since the initial binding for PCI virtio-iommu didn't follow this rule, it was modified to accept both strings and ensure backward compatibility. Also, the unit-name for the node should be "device,function". Fix corresponding dt-validate and dtc warnings: pcie@10000000: virtio_iommu@16:compatible: ['virtio,pci-iommu'] does not contain items matching the given schema pcie@10000000: Unevaluated properties are not allowed (... 'virtio_iommu@16' were unexpected) From schema: linux/Documentation/devicetree/bindings/pci/host-generic-pci.yaml virtio_iommu@16: compatible: 'oneOf' conditional failed, one must be fixed: ['virtio,pci-iommu'] is too short 'pci1af4,1057' was expected From schema: dtschema/schemas/pci/pci-bus.yaml Warning (pci_device_reg): /pcie@10000000/virtio_iommu@16: PCI unit address format error, expected "2,0" Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-10-27 10:27:23 +01:00
Peter Maydell	e4c93e44ab	target/arm: Implement FEAT_E0PD FEAT_E0PD adds new bits E0PD0 and E0PD1 to TCR_EL1, which allow the OS to forbid EL0 access to half of the address space. Since this is an EL0-specific variation on the existing TCR_ELx.{EPD0,EPD1}, we can implement it entirely in aa64_va_parameters(). This requires moving the existing regime_is_user() to internals.h so that the code in helper.c can get at it. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20221021160131.3531787-1-peter.maydell@linaro.org	2022-10-27 10:27:23 +01:00
David Hildenbrand	bd77c30df9	vl: Allow ThreadContext objects to be created before the sandbox option Currently, there is no way to configure a CPU affinity inside QEMU when the sandbox option disables it for QEMU as a whole, for example, via: -sandbox enable=on,resourcecontrol=deny While ThreadContext objects can be created on the QEMU commandline and the CPU affinity can be configured externally via the thread-id, this is insufficient if a ThreadContext with a certain CPU affinity is already required during QEMU startup, before we can intercept QEMU and configure the CPU affinity. Blocking sched_setaffinity() was introduced in `24f8cdc572` ("seccomp: add resourcecontrol argument to command line"), "to avoid any bigger of the process". However, we only care about once QEMU is running, not when the instance starting QEMU explicitly requests a certain CPU affinity on the QEMU comandline. Right now, for NUMA-aware preallocation of memory backends used for initial machine RAM, one has to: 1) Start QEMU with the memory-backend with "prealloc=off" 2) Pause QEMU before it starts the guest (-S) 3) Create ThreadContext, configure the CPU affinity using the thread-id 4) Configure the ThreadContext as "prealloc-context" of the memory backend 5) Trigger preallocation by setting "prealloc=on" To simplify this handling especially for initial machine RAM, allow creation of ThreadContext objects before parsing sandbox options, such that the CPU affinity requested on the QEMU commandline alongside the sandbox option can be set. As ThreadContext objects essentially only create a persistent context thread and set the CPU affinity, this is easily possible. With this change, we can create a ThreadContext with a CPU affinity on the QEMU commandline and use it for preallocation of memory backends glued to the machine (simplified example): To make "-name debug-threads=on" keep working as expected for the context threads, perform earlier parsing of "-name". qemu-system-x86_64 -m 1G \ -object thread-context,id=tc1,cpu-affinity=3-4 \ -object memory-backend-ram,id=pc.ram,size=1G,prealloc=on,prealloc-threads=2,prealloc-context=tc1 \ -machine memory-backend=pc.ram \ -S -monitor stdio -sandbox enable=on,resourcecontrol=deny And while we can query the current CPU affinity: (qemu) qom-get tc1 cpu-affinity [ 3, 4 ] We can no longer change it from QEMU directly: (qemu) qom-set tc1 cpu-affinity 1-2 Error: Setting CPU affinity failed: Operation not permitted Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <20221014134720.168738-8-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>	2022-10-27 11:01:09 +02:00
David Hildenbrand	e681645862	hostmem: Allow for specifying a ThreadContext for preallocation Let's allow for specifying a thread context via the "prealloc-context" property. When set, preallcoation threads will be crated via the thread context -- inheriting the same CPU affinity as the thread context. Pinning preallcoation threads to CPUs can heavily increase performance in NUMA setups, because, preallocation from a CPU close to the target NUMA node(s) is faster then preallocation from a CPU further remote, simply because of memory bandwidth for initializing memory with zeroes. This is especially relevant for very large VMs backed by huge/gigantic pages, whereby preallocation is mandatory. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <20221014134720.168738-7-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>	2022-10-27 11:01:03 +02:00
David Hildenbrand	e04a34e55c	util: Make qemu_prealloc_mem() optionally consume a ThreadContext ... and implement it under POSIX. When a ThreadContext is provided, create new threads via the context such that these new threads obtain a properly configured CPU affinity. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <20221014134720.168738-6-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>	2022-10-27 11:00:56 +02:00
David Hildenbrand	10218ae6d0	util: Add write-only "node-affinity" property for ThreadContext Let's make it easier to pin threads created via a ThreadContext to all host CPUs currently belonging to a given set of host NUMA nodes -- which is the common case. "node-affinity" is simply a shortcut for setting "cpu-affinity" manually to the list of host CPUs belonging to the set of host nodes. This property can only be written. A simple QEMU example to set the CPU affinity to host node 1 on a system with two nodes, 24 CPUs each, whereby odd-numbered host CPUs belong to host node 1: qemu-system-x86_64 -S \ -object thread-context,id=tc1,node-affinity=1 And we can query the cpu-affinity via HMP/QMP: (qemu) qom-get tc1 cpu-affinity [ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47 ] We cannot query the node-affinity: (qemu) qom-get tc1 node-affinity Error: Insufficient permission to perform this operation But note that due to dynamic library loading this example will not work before we actually make use of thread_context_create_thread() in QEMU code, because the type will otherwise not get registered. We'll wire this up next to make it work. Note that if the host CPUs for a host node change due do CPU hot(un)plug CPU onlining/offlining (i.e., lscpu output changes) after the ThreadContext was started, the CPU affinity will not get updated. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221014134720.168738-5-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>	2022-10-27 11:00:50 +02:00
David Hildenbrand	e2de2c497e	util: Introduce ThreadContext user-creatable object Setting the CPU affinity of QEMU threads is a bit problematic, because QEMU doesn't always have permissions to set the CPU affinity itself, for example, with seccomp after initialized by QEMU: -sandbox enable=on,resourcecontrol=deny General information about CPU affinities can be found in the man page of taskset: CPU affinity is a scheduler property that "bonds" a process to a given set of CPUs on the system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs. While upper layers are already aware of how to handle CPU affinities for long-lived threads like iothreads or vcpu threads, especially short-lived threads, as used for memory-backend preallocation, are more involved to handle. These threads are created on demand and upper layers are not even able to identify and configure them. Introduce the concept of a ThreadContext, that is essentially a thread used for creating new threads. All threads created via that context thread inherit the configured CPU affinity. Consequently, it's sufficient to create a ThreadContext and configure it once, and have all threads created via that ThreadContext inherit the same CPU affinity. The CPU affinity of a ThreadContext can be configured two ways: (1) Obtaining the thread id via the "thread-id" property and setting the CPU affinity manually (e.g., via taskset). (2) Setting the "cpu-affinity" property and letting QEMU try set the CPU affinity itself. This will fail if QEMU doesn't have permissions to do so anymore after seccomp was initialized. A simple QEMU example to set the CPU affinity to host CPU 0,1,6,7 would be: qemu-system-x86_64 -S \ -object thread-context,id=tc1,cpu-affinity=0-1,cpu-affinity=6-7 And we can query it via HMP/QMP: (qemu) qom-get tc1 cpu-affinity [ 0, 1, 6, 7 ] But note that due to dynamic library loading this example will not work before we actually make use of thread_context_create_thread() in QEMU code, because the type will otherwise not get registered. We'll wire this up next to make it work. In general, the interface behaves like pthread_setaffinity_np(): host CPU numbers that are currently not available are ignored; only host CPU numbers that are impossible with the current kernel will fail. If the list of host CPU numbers does not include a single CPU that is available, setting the CPU affinity will fail. A ThreadContext can be reused, simply by reconfiguring the CPU affinity. Note that the CPU affinity of previously created threads will not get adjusted. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221014134720.168738-4-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>	2022-10-27 11:00:43 +02:00
David Hildenbrand	7730f32c28	util: Introduce qemu_thread_set_affinity() and qemu_thread_get_affinity() Usually, we let upper layers handle CPU pinning, because pthread_setaffinity_np() (-> sched_setaffinity()) is blocked via seccomp when starting QEMU with -sandbox enable=on,resourcecontrol=deny However, we want to configure and observe the CPU affinity of threads from QEMU directly in some cases when the sandbox option is either not enabled or not active yet. So let's add a way to configure CPU pinning via qemu_thread_set_affinity() and obtain CPU affinity via qemu_thread_get_affinity() and implement them under POSIX using pthread_setaffinity_np() + pthread_getaffinity_np(). Implementation under Windows is possible using SetProcessAffinityMask() + GetProcessAffinityMask(), however, that is left as future work. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <20221014134720.168738-3-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>	2022-10-27 11:00:36 +02:00
David Hildenbrand	6556aadc18	util: Cleanup and rename os_mem_prealloc() Let's * give the function a "qemu_" style name make sure the parameters in the implementation match the prototype * rename smp_cpus to max_threads, which makes the semantics of that parameter clearer ... and add a function documentation. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <20221014134720.168738-2-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>	2022-10-27 11:00:28 +02:00
Thomas Huth	f7d81a351d	target/s390x: Fix emulation of the VISTR instruction The element size is encoded in the M3 field, not in the M4 field. Fixes: `be6324c6b7` ("s390x/tcg: Implement VECTOR ISOLATE STRING") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1248 Message-Id: <20221012182755.1014853-3-thuth@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2022-10-27 09:09:50 +02:00
Thomas Huth	117ea96089	tests/tcg/s390x: Test compiler flags only once, not every time This is common practice, see the Makefile.target in the aarch64 folder for example. Suggested-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20221012182755.1014853-2-thuth@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2022-10-27 09:09:50 +02:00
Nico Boehr	38621181ae	s390x/tod-kvm: don't save/restore the TOD in PV guests Under PV, the guest's TOD clock is under control of the ultravisor and the hypervisor cannot change it. With upcoming kernel changes[1], the Linux kernel will reject QEMU's request to adjust the guest's clock in this case, so don't attempt to set the clock. This avoids the following warning message on save/restore of a PV guest: warning: Unable to set KVM guest TOD clock: Operation not supported [1] https://lore.kernel.org/all/20221011160712.928239-2-nrb@linux.ibm.com/ Fixes: `c3347ed0d2` ("s390x: protvirt: Support unpack facility") Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Message-Id: <20221012123229.1196007-1-nrb@linux.ibm.com> [thuth: Add curly braces] Signed-off-by: Thomas Huth <thuth@redhat.com>	2022-10-27 09:09:50 +02:00
Cornelia Huck	d001a81256	s390x: step down as general arch maintainer I haven't really been working on s390x for some time now, and in practice, I don't have time for it, either. So let's remove myself from this entry. Signed-off-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20221010160957.40779-1-cohuck@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2022-10-27 09:09:50 +02:00
Claudio Imbrenda	36c182bbe6	s390x/pv: remove semicolon from macro definition Remove spurious semicolon at the end of the macro s390_pv_cmd Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20221010151041.89071-1-imbrenda@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2022-10-27 09:09:50 +02:00
Markus Armbruster	0dddb0fc80	qerror: QERR_PERMISSION_DENIED is no longer used, drop Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221012153801.2604340-5-armbru@redhat.com>	2022-10-27 07:57:18 +02:00
Markus Armbruster	8d09593314	qtest: Improve error messages when property can not be set right now When you try to set qtest property "log" while the qtest object is active, the error message blames "insufficient permission": $ qemu-system-x86_64 -S -display none -nodefaults -monitor stdio -chardev socket,id=chrqt0,path=qtest.socket,server=on,wait=off -object qtest,id=qt0,chardev=chrqt0,log=/dev/null QEMU 7.1.50 monitor - type 'help' for more information (qemu) qom-set /objects/qt0 log qtest.log Error: Insufficient permission to perform this operation This implies it could work with "sufficient permission". It can't. Change the error message to: Error: Property 'log' can not be set now Same for property "chardev". Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221012153801.2604340-4-armbru@redhat.com>	2022-10-27 07:57:09 +02:00
Markus Armbruster	ff92444884	backends: Improve error messages when property can no longer be set When you try to set virtio-rng property "filename" after the backend has been completed with user_creatable_complete(), the error message blames "insufficient permission": $ qemu-system-x86_64 -S -display none -nodefaults -monitor stdio -object rng-random,id=rng0 -device virtio-rng,id=vrng0,rng=rng0 QEMU 7.1.50 monitor - type 'help' for more information (qemu) qom-set /objects/rng0 filename /dev/random Error: Insufficient permission to perform this operation This implies it could work with "sufficient permission". It can't. Change the error message to: Error: Property 'filename' can no longer be set Same for cryptodev-vhost-user property "chardev", rng-egd property "chardev", and vhost-user-backend property "chardev". Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221012153801.2604340-3-armbru@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> [Commit message tidied up]	2022-10-27 07:56:56 +02:00
Markus Armbruster	3f7febc937	qom: Improve error messages when property has no getter or setter When you try to set a property that has no setter, the error message blames "insufficient permission": $ qemu-system-x86_64 -S -display none -nodefaults -monitor stdio QEMU 7.1.50 monitor - type 'help' for more information (qemu) qom-set /machine type q35 Error: Insufficient permission to perform this operation This implies it could work with "sufficient permission". It can't. Change the error message to: Error: Property 'pc-i440fx-7.2-machine.type' is not writable Do the same for getting a property that has no getter. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221012153801.2604340-2-armbru@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>	2022-10-27 07:54:45 +02:00
Muhammad Moinur Rahman	41bf9322a0	bsd-user: Catch up with sys/param.h requirement for machine/pmap.h Some versions of FreeBSD now require sys/param.h for machine/pmap.h on x86. Include them here to meet that requirement. It does no harm on older versions, so there's no need to #ifdef it. Signed-off-by: Muhammad Moinur Rahman <bofh@FreeBSD.org> Reviewed-by: John Baldwin <jhb@FreeBSD.org> Signed-off-by: Warner Losh <imp@bsdimp.com>	2022-10-26 14:09:17 -06:00
Stefan Hajnoczi	baf422684d	virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint Register guest RAM using BlockRAMRegistrar and set the BDRV_REQ_REGISTERED_BUF flag so block drivers can optimize memory accesses in I/O requests. This is for vdpa-blk, vhost-user-blk, and other I/O interfaces that rely on DMA mapping/unmapping. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20221013185908.1297568-14-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Stefan Hajnoczi	c5640b3e2f	blkio: implement BDRV_REQ_REGISTERED_BUF optimization Avoid bounce buffers when QEMUIOVector elements are within previously registered bdrv_register_buf() buffers. The idea is that emulated storage controllers will register guest RAM using bdrv_register_buf() and set the BDRV_REQ_REGISTERED_BUF on I/O requests. Therefore no blkio_map_mem_region() calls are necessary in the performance-critical I/O code path. This optimization doesn't apply if the I/O buffer is internally allocated by QEMU (e.g. qcow2 metadata). There we still take the slow path because BDRV_REQ_REGISTERED_BUF is not set. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20221013185908.1297568-13-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Stefan Hajnoczi	701bff24de	stubs: add qemu_ram_block_from_host() and qemu_ram_get_fd() The blkio block driver will need to look up the file descriptor for a given pointer. This is possible in softmmu builds where the RAMBlock API is available for querying guest RAM. Add stubs so tools like qemu-img that link the block layer still build successfully. In this case there is no guest RAM but that is fine. Bounce buffers and their file descriptors will be allocated with libblkio's blkio_alloc_mem_region() so we won't rely on QEMU's qemu_ram_get_fd() in that case. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20221013185908.1297568-12-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Stefan Hajnoczi	6d998f3cbf	exec/cpu-common: add qemu_ram_get_fd() Add a function to get the file descriptor for a RAMBlock. Device emulation code typically uses the MemoryRegion APIs but vhost-style code may use RAMBlock directly for sharing guest memory with another process. This new API will be used by the libblkio block driver so it can share guest memory via .bdrv_register_buf(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20221013185908.1297568-11-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Stefan Hajnoczi	7f9241d805	block: add BlockRAMRegistrar Emulated devices and other BlockBackend users wishing to take advantage of blk_register_buf() all have the same repetitive job: register RAMBlocks with the BlockBackend using RAMBlockNotifier. Add a BlockRAMRegistrar API to do this. A later commit will use this from hw/block/virtio-blk.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20221013185908.1297568-10-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Stefan Hajnoczi	4fdd0a1a7e	numa: use QLIST_FOREACH_SAFE() for RAM block notifiers Make list traversal work when a callback removes a notifier mid-traversal. This is a cleanup to prevent bugs in the future. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-id: 20221013185908.1297568-9-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Stefan Hajnoczi	f4ec04bae9	block: return errors from bdrv_register_buf() Registering an I/O buffer is only a performance optimization hint but it is still necessary to return errors when it fails. Later patches will need to detect errors when registering buffers but an immediate advantage is that error_report() calls are no longer needed in block driver .bdrv_register_buf() functions. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20221013185908.1297568-8-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Stefan Hajnoczi	e8b6535533	block: add BDRV_REQ_REGISTERED_BUF request flag Block drivers may optimize I/O requests accessing buffers previously registered with bdrv_register_buf(). Checking whether all elements of a request's QEMUIOVector are within previously registered buffers is expensive, so we need a hint from the user to avoid costly checks. Add a BDRV_REQ_REGISTERED_BUF request flag to indicate that all QEMUIOVector elements in an I/O request are known to be within previously registered buffers. Always pass the flag through to driver read/write functions. There is little harm in passing the flag to a driver that does not use it. Passing the flag to drivers avoids changes across many block drivers. Filter drivers would need to explicitly support the flag and pass through to their children when the children support it. That's a lot of code changes and it's hard to remember to do that everywhere, leading to silent reduced performance when the flag is accidentally dropped. The only problematic scenario with the approach in this patch is when a driver passes the flag through to internal I/O requests that don't use the same I/O buffer. In that case the hint may be set when it should actually be clear. This is a rare case though so the risk is low. Some drivers have assert(!flags), which no longer works when BDRV_REQ_REGISTERED_BUF is passed in. These assertions aren't very useful anyway since the functions are called almost exclusively by bdrv_driver_preadv/pwritev() so if we get flags handling right there then the assertion is not needed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20221013185908.1297568-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Stefan Hajnoczi	98b3ddc78b	block: use BdrvRequestFlags type for supported flag fields Use the enum type so GDB displays the enum members instead of printing a numeric constant. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20221013185908.1297568-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Stefan Hajnoczi	4f384011c5	block: pass size to bdrv_unregister_buf() The only implementor of bdrv_register_buf() is block/nvme.c, where the size is not needed when unregistering a buffer. This is because util/vfio-helpers.c can look up mappings by address. Future block drivers that implement bdrv_register_buf() may not be able to do their job given only the buffer address. Add a size argument to bdrv_unregister_buf(). Also document the assumptions about bdrv_register_buf()/bdrv_unregister_buf() calls. The same <host, size> values that were given to bdrv_register_buf() must be given to bdrv_unregister_buf(). gcc 11.2.1 emits a spurious warning that img_bench()'s buf_size local variable might be uninitialized, so it's necessary to silence the compiler. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20221013185908.1297568-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Stefan Hajnoczi	1f0fea38f4	numa: call ->ram_block_removed() in ram_block_notifer_remove() When a RAMBlockNotifier is added, ->ram_block_added() is called with all existing RAMBlocks. There is no equivalent ->ram_block_removed() call when a RAMBlockNotifier is removed. The util/vfio-helpers.c code (the sole user of RAMBlockNotifier) is fine with this asymmetry because it does not rely on RAMBlockNotifier for cleanup. It walks its internal list of DMA mappings and unmaps them by itself. Future users of RAMBlockNotifier may not have an internal data structure that records added RAMBlocks so they will need ->ram_block_removed() callbacks. This patch makes ram_block_notifier_remove() symmetric with respect to callbacks. Now util/vfio-helpers.c needs to unmap remaining DMA mappings after ram_block_notifier_remove() has been called. This is necessary since users like block/nvme.c may create additional DMA mappings that do not originate from the RAMBlockNotifier. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20221013185908.1297568-4-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Stefan Hajnoczi	fd66dbd424	blkio: add libblkio block driver libblkio (https://gitlab.com/libblkio/libblkio/) is a library for high-performance disk I/O. It currently supports io_uring, virtio-blk-vhost-user, and virtio-blk-vhost-vdpa with additional drivers under development. One of the reasons for developing libblkio is that other applications besides QEMU can use it. This will be particularly useful for virtio-blk-vhost-user which applications may wish to use for connecting to qemu-storage-daemon. libblkio also gives us an opportunity to develop in Rust behind a C API that is easy to consume from QEMU. This commit adds io_uring, nvme-io_uring, virtio-blk-vhost-user, and virtio-blk-vhost-vdpa BlockDrivers to QEMU using libblkio. It will be easy to add other libblkio drivers since they will share the majority of code. For now I/O buffers are copied through bounce buffers if the libblkio driver requires it. Later commits add an optimization for pre-registering guest RAM to avoid bounce buffers. The syntax is: --blockdev io_uring,node-name=drive0,filename=test.img,readonly=on\|off,cache.direct=on\|off --blockdev nvme-io_uring,node-name=drive0,filename=/dev/ng0n1,readonly=on\|off,cache.direct=on --blockdev virtio-blk-vhost-vdpa,node-name=drive0,path=/dev/vdpa...,readonly=on\|off,cache.direct=on --blockdev virtio-blk-vhost-user,node-name=drive0,path=vhost-user-blk.sock,readonly=on\|off,cache.direct=on Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20221013185908.1297568-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Stefan Hajnoczi	0421b563ab	coroutine: add flag to re-queue at front of CoQueue When a coroutine wakes up it may determine that it must re-queue. Normally coroutines are pushed onto the back of the CoQueue, but for fairness it may be necessary to push it onto the front of the CoQueue. Add a flag to specify that the coroutine should be pushed onto the front of the CoQueue. A later patch will use this to ensure fairness in the bounce buffer CoQueue used by the blkio BlockDriver. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20221013185908.1297568-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2022-10-26 14:56:42 -04:00
Bjørn Forsman	3845ffff8b	qga: add channel path to error messages It's useful to know which device was used if/when it fails. channel-win32.c had this since 2015, with `c69403fcd4` ("qemu-ga: debug printouts to help troubleshoot installation"), this brings channel-posix.c up to speed. Signed-off-by: Bjørn Forsman <bjorn.forsman@gmail.com> Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com> Signed-off-by: Konstantin Kostiuk <kkostiuk@redhat.com>	2022-10-26 20:35:20 +03:00

... 3 4 5 6 7 ...

99319 Commits All Branches Search

99319 Commits

All Branches