Commit Graph

143 Commits

Author SHA1 Message Date
David Woodhouse b46f9745b1 hw/xen: Implement GNTTABOP_query_size
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 09:07:52 +00:00
David Woodhouse 28b7ae94a2 i386/xen: Implement HYPERVISOR_grant_table_op and GNTTABOP_[gs]et_verson
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 09:07:52 +00:00
David Woodhouse a28b0fc034 hw/xen: Add xen_gnttab device for grant table emulation
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 09:07:52 +00:00
David Woodhouse 6f43f2ee49 kvm/i386: Add xen-gnttab-max-frames property
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 09:07:52 +00:00
David Woodhouse 2aff696b10 hw/xen: Support HVM_PARAM_CALLBACK_TYPE_PCI_INTX callback
The guest is permitted to specify an arbitrary domain/bus/device/function
and INTX pin from which the callback IRQ shall appear to have come.

In QEMU we can only easily do this for devices that actually exist, and
even that requires us "knowing" that it's a PCMachine in order to find
the PCI root bus — although that's OK really because it's always true.

We also don't get to get notified of INTX routing changes, because we
can't do that as a passive observer; if we try to register a notifier
it will overwrite any existing notifier callback on the device.

But in practice, guests using PCI_INTX will only ever use pin A on the
Xen platform device, and won't swizzle the INTX routing after they set
it up. So this is just fine.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 09:07:50 +00:00
David Woodhouse ddf0fd9ae1 hw/xen: Support HVM_PARAM_CALLBACK_TYPE_GSI callback
The GSI callback (and later PCI_INTX) is a level triggered interrupt. It
is asserted when an event channel is delivered to vCPU0, and is supposed
to be cleared when the vcpu_info->evtchn_upcall_pending field for vCPU0
is cleared again.

Thankfully, Xen does *not* assert the GSI if the guest sets its own
evtchn_upcall_pending field; we only need to assert the GSI when we
have delivered an event for ourselves. So that's the easy part, kind of.

There's a slight complexity in that we need to hold the BQL before we
can call qemu_set_irq(), and we definitely can't do that while holding
our own port_lock (because we'll need to take that from the qemu-side
functions that the PV backend drivers will call). So if we end up
wanting to set the IRQ in a context where we *don't* already hold the
BQL, defer to a BH.

However, we *do* need to poll for the evtchn_upcall_pending flag being
cleared. In an ideal world we would poll that when the EOI happens on
the PIC/IOAPIC. That's how it works in the kernel with the VFIO eventfd
pairs — one is used to trigger the interrupt, and the other works in the
other direction to 'resample' on EOI, and trigger the first eventfd
again if the line is still active.

However, QEMU doesn't seem to do that. Even VFIO level interrupts seem
to be supported by temporarily unmapping the device's BARs from the
guest when an interrupt happens, then trapping *all* MMIO to the device
and sending the 'resample' event on *every* MMIO access until the IRQ
is cleared! Maybe in future we'll plumb the 'resample' concept through
QEMU's irq framework but for now we'll do what Xen itself does: just
check the flag on every vmexit if the upcall GSI is known to be
asserted.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 09:06:44 +00:00
David Woodhouse a15b10978f hw/xen: Implement EVTCHNOP_reset
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:50 +00:00
David Woodhouse 306670461b hw/xen: Implement EVTCHNOP_bind_vcpu
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:50 +00:00
David Woodhouse 8432788104 hw/xen: Implement EVTCHNOP_bind_interdomain
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:50 +00:00
David Woodhouse e1db61b87b hw/xen: Implement EVTCHNOP_alloc_unbound
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:50 +00:00
David Woodhouse cf7679abdd hw/xen: Implement EVTCHNOP_send
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:50 +00:00
David Woodhouse f5417856d2 hw/xen: Implement EVTCHNOP_bind_ipi
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:50 +00:00
David Woodhouse c723d4c15e hw/xen: Implement EVTCHNOP_bind_virq
Add the array of virq ports to each vCPU so that we can deliver timers,
debug ports, etc. Global virqs are allocated against vCPU 0 initially,
but can be migrated to other vCPUs (when we implement that).

The kernel needs to know about VIRQ_TIMER in order to accelerate timers,
so tell it via KVM_XEN_VCPU_ATTR_TYPE_TIMER. Also save/restore the value
of the singleshot timer across migration, as the kernel will handle the
hypercalls automatically now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:50 +00:00
David Woodhouse 190cc3c0ed hw/xen: Implement EVTCHNOP_unmask
This finally comes with a mechanism for actually injecting events into
the guest vCPU, with all the atomic-test-and-set that's involved in
setting the bit in the shinfo, then the index in the vcpu_info, and
injecting either the lapic vector as MSI, or letting KVM inject the
bare vector.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:50 +00:00
David Woodhouse 83eb581134 hw/xen: Implement EVTCHNOP_close
It calls an internal close_port() helper which will also be used from
EVTCHNOP_reset and will actually do the work to disconnect/unbind a port
once any of that is actually implemented in the first place.

That in turn calls a free_port() internal function which will be in
error paths after allocation.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:50 +00:00
David Woodhouse 4858ba2065 hw/xen: Implement EVTCHNOP_status
This adds the basic structure for maintaining the port table and reporting
the status of ports therein.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:50 +00:00
David Woodhouse 27d4075dd8 i386/xen: Add support for Xen event channel delivery to vCPU
The kvm_xen_inject_vcpu_callback_vector() function will either deliver
the per-vCPU local APIC vector (as an MSI), or just kick the vCPU out
of the kernel to trigger KVM's automatic delivery of the global vector.
Support for asserting the GSI/PCI_INTX callbacks will come later.

Also add kvm_xen_get_vcpu_info_hva() which returns the vcpu_info of
a given vCPU.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:50 +00:00
David Woodhouse 91cce75617 hw/xen: Add xen_evtchn device for event channel emulation
Include basic support for setting HVM_PARAM_CALLBACK_IRQ to the global
vector method HVM_PARAM_CALLBACK_TYPE_VECTOR, which is handled in-kernel
by raising the vector whenever the vCPU's vcpu_info->evtchn_upcall_pending
flag is set.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:50 +00:00
Ankur Arora 5dbcd01a8d i386/xen: implement HVMOP_set_param
This is the hook for adding the HVM_PARAM_CALLBACK_IRQ parameter in a
subsequent commit.

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Split out from another commit]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:50 +00:00
Ankur Arora 105b47fdf2 i386/xen: implement HVMOP_set_evtchn_upcall_vector
The HVMOP_set_evtchn_upcall_vector hypercall sets the per-vCPU upcall
vector, to be delivered to the local APIC just like an MSI (with an EOI).

This takes precedence over the system-wide delivery method set by the
HVMOP_set_param hypercall with HVM_PARAM_CALLBACK_IRQ. It's used by
Windows and Xen (PV shim) guests but normally not by Linux.

Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Rework for upstream kernel changes and split from HVMOP_set_param]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:50 +00:00
Joao Martins 3b06f29b24 i386/xen: implement HYPERVISOR_event_channel_op
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Ditch event_channel_op_compat which was never available to HVM guests]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:50 +00:00
Joao Martins 5092db87e4 i386/xen: handle VCPUOP_register_runstate_memory_area
Allow guest to setup the vcpu runstates which is used as
steal clock.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:50 +00:00
Joao Martins f068930277 i386/xen: handle VCPUOP_register_vcpu_time_info
In order to support Linux vdso in Xen.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:49 +00:00
Joao Martins c345104cd1 i386/xen: handle VCPUOP_register_vcpu_info
Handle the hypercall to set a per vcpu info, and also wire up the default
vcpu_info in the shared_info page for the first 32 vCPUs.

To avoid deadlock within KVM a vCPU thread must set its *own* vcpu_info
rather than it being set from the context in which the hypercall is
invoked.

Add the vcpu_info (and default) GPA to the vmstate_x86_cpu for migration,
and restore it in kvm_arch_put_registers() appropriately.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:49 +00:00
Joao Martins d70bd6a485 i386/xen: implement HYPERVISOR_vcpu_op
This is simply when guest tries to register a vcpu_info
and since vcpu_info placement is optional in the minimum ABI
therefore we can just fail with -ENOSYS

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:49 +00:00
Joao Martins 671bfdcd47 i386/xen: implement HYPERVISOR_hvm_op
This is when guest queries for support for HVMOP_pagetable_dying.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:49 +00:00
David Woodhouse 782a79601e i386/xen: implement XENMEM_add_to_physmap_batch
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:49 +00:00
Joao Martins fb0fd2ce38 i386/xen: implement HYPERVISOR_memory_op
Specifically XENMEM_add_to_physmap with space XENMAPSPACE_shared_info to
allow the guest to set its shared_info page.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Use the xen_overlay device, add compat support]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:49 +00:00
David Woodhouse 110a0ea59f i386/xen: manage and save/restore Xen guest long_mode setting
Xen will "latch" the guest's 32-bit or 64-bit ("long mode") setting when
the guest writes the MSR to fill in the hypercall page, or when the guest
sets the event channel callback in HVM_PARAM_CALLBACK_IRQ.

KVM handles the former and sets the kernel's long_mode flag accordingly.
The latter will be handled in userspace. Keep them in sync by noticing
when a hypercall is made in a mode that doesn't match qemu's idea of
the guest mode, and resyncing from the kernel. Do that same sync right
before serialization too, in case the guest has set the hypercall page
but hasn't yet made a system call.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:49 +00:00
David Woodhouse c789b9ef5f i386/xen: Implement SCHEDOP_poll and SCHEDOP_yield
They both do the same thing and just call sched_yield. This is enough to
stop the Linux guest panicking when running on a host kernel which doesn't
intercept SCHEDOP_poll and lets it reach userspace.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:49 +00:00
Joao Martins 79b7067dc6 i386/xen: implement HYPERVISOR_sched_op, SCHEDOP_shutdown
It allows to shutdown itself via hypercall with any of the 3 reasons:
  1) self-reboot
  2) shutdown
  3) crash

Implementing SCHEDOP_shutdown sub op let us handle crashes gracefully rather
than leading to triple faults if it remains unimplemented.

In addition, the SHUTDOWN_soft_reset reason is used for kexec, to reset
Xen shared pages and other enlightenments and leave a clean slate for the
new kernel without the hypervisor helpfully writing information at
unexpected addresses.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Ditch sched_op_compat which was never available for HVM guests,
        Add SCHEDOP_soft_reset]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:49 +00:00
Joao Martins bedcc13924 i386/xen: implement HYPERVISOR_xen_version
This is just meant to serve as an example on how we can implement
hypercalls. xen_version specifically since Qemu does all kind of
feature controllability. So handling that here seems appropriate.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Implement kvm_gva_rw() safely]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:49 +00:00
Joao Martins 55a3f666b4 i386/xen: handle guest hypercalls
This means handling the new exit reason for Xen but still
crashing on purpose. As we implement each of the hypercalls
we will then return the right return code.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Add CPL to hypercall tracing, disallow hypercalls from CPL > 0]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:49 +00:00
David Woodhouse 5e691a955a i386/kvm: Set Xen vCPU ID in KVM
There are (at least) three different vCPU ID number spaces. One is the
internal KVM vCPU index, based purely on which vCPU was chronologically
created in the kernel first. If userspace threads are all spawned and
create their KVM vCPUs in essentially random order, then the KVM indices
are basically random too.

The second number space is the APIC ID space, which is consistent and
useful for referencing vCPUs. MSIs will specify the target vCPU using
the APIC ID, for example, and the KVM Xen APIs also take an APIC ID
from userspace whenever a vCPU needs to be specified (as opposed to
just using the appropriate vCPU fd).

The third number space is not normally relevant to the kernel, and is
the ACPI/MADT/Xen CPU number which corresponds to cs->cpu_index. But
Xen timer hypercalls use it, and Xen timer hypercalls *really* want
to be accelerated in the kernel rather than handled in userspace, so
the kernel needs to be told.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:49 +00:00
Joao Martins f66b8a83c5 i386/kvm: handle Xen HVM cpuid leaves
Introduce support for emulating CPUID for Xen HVM guests. It doesn't make
sense to advertise the KVM leaves to a Xen guest, so do Xen unconditionally
when the xen-version machine property is set.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
[dwmw2: Obtain xen_version from KVM property, make it automatic]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:49 +00:00
David Woodhouse 61491cf441 i386/kvm: Add xen-version KVM accelerator property and init KVM Xen support
This just initializes the basic Xen support in KVM for now. Only permitted
on TYPE_PC_MACHINE because that's where the sysbus devices for Xen heap
overlay, event channel, grant tables and other stuff will exist. There's
no point having the basic hypercall support if nothing else works.

Provide sysemu/kvm_xen.h and a kvm_xen_get_caps() which will be used
later by support devices.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01 08:22:49 +00:00
Paolo Bonzini 3023c9b4d1 target/i386: KVM: allow fast string operations if host supports them
These are just a flag that documents the performance characteristic of
an instruction; it needs no hypervisor support.  So include them even
if KVM does not show them.  In particular, FZRM/FSRS/FSRC have only
been added very recently, but they are available on Sapphire Rapids
processors.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-27 18:44:57 +01:00
Markus Armbruster e2c1c34f13 include/block: Untangle inclusion loops
We have two inclusion loops:

       block/block.h
    -> block/block-global-state.h
    -> block/block-common.h
    -> block/blockjob.h
    -> block/block.h

       block/block.h
    -> block/block-io.h
    -> block/block-common.h
    -> block/blockjob.h
    -> block/block.h

I believe these go back to Emanuele's reorganization of the block API,
merged a few months ago in commit d7e2fe4aac.

Fortunately, breaking them is merely a matter of deleting unnecessary
includes from headers, and adding them back in places where they are
now missing.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221221133551.3967339-2-armbru@redhat.com>
2023-01-20 07:24:28 +01:00
Markus Armbruster d1c81c3496 qapi: Use returned bool to check for failure (again)
Commit 012d4c96e2 changed the visitor functions taking Error ** to
return bool instead of void, and the commits following it used the new
return value to simplify error checking.  Since then a few more uses
in need of the same treatment crept in.  Do that.  All pretty
mechanical except for

* balloon_stats_get_all()

  This is basically the same transformation commit 012d4c96e2 applied
  to the virtual walk example in include/qapi/visitor.h.

* set_max_queue_size()

  Additionally replace "goto end of function" by return.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221121085054.683122-10-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2022-12-14 16:19:35 +01:00
Zeng Guang 19e2a9fb9d target/i386: Set maximum APIC ID to KVM prior to vCPU creation
Specify maximum possible APIC ID assigned for current VM session to KVM
prior to the creation of vCPUs. By this setting, KVM can set up VM-scoped
data structure indexed by the APIC ID, e.g. Posted-Interrupt Descriptor
pointer table to support Intel IPI virtualization, with the most optimal
memory footprint.

It can be achieved by calling KVM_ENABLE_CAP for KVM_CAP_MAX_VCPU_ID
capability once KVM has enabled it. Ignoring the return error if KVM
doesn't support this capability yet.

Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220825025246.26618-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-31 09:46:34 +01:00
Markus Armbruster 0a553c12c7 Drop useless casts from g_malloc() & friends to pointer
These memory allocation functions return void *, and casting to
another pointer type is useless clutter.  Drop these casts.

If you really want another pointer type, consider g_new().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220923120025.448759-3-armbru@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-10-22 23:15:40 +02:00
Maciej S. Szmigiero ec19444a53 hyperv: fix SynIC SINT assertion failure on guest reset
Resetting a guest that has Hyper-V VMBus support enabled triggers a QEMU
assertion failure:
hw/hyperv/hyperv.c:131: synic_reset: Assertion `QLIST_EMPTY(&synic->sint_routes)' failed.

This happens both on normal guest reboot or when using "system_reset" HMP
command.

The failing assertion was introduced by commit 64ddecc88b ("hyperv: SControl is optional to enable SynIc")
to catch dangling SINT routes on SynIC reset.

The root cause of this problem is that the SynIC itself is reset before
devices using SINT routes have chance to clean up these routes.

Since there seems to be no existing mechanism to force reset callbacks (or
methods) to be executed in specific order let's use a similar method that
is already used to reset another interrupt controller (APIC) after devices
have been reset - by invoking the SynIC reset from the machine reset
handler via a new x86_cpu_after_reset() function co-located with
the existing x86_cpu_reset() in target/i386/cpu.c.
Opportunistically move the APIC reset handler there, too.

Fixes: 64ddecc88b ("hyperv: SControl is optional to enable SynIc") # exposed the bug
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <cb57cee2e29b20d06f81dce054cbcea8b5d497e8.1664552976.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Alexander Graf 37656470f6 KVM: x86: Implement MSR_CORE_THREAD_COUNT MSR
The MSR_CORE_THREAD_COUNT MSR describes CPU package topology, such as number
of threads and cores for a given package. This is information that QEMU has
readily available and can provide through the new user space MSR deflection
interface.

This patch propagates the existing hvf logic from patch 027ac0cb51
("target/i386/hvf: add rdmsr 35H MSR_CORE_THREAD_COUNT") to KVM.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-Id: <20221004225643.65036-4-agraf@csgraf.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-11 09:36:01 +02:00
Alexander Graf 860054d8ce i386: kvm: Add support for MSR filtering
KVM has grown support to deflect arbitrary MSRs to user space since
Linux 5.10. For now we don't expect to make a lot of use of this
feature, so let's expose it the easiest way possible: With up to 16
individually maskable MSRs.

This patch adds a kvm_filter_msr() function that other code can call
to install a hook on KVM MSR reads or writes.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Message-Id: <20221004225643.65036-3-agraf@csgraf.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-11 09:36:01 +02:00
Chenyi Qiang e2e69f6bb9 i386: add notify VM exit support
There are cases that malicious virtual machine can cause CPU stuck (due
to event windows don't open up), e.g., infinite loop in microcode when
nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
IRQ) can be delivered. It leads the CPU to be unavailable to host or
other VMs. Notify VM exit is introduced to mitigate such kind of
attacks, which will generate a VM exit if no event window occurs in VM
non-root mode for a specified amount of time (notify window).

A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space
so that the user can query the capability and set the expected notify
window when creating VMs. The format of the argument when enabling this
capability is as follows:
  Bit 63:32 - notify window specified in qemu command
  Bit 31:0  - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to
              enable the feature.)

Users can configure the feature by a new (x86 only) accel property:
    qemu -accel kvm,notify-vmexit=run|internal-error|disable,notify-window=n

The default option of notify-vmexit is run, which will enable the
capability and do nothing if the exit happens. The internal-error option
raises a KVM internal error if it happens. The disable option does not
enable the capability. The default value of notify-window is 0. It is valid
only when notify-vmexit is not disabled. The valid range of notify-window
is non-negative. It is even safe to set it to zero since there's an
internal hardware threshold to be added to ensure no false positive.

Because a notify VM exit may happen with VM_CONTEXT_INVALID set in exit
qualification (no cases are anticipated that would set this bit), which
means VM context is corrupted. It would be reflected in the flags of
KVM_EXIT_NOTIFY exit. If KVM_NOTIFY_CONTEXT_INVALID bit is set, raise a KVM
internal error unconditionally.

Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220929072014.20705-5-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-11 09:36:00 +02:00
Paolo Bonzini 3dba0a335c kvm: allow target-specific accelerator properties
Several hypervisor capabilities in KVM are target-specific.  When exposed
to QEMU users as accelerator properties (i.e. -accel kvm,prop=value), they
should not be available for all targets.

Add a hook for targets to add their own properties to -accel kvm, for
now no such property is defined.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220929072014.20705-3-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-10 09:23:16 +02:00
Chenyi Qiang 12f89a39cf i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple fault
For the direct triple faults, i.e. hardware detected and KVM morphed
to VM-Exit, KVM will never lose them. But for triple faults sythesized
by KVM, e.g. the RSM path, if KVM exits to userspace before the request
is serviced, userspace could migrate the VM and lose the triple fault.

A new flag KVM_VCPUEVENT_VALID_TRIPLE_FAULT is defined to signal that
the event.triple_fault_pending field contains a valid state if the
KVM_CAP_X86_TRIPLE_FAULT_EVENT capability is enabled.

Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220929072014.20705-2-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-10 09:23:16 +02:00
Stefan Hajnoczi fafd35a6da Pull request trivial patches branch 20220930-v2
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAmM7XoISHGxhdXJlbnRA
 dml2aWVyLmV1AAoJEPMMOL0/L748D/0QAKbYtTWjhFPeapjZVoTv13YrTvczWrcF
 omL6IZivVq0t7hun4iem0DwmvXJELMGexEOTvEJOzM19IIlvvwvOsI8xnxpcMnEY
 6GKVbs53Ba0bg2yh7Dll2W9jkou9eX27DwUHMVF8KX7qqsbU+WyD/vdGZitgGt+T
 8yna7kzVvNVsdB3+DbIatI5RzzHeu4OqeuH/WCtAyzCaLB64UYTcHprskxIp4+wp
 dR+EUSoDEr9Qx4PC+uVEsTFK1zZjyAYNoNIkh6fhlkRvDJ1uA75m3EJ57P8xPPqe
 VbVkPMKi0d4c52m6XvLsQhyYryLx/qLLUAkJWVpY66aHcapYbZAEAfZmNGTQLrOJ
 qIOJzIkOdU6l3pRgXVdVCgkHRc2HETwET2LyVbNkUz/vBlW2wOZQbZFbezComael
 bQ/gNBYqP+eOGnZzeWbKBGHr/9QDBClNufidIMC+sOiUw0iSifzjkFwvH7IElx6K
 EQCOSV6pOhKVlinTpmBbk1XD3xDkQ7ZidiLT9g+P1c8dExrXBhWOnfUHueISb8+s
 KKMozuxQ/6/3c/DP5hwI9cKPEWEbqJfq1kMuxIvEivKGwUIqX2yq4VJ+hSlYJ+CW
 nGjXZldtf4KwH+cTsxyPmdZRR5Q7+ODr5Xo7GNvEKBuDsHs7uUl1c3vvOykQgje9
 +dyJR6TfbQWn
 =aK29
 -----END PGP SIGNATURE-----

Merge tag 'trivial-branch-for-7.2-pull-request' of https://gitlab.com/laurent_vivier/qemu into staging

Pull request trivial patches branch 20220930-v2

# -----BEGIN PGP SIGNATURE-----
#
# iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAmM7XoISHGxhdXJlbnRA
# dml2aWVyLmV1AAoJEPMMOL0/L748D/0QAKbYtTWjhFPeapjZVoTv13YrTvczWrcF
# omL6IZivVq0t7hun4iem0DwmvXJELMGexEOTvEJOzM19IIlvvwvOsI8xnxpcMnEY
# 6GKVbs53Ba0bg2yh7Dll2W9jkou9eX27DwUHMVF8KX7qqsbU+WyD/vdGZitgGt+T
# 8yna7kzVvNVsdB3+DbIatI5RzzHeu4OqeuH/WCtAyzCaLB64UYTcHprskxIp4+wp
# dR+EUSoDEr9Qx4PC+uVEsTFK1zZjyAYNoNIkh6fhlkRvDJ1uA75m3EJ57P8xPPqe
# VbVkPMKi0d4c52m6XvLsQhyYryLx/qLLUAkJWVpY66aHcapYbZAEAfZmNGTQLrOJ
# qIOJzIkOdU6l3pRgXVdVCgkHRc2HETwET2LyVbNkUz/vBlW2wOZQbZFbezComael
# bQ/gNBYqP+eOGnZzeWbKBGHr/9QDBClNufidIMC+sOiUw0iSifzjkFwvH7IElx6K
# EQCOSV6pOhKVlinTpmBbk1XD3xDkQ7ZidiLT9g+P1c8dExrXBhWOnfUHueISb8+s
# KKMozuxQ/6/3c/DP5hwI9cKPEWEbqJfq1kMuxIvEivKGwUIqX2yq4VJ+hSlYJ+CW
# nGjXZldtf4KwH+cTsxyPmdZRR5Q7+ODr5Xo7GNvEKBuDsHs7uUl1c3vvOykQgje9
# +dyJR6TfbQWn
# =aK29
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 03 Oct 2022 18:13:22 EDT
# gpg:                using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg:                issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* tag 'trivial-branch-for-7.2-pull-request' of https://gitlab.com/laurent_vivier/qemu:
  docs: Update TPM documentation for usage of a TPM 2
  Use g_new() & friends where that makes obvious sense
  Drop superfluous conditionals around g_free()
  block/qcow2-bitmap: Add missing cast to silent GCC error
  checkpatch: ignore target/hexagon/imported/* files
  mem/cxl_type3: fix GPF DVSEC
  .gitignore: add .cache/ to .gitignore
  hw/virtio/vhost-shadow-virtqueue: Silence GCC error "maybe-uninitialized"

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-10-04 14:04:18 -04:00
Markus Armbruster 76eb88b12b Drop superfluous conditionals around g_free()
There is no need to guard g_free(P) with if (P): g_free(NULL) is safe.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220923090428.93529-1-armbru@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-10-04 00:10:11 +02:00
Ray Zhang c4ef867f29 target/i386/kvm: fix kvmclock_current_nsec: Assertion `time.tsc_timestamp <= migration_tsc' failed
New KVM_CLOCK flags were added in the kernel.(c68dc1b577eabd5605c6c7c08f3e07ae18d30d5d)
```
+ #define KVM_CLOCK_VALID_FLAGS						\
+	(KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)

	case KVM_CAP_ADJUST_CLOCK:
-		r = KVM_CLOCK_TSC_STABLE;
+		r = KVM_CLOCK_VALID_FLAGS;
```

kvm_has_adjust_clock_stable needs to handle additional flags,
so that s->clock_is_reliable can be true and kvmclock_current_nsec doesn't need to be called.

Signed-off-by: Ray Zhang <zhanglei002@gmail.com>
Message-Id: <20220922100523.2362205-1-zhanglei002@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-01 21:16:36 +02:00