Stack accesses should be explicit and use the privilege level of the
target stack. This ensures that SMAP is not applied when the target
stack is in ring 3.
This fixes a bug wherein i386/tcg assumed that an interrupt return, or a
far call using the CALL or JMP instruction, was always going from kernel
or user mode to kernel mode when using a call gate. This assumption is
violated if the call gate has a DPL that is greater than 0.
Analyzed-by: Robert R. Henry <rrh.henry@gmail.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/249
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now all decoding has been done before any code generation.
There is no need anymore to save and restore cc_op* and
pc_save but, for the time being, assert that this is indeed
the case.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Group them so that it is easier to figure out which two-byte opcodes to
tackle together.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It is already checked before getting there.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The gen_cmpxchg8b and gen_cmpxchg16b functions even have the correct
prototype already; the only thing that needs to be done is removing the
gen_lea_modrm() call.
This moves the last LOCK-enabled instructions to the new decoder. It is
now possible to assume that gen_multi0F is called only after checking
that PREFIX_LOCK was not specified.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There are now relatively few unconverted opcodes in translate.c (there
are 13 of them including 8 for x87), and all of them have the same
format with a mod/rm byte and no immediate. A good next step is
to remove the early bail out to disas_insn_x87/disas_insn_old,
instead giving these legacy translator functions the same prototype
as the other gen_* functions.
To do this, the X86DecodeInsn can be passed down to the places that
used to fetch address bytes from the instruction stream. To make
sure that everything is done cleanly, the CPUX86State* argument is
removed.
As part of the unification, the gen_lea_modrm() name is now free,
so rename gen_load_ea() to gen_lea_modrm(). This is as good a name
and it makes the changes to translate.c easier to review.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Code generation was rewritten; it reuses the same trick to use the
CC_OP_SAR values for cc_op, but it tries to use CC_OP_ADCX or CC_OP_ADCOX
instead of CC_OP_EFLAGS. This is a tiny bit more efficient in the
common case where only CF is checked in the resulting flags.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
'hyperv_synic' test from KVM unittests was observed to be flaky on certain
hardware (hangs sometimes). Debugging shows that the problem happens in
hyperv_sint_route_new() when the test tries to set up a new SynIC
route. The function bails out on:
if (!synic->sctl_enabled) {
goto cleanup;
}
but the test writes to HV_X64_MSR_SCONTROL just before it starts
establishing SINT routes. Further investigation shows that
synic_update() (called from async_synic_update()) happens after the SINT
setup attempt and not before. Apparently, the comment before
async_safe_run_on_cpu() in kvm_hv_handle_exit() does not correctly describe
the guarantees async_safe_run_on_cpu() gives. In particular, async worked
added to a CPU is actually processed from qemu_wait_io_event() which is not
always called before KVM_RUN, i.e. kvm_cpu_exec() checks whether an exit
request is pending for a CPU and if not, keeps running the vCPU until it
meets an exit it can't handle internally. Hyper-V specific MSR writes are
not automatically trigger an exit.
Fix the issue by simply raising an exit request for the vCPU where SynIC
update was queued. This is not a performance critical path as SynIC state
does not get updated so often (and async_safe_run_on_cpu() is a big hammer
anyways).
Reported-by: Jan Richter <jarichte@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240917160051.2637594-4-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Windows with Hyper-V role enabled doesn't boot with 'hv-passthrough' when
no debugger is configured, this significantly limits the usefulness of the
feature as there's no support for subtracting Hyper-V features from CPU
flags at this moment (e.g. "-cpu host,hv-passthrough,-hv-syndbg" does not
work). While this is also theoretically fixable, 'hv-syndbg' is likely
very special and unneeded in the default set. Genuine Hyper-V doesn't seem
to enable it either.
Introduce 'skip_passthrough' flag to 'kvm_hyperv_properties' and use it as
one-off to skip 'hv-syndbg' when enabling features in 'hv-passthrough'
mode. Note, "-cpu host,hv-passthrough,hv-syndbg" can still be used if
needed.
As both 'hv-passthrough' and 'hv-syndbg' are debug features, the change
should not have any effect on production environments.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240917160051.2637594-3-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Putting HYPERV_FEAT_SYNDBG entry under "#ifdef CONFIG_SYNDBG" in
'kvm_hyperv_properties' array is wrong: as HYPERV_FEAT_SYNDBG is not
the highest feature number, the result is an empty (zeroed) entry in
the array (and not a skipped entry!). hyperv_feature_supported() is
designed to check that all CPUID bits are set but for a zeroed
feature in 'kvm_hyperv_properties' it returns 'true' so QEMU considers
HYPERV_FEAT_SYNDBG as always supported, regardless of whether KVM host
actually supports it.
To fix the issue, leave HYPERV_FEAT_SYNDBG's definition in
'kvm_hyperv_properties' array, there's nothing wrong in having it defined
even when 'CONFIG_SYNDBG' is not set. Instead, put "hv-syndbg" CPU property
under '#ifdef CONFIG_SYNDBG' to alter the existing behavior when the flag
is silently skipped in !CONFIG_SYNDBG builds.
Leave an 'assert' sentinel in hyperv_feature_supported() making sure there
are no 'holes' or improperly defined features in 'kvm_hyperv_properties'.
Fixes: d8701185f4 ("hw: hyperv: Initial commit for Synthetic Debugging device")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20240917160051.2637594-2-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM commit 191c8137a939 ("x86/kvm: Implement HWCR support")
introduced support for emulating HWCR MSR.
Add support for QEMU to save/load this MSR for migration purposes.
Signed-off-by: Gao Shiyuan <gaoshiyuan@baidu.com>
Signed-off-by: Wang Liang <wangliang44@baidu.com>
Link: https://lore.kernel.org/r/20241009095109.66843-1-gaoshiyuan@baidu.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Following 5 bits in CPUID.7.2.EDX are supported by KVM. Add their
supports in QEMU. Each of them indicates certain bits of IA32_SPEC_CTRL
are supported. Those bits can control CPU speculation behavior which can
be used to defend against side-channel attacks.
bit0: intel-psfd
if 1, indicates bit 7 of the IA32_SPEC_CTRL MSR is supported. Bit 7 of
this MSR disables Fast Store Forwarding Predictor without disabling
Speculative Store Bypass
bit1: ipred-ctrl
If 1, indicates bits 3 and 4 of the IA32_SPEC_CTRL MSR are supported.
Bit 3 of this MSR enables IPRED_DIS control for CPL3. Bit 4 of this
MSR enables IPRED_DIS control for CPL0/1/2
bit2: rrsba-ctrl
If 1, indicates bits 5 and 6 of the IA32_SPEC_CTRL MSR are supported.
Bit 5 of this MSR disables RRSBA behavior for CPL3. Bit 6 of this MSR
disables RRSBA behavior for CPL0/1/2
bit3: ddpd-u
If 1, indicates bit 8 of the IA32_SPEC_CTRL MSR is supported. Bit 8 of
this MSR disables Data Dependent Prefetcher.
bit4: bhi-ctrl
if 1, indicates bit 10 of the IA32_SPEC_CTRL MSR is supported. Bit 10
of this MSR enables BHI_DIS_S behavior.
Signed-off-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20240919051011.118309-1-chao.gao@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When user sets tsc-frequency explicitly, the invtsc feature is actually
migratable because the tsc-frequency is supposed to be fixed during the
migration.
See commit d99569d9d8 ("kvm: Allow invtsc migration if tsc-khz
is set explicitly") for referrence.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20240814075431.339209-10-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- CPUID.(EAX=07H,ECX=0H):EBX[bit 6]: x87 FPU Data Pointer updated only
on x87 exceptions if 1.
- CPUID.(EAX=07H,ECX=0H):EBX[bit 13]: Deprecates FPU CS and FPU DS
values if 1. i.e., X87 FCS and FDS are always zero.
Define names for them so that they can be exposed to guest with -cpu host.
Also define the bit field MACROs so that named cpu models can add it as
well in the future.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20240814075431.339209-3-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, QEMU always constructs a all-zero CPUID entry for
CPUID[0xD 0x3f].
It's meaningless to construct such a leaf as the end of leaf 0xD. Rework
the logic of how subleaves of 0xD are constructed to get rid of such
all-zero value of subleaf 0x3f.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20240814075431.339209-2-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Copy XML files describing orig_ax from GDB and glue them with
CPUX86State.orig_ax.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-ID: <20240912093012.402366-5-iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
i386 gdbstub handles both i386 and x86_64. Factor out two functions
for reading and writing registers without knowing their bitness.
While at it, simplify the TARGET_LONG_BITS == 32 case.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-ID: <20240912093012.402366-4-iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This is necessary to provide discernible error messages to the caller.
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240927104743.218468-2-jusual@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to AMD's Speculative Return Stack Overflow whitepaper (link
below), the hypervisor should synthesize the value of IBPB_BRTYPE and
SBPB CPUID bits to the guest.
Support for this is already present in the kernel with commit
e47d86083c66 ("KVM: x86: Add SBPB support") and commit 6f0f23ef76be
("KVM: x86: Add IBPB_BRTYPE support").
Add support in QEMU to expose the bits to the guest OS.
host:
# cat /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow
Mitigation: Safe RET
before (guest):
$ cpuid -l 0x80000021 -1 -r
0x80000021 0x00: eax=0x00000045 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
^
$ cat /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow
Vulnerable: Safe RET, no microcode
after (guest):
$ cpuid -l 0x80000021 -1 -r
0x80000021 0x00: eax=0x18000045 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
^
$ cat /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow
Mitigation: Safe RET
Reported-by: Fabian Vogt <fvogt@suse.de>
Link: https://www.amd.com/content/dam/amd/en/documents/corporate/cr/speculative-return-stack-overflow-whitepaper.pdf
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240805202041.5936-1-farosas@suse.de
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
identity_base variable is first initialzied to address 0xfffbc000 and then
kvm_vm_set_identity_map_addr() overrides this value to address 0xfeffc000.
The initial address to which the variable was initialized was never used. Clean
everything up, placing 0xfeffc000 in a preprocessor constant.
Reported-by: Ani Sinha <anisinha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_arch_init() enables a lot of vm capabilities. Refactor them into separate
smaller functions. Energy MSR related operations also moved to its own
function. There should be no functional impact.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20240903124143.39345-2-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
is_host_cpu_intel() should return TRUE if the host cpu in Intel based, otherwise
it should return FALSE. Currently, it returns zero (FALSE) when the host CPU
is INTEL and non-zero otherwise. Fix the function so that it agrees more with
the semantics. Adjust the calling logic accordingly. RAPL needs Intel host cpus.
If the host CPU is not Intel baseed, we should report error.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20240903080004.33746-1-anisinha@redhat.com
[While touching the code remove too many spaces from the second part of the
error. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_filer_msr() is only used from i386 kvm module. Make it static so that its
easy for developers to understand that its not used anywhere else.
Same for QEMURDMSRHandler, QEMUWRMSRHandler and KVMMSRHandlers definitions.
CC: philmd@linaro.org
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/20240903140045.41167-1-anisinha@redhat.com
[Make struct unnamed. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Because the index value of the VMCS field encoding of FRED injected-event
data (one of the newly added VMCS fields for FRED transitions), 0x52, is
larger than any existing index value, raise the highest index value used
for any VMCS encoding to 0x52.
Because the index value of the VMCS field encoding of Secondary VM-exit
controls, 0x44, is larger than any existing index value, raise the highest
index value used for any VMCS encoding to 0x44.
Co-developed-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Lei Wang <lei4.wang@intel.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Link: https://lore.kernel.org/r/20240807081813.735158-4-xin@zytor.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add definitions of
1) VM-exit activate secondary controls bit
2) VM-entry load FRED bit
which are required to enable nested FRED.
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Link: https://lore.kernel.org/r/20240807081813.735158-3-xin@zytor.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch is part of a series that moves towards a consistent use of
g_assert_not_reached() rather than an ad hoc mix of different
assertion mechanisms.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20240919044641.386068-15-pierrick.bouvier@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The 'LGPL-2.0+' license identifier has been deprecated since license
list version 2.0rc2 [1] and replaced by the 'LGPL-2.0-or-later' [2]
tag.
[1] https://spdx.org/licenses/LGPL-2.0+.html
[2] https://spdx.org/licenses/LGPL-2.0-or-later.html
Mechanical patch running:
$ sed -i -e s/LGPL-2.0+/LGPL-2.0-or-later/ \
$(git grep -l 'SPDX-License-Identifier: LGPL-2.0+$')
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This is preliminary work to split up hv_vm_create
logic per platform so we can support creating VMs
with > 64GB of RAM on Apple Silicon machines. This
is done via ARM HVF's hv_vm_config_create() (and
other APIs that modify this config that will be
coming in future patches). This should have no
behavioral difference at all as hv_vm_config_create()
just assigns the same default values as if you just
passed NULL to the function.
Signed-off-by: Danny Canter <danny_canter@apple.com>
Message-id: 20240828111552.93482-3-danny_canter@apple.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Change the data type of the ioctl _request_ argument from 'int' to
'unsigned long' for the various accel/kvm functions which are
essentially wrappers around the ioctl() syscall.
The correct type for ioctl()'s 'request' argument is confused:
* POSIX defines the request argument as 'int'
* glibc uses 'unsigned long' in the prototype in sys/ioctl.h
* the glibc info documentation uses 'int'
* the Linux manpage uses 'unsigned long'
* the Linux implementation of the syscall uses 'unsigned int'
If we wrap ioctl() with another function which uses 'int' as the
type for the request argument, then requests with the 0x8000_0000
bit set will be sign-extended when the 'int' is cast to
'unsigned long' for the call to ioctl().
On x86_64 one such example is the KVM_IRQ_LINE_STATUS request.
Bit requests with the _IOC_READ direction bit set, will have the high
bit set.
Fortunately the Linux Kernel truncates the upper 32bit of the request
on 64bit machines (because it uses 'unsigned int', and see also Linus
Torvalds' comments in
https://sourceware.org/bugzilla/show_bug.cgi?id=14362 )
so this doesn't cause active problems for us. However it is more
consistent to follow the glibc ioctl() prototype when we define
functions that are essentially wrappers around ioctl().
This resolves a Coverity issue where it points out that in
kvm_get_xsave() we assign a value (KVM_GET_XSAVE or KVM_GET_XSAVE2)
to an 'int' variable which can't hold it without overflow.
Resolves: Coverity CID 1547759
Signed-off-by: Johannes Stoelp <johannes.stoelp@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20240815122747.3053871-1-peter.maydell@linaro.org
[PMM: Rebased patch, adjusted commit message, included note about
Coverity fix, updated the type of the local var in kvm_get_xsave,
updated the comment in the KVMState struct definition]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
QAPI's 'prefix' feature can make the connection between enumeration
type and its constants less than obvious. It's best used with
restraint.
QCryptoHashAlgorithm has a 'prefix' that overrides the generated
enumeration constants' prefix to QCRYPTO_HASH_ALG.
We could simply drop 'prefix', but then the prefix becomes
QCRYPTO_HASH_ALGORITHM, which is rather long.
We could additionally rename the type to QCryptoHashAlg, but I think
the abbreviation "alg" is less than clear.
Rename the type to QCryptoHashAlgo instead. The prefix becomes to
QCRYPTO_HASH_ALGO.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20240904111836.3273842-12-armbru@redhat.com>
[Conflicts with merge commit 7bbadc60b5 resolved]
The two limit_max variables represent size - 1, just like the
encoding in the GDT, thus the 'old' access was off by one.
Access the minimal size of the new tss: the complete tss contains
the iopb, which may be a larger block than the access api expects,
and irrelevant because the iopb is not accessed during the
switch itself.
Fixes: 8b13106508 ("target/i386/tcg: use X86Access for TSS access")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2511
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240819074052.207783-1-richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
BLSI has inverted semantics for C as compared to the other two
BMI1 instructions, BLSMSK and BLSR. Introduce CC_OP_BLSI* for
this purpose.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2175
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240801075845.573075-3-richard.henderson@linaro.org>
Split out the TCG_COND_TSTEQ logic from gen_prepare_eflags_z,
and use it for CC_OP_BMILG* as well. Prepare for requiring
both zero and non-zero senses.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240801075845.573075-2-richard.henderson@linaro.org>
When we are using TCG plugin memory callbacks probe_access_internal
will return TLB_MMIO to force the slow path for memory access. This
results in probe_access returning NULL but the x86 access_ptr function
happily accepts an empty haddr resulting in segfault hilarity.
Check for an empty haddr to prevent the segfault and enable plugins to
track all the memory operations for the x86 save/restore helpers. As
we also want to run the slow path when instrumenting *-user we should
also not have the short cutting test_ptr macro.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2489
Fixes: 6d03226b42 (plugins: force slow path when plugins instrument memory ops)
Reviewed-by: Alexandre Iooss <erdnaxe@crans.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240813202329.1237572-8-alex.bennee@linaro.org>
Snapshot of the stat utime and stime for each thread, taken before and
after the pause, must be stored in separate locations
Signed-off-by: Anthony Harivel <aharivel@redhat.com>
Link: https://lore.kernel.org/r/20240807124320.1741124-2-aharivel@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rather that enumerating the types that can produce
MMX operands, examine the unit. No functional change.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20240812025844.58956-3-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The vcek-disabled property of the sev-snp-guest object is misspelled
vcek-required (which I suppose would use the opposite polarity) in
the call to object_class_property_add_bool(). Fix it.
Reported-by: Zixi Chen <zixchen@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix leaking memory of file handle in case of error
Erase unused "pid = -1"
Add clearer error_report
Should fix Coverity CID 1558557.
Signed-off-by: Anthony Harivel <aharivel@redhat.com>
Link: https://lore.kernel.org/r/20240726102632.1324432-3-aharivel@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As SDM stated, CPUID 0x12 leaves depend on CPUID_7_0_EBX_SGX (SGX
feature word).
Since FEAT_SGX_12_0_EAX, FEAT_SGX_12_0_EBX and FEAT_SGX_12_1_EAX define
multiple feature words, add the dependencies of those registers to
report the warning to user if SGX is absent.
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20240730045544.2516284-4-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
At present, cpu_x86_cpuid() silently masks off SGX_LC if SGX is absent.
This is not proper because the user is not told about the dependency
between the two.
So explicitly define the dependency between SGX_LC and SGX feature
words, so that user could get a warning when SGX_LC is enabled but
SGX is absent.
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20240730045544.2516284-3-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CPUID.0x7.0.ebx and CPUID.0x7.0.ecx leaves have been expressed as the
feature word lists, and the Host capability support has been checked
in x86_cpu_filter_features().
Therefore, such checks on SGX feature "words" are redundant, and
the follow-up adjustments to those feature "words" will not actually
take effect.
Remove unnecessary SGX feature words related checks.
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20240730045544.2516284-2-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The feature word 'r' is a u64, and "unavail" is a u32, the operation
'r &= ~unavail' clears the high 32 bits of 'r'. This causes many vmx cases
in kvm-unit-tests to fail. Changing 'unavail' from u32 to u64 fixes this
issue.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2442
Fixes: 0b2757412c ("target/i386: drop AMD machine check bits from Intel CPUID")
Signed-off-by: Xiong Zhang <xiong.y.zhang@linux.intel.com>
Link: https://lore.kernel.org/r/20240730082927.250180-1-xiong.y.zhang@linux.intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Coverity points out that in do_interrupt64() in the "to inner
privilege" codepath we set "ss = 0", but because we also set
"new_stack = 1" there, later in the function we will always override
that value of ss with "ss = 0 | dpl".
Remove the unnecessary initialization of ss, which allows us to
reduce the scope of the variable to only where it is used. Borrow a
comment from helper_lcall_protected() that explains what "0 | dpl"
means here.
Resolves: Coverity CID 1527395
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20240723162525.1585743-1-peter.maydell@linaro.org
Starting with the "Sandy Bridge" generation, Intel CPUs provide a RAPL
interface (Running Average Power Limit) for advertising the accumulated
energy consumption of various power domains (e.g. CPU packages, DRAM,
etc.).
The consumption is reported via MSRs (model specific registers) like
MSR_PKG_ENERGY_STATUS for the CPU package power domain. These MSRs are
64 bits registers that represent the accumulated energy consumption in
micro Joules. They are updated by microcode every ~1ms.
For now, KVM always returns 0 when the guest requests the value of
these MSRs. Use the KVM MSR filtering mechanism to allow QEMU handle
these MSRs dynamically in userspace.
To limit the amount of system calls for every MSR call, create a new
thread in QEMU that updates the "virtual" MSR values asynchronously.
Each vCPU has its own vMSR to reflect the independence of vCPUs. The
thread updates the vMSR values with the ratio of energy consumed of
the whole physical CPU package the vCPU thread runs on and the
thread's utime and stime values.
All other non-vCPU threads are also taken into account. Their energy
consumption is evenly distributed among all vCPUs threads running on
the same physical CPU package.
To overcome the problem that reading the RAPL MSR requires priviliged
access, a socket communication between QEMU and the qemu-vmsr-helper is
mandatory. You can specified the socket path in the parameter.
This feature is activated with -accel kvm,rapl=true,path=/path/sock.sock
Actual limitation:
- Works only on Intel host CPU because AMD CPUs are using different MSR
adresses.
- Only the Package Power-Plane (MSR_PKG_ENERGY_STATUS) is reported at
the moment.
Signed-off-by: Anthony Harivel <aharivel@redhat.com>
Link: https://lore.kernel.org/r/20240522153453.1230389-4-aharivel@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is how the steps are ordered in the manual. EFLAGS.NT is
overwritten after the fact in the saved image.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This takes care of probing the vaddr range in advance, and is also faster
because it avoids repeated TLB lookups. It also matches the Intel manual
better, as it says "Checks that the current (old) TSS, new TSS, and all
segment descriptors used in the task switch are paged into system memory";
note however that it's not clear how the processor checks for segment
descriptors, and this check is not included in the AMD manual.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This step is listed in the Intel manual: "Checks that the new task is available
(call, jump, exception, or interrupt) or busy (IRET return)".
The AMD manual lists the same operation under the "Preventing recursion"
paragraph of "12.3.4 Nesting Tasks", though it is not clear if the processor
checks the busy bit in the IRET case.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add the MMU index to the StackAccess struct, so that it can be cached
or (in the next patch) computed from information that is not in
CPUX86State.
Co-developed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Interrupts and call gates should use accesses with the DPL as
the privilege level. While computing the applicable MMU index
is easy, the harder thing is how to plumb it in the code.
One possibility could be to add a single argument to the PUSH* macros
for the privilege level, but this is repetitive and risks confusion
between the involved privilege levels.
Another possibility is to pass both CPL and DPL, and adjusting both
PUSH* and POP* to use specific privilege levels (instead of using
cpu_{ld,st}*_data). This makes the code more symmetric.
However, a more complicated but much nicer approach is to use a structure
to contain the stack parameters, env, unwind return address, and rewrite
the macros into functions. The struct provides an easy home for the MMU
index as well.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20240617161210.4639-4-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Do not pre-decrement esp, let the macros subtract the appropriate
operand size.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This fixes a bug wherein i386/tcg assumed an interrupt return using
the IRET instruction was always returning from kernel mode to either
kernel mode or user mode. This assumption is violated when IRET is used
as a clever way to restore thread state, as for example in the dotnet
runtime. There, IRET returns from user mode to user mode.
This bug is that stack accesses from IRET and RETF, as well as accesses
to the parameters in a call gate, are normal data accesses using the
current CPL. This manifested itself as a page fault in the guest Linux
kernel due to SMAP preventing the access.
This bug appears to have been in QEMU since the beginning.
Analyzed-by: Robert R. Henry <rrh.henry@gmail.com>
Co-developed-by: Robert R. Henry <rrh.henry@gmail.com>
Signed-off-by: Robert R. Henry <rrh.henry@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This truncation is now handled by MMU_*32_IDX. The introduction of
MMU_*32_IDX in fact applied correct 32-bit wraparound to 16-bit accesses
with a high segment base (e.g. big real mode or vm86 mode), which did
not use SEG_ADDL.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20240617161210.4639-3-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In long mode, POP to memory will write a full 64-bit value. However,
the call to gen_writeback() in gen_POP will use MO_32 because the
decoding table is incorrect.
The bug was latent until commit aea49fbb01 ("target/i386: use gen_writeback()
within gen_POP()", 2024-06-08), and then became visible because gen_op_st_v
now receives op->ot instead of the "ot" returned by gen_pop_T0.
Analyzed-by: Clément Chigot <chigot@adacore.com>
Fixes: 5e9e21bcc4 ("target/i386: move 60-BF opcodes to new decoder", 2024-05-07)
Tested-by: Clément Chigot <chigot@adacore.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently if the 'legacy-vm-type' property of the sev-guest object is
'on', QEMU will attempt to use the newer KVM_SEV_INIT2 kernel
interface in conjunction with the newer KVM_X86_SEV_VM and
KVM_X86_SEV_ES_VM KVM VM types.
This can lead to measurement changes if, for instance, an SEV guest was
created on a host that originally had an older kernel that didn't
support KVM_SEV_INIT2, but is booted on the same host later on after the
host kernel was upgraded.
Instead, if legacy-vm-type is 'off', QEMU should fail if the
KVM_SEV_INIT2 interface is not provided by the current host kernel.
Modify the fallback handling accordingly.
In the future, VMSA features and other flags might be added to QEMU
which will require legacy-vm-type to be 'off' because they will rely
on the newer KVM_SEV_INIT2 interface. It may be difficult to convey to
users what values of legacy-vm-type are compatible with which
features/options, so as part of this rework, switch legacy-vm-type to a
tri-state OnOffAuto option. 'auto' in this case will automatically
switch to using the newer KVM_SEV_INIT2, but only if it is required to
make use of new VMSA features or other options only available via
KVM_SEV_INIT2.
Defining 'auto' in this way would avoid inadvertantly breaking
compatibility with older kernels since it would only be used in cases
where users opt into newer features that are only available via
KVM_SEV_INIT2 and newer kernels, and provide better default behavior
than the legacy-vm-type=off behavior that was previously in place, so
make it the default for 9.1+ machine types.
Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
cc: kvm@vger.kernel.org
Signed-off-by: Michael Roth <michael.roth@amd.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Link: https://lore.kernel.org/r/20240710041005.83720-1-michael.roth@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Drop features that are listed as "BitMask" in the PPR and currently
not supported by AMD processors. The only ones that may become useful
in the future are TSC deadline timer and x2APIC, everything else is
not needed for SEV-SNP guests (e.g. VIRT_SSBD) or would require
processor support (e.g. TSC_ADJUST).
This allows running SEV-SNP guests with "-cpu host".
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some CPUID features may be provided by KVM for some guests, independent of
processor support, for example TSC deadline or TSC adjust. If these are
not supported by the confidential computing firmware, however, the guest
will fail to start. Add support for removing unsupported features from
"-cpu host".
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A bunch of improvements:
- vhost dirty log is now only scanned once, not once per device
- virtio and vhost now support VIRTIO_F_NOTIFICATION_DATA
- cxl gained DCD emulation support
- pvpanic gained shutdown support
- beginning of patchset for Generic Port Affinity Structure
- s3 support
- friendlier error messages when boot fails on some illegal configs
- for vhost-user, VHOST_USER_SET_LOG_BASE is now only sent once
- part of vhost-user support for any POSIX system -
not yet enabled due to qtest failures
- sr-iov VF setup code has been reworked significantly
- new tests, particularly for risc-v ACPI
- bugfixes
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmaF068PHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRp+DMIAMC//mBXIZlPprfhb5cuZklxYi31Acgu5TUr
njqjCkN+mFhXXZuc3B67xmrQ066IEPtsbzCjSnzuU41YK4tjvO1g+LgYJBv41G16
va2k8vFM5pdvRA+UC9li1CCIPxiEcszxOdzZemj3szWLVLLUmwsc5OZLWWeFA5m8
vXrrT9miODUz3z8/Xn/TVpxnmD6glKYIRK/IJRzzC4Qqqwb5H3ji/BJV27cDUtdC
w6ns5RYIj5j4uAiG8wQNDggA1bMsTxFxThRDUwxlxaIwAcexrf1oRnxGRePA7PVG
BXrt5yodrZYR2sR6svmOOIF3wPMUDKdlAItTcEgYyxaVo5rAdpc=
=p9h4
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio: features,fixes
A bunch of improvements:
- vhost dirty log is now only scanned once, not once per device
- virtio and vhost now support VIRTIO_F_NOTIFICATION_DATA
- cxl gained DCD emulation support
- pvpanic gained shutdown support
- beginning of patchset for Generic Port Affinity Structure
- s3 support
- friendlier error messages when boot fails on some illegal configs
- for vhost-user, VHOST_USER_SET_LOG_BASE is now only sent once
- part of vhost-user support for any POSIX system -
not yet enabled due to qtest failures
- sr-iov VF setup code has been reworked significantly
- new tests, particularly for risc-v ACPI
- bugfixes
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmaF068PHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRp+DMIAMC//mBXIZlPprfhb5cuZklxYi31Acgu5TUr
# njqjCkN+mFhXXZuc3B67xmrQ066IEPtsbzCjSnzuU41YK4tjvO1g+LgYJBv41G16
# va2k8vFM5pdvRA+UC9li1CCIPxiEcszxOdzZemj3szWLVLLUmwsc5OZLWWeFA5m8
# vXrrT9miODUz3z8/Xn/TVpxnmD6glKYIRK/IJRzzC4Qqqwb5H3ji/BJV27cDUtdC
# w6ns5RYIj5j4uAiG8wQNDggA1bMsTxFxThRDUwxlxaIwAcexrf1oRnxGRePA7PVG
# BXrt5yodrZYR2sR6svmOOIF3wPMUDKdlAItTcEgYyxaVo5rAdpc=
# =p9h4
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 03 Jul 2024 03:41:51 PM PDT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [undefined]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (85 commits)
hw/pci: Replace -1 with UINT32_MAX for romsize
pcie_sriov: Register VFs after migration
pcie_sriov: Remove num_vfs from PCIESriovPF
pcie_sriov: Release VFs failed to realize
pcie_sriov: Reuse SR-IOV VF device instances
pcie_sriov: Ensure VF function number does not overflow
pcie_sriov: Do not manually unrealize
hw/ppc/spapr_pci: Do not reject VFs created after a PF
hw/ppc/spapr_pci: Do not create DT for disabled PCI device
hw/pci: Rename has_power to enabled
virtio-iommu: Clear IOMMUDevice when VFIO device is unplugged
virtio: remove virtio_tswap16s() call in vring_packed_event_read()
hw/cxl/events: Mark cxl-add-dynamic-capacity and cxl-release-dynamic-capcity unstable
hw/cxl/events: Improve QMP interfaces and documentation for add/release dynamic capacity.
tests/data/acpi/rebuild-expected-aml.sh: Add RISC-V
pc-bios/meson.build: Add support for RISC-V in unpack_edk2_blobs
meson.build: Add RISC-V to the edk2-target list
tests/data/acpi/virt: Move ARM64 ACPI tables under aarch64/${machine} path
tests/data/acpi: Move x86 ACPI tables under x86/${machine} path
tests/qtest/bios-tables-test.c: Set "arch" for x86 tests
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
In e820_add_entry() the e820_table is reallocated with g_renew() to make
space for a new entry. However, fw_cfg_arch_create() just uses the
existing e820_table pointer. This leads to a use-after-free if anything
adds a new entry after fw_cfg is set up.
Shift the addition of the etc/e820 file to the machine done notifier, via
a new fw_cfg_add_e820() function.
Also make e820_table private and use an e820_get_table() accessor function
for it, which sets a flag that will trigger an assert() for any *later*
attempts to add to the table.
Make e820_add_entry() return void, as most callers don't check for error
anyway.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <a2708734f004b224f33d3b4824e9a5a262431568.camel@infradead.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
AVX-VNNI-INT16 (CPUID[EAX=7,ECX=1).EDX[10]) is supported by Clearwater
Forest processor, add it to QEMU as it does not need any specific
enablement.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When management tools (e.g. libvirt) query QEMU capabilities,
they start QEMU with a minimalistic configuration and issue
various commands on monitor. One of the command issued is/might
be "query-sev-capabilities" to learn values like cbitpos or
reduced-phys-bits. But as of v9.0.0-1145-g16dcf200dc the monitor
command returns an error instead.
This creates a chicken-egg problem because in order to query
those aforementioned values QEMU needs to be started with a
'sev-guest' object. But to start QEMU with the values must be
known.
I think it's safe to assume that the default path ("/dev/sev")
provides the same data as user provided one. So fall back to it.
Fixes: 16dcf200dc
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Link: https://lore.kernel.org/r/157f93712c23818be193ce785f648f0060b33dee.1719218926.git.mprivozn@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When a custom path is provided to sev-guest object and opening
the path fails an error message is reported. But the error
message still mentions DEFAULT_SEV_DEVICE ("/dev/sev") instead of
the custom path.
Fixes: 16dcf200dc
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/b4648905d399780063dc70851d3d6a3cd28719a5.1719218926.git.mprivozn@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit d7c72735f6 ("target/i386: Add new EPYC CPU versions with updated
cache_info", 2023-05-08) ensured that AMD-defined CPU models did not
have the 'complex_indexing' bit set, but left it set in "-cpu host"
which uses the default ("legacy") cache information.
Reimplement that commit using a CPU feature, so that it can be applied
to all guests using a new machine type, independent of the CPU model.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The recent addition of the SUCCOR bit to kvm_arch_get_supported_cpuid()
causes the bit to be visible when "-cpu host" VMs are started on Intel
processors.
While this should in principle be harmless, it's not tidy and we don't
even know for sure that it doesn't cause any guest OS to take unexpected
paths. Since x86_cpu_get_supported_feature_word() can return different
different values depending on the guest, adjust it to hide the SUCCOR
bit if the guest has non-AMD vendor.
Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: John Allen <john.allen@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This allows modifying the bits in "-cpu max"/"-cpu host" depending on
the guest CPU vendor (which, at least by default, is the host vendor in
the case of KVM).
For example, machine check architecture differs between Intel and AMD,
and bits from AMD should be dropped when configuring the guest for
an Intel model.
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: John Allen <john.allen@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
macOS versions older than 12.0 are no longer supported.
docs/about/build-platforms.rst says:
> Support for the previous major version will be dropped 2 years after
> the new major version is released or when the vendor itself drops
> support, whichever comes first.
macOS 12.0 was released 2021:
https://www.apple.com/newsroom/2021/10/macos-monterey-is-now-available/
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240629-macos-v1-1-6e70a6b700a0@daynix.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
host_cpu_realizefn() sets CPUID_EXT_MONITOR without consulting host/KVM
capabilities. This may cause problems:
- If MWAIT/MONITOR is not available on the host, advertising this
feature to the guest and executing MWAIT/MONITOR from the guest
triggers #UD and the guest doesn't boot. This is because typically
#UD takes priority over VM-Exit interception checks and KVM doesn't
emulate MONITOR/MWAIT on #UD.
- If KVM doesn't support KVM_X86_DISABLE_EXITS_MWAIT, MWAIT/MONITOR
from the guest are intercepted by KVM, which is not what cpu-pm=on
intends to do.
In these cases, MWAIT/MONITOR should not be exposed to the guest.
The logic in kvm_arch_get_supported_cpuid() to handle CPUID_EXT_MONITOR
is correct and sufficient, and we can't set CPUID_EXT_MONITOR after
x86_cpu_filter_features().
This was not an issue before commit 662175b91f ("i386: reorder call to
cpu_exec_realizefn") because the feature added in the accel-specific
realizefn could be checked against host availability and filtered out.
Additionally, it seems not a good idea to handle guest CPUID leaves in
host_cpu_realizefn(), and this patch merges host_cpu_enable_cpu_pm()
into kvm_cpu_realizefn().
Fixes: f5cc5a5c16 ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
Fixes: 662175b91f ("i386: reorder call to cpu_exec_realizefn")
Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This code was using both uint32_t and uint64_t for len.
Consistently use size_t instead.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20240626194950.1725800-3-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Do not rely on finish->id_auth_uaddr, so that there are no casts from
pointer to uint64_t. They break on 32-bit hosts.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Free the "id_auth" name for the binary version of the data.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Do not rely on finish->id_block_uaddr, so that there are no casts from
pointer to uint64_t. They break on 32-bit hosts.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Free the "id_block" name for the binary version of the data.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Handle it like the other arithmetic cc_ops. This simplifies a
bit the implementation of bit test instructions.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It is the only CCOp, among those that compute ZF from one of the cc_op_*
registers, that uses cpu_cc_src. Do not make it the odd one off,
instead use cpu_cc_dst like the others.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
POPCNT was missing, and the entries were all out of order after
ADCX/ADOX/ADCOX were moved close to EFLAGS. Just use designated
initializers.
Fixes: 4885c3c495 ("target-i386: Use ctpop helper", 2017-01-10)
Fixes: cc155f1971 ("target/i386: rewrite flags writeback for ADCX/ADOX", 2024-06-11)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is an experiment to further reduce the amount we throw into the
exec headers. It might not be as useful as I initially thought because
just under half of the users also need gdbserver_start().
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240620152220.2192768-3-alex.bennee@linaro.org>
X86CPU::kvm_no_smi_migration was only used by the
pc-i440fx-2.3 machine, which got removed. Remove it
and simplify kvm_put_vcpu_events().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240617071118.60464-23-philmd@linaro.org>
x86_cpu_change_kvm_default() was only used out of kvm-cpu.c by
the pc-i440fx-2.1 machine, which got removed. Make it static,
and remove its declaration. "kvm-cpu.h" is now empty, remove it.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240617071118.60464-10-philmd@linaro.org>
There can be other confidential computing classes that are not derived
from sev-common. Avoid aborting when encountering them.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use the same flag generation code as SHL and SHR, but use
the existing gen_shiftd_rm_T1 function to compute the result
as well as CC_SRC.
Decoding-wise, SHLD/SHRD by immediate count as a 4 operand
instruction because s->T0 and s->T1 actually occupy three op
slots. The infrastructure used by opcodes in the 0F 3A table
works fine.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SHLD/SHRD can have 3 register operands - s->T0, s->T1 and either
1 or CL - and therefore decode->op[2] is taken by the low part
of the register being shifted. Pass X86_OP_* to gen_shift_count
from its current callers and hardcode cpu_regs[R_ECX] as the
shift count.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use gen_ld_modrm/gen_st_modrm, moving them and gen_shift_flags to the
caller. This way, gen_shiftd_rm_T1 becomes something that the new
decoder can call.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These have very simple generators and no need for complex group
decoding. Apart from LAR/LSL which are simplified to use
gen_op_deposit_reg_v and movcond, the code is generally lifted
from translate.c into the generators.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>