Other accelerators have their own headers: sysemu/hax.h, sysemu/hvf.h,
sysemu/kvm.h, sysemu/whpx.h. Only tcg_enabled() & friends sit in
qemu-common.h. This necessitates inclusion of qemu-common.h into
headers, which is against the rules spelled out in qemu-common.h's
file comment.
Move tcg_enabled() & friends into their own header sysemu/tcg.h, and
adjust #include directives.
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-2-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[Rebased with conflicts resolved automatically, except for
accel/tcg/tcg-all.c]
Cleanup in the boilerplate that each target must define.
Replace x86_env_get_cpu with env_archcpu. The combination
CPU(x86_env_get_cpu) should have used ENV_GET_CPU to begin;
use env_cpu now.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit b2fc91db84 ("q35: set split kernel irqchip as default") changed
the default for the pc-q35-4.0 machine type to use split irqchip, which
turned out to have disasterous effects on vfio-pci INTx support. KVM
resampling irqfds are registered for handling these interrupts, but
these are non-functional in split irqchip mode. We can't simply test
for split irqchip in QEMU as userspace handling of this interrupt is a
significant performance regression versus KVM handling (GeForce GPUs
assigned to Windows VMs are non-functional without forcing MSI mode or
re-enabling kernel irqchip).
The resolution is to revert the change in default irqchip mode in the
pc-q35-4.1 machine and create a pc-q35-4.0.1 machine for the 4.0-stable
branch. The qemu-q35-4.0 machine type should not be used in vfio-pci
configurations for devices requiring legacy INTx support without
explicitly modifying the VM configuration to use kernel irqchip.
Link: https://bugs.launchpad.net/qemu/+bug/1826422
Fixes: b2fc91db84 ("q35: set split kernel irqchip as default")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <155786484688.13873.6037015630912983760.stgit@gimli.home>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
QEMU currently crashes when you try to hot-plug an "nvdimm" device
on older machine types:
$ qemu-system-x86_64 -monitor stdio -M pc-1.1
QEMU 3.1.92 monitor - type 'help' for more information
(qemu) device_add nvdimm,id=nvdimmn1
qemu-system-x86_64: /home/thuth/devel/qemu/util/error.c:57: error_setv:
Assertion `*errp == ((void *)0)' failed.
Aborted (core dumped)
The call to hotplug_handler_pre_plug() in pc_memory_pre_plug() has been
added recently before the check whether nvdimm is enabled. It should
be done after the check. And while we're at it, also check the errp
after the hotplug_handler_pre_plug(), otherwise errors are silently
ignored here.
Fixes: 9040e6dfa8
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20190407092314.11066-1-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently we do device realization like below:
hotplug_handler_pre_plug()
dc->realize()
hotplug_handler_plug()
Before we do device realization and plug, we should allocate necessary
resources and check if memory-hotplug-support property is enabled.
At the piix4 and ich9, the memory-hotplug-support property is checked at
plug stage. This means that device has been realized and mapped into guest
address space 'pc_dimm_plug()' by the time acpi plug handler is called,
where it might fail and crash QEMU due to reaching g_assert_not_reached()
(piix4) or error_abort (ich9).
Fix it by checking if memory hotplug is enabled at pre_plug stage
where we can gracefully abort hotplug request.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
CC: Igor Mammedov <imammedo@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190301033548.6691-1-richardw.yang@linux.intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The PC machines put firmware in ROM by default. To get it put into
flash memory (required by OVMF), you have to use -drive
if=pflash,unit=0,... and optionally -drive if=pflash,unit=1,...
Why two -drive? This permits setting up one part of the flash memory
read-only, and the other part read/write. It also makes upgrading
firmware on the host easier. Below the hood, it creates two separate
flash devices, because we were too lazy to improve our flash device
models to support sector protection.
The problem at hand is to do the same with -blockdev somehow, as one
more step towards deprecating -drive.
Mapping -drive if=none,... to -blockdev is a solved problem. With
if=T other than if=none, -drive additionally configures a block device
frontend. For non-onboard devices, that part maps to -device. Also a
solved problem. For onboard devices such as PC flash memory, we have
an unsolved problem.
This is actually an instance of a wider problem: our general device
configuration interface doesn't cover onboard devices. Instead, we have
a zoo of ad hoc interfaces that are much more limited. One of them is
-drive, which we'd rather deprecate, but can't until we have suitable
replacements for all its uses.
Sadly, I can't attack the wider problem today. So back to the narrow
problem.
My first idea was to reduce it to its solved buddy by using pluggable
instead of onboard devices for the flash memory. Workable, but it
requires some extra smarts in firmware descriptors and libvirt. Paolo
had an idea that is simpler for libvirt: keep the devices onboard, and
add machine properties for their block backends.
The implementation is less than straightforward, I'm afraid.
First, block backend properties are *qdev* properties. Machines can't
have those, as they're not devices. I could duplicate these qdev
properties as QOM properties, but I hate that.
More seriously, the properties do not belong to the machine, they
belong to the onboard flash devices. Adding them to the machine would
then require bad magic to somehow transfer them to the flash devices.
Fortunately, QOM provides the means to handle exactly this case: add
alias properties to the machine that forward to the onboard devices'
properties.
Properties need to be created in .instance_init() methods. For PC
machines, that's pc_machine_initfn(). To make alias properties work,
we need to create the onboard flash devices there, too. Requires
several bug fixes, in the previous commits. We also have to realize
the devices. More on that below.
If the user sets pflash0, firmware resides in flash memory.
pc_system_firmware_init() maps and realizes the flash devices.
Else, firmware resides in ROM. The onboard flash devices aren't used
then. pc_system_firmware_init() destroys them unrealized, along with
the alias properties.
The existing code to pick up drives defined with -drive if=pflash is
replaced by code to desugar into the machine properties.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <87ftrtux81.fsf@dusky.pond.sub.org>
pc_system_firmware_init() parameter @isapc_ram_fw is PCMachineState
member pci_enabled negated. The next commit will need more of
PCMachineState. To prepare for that, pass a PCMachineState *, and
drop the now redundant parameter @isapc_ram_fw.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20190308131445.17502-11-armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Intel Processor Trace required CPUID[0x14] but the cpuid_level
have no change when create a kvm guest with
e.g. "-cpu qemu64,+intel-pt".
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Message-Id: <1548805979-12321-1-git-send-email-luwei.kang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some multiboot images could be in the ELF format. In the current
implementation QEMU fails because we try to load these images
as a PVH image.
In order to fix this issue, we should try multiboot first (we
already check the multiboot magic header before to load it).
If it is not a multiboot image, we can try the PVH loader.
Fixes: ab969087da ("pvh: Boot uncompressed kernel using direct boot ABI", 2019-01-15)
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20190214180216.246707-1-sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As NVDIMM support is looming for ARM and SPAPR, let's
move the acpi_nvdimm_state to the generic machine struct
instead of duplicating the same code in several machines.
It is also renamed into nvdimms_state and becomes a pointer.
nvdimm and nvdimm-persistence become generic machine options.
They become guarded by a nvdimm_supported machine class member.
We also add a description for those options.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20190308182053.5487-3-eric.auger@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
As we intend to migrate the acpi_nvdimm_state into
the base machine with a new dimms_state name, let's
also rename the datatype.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190308182053.5487-2-eric.auger@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
When unplugging a device, at one point the device will be destroyed
via object_unparent(). This will, one the one hand, unrealize the
removed device hierarchy, and on the other hand, destroy/free the
device hierarchy.
When chaining hotplug handlers, we want to overwrite a bus hotplug
handler by the machine hotplug handler, to be able to perform
some part of the plug/unplug and to forward the calls to the bus hotplug
handler.
For now, the bus hotplug handler would trigger an object_unparent(), not
allowing us to perform some unplug action on a device after we forwarded
the call to the bus hotplug handler. The device would be gone at that
point.
machine_unplug_handler(dev)
/* eventually do unplug stuff */
bus_unplug_handler(dev)
/* dev is gone, we can't do more unplug stuff */
So move the object_unparent() to the original caller of the unplug. For
now, keep the unrealize() at the original places of the
object_unparent(). For implicitly chained hotplug handlers (e.g. pc
code calling acpi hotplug handlers), the object_unparent() has to be
done by the outermost caller. So when calling hotplug_handler_unplug()
from inside an unplug handler, nothing is to be done.
hotplug_handler_unplug(dev) -> calls machine_unplug_handler()
machine_unplug_handler(dev) {
/* eventually do unplug stuff */
bus_unplug_handler(dev) -> calls unrealize(dev)
/* we can do more unplug stuff but device already unrealized */
}
object_unparent(dev)
In the long run, every unplug action should be factored out of the
unrealize() function into the unplug handler (especially for PCI). Then
we can get rid of the additonal unrealize() calls and object_unparent()
will properly unrealize the device hierarchy after the device has been
unplugged.
hotplug_handler_unplug(dev) -> calls machine_unplug_handler()
machine_unplug_handler(dev) {
/* eventually do unplug stuff */
bus_unplug_handler(dev) -> only unplugs, does not unrealize
/* we can do more unplug stuff */
}
object_unparent(dev) -> will unrealize
The original approach was suggested by Igor Mammedov for the PCI
part, but I extended it to all hotplug handlers. I consider this one
step into the right direction.
To summarize:
- object_unparent() on synchronous unplugs is done by common code
-- "Caller of hotplug_handler_unplug"
- object_unparent() on asynchronous unplugs ("unplug requests") has to
be done manually
-- "Caller of hotplug_handler_unplug"
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190228122849.4296-2-david@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Function pc_acpi_init() is not used anymore.
Remove the definition and declaration.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20190214084939.20640-2-richardw.yang@linux.intel.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Let's avoid manually looking up the hotplug handler class. Use the
existing wrappers instead.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20181212095707.19358-1-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since linux commit: cf8fa920cb42 ("i386: handle an initrd in highmem (version 2)")
linux has supported initrd up to 4 GB, but the header field
ramdisk_max is still set to 2 GB to avoid "possible bootloader bugs".
When use '-kernel vmlinux -initrd initrd.cgz' to launch a VM,
the firmware(it could be linuxboot_dma.bin) helps to read initrd
contents into guest memory(below ramdisk_max) and jump to kernel.
that's similar with what bootloader does, like grub.
In addition, initrd_max is uint32_t simply because QEMU doesn't support
the 64-bit boot protocol (specifically the ext_ramdisk_image field).
Therefore here just limit initrd_max to UINT32_MAX simply as well to
allow initrd to be loaded below 4 GB.
NOTE: it's possible that linux protocol within [0x208, 0x20c]
supports up to 4 GB initrd as well.
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Eduardo Habkost <ehabkost@redhat.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
it's from v4.20-rc5.
CC: Stefano Garzarella <sgarzare@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In order to avoid migration issues, we enable PVH only for
machine type >= 4.0
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use pvh.bin option rom when we are booting an uncompressed
kernel using the x86/HVM direct boot ABI.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Based-on: <1547554687-12687-1-git-send-email-liam.merwick@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When initrd is specified, load and expose it to the guest firmware
through fw_cfg. The firmware will fill the hvm_start_info for the
kernel.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Based-on: <1545422632-24444-5-git-send-email-liam.merwick@oracle.com>
Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These changes (along with corresponding Linux kernel and qboot changes)
enable a guest to be booted using the x86/HVM direct boot ABI.
This commit adds a load_elfboot() routine to pass the size and
location of the kernel entry point to qboot (which will fill in
the start_info struct information needed to to boot the guest).
Having loaded the ELF binary, load_linux() will run qboot
which continues the boot.
The address for the kernel entry point is read from an ELF Note
in the uncompressed kernel binary by a helper routine passed
to load_elf().
Co-developed-by: George Kennedy <George.Kennedy@oracle.com>
Signed-off-by: George Kennedy <George.Kennedy@oracle.com>
Signed-off-by: Liam Merwick <liam.merwick@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Modern AMD CPUs support NPT and NRIPSAVE features and KVM exposes these
when present. NRIPSAVE apeared somewhere in Opteron_G3 lifetime (e.g.
QuadCore AMD Opteron 2378 has is but QuadCore AMD Opteron HE 2344 doesn't),
NPT was introduced a bit earlier.
Add the FEAT_SVM leaf to Opteron_G4/G5 and EPYC/EPYC-IBPB cpu models.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20190121155051.5628-1-vkuznets@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Update the stepping from 5 to 6, in order that
the Cascadelake-Server CPU model can support AVX512VNNI
and MSR based features exposed by ARCH_CAPABILITIES.
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <20181227024304.12182-2-tao3.xu@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
MPX support is being phased out by Intel; GCC has dropped it, Linux
is also going to do that. Even though KVM will have special code
to support MPX after the kernel proper stops enabling it in XCR0,
we probably also want to deprecate that in a few years. As a start,
do not enable it by default for any named CPU model starting with
the 4.0 machine types; this include Skylake, Icelake and Cascadelake.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20181220121100.21554-1-pbonzini@redhat.com>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The missing functionality was added ~3 years ago with the Linux commit
46896c73c1a4 ("KVM: svm: add support for RDTSCP")
so reenable RDTSCP support on those CPU models.
Opteron_G2 - being family 15, model 6, doesn't have RDTSCP support
(the real hardware doesn't have it. K8 got RDTSCP support with the NPT
models, i.e., models >= 0x40).
Document the host's minimum required kernel version, while at it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Message-ID: <20181212200803.GG6653@zn.tnic>
[ehabkost: moved compat properties code to pc.c]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Instead of verbose arrays with 4 lines for each entry, make each
entry take only one line. This makes long arrays that couldn't
fit in the screen become short and readable.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190107193020.21744-4-ehabkost@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
stringify() is useful when we need to use macros in compat_props
(like when we set virtio-baloon-pci.class=PCI_CLASS_MEMORY_RAM at
pc_i440fx_1_0_machine_options()), but it is pointless when we are
already providing a number literal.
Replace stringify() with string literals when appropriate.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190107193020.21744-3-ehabkost@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Make them more QOMConventional.
Cc:qemu-trivial@nongnu.org
Signed-off-by: Li Qiang <liq3ea@163.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20190105023831.66910-1-liq3ea@163.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Use static arrays instead. I decided to rename the conflicting
pc_compat_2_1() function with pc_compat_2_1_fn().
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Use static arrays instead. I decided to rename the conflicting
pc_compat_2_1() function with pc_compat_2_1_fn().
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Use static arrays instead. I decided to rename the conflicting
pc_compat_2_2() function with pc_compat_2_2_fn().
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Use static arrays instead. I decided to rename the conflicting
pc_compat_2_3() function with pc_compat_2_3_fn().
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Switch the intr_supported variable from a boolean to OnOffAuto type so
that we can know whether the user specified it or not. With that
we'll have a chance to help the user to choose more wisely where
possible. Introduce x86_iommu_ir_supported() to mask these changes.
No functional change at all.
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
SMBIOS is just another firmware interface used by some QEMU models.
We will later introduce more firmware interfaces in this subdirectory.
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The load_image() function is deprecated, as it does not let the
caller specify how large the buffer to read the file into is.
Use the glib g_file_get_contents() function instead, which does
the whole "allocate memory for the file and read it in" operation.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20181130151712.2312-6-peter.maydell@linaro.org
This makes their function more clear and prevents conflicts when adding
the actual devices to the machine state, if necessary.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20181107152434.22219-1-minyard@acm.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
We're plugging/unplugging a PCDIMMDevice, so directly pass this type
instead of a more generic DeviceState.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20181005092024.14344-5-david@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Previously, if the size of initrd >=2G, qemu exits with error:
root@haswell-OptiPlex-9020:/home/lizj# /home/lizhijian/lkp/qemu-colo/x86_64-softmmu/qemu-system-x86_64 -kernel ./vmlinuz-4.16.0-rc4 -initrd large.cgz -nographic
qemu: error reading initrd large.cgz: No such file or directory
root@haswell-OptiPlex-9020:/home/lizj# du -sh large.cgz
2.5G large.cgz
this patch changes the caller side that use this function to calculate
size of initrd file as well.
v2: update error message and int64_t printing format
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Message-Id: <1536833233-14121-1-git-send-email-lizhijian@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We can assign and verify the address before realizing and trying to plug.
reading/writing the address property should never fail for DIMMs, so let's
reduce error handling a bit by using &error_abort. Getting access to the
memory region now might however fail. So forward errors from
get_memory_region() properly.
As all memory devices should use the alignment of the underlying memory
region for guest physical address asignment, do detection of the
alignment in pc_dimm_pre_plug(), but allow pc.c to overwrite the
alignment for compatibility handling.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180801133444.11269-5-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All applicable memory regions always have an alignment > 0. All memory
backends result in file_ram_alloc() or qemu_anon_ram_alloc() getting
called, setting the alignment to > 0.
So a PCDIMM memory region always has an alignment > 0. NVDIMM copy the
alignment of the original memory memory region into the handcrafted memory
region that will be used at this place.
So the check for 0 can be dropped and we can reduce the special
handling.
Dropping this check makes factoring out of alignment handling easier as
compat handling only has to look at pcmc->enforce_aligned_dimm and not
care about the alignment of the memory region.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180801133444.11269-4-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We can assign and verify the slot before realizing and trying to plug.
reading/writing the slot property should never fail, so let's reduce
error handling a bit by using &error_abort.
To do this during pre_plug, add and use (x86, ppc) pc_dimm_pre_plug().
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180801133444.11269-2-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hyper-V identifies vCPUs by Virtual Processor (VP) index which can be
queried by the guest via HV_X64_MSR_VP_INDEX msr. It is defined by the
spec as a sequential number which can't exceed the maximum number of
vCPUs per VM.
It has to be owned by QEMU in order to preserve it across migration.
However, the initial implementation in KVM didn't allow to set this
msr, and KVM used its own notion of VP index. Fortunately, the way
vCPUs are created in QEMU/KVM makes it likely that the KVM value is
equal to QEMU cpu_index.
So choose cpu_index as the value for vp_index, and push that to KVM on
kernels that support setting the msr. On older ones that don't, query
the kernel value and assert that it's in sync with QEMU.
Besides, since handling errors from vCPU init at hotplug time is
impossible, disable vCPU hotplug.
This patch also introduces accessor functions to encapsulate the mapping
between a vCPU and its vp_index.
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Message-Id: <20180702134156.13404-3-rkagan@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It eases code review, unit is explicit.
Patch generated using:
$ git grep -E '[<>][<>]=? ?[1-5]0' hw/ include/hw/
and modified manually.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Let's try to reduce error handling a bit. In the plug/unplug case, the
device was realized and therefore we can assume that getting access to
the memory region will not fail.
For get_vmstate_memory_region() this is already handled that way.
Document both cases.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180619134141.29478-13-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We can perform these checks before the device is actually realized.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180619134141.29478-6-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Let's rename it to make it look more consistent.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180619134141.29478-4-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use a similar naming scheme as spapr. This way, we can go ahead and
rename e.g. pc_dimm_memory_plug to pc_dimm_plug, which avoids
confusion.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180619134141.29478-3-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A link property can be set during creation, with
object_property_add_link() and later with object_property_set_link().
add_link() doesn't add a reference to the target object, while
set_link() does.
Furthemore, OBJ_PROP_LINK_UNREF_ON_RELEASE flags, set during add_link,
says whether a reference must be released when the property is destroyed.
This can lead to leaks if the property was later set_link(), as the
added reference is never released.
Instead, rename OBJ_PROP_LINK_UNREF_ON_RELEASE to OBJ_PROP_LINK_STRONG
and use that has an indication on how the link handle reference
management in set_link().
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20180531195119.22021-3-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Replace the "nvdimm-cap" option which took numeric arguments such as "2"
with a more user friendly "nvdimm-persistence" option which takes symbolic
arguments "cpu" or "mem-ctrl".
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vDPA support, fix to vhost blk RO bit handling, some include path
cleanups, NFIT ACPI table.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJbEXNvAAoJECgfDbjSjVRpc8gH/R8xrcFrV+k9wwbgYcOcGb6Y
LWjseE31pqJcxRV80vLOdzYEuLStZQKQQY7xBDMlA5vdyvZxIA6FLO2IsiJSbFAk
EK8pclwhpwQAahr8BfzenabohBv2UO7zu5+dqSvuJCiMWF3jGtPAIMxInfjXaOZY
odc1zY2D2EgsC7wZZ1hfraRbISBOiRaez9BoGDKPOyBY9G1ASEgxJgleFgoBLfsK
a1XU+fDM6hAVdxftfkTm0nibyf7PWPDyzqghLqjR9WXLvZP3Cqud4p8N29mY51pR
KSTjA4FYk6Z9EVMltyBHfdJs6RQzglKjxcNGdlrvacDfyFi79fGdiosVllrjfJM=
=3+V0
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
acpi, vhost, misc: fixes, features
vDPA support, fix to vhost blk RO bit handling, some include path
cleanups, NFIT ACPI table.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Fri 01 Jun 2018 17:25:19 BST
# gpg: using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream: (31 commits)
vhost-blk: turn on pre-defined RO feature bit
ACPI testing: test NFIT platform capabilities
nvdimm, acpi: support NFIT platform capabilities
tests/.gitignore: add entry for generated file
arch_init: sort architectures
ui: use local path for local headers
qga: use local path for local headers
colo: use local path for local headers
migration: use local path for local headers
usb: use local path for local headers
sd: fix up include
vhost-scsi: drop an unused include
ppc: use local path for local headers
rocker: drop an unused include
e1000e: use local path for local headers
ioapic: fix up includes
ide: use local path for local headers
display: use local path for local headers
trace: use local path for local headers
migration: drop an unused include
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Add a machine command line option to allow the user to control the Platform
Capabilities Structure in the virtualized NFIT. This Platform Capabilities
Structure was added in ACPI 6.2 Errata A.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180528232719.4721-20-f4bug@amsat.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum<marcel.apfelbaum@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
By default MachineClass::get_hotplug_handler is NULL and concrete board
should set it to it's own handler.
Considering there isn't any default handler, drop saving empty
MachineClass::get_hotplug_handler in child class and make PC code
consistent with spapr/s390x boards.
We can bring this back when actual usecase surfaces and do it
consistently across boards that use get_hotplug_handler().
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-id: 1525691524-32265-2-git-send-email-imammedo@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Let's make it clear that we are dealing with device memory. That it can
be used for memory hotplug is just a special case.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-10-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
We use the machine internally either way, so let's just pass it in then.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-5-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
We can just query it ourselves. When unplugging, we should always be
able to the region (as it was previously plugged). E.g. PPC already
assumed that and used &error_abort.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-4-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Let's allow to query the MemoryHotplugState directly from the machine.
If the pointer is NULL, the machine does not support memory devices. If
the pointer is !NULL, the machine supports memory devices and the
data structure contains information about the applicable physical
guest address space region.
This allows us to generically detect if a certain machine has support
for memory devices, and to generically manage it (find free address
range, plug/unplug a memory region).
We will rename "MemoryHotplugState" to something more meaningful
("DeviceMemory") after we completed factoring out the pc-dimm code into
MemoryDevice code.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-3-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
[ehabkost: rebased series, solved conflicts at spapr.c]
[ehabkost: squashed fix to use g_malloc0()]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The ISA serial port handling in serial-isa.c imposes a limit
of 4 serial ports. This is because we only know of 4 IO port
and IRQ settings for them, and is unrelated to the generic
MAX_SERIAL_PORTS limit, though they happen to both be set at
4 currently.
Use a new MAX_ISA_SERIAL_PORTS wherever that is the correct
limit to be checking against.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180420145249.32435-11-peter.maydell@linaro.org
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180308223946.26784-26-f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au> (hw/ppc)
Message-Id: <20180308223946.26784-4-f4bug@amsat.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Move the header from hw/isa/ to hw/dma/
- Remove the old i386/pc dependency
- use a bool type for the high_page_enable argument
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180308223946.26784-3-f4bug@amsat.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Again... (after 07dc788054 and 9157eee1b1).
We now extract the ISA bus specific helpers.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180308223946.26784-2-f4bug@amsat.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After reviewing a patch from Philippe that removes block-backend.h
from hw/lm32/milkymist.c, I noticed that this header is included
unnecessarily in a lot of other files, too. Remove those unneeded
includes to speed up the compilation process a little bit.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1518684912-31637-1-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The e1000 NIC is getting old and is not a very good default for a
PCIe machine type. Change it to e1000e, which should be supported
by a good number of guests.
In particular, drivers for 82574 were added first to Linux 2.6.27 (2008)
and Windows 2008 R2. This does mean that Windows 2008 will not work
anymore with Q35 machine types and a default "-net nic -net xxx" network
configuration; it did work before because it does have an AHCI driver.
However, Windows 2008 has been declared out of main stream support
in 2015. It will get out of extended support in 2020. Windows 2008
R2 has the same end of support dates and, since the two are basically
Vista vs. Windows 7, R2 probably is more popular.
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Automatic creation of SCSI controllers for "-drive if=scsi" for x86
machines was quite a bad idea (see description of commit f778a82f0c
for details). This is marked as deprecated since QEMU v2.9.0, and as
far as I know, nobody complained that this is still urgently required
anymore. Time to remove this now.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1519123357-13225-1-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In my "build everything" tree, a change to the types in
qapi-schema.json triggers a recompile of about 4800 out of 5100
objects.
The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h,
qapi-types.h. Each of these headers still includes all its shards.
Reduce compile time by including just the shards we actually need.
To illustrate the benefits: adding a type to qapi/migration.json now
recompiles some 2300 instead of 4800 objects. The next commit will
improve it further.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-24-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
qemu-common.h includes qemu/option.h, but most places that include the
former don't actually need the latter. Drop the include, and add it
to the places that actually need it.
While there, drop superfluous includes of both headers, and
separate #include from file comment with a blank line.
This cleanup makes the number of objects depending on qemu/option.h
drop from 4545 (out of 4743) to 284 in my "build everything" tree.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-20-armbru@redhat.com>
[Semantic conflict with commit bdd6a90a9e in block/nvme.c resolved]
This cleanup makes the number of objects depending on qapi/error.h
drop from 1910 (out of 4743) to 1612 in my "build everything" tree.
While there, separate #include from file comment with a blank line,
and drop a useless comment on why qemu/osdep.h is included first.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-5-armbru@redhat.com>
[Semantic conflict with commit 34e304e975 resolved, OSX breakage fixed]
Remove dependency of possible_cpus on 1st CPU instance,
which decouples configuration data from CPU instances that
are created using that data.
Also later it would be used for enabling early cpu to numa node
configuration at runtime qmp_query_hotpluggable_cpus() should
provide a list of available cpu slots at early stage,
before machine_init() is called and the 1st cpu is created,
so that mgmt might be able to call it and use output to set
numa mapping.
Use MachineClass::possible_cpu_arch_ids() callback to set
cpu type info, along with the rest of possible cpu properties,
to let machine define which cpu type* will be used.
* for SPAPR it will be a spapr core type and for ARM/s390x/x86
a respective descendant of CPUClass.
Move parse_numa_opts() in vl.c after cpu_model is parsed into
cpu_type so that possible_cpu_arch_ids() would know which
cpu_type to use during layout initialization.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1515597770-268979-1-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
When -no-acpi option is used with Q35 machine type, no guest ACPI is
built, but the ACPI device is still created, so only checking the
presence of ACPI device before memory plug/unplug is not enough in
such cases. Check whether ACPI is disabled globally in addition and
fail memory plug/unplug if it's disabled.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Message-Id: <20171222015120.31730-1-haozhong.zhang@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
and remove the old i386/pc dependency
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
when qemu is started with '-no-acpi' CLI option, an attempt
to unplug a CPU using device_del results in null pointer
dereference at:
#0 object_get_class
#1 pc_machine_device_unplug_request_cb
#2 qmp_marshal_device_del
which is caused by pcms->acpi_dev == NULL due to ACPI support
being disabled.
Considering that ACPI support is necessary for unplug to work,
check that it's enabled and fail unplug request gracefully
if no acpi device were found.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Linux and Windows need ACPI SRAT table to make memory hotplug work properly,
however currently QEMU doesn't create SRAT table if numa options aren't present
on CLI.
Which breaks both linux and windows guests in certain conditions:
* Windows: won't enable memory hotplug without SRAT table at all
* Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table
present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers
when memory is hotplugged and guest tries to use it with that drivers.
Fix above issues by automatically creating a numa node when QEMU is started with
memory hotplug enabled but without '-numa' options on CLI.
(PS: auto-create numa node only for new machine types so not to break migration).
Which would provide SRAT table to guests without explicit -numa options on CLI
and would allow:
* Windows: to enable memory hotplug
* Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated
buffers that legacy drivers/hw can handle.
[Rewritten by Igor]
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Cc: Alistair Francis <alistair23@gmail.com>
Cc: Takao Indoh <indou.takao@jp.fujitsu.com>
Cc: Izumi Taku <izumi.taku@jp.fujitsu.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently there is no MMIO range over 4G
reserved for PCI hotplug. Since the 32bit PCI hole
depends on the number of cold-plugged PCI devices
and other factors, it is very possible is too small
to hotplug PCI devices with large BARs.
Fix it by reserving 2G for I4400FX chipset
in order to comply with older Win32 Guest OSes
and 32G for Q35 chipset.
Even if the new defaults of pci-hole64-size will appear in
"info qtree" also for older machines, the property was
not implemented so no changes will be visible to guests.
Note this is a regression since prev QEMU versions had
some range reserved for 64bit PCI hotplug.
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xen vIOMMU device model will be in Xen hypervisor. Skip vIOMMU
check for Xen here when vcpu number is more than 255.
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Message-Id: <1502842933-8323-1-git-send-email-tianyu.lan@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
heterogeneous cpus are not supported and hotplugging different
cpu model crashes QEMU:
qemu-system-x86_64 -cpu qemu64 -smp 1,maxcpus=2
(qemu) device_add host-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,id=foo
(qemu) info cpus
error: failed to get MSR 0x38d
qemu-system-x86_64: target/i386/kvm.c:2121: kvm_get_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
Aborted (core dumped)
Gracefully fail hotplug process in case of user mistake.
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1507638879-200718-1-git-send-email-imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJZwXs9AAoJECgHk2+YTcWmS18P/1OceEmetatwBQ6YWURA0VfU
CB+nCHxijlWXU8UiwoTVDOXc11P00V5VO0mMFouUbv+o+/80qyAfNl2DhDVq7ZXo
nhIFHKXUR1BX3YbcqfIonH8xCMGtrAexghhhaGPQbx450PqmGyyjBsZsXttNge1S
Yt+jzgD9drsZPYw4ZLJjT/tnWI/+8kDPn37jVujzRzApTD9/fq+77ZZq7q25RzQH
ISa5OXOQk6pq+tPHvaIFXfOqfhILcpM7u/X7MYVwiA1oBqcLXTDCjDX3cKavKade
+i3yrH7ahNgUO1DyT2bMZX6NAFoZDBrwlYgpkw8n+Yf+EUcdPDOHHEcEeaMpdTGx
wgWbQrs+xzIg/ulRb8Qqe9FwdXGQbehfFGofd+gnGQ0XuxekT82in+ucMOivQO3x
W/azGnzoz6D83stJFIZ93S69SRswqBuj2R8mu821yzqx1EUSNXgfKXz9OPwwFed5
El9YN127F/VAyp0av1CsOg1XgqIujMUGRxGf7eQBfkh1R3C/g2XNPTvR3yaY9L5B
zuMJfWLF6r6zL53ymt7/9UVEim295Lia3mNGS5/Min5QGms7edphsDsrubzXsZGq
2owWfAU/KeDH9gNVNNkZdLcEcS5TEz+2oGPR5oeDeB/QlzVdNQ3FeTVlzFavxNQa
8nrzeFcw7VNrIx2gdvsY
=LQ8U
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging
Machine/CPU/NUMA queue, 2017-09-19
# gpg: Signature made Tue 19 Sep 2017 21:17:01 BST
# gpg: using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* remotes/ehabkost/tags/machine-next-pull-request:
MAINTAINERS: Update git URLs for my trees
hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM
NUMA: Replace MAX_NODES with nb_numa_nodes in for loop
numa: cpu: calculate/set default node-ids after all -numa CLI options are parsed
arm: drop intermediate cpu_model -> cpu type parsing and use cpu type directly
pc: use generic cpu_model parsing
vl.c: convert cpu_model to cpu type and set of global properties before machine_init()
cpu: make cpu_generic_init() abort QEMU on error
qom: cpus: split cpu_generic_init() on feature parsing and cpu creation parts
hostmem-file: Add "discard-data" option
osdep: Define QEMU_MADV_REMOVE
vl: Clean up user-creatable objects when exiting
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Calculating default node-ids for CPUs in possible_cpu_arch_ids()
is rather fragile since defaults calculation uses nb_numa_nodes but
callback might be potentially called early before all -numa CLI
options are parsed, which would lead to cpus assigned only upto
nb_numa_nodes at the time possible_cpu_arch_ids() is called.
Issue was introduced by
(7c88e65 numa: mirror cpu to node mapping in MachineState::possible_cpus)
and for example CLI:
-smp 4 -numa node,cpus=0 -numa node
would set props.node-id in possible_cpus array for every non
explicitly mapped CPU to the first node.
Issue is not visible to guest nor to mgmt interface due to
1) implictly mapped cpus are forced to the first node in
case of partial mapping
2) in case of default mapping possible_cpu_arch_ids() is
called after all -numa options are parsed (resulting
in correct mapping).
However it's fragile to rely on late execution of
possible_cpu_arch_ids(), therefore add machine specific
callback that returns node-id for CPU and use it to calculate/
set defaults at machine_numa_finish_init() time when all -numa
options are parsed.
Reported-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1496314408-163972-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Convert all the single line uses of fprintf(stderr, "warning:"..."\n"...
to use warn_report() instead. This helps standardise on a single
method of printing warnings to the user.
All of the warnings were changed using this command:
find ./* -type f -exec sed -i \
's|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig' \
{} +
Some of the lines were manually edited to reduce the line length to below
80 charecters.
The #include lines were manually updated to allow the code to compile.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Yongbok Kim <yongbok.kim@imgtec.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com> [mips]
Message-Id: <ae8f8a7f0a88ded61743dff2adade21f8122a9e7.1505158760.git.alistair.francis@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>