ShuriZma/xemu - xemu

Commit Graph

Author	SHA1	Message	Date
Greg Kurz	dba95ebbf8	spapr_pci: parent the MSI memory region to the PHB This memory region should be owned by the PHB. This ensures the PHB cannot be finalized as long as the the region is guest visible, or used by a CPU or a device. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:54 +10:00
Greg Kurz	a931ad137a	spapr_iommu: convert TCE table object to realize() Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:54 +10:00
Greg Kurz	f5babeacc4	spapr_drc: use g_strdup_printf() instead of snprintf() Passing a stack allocated buffer of arbitrary length to snprintf() without checking the return value can cause the resultant strings to be silently truncated. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:54 +10:00
Greg Kurz	a205a053dc	spapr_iommu: use g_strdup_printf() instead of snprintf() Passing a stack allocated buffer of arbitrary length to snprintf() without checking the return value can cause the resultant strings to be silently truncated. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:54 +10:00
Greg Kurz	5c3d70e970	spapr_pci: use memory_region_add_subregion() with DMA windows Passing a null priority to memory_region_add_subregion_overlap() is strictly equivalent to calling memory_region_add_subregion(). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:54 +10:00
Daniel Henrique Barboza	10f12e6450	hw/ppc: CAS reset on early device hotplug This patch is a follow up on the discussions made in patch "hw/ppc: disable hotplug before CAS is completed" that can be found at [1]. At this moment, we do not support CPU/memory hotplug in early boot stages, before CAS. When a hotplug occurs, the event is logged in an internal RTAS event log queue and an IRQ pulse is fired. In regular conditions, the guest handles the interrupt by executing check_exception, fetching the generated hotplug event and enabling the device for use. In early boot, this IRQ isn't caught (SLOF does not handle hotplug events), leaving the event in the rtas event log queue. If the guest executes check_exception due to another hotplug event, the re-assertion of the IRQ ends up de-queuing the first hotplug event as well. In short, a device hotplugged before CAS is considered coldplugged by SLOF. This leads to device misbehavior and, in some cases, guest kernel Ooops when trying to unplug the device. A proper fix would be to turn every device hotplugged before CAS as a colplugged device. This is not trivial to do with the current code base though - the FDT is written in the guest memory at ppc_spapr_reset and can't be retrieved without adding extra state (fdt_size for example) that will need to managed and migrated. Adding the hotplugged DT in the middle of CAS negotiation via the updated DT tree works with CPU devs, but panics the guest kernel at boot. Additional analysis would be necessary for LMBs and PCI devices. There are questions to be made in QEMU/SLOF/kernel level about how we can make this change in a sustainable way. With Linux guests, a fix would be the kernel executing check_exception at boot time, de-queueing the events that happened in early boot and processing them. However, even if/when the newer kernels start fetching these events at boot time, we need to take care of older kernels that won't be doing that. This patch works around the situation by issuing a CAS reset if a hotplugged device is detected during CAS: - the DRC conditions that warrant a CAS reset is the same as those that triggers a DRC migration - the DRC must have a device attached and the DRC state is not equal to its ready_state. With that in mind, this patch makes use of 'spapr_drc_needed' to determine if a CAS reset is needed. - In the middle of CAS negotiations, the function 'spapr_hotplugged_dev_before_cas' goes through all the DRCs to see if there are any DRC that requires a reset, using spapr_drc_needed. If that happens, returns '1' in 'spapr_h_cas_compose_response' which will set spapr->cas_reboot to true, causing the machine to reboot. No changes are made for coldplug devices. [1] http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg02855.html Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:54 +10:00
Daniel Henrique Barboza	5625817423	hw/ppc: clear pending_events on machine reset The sPAPR machine isn't clearing up the pending events QTAILQ on machine reboot. This allows for unprocessed hotplug/epow events to persist in the queue after reset and, when reasserting the IRQs in check_exception later on, these will be being processed by the OS. This patch implements a new function called 'spapr_clear_pending_events' that clears up the pending_events QTAILQ. This helper is then called inside ppc_spapr_reset to clear up the events queue, preventing old/deprecated events from persisting after a reset. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:54 +10:00
Daniel Henrique Barboza	c618e300eb	hw/ppc/spapr_drc.c: change spapr_drc_needed to use drc->dev This patch makes a small fix in 'spapr_drc_needed' to change how we detect if a DRC has a device attached. Previously it used dr_entity_sense for this, which works for physical DRCs. However, for logical DRCs, it didn't cover the case where a logical DRC has a drc->dev but the state is LOGICAL_UNUSABLE (e.g. a hotplugged CPU before CAS). In this case, the dr_entity_sense of this DRC returns UNUSABLE and the code was considering that there were no dev attached, making spapr_drc_needed return 'false' when in fact we would like to migrate the DRC. Changing it to check for drc->dev instead works for all DRC types. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-09-08 09:30:54 +10:00
Igor Mammedov	84efa64c60	ppc: replace cpu_ppc_init() with cpu_generic_init() it's just a wrapper, drop it and use cpu_generic_init() directly Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Hervé Poussineau <hpoussin@reactos.org> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1503592308-93913-26-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-09-01 11:54:25 -03:00
Thomas Huth	1f98e55385	hw/ppc/spapr_iommu: Fix crash when removing the "spapr-tce-table" device QEMU currently aborts unexpectedly when the user tries to add and remove a "spapr-tce-table" device: $ qemu-system-ppc64 -nographic -S -nodefaults -monitor stdio QEMU 2.9.92 monitor - type 'help' for more information (qemu) device_add spapr-tce-table,id=x (qemu) device_del x ** ERROR:qemu/qdev-monitor.c:872:qdev_unplug: assertion failed: (hotplug_ctrl) Aborted (core dumped) The device should not be accessable for the users at all, it's just used internally, so mark it with user_creatable = false. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-08-22 21:26:46 +10:00
Thomas Huth	8ccccff9dd	hw/ppc/spapr_rtc: Mark the RTC device with user_creatable = false QEMU currently aborts unexpectedly when a user tries to do something like this: $ qemu-system-ppc64 -nographic -S -nodefaults -monitor stdio QEMU 2.9.92 monitor - type 'help' for more information (qemu) device_add spapr-rtc,id=spapr-rtc (qemu) device_del spapr-rtc ** ERROR:qemu/qdev-monitor.c:872:qdev_unplug: assertion failed: (hotplug_ctrl) Aborted (core dumped) The RTC device is not meant to be hot-pluggable - it's an internal device only and it even should not be possible to create it a second time with the "-device" parameter, so let's mark this with "user_creatable = false". Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-08-22 21:26:46 +10:00
Thomas Huth	0479097859	hw/ppc/spapr: Fix segfault when instantiating a 'pc-dimm' without 'memdev' QEMU currently crashes when trying to use a 'pc-dimm' on the pseries machine without specifying its 'memdev' property. This happens because pc_dimm_get_memory_region() does not check whether the 'memdev' property has properly been set by the user. Looking closer at this function, it's also obvious that it is using &error_abort to call another function - and this is bad in a function that is used in the hot-plugging calling chain since this can also cause QEMU to exit unexpectedly. So let's fix these issues in a proper way now: Add a "Error **errp" parameter to pc_dimm_get_memory_region() which we use in case the 'memdev' property has not been set by the user, and which we can use instead of the &error_abort, and change the callers of get_memory_region() to make use of this "errp" parameter for proper error checking. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-08-22 21:26:46 +10:00
Bharata B Rao	188bfe1b00	spapr: Allow configure-connector to be called multiple times In case of in-kernel memory hot unplug, when the guest is not able to remove all the LMBs that are requested for removal, it will add back any LMBs that have been successfully removed. The DR Connectors of these LMBs wouldn't have been unconfigured and hence the addition of these LMBs will result in configure-connector call being issued on LMB DR connectors that are already in configured state. Such configure-connector calls will fail resulting in a DIMM which is partially unplugged. This however worked till recently before we overhauled the DRC implementation in QEMU. Commit 9d4c0f4f0a71e: "spapr: Consolidate DRC state variables" is the first commit where this problem shows up as per git bisect. Ideally guest shouldn't be issuing configure-connector call on an already configured DR connector. However for now, work around this in QEMU by allowing configure-connector to be called multiple times for all types of DR connectors. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> [dwg: Corrected buglet that would have initialized fdt pointers ready for reading on a device not present at reset] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-08-22 21:26:46 +10:00
Sam Bobroff	f57467e3b3	spapr: Fix bug in h_signal_sys_reset() The unicast case in h_signal_sys_reset() seems to be broken: rather than selecting the target CPU, it looks like it will pick either the first CPU or fail to find one at all. Fix it by using the search function rather than open coding the search. This was found by inspection; the code appears to be unused because the Linux kernel only uses the broadcast target. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-08-09 14:04:28 +10:00
Greg Kurz	325837ca38	spapr_drc: abort if object_property_add_child() fails object_property_add_child() can only fail in two cases: - the child already has a parent, which shouldn't happen since the DRC was allocated a few lines above - the parent already has a child with the same name, which would mean the caller tries to create a DRC that already exists In both case, this is a QEMU bug and we should abort. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-08-09 11:46:44 +10:00
Vladimir Sementsov-Ogievskiy	8908eb1a4a	trace-events: fix code style: print 0x before hex numbers The only exception are groups of numers separated by symbols '.', ' ', ':', '/', like 'ab.09.7d'. This patch is made by the following: > find . -name trace-events \| xargs python script.py where script.py is the following python script: ========================= #!/usr/bin/env python import sys import re import fileinput rhex = '%[-+ .0-9](?:[hljztL]\|ll\|hh)?(?:x\|X\|"\sPRI[xX][^"]"?)' rgroup = re.compile('((?:' + rhex + '[.:/ ])+' + rhex + ')') rbad = re.compile('(?<!0x)' + rhex) files = sys.argv[1:] for fname in files: for line in fileinput.input(fname, inplace=True): arr = re.split(rgroup, line) for i in range(0, len(arr), 2): arr[i] = re.sub(rbad, '0x\g<0>', arr[i]) sys.stdout.write(''.join(arr)) ========================= Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Message-id: 20170731160135.12101-5-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-08-01 12:13:07 +01:00
Philippe Mathieu-Daudé	87e0331c5a	docs: fix broken paths to docs/devel/tracing.txt With the move of some docs/ to docs/devel/ on `ac06724a71`, no references were updated. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-07-31 13:12:53 +03:00
David Gibson	fc7e0765fc	Revert "spapr: populate device tree depending on XIVE_EXPLOIT option" This reverts commit `b87680427e`. I thought this was a harmless preliminary for XIVE enablement patches we expect later on. However, due to some subtle interactions between qemu and SLOF (guest firmware) this breaks some things. Revert it for now, we'll work out how to fix it when the rest of the XIVE patches are ready. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-29 16:22:14 +10:00
Greg Kurz	bf26ae32a9	spapr_drc: fix realize and unrealize If object_property_add_alias() returns an error in realize(), we should propagate it to the caller and certainly not unref the DRC. Same thing goes for unrealize(). Since object_property_del() is the last call, we can even get rid of the intermediate Error *. And finally, unrealize() should undo all registrations performed by realize(). Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-29 16:22:14 +10:00
Bharata B Rao	8d5981c4fc	spapr: Fix QEMU abort during memory unplug Commit `0cffce56` (hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState) introduced a new way to track pending LMBs of DIMM device that is marked for removal. Since this commit we can hit the assert in spapr_pending_dimm_unplugs_add() in the following situation: - DIMM device removal fails as the guest doesn't allow the removal. - Subsequent attempt to remove the same DIMM would hit the assert as the corresponding sPAPRDIMMState is still part of the pending_dimm_unplugs list. Fix this by removing the assert and conditionally adding the sPAPRDIMMState to pending_dimm_unplugs list only when it is not already present. Fixes: `0cffce56ae` Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> [dwg: Tweaked to avoid returning NULL when spapr_pending_dimm_unplugs_add() does find an existing entry] Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-25 11:14:25 +10:00
Laurent Vivier	e8cd4247e9	spapr/htab: fix savevm Commit `3a38429` ("spapr: Add a "no HPT" encoding to HTAB migration stream") allows to migrate an empty HPT, but doesn't mark correctly the end of the migration stream. The end condition (value returned by htab_save_iterate()) should be 1, whereas in `3a38429` it returns 0. The problem can be reproduced with QEMU monitor command "savevm": the command never stops and the disk image grows without limit. Fixes: `3a38429748` Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-25 11:14:25 +10:00
Alexey Kardashevskiy	18f2330ef5	spapr_pci: Fix obsolete comment about MSIX encoding in addr/data `f1c2dc7c86` "spapr-pci: rework MSI/MSIX" (07/2013) changed MSIX encoding but forgot to change the comment so this changes it. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-25 11:14:25 +10:00
Markus Armbruster	d2f95f4d48	qapi: Use QNull for a more regular visit_type_null() Make visit_type_null() take an @obj argument like its buddies. This helps keep the next commit simple. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com>	2017-07-24 13:35:11 +02:00
Peter Maydell	77031ee1ce	ppc patch queue 2017-07-17 This pull requests supersedes the one from 2017-07-14. That one had a couple of subtle regressions: there was a build error for mingw32, and an instance_size which was theoretically wrong everywhere, but only actually bit on the Travis OSX build. There are two major batches in this set, rather than the usual collection of assorted fixes. * More DRC cleanup. This gets the state management into a state which should fix many of the hotplug+migration problems we've had. Plus it gets the migration stream format into something well defined and pretty minimal which we can reasonably support into the future. * Hashed Page Table resizing. It's been a while since this was posted, but it's been through several previous rounds of review. The kernel parts (both guest and host) are merged in 4.11, so this is the only remaining piece left to allow resizing of the HPT in a running guest. There are also a handful of unrelated fixes. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAllsWwQACgkQbDjKyiDZ s5LMnA//dpoqWrTPiEmx2DsXMkjLefn/2Yl1dkQDzhyb7v+tNGFYmxpbb7nPRfJE tfvcKu1Tz23NPOp6+1VC9eTyTO1YOXTgvQrNSbF1MmIg4PGN6s2DHrLviAqCS15M 29x6+RdRaeLUSCsk8elsViiWb8h7cISDuN0SMA0WWjWP3bO/drz5nq5z5dRgdVFe Z5O0qwDNoN0NypJ68Cld+riP1uDAYMONPxA0QOWCLx8qowoJ3hYMuyNnqBQU5OJn PpAA3EfdxkN6rtaBjDt7xHkJfm9Xkm9SsT8qTcj/R2JjkENef8EbzrdjFE+pSVz0 7c9C4evgYgmhUCUFvnZfgN+VBL1lS/p5UGnFPyNQ7KbSXDE71OAgWH/f/7kzsJPy MxbJWM6eUN9Ny0APxM8olLV1FM4GzEoCSLfDVhStrdJ6P5wBmjLSugqSOLB8aMtd 8NwBY06nTpmo9xXGz9enLUWlpSeoReKU3TxvQvY+JcOWWpasDZOO4zD8B3bdLbA/ I8jdkH5Vs0pyPLaWD+1FxlQvlF45CuwpwoiAz00V2XkkMu8jKCGsQ0iuqXorSqvs /7tQ1pHlUybAX+5W9raaJmphgc4gk33P3PlQCjhgYzxRu4yzRsEzS9hahoO/TAmq Y70CooZaaeGNOBEDcKLZEzJdBr52cqW4MM8t1xHWTg3VCHJGeYI= =O6NQ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170717' into staging ppc patch queue 2017-07-17 This pull requests supersedes the one from 2017-07-14. That one had a couple of subtle regressions: there was a build error for mingw32, and an instance_size which was theoretically wrong everywhere, but only actually bit on the Travis OSX build. There are two major batches in this set, rather than the usual collection of assorted fixes. * More DRC cleanup. This gets the state management into a state which should fix many of the hotplug+migration problems we've had. Plus it gets the migration stream format into something well defined and pretty minimal which we can reasonably support into the future. * Hashed Page Table resizing. It's been a while since this was posted, but it's been through several previous rounds of review. The kernel parts (both guest and host) are merged in 4.11, so this is the only remaining piece left to allow resizing of the HPT in a running guest. There are also a handful of unrelated fixes. # gpg: Signature made Mon 17 Jul 2017 07:36:52 BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.10-20170717: (21 commits) target/ppc: fix CPU hotplug when radix is enabled (TCG) spapr: fix memory leak in spapr_core_pre_plug() pseries: Allow HPT resizing with KVM pseries: Use smaller default hash page tables when guest can resize pseries: Enable HPT resizing for 2.10 pseries: Implement HPT resizing pseries: Stubs for HPT resizing ppc/pnv: Remove unused XICSState reference spapr: fix potential memory leak in spapr_core_plug() spapr: Implement DR-indicator for physical DRCs only spapr: Remove sPAPRConfigureConnectorState sub-structure spapr: Consolidate DRC state variables spapr: Cleanups relating to DRC awaiting_release field spapr: Refactor spapr_drc_detach() spapr: Abort on delete failure in spapr_drc_release() spapr: Simplify unplug path spapr: Remove 'awaiting_allocation' DRC flag spapr: Treat devices added before inbound migration as coldplugged spapr: Minor cleanups to events handling spapr: migrate pending_events of spapr state ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-07-17 12:52:59 +01:00
Greg Kurz	df8658de43	spapr: fix memory leak in spapr_core_pre_plug() In case of error, we must ensure the dynamically allocated base_core_type is freed, like it is done everywhere else in this function. This is a regression introduced in QEMU 2.9 by commit `8149e2992f`. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
David Gibson	b55d295e3e	pseries: Allow HPT resizing with KVM So far, qemu implements the PAPR Hash Page Table (HPT) resizing extension with TCG. The same implementation will work with KVM PR, but we don't currently allow that. For KVM HV we can only implement resizing with the assistance of the host kernel, which needs a new capability and ioctl()s. This patch adds support for testing the new KVM capability and implementing the resize in terms of KVM facilities when necessary. If we're running on a kernel which doesn't have the new capability flag at all, we fall back to testing for PR vs. HV KVM using the same hack that we already use in a number of places for older kernels. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
David Gibson	2772cf6be9	pseries: Use smaller default hash page tables when guest can resize We've now implemented a PAPR extension allowing PAPR guest to resize their hash page table (HPT) during runtime. This patch makes use of that facility to allocate smaller HPTs by default. Specifically when a guest is aware of the HPT resize facility, qemu sizes the HPT to the initial memory size, rather than the maximum memory size on the assumption that the guest will resize its HPT if necessary for hot plugged memory. When the initial memory size is much smaller than the maximum memory size (a common configuration with e.g. oVirt / RHEV) then this can save significant memory on the HPT. If the guest does not advertise HPT resize awareness when it makes the ibm,client-architecture-support call, qemu resizes the HPT for maxmimum memory size (unless it's been configured not to allow such guests at all). For now we make that reallocation assuming the guest has not yet used the HPT at all. That's true in practice, but not, strictly, an architectural or PAPR requirement. If we need to in future we can fix this by having the client-architecture-support call reboot the guest with the revised HPT size (the client-architecture-support call is explicitly permitted to trigger a reboot in this way). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>	2017-07-17 15:07:05 +10:00
David Gibson	52b81ab5e9	pseries: Enable HPT resizing for 2.10 We've now implemented a PAPR extensions which allows PAPR guests (i.e. "pseries" machine type) to resize their hash page table during runtime. However, that extension is only enabled if explicitly chosen on the command line. This patch enables it by default for spapr-2.10, but leaves it disabled (by default) for older machine types. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-07-17 15:07:05 +10:00
David Gibson	0b0b831016	pseries: Implement HPT resizing This patch implements hypercalls allowing a PAPR guest to resize its own hash page table. This will eventually allow for more flexible memory hotplug. The implementation is partially asynchronous, handled in a special thread running the hpt_prepare_thread() function. The state of a pending resize is stored in SPAPR_MACHINE->pending_hpt. The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or, if one is already in progress, monitor it for completion. If there is an existing HPT resize in progress that doesn't match the size specified in the call, it will cancel it, replacing it with a new one matching the given size. The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only be called successfully once H_RESIZE_HPT_PREPARE has successfully completed initialization of a new HPT. The guest must ensure that there are no concurrent accesses to the existing HPT while this is called (this effectively means stop_machine() for Linux guests). For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each HPTE into the new HPT. This can have quite high latency, but it seems to be of the order of typical migration downtime latencies for HPTs of size up to ~2GiB (which would be used in a 256GiB guest). In future we probably want to move more of the rehashing to the "prepare" phase, by having H_ENTER and other hcalls update both current and pending HPTs. That's a project for another day, but should be possible without any changes to the guest interface. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
David Gibson	30f4b05bd0	pseries: Stubs for HPT resizing This introduces stub implementations of the H_RESIZE_HPT_PREPARE and H_RESIZE_HPT_COMMIT hypercalls which we hope to add in a PAPR extension to allow run time resizing of a guest's hash page table. It also adds a new machine property for controlling whether this new facility is available. For now we only allow resizing with TCG, allowing it with KVM will require kernel changes as well. Finally, it adds a new string to the hypertas property in the device tree, advertising to the guest the availability of the HPT resizing hypercalls. This is a tentative suggested value, and would need to be standardized by PAPR before being merged. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-07-17 15:07:05 +10:00
Greg Kurz	e49c63d5b3	spapr: fix potential memory leak in spapr_core_plug() Since commit `5c1da81215` ("spapr: Remove unnecessary differences between hotplug and coldplug paths"), the CPU DT for the DRC is always allocated. This causes a memory leak for pseries-2.6 and older machine types, that don't support CPU hotplug and don't allocate DRCs for CPUs. Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
David Gibson	67fea71bf3	spapr: Implement DR-indicator for physical DRCs only According to PAPR, the DR-indicator should only be valid for physical DRCs, not logical DRCs. At the moment we implement it for all DRCs, so restrict it to physical ones only. We move the state to the physical DRC subclass, which means adding some QOM boilerplate to handle the newly distinct type. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
David Gibson	4445b1d27e	spapr: Remove sPAPRConfigureConnectorState sub-structure Most of the time, the state of a DRC object is contained in the single 'state' variable. However, during the transition from UNISOLATE to CONFIGURED state requires multiple calls to the ibm,configure-connector RTAS call to retrieve the device tree for the attached device. We need some extra state to keep track of where we're up to in delivering the device tree information to the guest. Currently that extra state is in a sPAPRConfigureConnectorState substructure which is only allocated when we're in the middle of the configure connector process. That sounds like a good idea, but the extra state is only two integers - on many platforms that will take up the same room as the (maybe NULL) ccs pointer even before malloc() overhead. Plus it's another object whose lifetime we need to manage. In short, it's not worth it. So, fold the sPAPRConfigureConnectorState substructure directly into the DRC object. Previously the structure was allocated lazily when the configure-connector call discovers it's not there. Now, we need to initialize the subfields pre-emptively, as soon as we enter UNISOLATE state. Although it's not strictly necessary (the field values should only ever be consulted when in UNISOLATE state), we try to keep them at -1 when in other states, as a debugging aid. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
David Gibson	9d4c0f4f0a	spapr: Consolidate DRC state variables Each DRC has three fields describing its state: isolation_state, allocation_state and configured. At first this seems like a reasonable representation, since its based directly on the PAPR defined isolation-state and allocation-state indicators. However: * Only a few combinations of the two fields' values are permitted * allocation_state isn't used at all for physical DRCs * The indicators are write only so they don't really have a well defined current value independent of each other This replaces these variables with a single state variable, whose names and numbers are based on the diagram in LoPAPR section 13.4. Along with this we add code to check the current state on various operations and make sure the requested transition is permitted. Strictly speaking, this makes guest visible changes to behaviour (since we probably allowed some transitions we shouldn't have before). However, a hypothetical guest broken by that wasn't PAPR compliant, and probably wouldn't have worked under PowerVM. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
David Gibson	f1c52354e5	spapr: Cleanups relating to DRC awaiting_release field 'awaiting_release' indicates that the host has requested an unplug of the device attached to the DRC, but the guest has not (yet) put the device into a state where it is safe to complete removal. 1. Rename it to 'unplug_requested' which to me at least is clearer 2. Remove the ->release_pending() method used to check this from outside spapr_drc.c. The method only plausibly has one implementation, so use a plain function (spapr_drc_unplug_requested()) instead. 3. Remove it from the migration stream. Attempting to migrate mid-unplug is broken not just for spapr - in general management has no good way to determine if the device should be present on the destination or not. So, until that's fixed, there's no point adding extra things to the stream. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
David Gibson	a8dc47fd82	spapr: Refactor spapr_drc_detach() This function has two unused parameters - remove them. It also sets awaiting_release on all paths, except one. On that path setting it is harmless, since it will be immediately cleared by spapr_drc_release(). So factor it out of the if statements. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
David Gibson	ba50822ff8	spapr: Abort on delete failure in spapr_drc_release() We currently ignore errors from the object_property_del() in spapr_drc_release(). But the only way that could fail is if the property doesn't exist, in which case it's a bug that we're in spapr_drc_release() at all. So change from ignoring to abort()ing on errors. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
David Gibson	765d1bdda5	spapr: Simplify unplug path spapr_lmb_release() and spapr_core_release() call hotplug_handler_unplug() which after a bunch of indirection calls spapr_memory_unplug() or spapr_core_unplug(). But we already know which is the appropriate thing to call here, so we can just fold it directly into the release function. Once that's done, there's no need for an hc->unplug method in the spapr machine at all: since we also have an hc->unplug_request method, the hotplug core will never use ->unplug. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
David Gibson	82a93a1d30	spapr: Remove 'awaiting_allocation' DRC flag The awaiting_allocation flag in the DRC was introduced by `aab9913` "spapr_drc: Prevent detach racing against attach for CPU DR", allegedly to prevent a guest crash on racing attach and detach. Except.. information from the BZ actually suggests a qemu crash, not a guest crash. And there shouldn't be a problem here anyway: if the guest has already moved the DRC away from UNUSABLE state, the detach would already be deferred, and if it hadn't it should be safe to detach it (the guest should fail gracefully when it attempts to change the allocation state). I think this was probably just a bandaid for some other problem in the state management. So, remove awaiting_allocation and associated code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
Laurent Vivier	94fd9cbaa3	spapr: Treat devices added before inbound migration as coldplugged When migrating a guest which has already had devices hotplugged, libvirt typically starts the destination qemu with -incoming defer, adds those hotplugged devices with qmp, then initiates the incoming migration. This causes problems for the management of spapr DRC state. Because the device is treated as hotplugged, it goes into a DRC state for a device immediately after it's plugged, but before the guest has acknowledged its presence. However, chances are the guest on the source machine has acknowledged the device's presence and configured it. If the source has fully configured the device, then DRC state won't be sent in the migration stream: for maximum migration compatibility with earlier versions we don't migrate DRCs in coldplug-equivalent state. That means that the DRC effectively changes state over the migrate, causing problems later on. In addition, logging hotplug events for these devices isn't what we want because a) those events should already have been issued on the source host and b) the event queue should get wiped out by the incoming state anyway. In short, what we really want is to treat devices added before an incoming migration as if they were coldplugged. To do this, we first add a spapr_drc_hotplugged() helper which determines if the device is hotplugged in the sense relevant for DRC state management. We only send hotplug events when this is true. Second, when we add a device which isn't hotplugged in this sense, we force a reset of the DRC state - this ensures the DRC is in a coldplug-equivalent state (there isn't usually a system reset between these device adds and the incoming migration). This is based on an earlier patch by Laurent Vivier, cleaned up and extended. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-07-17 15:07:05 +10:00
David Gibson	5341258e86	spapr: Minor cleanups to events handling The rtas_error_log structure is marked packed, which strongly suggests its precise layout is important to match an external interface. Along with that one could expect it to have a fixed endianness to match the same interface. That used to be the case - matching the layout of PAPR RTAS event format and requiring BE fields. Now, however, it's only used embedded within sPAPREventLogEntry with the fields in native order, since they're processed internally. Clear that up by removing the nested structure in sPAPREventLogEntry. struct rtas_error_log is moved back to spapr_events.c where it is used as a temporary to help convert the fields in sPAPREventLogEntry to the correct in memory format when delivering an event to the guest. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
Daniel Henrique Barboza	fd38804b38	spapr: migrate pending_events of spapr state In racing situations between hotplug events and migration operation, a rtas hotplug event could have not yet be delivered to the source guest when migration is started. In this case the pending_events of spapr state need be transmitted to the target so that the hotplug event can be finished on the target. To achieve the minimal VMSD possible to migrate the pending_events list, this patch makes the changes in spapr_events.c: - 'log_type' of sPAPREventLogEntry struct deleted. This information can be derived by inspecting the rtas_error_log summary field. A new function called 'spapr_event_log_entry_type' was added to retrieve the type of a given sPAPREventLogEntry. - sPAPREventLogEntry, epow_log_full and hp_log_full were redesigned. The only data we're going to migrate in the VMSD is the event log data itself, which can be divided in two parts: a rtas_error_log header and an extended event log field. The rtas_error_log header contains information about the size of the extended log field, which can be used inside VMSD as the size parameter of the VBUFFER_ALOC field that will store it. To allow this use, the header.extended_length field must be exposed inline to the VMSD instead of embedded into a 'data' field that holds everything. With this in mind, the following changes were done: * a new 'header' field was added to sPAPREventLogEntry. This field holds a a struct rtas_error_log inline. * the declaration of the 'rtas_error_log' struct was moved to spapr.h to be visible to the VMSD macros. * 'data' field of sPAPREventLogEntry was renamed to 'extended_log' and now holds only the contents of the extended event log. * 'struct rtas_error_log hdr' were taken away from both epow_log_full and hp_log_full. This information is now available at the header field of sPAPREventLogEntry. * epow_log_full and hp_log_full were renamed to epow_extended_log and hp_extended_log respectively. This rename makes it clearer to understand the new purpose of both structures: hold the information of an extended event log field. * spapr_powerdown_req and spapr_hotplug_req_event now creates a sPAPREventLogEntry structure that contains the full rtas log entry. * rtas_event_log_queue and rtas_event_log_dequeue now receives a sPAPREventLogEntry pointer as a parameter instead of a void pointer. - the endianess of the sPAPREventLogEntry header is now native instead of be32. We can use the fields in native endianess internally and write them in be32 in the guest physical memory inside 'check_exception'. This allows the VMSD inside spapr.c to read the correct size of the entended_log field. - inside spapr.c, pending_events is put in a subsection in the spapr state VMSD to make sure migration across different versions is not broken. A small change in rtas_event_log_queue and rtas_event_log_dequeue were also made: instead of calling qdev_get_machine(), both functions now receive a pointer to the sPAPRMachineState. This pointer is already available in the callers of these functions and we don't need to waste resources calling qdev() again. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:07:05 +10:00
David Gibson	3579d606a0	spapr: Remove unnecessary instance_size specifications from DRC subtypes All the DRC subtypes explicitly list instance_size in TypeInfo (all as sizeof(sPAPRDRConnector). This isn't necessary, since if it's not listed it will be derived from the parent type. Worse, this is dangerous, because if a subtype is changed in future to have a larger structure, then subtypes of that subtype also need to have instance_size changed, or it will lead to hard to track memory corruption bugs. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-17 15:06:08 +10:00
Peter Maydell	98a99ce084	hw: Use new memory_region_init_{ram, rom, rom_device}() functions Use the new functions memory_region_init_{ram,rom,rom_device}() instead of manually calling the _nomigrate() version and then vmstate_register_ram_global(). Patch automatically created using coccinelle script: spatch --in-place -sp_file scripts/coccinelle/memory-region-init-ram.cocci -dir hw (As it turns out, there are no instances of the rom and rom_device functions that are caught by this script.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1499438577-7674-8-git-send-email-peter.maydell@linaro.org	2017-07-14 17:59:42 +01:00
Peter Maydell	1cfe48c1ce	memory: Rename memory_region_init_ram() to memory_region_init_ram_nomigrate() Rename memory_region_init_ram() to memory_region_init_ram_nomigrate(). This leaves the way clear for us to provide a memory_region_init_ram() which does handle migration. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1499438577-7674-4-git-send-email-peter.maydell@linaro.org	2017-07-14 17:59:42 +01:00
Peter Maydell	6c6076662d	* gdbstub fixes (Alex) * IOMMU MemoryRegion subclass (Alexey) * Chardev hotswap (Anton) * NBD_OPT_GO support (Eric) * Misc bugfixes * DEFINE_PROP_LINK (minus the ARM patches - Fam) * MAINTAINERS updates (Philippe) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJZaJejAAoJEL/70l94x66DwQ4H/0NUvh/Zfs64wE1iuZJACc24 1za02fFaB50vFDwQKWbM0GkHzDxoXBHk4Rvn92p+VSxpKtaAX4GRwCvxRA5GeUtm GAYbdIJUe0UELepKExrlUVzQcK9VfljoJpK3dZkP5Zzx83L2PAI/SexrZRibN2Uf yRI60uvlsMWU12nenzdVnYORd+TWDNKele7BhMrX/FX9wxaS1PlnsnKZggy6CU7G 8dwZJAZJ/s5tRGXyXyAQzLm5JZQCLnA6jxya540TbPeciFgbvvS2ydIitZ54vSPO VtmZ1rSWfTEbNF5xGD1Ztu8aAENr5/I05l6IjxZd45BdUCW3HxeJkc+7lE0K4uk= =wnVs -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * gdbstub fixes (Alex) * IOMMU MemoryRegion subclass (Alexey) * Chardev hotswap (Anton) * NBD_OPT_GO support (Eric) * Misc bugfixes * DEFINE_PROP_LINK (minus the ARM patches - Fam) * MAINTAINERS updates (Philippe) # gpg: Signature made Fri 14 Jul 2017 11:06:27 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (55 commits) spapr_rng: Convert to DEFINE_PROP_LINK cpu: Convert to DEFINE_PROP_LINK mips_cmgcr: Convert to DEFINE_PROP_LINK ivshmem: Convert to DEFINE_PROP_LINK dimm: Convert to DEFINE_PROP_LINK virtio-crypto: Convert to DEFINE_PROP_LINK virtio-rng: Convert to DEFINE_PROP_LINK virtio-scsi: Convert to DEFINE_PROP_LINK virtio-blk: Convert to DEFINE_PROP_LINK qdev: Add const qualifier to PropertyInfo definitions qmp: Use ObjectProperty.type if present qdev: Introduce DEFINE_PROP_LINK qdev: Introduce PropertyInfo.create qom: enforce readonly nature of link's check callback translate-all: remove redundant !tcg_enabled check in dump_exec_info vl: fix breakage of -tb-size nbd: Implement NBD_INFO_BLOCK_SIZE on client nbd: Implement NBD_INFO_BLOCK_SIZE on server nbd: Implement NBD_OPT_GO on client nbd: Implement NBD_OPT_GO on server ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-07-14 12:16:09 +01:00
Fam Zheng	68c761e19c	spapr_rng: Convert to DEFINE_PROP_LINK Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170714021509.23681-21-famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:43 +02:00
Alexey Kardashevskiy	1221a47467	memory/iommu: introduce IOMMUMemoryRegionClass This finishes QOM'fication of IOMMUMemoryRegion by introducing a IOMMUMemoryRegionClass. This also provides a fastpath analog for IOMMU_MEMORY_REGION_GET_CLASS(). This makes IOMMUMemoryRegion an abstract class. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170711035620.4232-3-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:41 +02:00
Alexey Kardashevskiy	3df9d74806	memory/iommu: QOM'fy IOMMU MemoryRegion This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion as a parent. This moves IOMMU-related fields from MR to IOMMU MR. However to avoid dymanic QOM casting in fast path (address_space_translate, etc), this adds an @is_iommu boolean flag to MR and provides new helper to do simple cast to IOMMU MR - memory_region_get_iommu. The flag is set in the instance init callback. This defines memory_region_is_iommu as memory_region_get_iommu()!=NULL. This switches MemoryRegion to IOMMUMemoryRegion in most places except the ones where MemoryRegion may be an alias. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20170711035620.4232-2-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-07-14 12:04:41 +02:00
Alistair Francis	3dc6f86936	Convert error_report() to warn_report() Convert all uses of error_report("warning:"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using these two commands: find ./* -type f -exec sed -i \ 's\|error_report(".*warning[,:] \|warn_report("\|Ig' {} + Indentation fixed up manually afterwards. The test-qdev-global-props test case was manually updated to ensure that this patch passes make check (as the test cases are case sensitive). Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Suggested-by: Thomas Huth <thuth@redhat.com> Cc: Jeff Cody <jcody@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Lieven <pl@kamp.de> Cc: Josh Durgin <jdurgin@redhat.com> Cc: "Richard W.M. Jones" <rjones@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Greg Kurz <groug@kaod.org> Cc: Rob Herring <robh@kernel.org> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Peter Chubb <peter.chubb@nicta.com.au> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Alexander Graf <agraf@suse.de> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Greg Kurz <groug@kaod.org> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed by: Peter Chubb <peter.chubb@data61.csiro.au> Acked-by: Max Reitz <mreitz@redhat.com> Acked-by: Marcel Apfelbaum <marcel@redhat.com> Message-Id: <e1cfa2cd47087c248dd24caca9c33d9af0c499b0.1499866456.git.alistair.francis@xilinx.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-07-13 13:49:58 +02:00
Peter Maydell	aa916e409c	ppc patch queue 2017-07-11 * Several minor cleanups from Greg Kurz * Fix for migration of pseries-2.7 and earlier machine types * More reworking of the DRC hotplug code, fixing several problems though there are still more to go * Fixes for CPU family / alias handling on POWER9 * Preliminary patches for POWER9 XIVE (new interrupt controller) support * Assorted other fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJZZFWEAAoJEGw4ysog2bOSxgAQAI85Vv8RuK1mgN0w0aIguP09 JIM+iZ3zJwSFM3A/D8CnWxMGEQkjkVfKWT8cB97v5vPGTu21WD2hdQ26ZrcjC8Do Y5sPuCGRRSZvz+tnz17HU2aZMQwteNNgdes9MGr61kdVUk+1uvcyqTdhqxka5rF7 SYcIEf95+Fcu00+bhwGaGg0ZXHer4rSTjDXbT3CcxT64sgQW8X36SceFBkFH0P40 tX1bn9gdQgBNOT11O0MNeq6ewxHhSSusTwyYXpHTvK6p0EXPqfm+vM9dQSmXeKsk T7/yDmKplutVnWlfbxrdG+wp+ObE1h7KljGdWLx4jIX58dHVvjDJ+kZ+OJbcb6Xj oEV947tYkZaDC7q7TkwXjYltbq+A6HFFKEwxJ59L4zYgVYVkTUMRJ3Apl66sq5a1 SHEBXAA5SDq8jxdKKqvwzh4ZtkkxIelOO8lTVjOAg8ffcNfEwbJOuom2h0kgzOgz Sn2PxC/jwk2RZZ4T+qe1KNpVbV3RYpGanMXYDMFUnTRw2RAU2io0R2bBwOlm/0I7 ZUrjD2xCFrMPuthxr5/5/w0P1StALVN50S5YqWvDuQYIbMYhSjSh3tDgAHVrqL4W Yc1Zr5X9X91qgUjAkejBuirvWLvgofiw8jlqAZ6K2zTUcvtn0KdQGe7eiK+wostA PhLW9tYrkpt/BmzEMi1X =8Wy2 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170711' into staging ppc patch queue 2017-07-11 * Several minor cleanups from Greg Kurz * Fix for migration of pseries-2.7 and earlier machine types * More reworking of the DRC hotplug code, fixing several problems though there are still more to go * Fixes for CPU family / alias handling on POWER9 * Preliminary patches for POWER9 XIVE (new interrupt controller) support * Assorted other fixes # gpg: Signature made Tue 11 Jul 2017 05:35:16 BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.10-20170711: spapr: populate device tree depending on XIVE_EXPLOIT option spapr: introduce the XIVE_EXPLOIT option in CAS ppc/kvm: have the "family" CPU alias to point to TYPE_HOST_POWERPC_CPU spapr: Only report host/guest IOMMU page size mismatches on KVM spapr: fix memory hotplug error path target/ppc: Add debug function for radix mmu translation target/ppc: Refactor tcg radix mmu code spapr: Use unplug_request for PCI hot unplug spapr: Remove unnecessary differences between hotplug and coldplug paths spapr: Add DRC release method spapr: Uniform DRC reset paths spapr: Leave DR-indicator management to the guest target-ppc: SPR_BOOKE_ESR not set on FP exceptions spapr: fix migration to pseries machine < 2.8 spapr: fix bogus function name in comment spapr: refresh "platform-specific" hcalls comment spapr: make spapr_populate_hotplug_cpu_dt() static Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-07-11 16:34:09 +01:00
Cédric Le Goater	b87680427e	spapr: populate device tree depending on XIVE_EXPLOIT option When XIVE is supported, the device tree should be populated accordingly and the XIVE memory regions mapped to activate MMIOs. Depending on the design we choose, we could also allocate different ICS and ICP objects, or switch between objects. This needs to be discussed. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-11 11:04:02 +10:00
Cédric Le Goater	f2b14e3a9f	spapr: introduce the XIVE_EXPLOIT option in CAS On POWER9, the Client Architecture Support (CAS) negotiation process determines whether the guest operates in XIVE Legacy compatibility (the former POWER8 interrupt model) or in XIVE exploitation mode (the newer POWER9 interrupt model). Bit 7 of Byte 23 of vector 5 is used for this purpose. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-11 11:04:02 +10:00
David Gibson	2a0d90fed5	spapr: Only report host/guest IOMMU page size mismatches on KVM We print a warning if the spapr IOMMU isn't configured to support a page size matching the host page size backing RAM. When that's the case we need more complex logic to translate VFIO mappings, which is slower. But, it's not so slow that it would be at all noticeable against the general slowness of TCG. So, only warn when using KVM. This removes some noisy and unhelpful warnings from make check on hosts with page sizes which typically differ from those on POWER (e.g. Sparc). Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com>	2017-07-11 11:04:02 +10:00
Greg Kurz	160bb67885	spapr: fix memory hotplug error path QEMU shouldn't abort if spapr_add_lmbs()->spapr_drc_attach() fails. Let's propagate the error instead, like it is done everywhere else where spapr_drc_attach() is called. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-11 11:04:02 +10:00
David Gibson	3340e5c4f2	spapr: Use unplug_request for PCI hot unplug AIUI, ->unplug_request in the HotplugHandler is used for "soft" unplug, where acknowledgement from the guest is required before completing the unplug, whereas ->unplug is used for "hard" unplug where qemu unilaterally removes the device, and the guest just has to cope with its sudden absence. For spapr we (correctly) use ->unplug_request for CPU and memory hot unplug but we use ->unplug for PCI. While I think it might be possible to support "hard" PCI unplug within the PAPR model, that's not how it actually works now. Although it's called from ->unplug, the PCI unplug path will usually just mark the device for removal, with completion of the unplug delayed until userspace responds to the unplug notification. If the guest doesn't respond as expected, that could delay the unplug completion arbitrarily long. To reflect that, change the PCI unplug path to be called from ->unplug_request. We also rename spapr_phb_hot_plug_child() and spapr_phb_hot_unplug_child() to spapr_pci_plug() and spapr_pci_unplug_request() to more obviously reflect the callbacks they're implementing. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2017-07-11 11:04:02 +10:00
David Gibson	5c1da81215	spapr: Remove unnecessary differences between hotplug and coldplug paths spapr_drc_attach() has a 'coldplug' parameter which sets the DRC into configured state initially, instead of the usual ISOLATED/UNUSABLE state. It turns out this is unnecessary: although coldplugged devices do need to be in CONFIGURED state once the guest starts, that will already be accomplished by the reset code which will move DRCs for already plugged devices into a coldplug equivalent state. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2017-07-11 11:04:01 +10:00
David Gibson	6b762f29a8	spapr: Add DRC release method At the moment, spapr_drc_release() has an ugly switch on the DRC type to call the right, device-specific release function. This cleans it up by doing that via a proper QOM method. It's still arguably an abstraction violation for the DRC code to call into the specific device code, but one mess at a time. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2017-07-11 11:04:01 +10:00
David Gibson	6caf3ac613	spapr: Uniform DRC reset paths DRC objects have a regular device reset method. However, it only gets called in the usual way for PCI DRCs. Because of where CPU and LMB DRCs are in the QOM tree, their device reset method isn't automatically called. So, the machine manually registers reset handlers to call device_reset(). This patch removes the device reset method, and instead always explicitly registers the reset handler from realize(). This means the callers don't have to worry about the two cases, and we always get proper resets. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-07-11 11:04:01 +10:00
David Gibson	f8dc29834c	spapr: Leave DR-indicator management to the guest The DR-indicator is essentially a "virtual LED" attached to a hotpluggable device, which the guest can set to various states for the attention of the operator or management layers. It's mostly guest managed, except that we once-off set it to ACTIVE/INACTIVE in the attach/detach path. While that makes certain sense, there's no indication in PAPR that the hypervisor should do this, and the drmgr code on the guest side doesn't appear to need it (it will already set the indicator to ACTIVE on hotplug, and INACTIVE on remove). So, leave the DR-indicator entirely to the guest; the only thing we need to do is ensure it's in a sane state on reset. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2017-07-11 11:04:01 +10:00
Laurent Vivier	e806b4db14	spapr: fix migration to pseries machine < 2.8 since commit `5c4537bd` ("spapr: Fix 2.7<->2.8 migration of PCI host bridge"), some migration fields are forged from the new ones in spapr_pci_pre_save(). It works well, except when the number of MSI devices is 0, because in this case the function exits immediately. This fix moves the migration code before the exit code. The problem can be reproduced with these commands: source qemu-2.9: qemu-system-ppc64 -monitor stdio -M pseries-2.6 -nodefaults -S destination qemu-2.6: qemu-system-ppc64 -monitor stdio -M pseries-2.6 -nodefaults \ -incoming tcp:0:4444 on the source: migrate tcp:localhost:4444 Destination fails with the following error: qemu-system-ppc64: error while loading state for instance 0x0 of device 'spapr_pci' qemu-system-ppc64: load of migration failed: Invalid argument Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-11 11:04:01 +10:00
Greg Kurz	f3728f9cbb	spapr: fix bogus function name in comment $ git grep spapr_ppc_reset hw/ppc/spapr.c: * as part of spapr_ppc_reset(). $ git grep ppc_spapr_reset hw/ppc/spapr.c:static void ppc_spapr_reset(void) hw/ppc/spapr.c: mc->reset = ppc_spapr_reset; hw/ppc/spapr_hcall.c: /* If ppc_spapr_reset() did not set up a HPT but one is necessary Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-11 11:04:01 +10:00
Greg Kurz	04d0ffbd52	spapr: make spapr_populate_hotplug_cpu_dt() static Since commit `ff9006ddbf` ("spapr: move spapr_core_[foo]plug() callbacks close to machine code in spapr.c"), this function doesn't need to be extern anymore. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-07-11 11:04:01 +10:00
Juan Quintela	70f794fcfa	migration: Rename cleanup() to save_cleanup() We need a cleanup for loads, so we rename here to be consistent. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Rename htab_cleanup to htap_save_cleanup as dave suggestion Message-Id: <20170628095228.4661-3-quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-07-10 17:52:21 +01:00
Juan Quintela	9907e842d7	migration: Rename save_live_setup() to save_setup() We are going to use it now for more than save live regions. Once there rename qemu_savevm_state_begin() to qemu_savevm_state_setup(). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170628095228.4661-2-quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-07-10 17:52:21 +01:00
David Gibson	0dfabd39d5	spapr: Clean up DRC set_isolation_state() path There are substantial differences in the various paths through set_isolation_state(), both for setting to ISOLATED versus UNISOLATED state and for logical versus physical DRCs. So, split the set_isolation_state() method into isolate() and unisolate() methods, and give it different implementations for the two DRC types. Factor some minimal common checks, including for valid indicator values (which we weren't previously checking) into rtas_set_isolation_state(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-30 14:03:32 +10:00
David Gibson	617367321e	spapr: Clean up DRC set_allocation_state path The allocation-state indicator should only actually be implemented for "logical" DRCs, not physical ones. Factor a check for this, and also for valid indicator state values into rtas_set_allocation_state(). Because they don't exist for physical DRCs, there's no reason that we'd ever want more than one method implementation, so it can just be a plain function. In addition, the setting to USABLE and setting to UNUSABLE paths in set_allocation_state() don't actually have much in common. So, split the method separate functions for each parameter value (drc_set_usable() and drc_set_unusable()). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-30 14:03:32 +10:00
David Gibson	4f9242fc93	spapr: Make DRC reset force DRC into known state The reset handler for DRCs attempts several state transitions which are subject to various checks and restrictions. But at reset time we know there is no guest, so we can ignore most of the usual sequencing rules and just set the DRC back to a known state. In fact, it's safer to do so. The existing code also has several redundant checks for drc->awaiting_release inside a block which has already tested that. This patch removes those and sets the DRC to a fixed initial state based only on whether a device is currently plugged or not. With DRCs correctly reset to a state based on device presence, we don't need to force state transitions as cold plugged devices are processed. This allows us to remove all the callers of the set_*_state() methods from outside spapr_drc.c. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-30 14:03:32 +10:00
David Gibson	9c914e5370	spapr: Split DRC release from DRC detach spapr_drc_detach() is called when qemu generic code requests a device be unplugged. It makes a number of tests, which could well delay further action until later, before actually detach the device from the DRC. This splits out the part which actually removes the device from the DRC into spapr_drc_release(). This will be useful for further cleanups. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-30 14:03:32 +10:00
David Gibson	307b7715d0	spapr: Eliminate DRC 'signalled' state variable The 'signalled' field in the DRC appears to be entirely a torturous workaround for the fact that PCI devices were started in UNISOLATED state for unclear reasons. 1) 'signalled' is already meaningless for logical (so far, all non PCI) DRCs. It's always set to true (at least at any point it might be tested), and can't be assigned any real meaning due to the way signalling works for logical DRCs. 2) For PCI DRCs, the only time signalled would be false is when non-zero functions of a multifunction device are hotplugged, followed by function zero (the other way around is explicitly not permitted). In that case the secondary function DRCs are attached, but the notification isn't sent to the guest until function 0 is plugged. 3) signalled being false is used to allow a DRC detach to switch mode back to ISOLATED state, which allows a secondary function to be hotplugged then unplugged with function 0 never inserted. Without this a secondary function starting in UNISOLATED state couldn't be detached again without function 0 being inserted, all the functions configured by the guest, then sent back to ISOLATED state. 4) But now that PCI DRCs start in ISOLATED state, there's nothing to be done. If the guest doesn't get the notification, it won't switch the device to UNISOLATED state, so nothing prevents it from being unplugged. If the guest does move it to UNISOLATED state without the signal (due to a manual drmgr call, for instance) then it really isn't safe to unplug it. So, this patch removes the signalled variable and all code related to it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-30 14:03:32 +10:00
David Gibson	af8ad96bd0	spapr: Start hotplugged PCI devices in ISOLATED state PCI DRCs, and only PCI DRCs, are immediately moved to UNISOLATED isolation state once the device is attached. This has been there from the initial implementation, and it's not clear why. The state diagram in PAPR 13.4 suggests PCI devices should start in ISOLATED state until the guest moves them into UNISOLATED, and the code in the guest-side drmgr tool seems to work that way too. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2017-06-30 14:03:32 +10:00
Daniel Henrique Barboza	aca8bf9f1c	hw/ppc/spapr.c: consecutive 'spapr->patb_entry = 0' statements In ppc_spapr_reset(), if the guest is using HPT, the code was executing: } else { spapr->patb_entry = 0; spapr_setup_hpt_and_vrma(spapr); } And, at the end of spapr_setup_hpt_and_vrma: /* We're setting up a hash table, so that means we're not radix */ spapr->patb_entry = 0; Resulting in spapr->patb_entry being assigned to 0 twice in a row. Given that 'spapr_setup_hpt_and_vrma' is also called inside 'spapr_check_setup_free_hpt' of spapr_hcall.c, this trivial patch removes the 'patb_entry = 0' assignment from the 'else' clause inside ppc_spapr_reset to avoid this behavior. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-30 14:03:31 +10:00
Bharata B Rao	6595ab3158	spapr: prevent QEMU crash when CPU realization fails ICPState objects were being allocated before CPU thread realization. However commit `9ed656631d` (xics: setup cpu at realize time) reversed it by allocating ICPState objects after CPU thread is realized. But it didn't take care to fix the error path because of which we observe a SIGSEGV when CPU thread realization fails during cold/hotplug. Fix this by ensuring that we do object_unparent() of ICPState object only in case when is was created earlier. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-30 14:03:31 +10:00
Greg Kurz	46f7afa370	spapr: fix migration of ICPState objects from/to older QEMU Commit `5bc8d26de2` ("spapr: allocate the ICPState object from under sPAPRCPUCore") moved ICPState objects from the machine to CPU cores. This is an improvement since we no longer allocate ICPState objects that will never be used. But it has the side-effect of breaking migration of older machine types from older QEMU versions. This patch allows spapr to register dummy "icp/server" entries to vmstate. These entries use a dedicated VMStateDescription that can swallow and discard state of an incoming migration stream, and that don't send anything on outgoing migration. As for real ICPState objects, the instance_id is the cpu_index of the corresponding vCPU, which happens to be equal to the generated instance_id of older machine types. The machine can unregister/register these entries when CPUs are dynamically plugged/unplugged. This is only available for pseries-2.9 and older machines, thanks to a compat property. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-30 14:03:31 +10:00
Bharata B Rao	d39c90f5f3	spapr: Fix migration of Radix guests Fix migration of radix guests by ensuring that we issue KVM_PPC_CONFIGURE_V3_MMU for radix case post migration. Reported-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-30 14:03:31 +10:00
Bharata B Rao	3a38429748	spapr: Add a "no HPT" encoding to HTAB migration stream Add a "no HPT" encoding (using value -1) to the HTAB migration stream (in the place of HPT size) when the guest doesn't allocate HPT. This will help the target side to match target HPT with the source HPT and thus enable successful migration. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-30 14:03:31 +10:00
David Gibson	d5fc133eed	ppc: Rework CPU compatibility testing across migration Migrating between different CPU versions is a bit complicated for ppc. A long time ago, we ensured identical CPU versions at either end by checking the PVR had the same value. However, this breaks under KVM HV, because we always have to use the host's PVR - it's not virtualized. That would mean we couldn't migrate between hosts with different PVRs, even if the CPUs are close enough to compatible in practice (sometimes identical cores with different surrounding logic have different PVRs, so this happens in practice quite often). So, we removed the PVR check, but instead checked that several flags indicating supported instructions matched. This turns out to be a bad idea, because those instruction masks are not architected information, but essentially a TCG implementation detail. So changes to qemu internal CPU modelling can break migration - this happened between qemu-2.6 and qemu-2.7. That was addressed by `146c11f1` "target-ppc: Allow eventual removal of old migration mistakes". Now, verification of CPU compatibility across a migration basically doesn't happen. We simply ignore the PVR of the incoming migration, and hope the cpu on the destination is close enough to work. Now that we've cleaned up handling of processor compatibility modes for pseries machine type, we can do better. For new machine types (pseries-2.10+) We allow migration if: * The source and destination PVRs are for the same type of CPU, as determined by CPU class's pvr_match function OR * When the source was in a compatibility mode, and the destination CPU supports the same compatibility mode For older machine types we retain the existing behaviour - current CAS code will usually set a compat mode which would break backwards migration if we made them use the new behaviour. [Fixed from an earlier version by Greg Kurz]. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Tested-by: Andrea Bolognani <abologna@redhat.com>	2017-06-30 14:03:31 +10:00
David Gibson	66d5c492dd	pseries: Reset CPU compatibility mode Currently, the CPU compatibility mode is set when the cpu is initialized, then again when the guest negotiates features. This means if a guest negotiates a compatibility mode, then reboots, that compatibility mode will be retained across the reset. Usually that will get overridden when features are negotiated on the next boot, but it's still not really correct. This patch moves the initial set up of the compatibility mode from cpu init to reset time. The mode is retained if the reboot was caused by the feature negotiation (it might be important in that case, though it's unlikely). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Tested-by: Andrea Bolognani <abologna@redhat.com>	2017-06-30 14:03:31 +10:00
David Gibson	7843c0d60d	pseries: Move CPU compatibility property to machine Server class POWER CPUs have a "compat" property, which is used to set the backwards compatibility mode for the processor. However, this only makes sense for machine types which don't give the guest access to hypervisor privilege - otherwise the compatibility level is under the guest's control. To reflect this, this removes the CPU 'compat' property and instead creates a 'max-cpu-compat' property on the pseries machine. Strictly speaking this breaks compatibility, but AFAIK the 'compat' option was never (directly) used with -device or device_add. The option was used with -cpu. So, to maintain compatibility, this patch adds a hack to the cpu option parsing to strip out any compat options supplied with -cpu and set them on the machine property instead of the now deprecated cpu property. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Tested-by: Andrea Bolognani <abologna@redhat.com>	2017-06-30 14:03:31 +10:00
Thomas Huth	6d034b7bf8	hw/ppc/prep: Remove superfluous call to soundhw_init() When using the 40p machine, soundhw_init() is currently called twice, one time from vl.c and one time from ibm_40p_init(). The call in ibm_40p_init() was likely just a copy-and-paste from a old version of the prep machine - but there the call to audio_init() (which was the previous name of this function) has been removed many years ago already, with commit `b3e6d591b0` ("audio: enable PCI audio cards for all PCI-enabled targets"), so we certainly also do not need the soundhw_init() in the 40p function anymore nowadays. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Sahid Ferdjaoui <sferdjao@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-30 14:03:31 +10:00
Halil Pasic	d2164ad35c	vmstate: error hint for failed equal checks In some cases a failing VMSTATE_*_EQUAL does not mean we detected a bug, but it's actually the best we can do. Especially in these cases a verbose error message is required. Let's introduce infrastructure for specifying a error hint to be used if equal check fails. Let's do this by adding a parameter to the _EQUAL macros called _err_hint. Also change all current users to pass NULL as last parameter so nothing changes for them. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Message-Id: <20170623144823.42936-1-pasic@linux.vnet.ibm.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:18:44 +02:00
Peter Xu	15c3850325	migration: move skip_section_footers Move it into MigrationState, revert its meaning and renaming it to send_section_footer, with a property bound to it. Same trick is played like previous patches. Removing savevm_skip_section_footers(). Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498536619-14548-9-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:18:39 +02:00
Peter Xu	71dd4c1a56	migration: move skip_configuration out It was in SaveState but now moved to MigrationState altogether, reverted its meaning, then renamed to "send_configuration". Again, using HW_COMPAT_2_3 for old PC/SPAPR machines, and accel_register_prop() for xen_init(). Removing savevm_skip_configuration(). Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498536619-14548-8-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:18:38 +02:00
Peter Xu	5272298c48	migration: move global_state.optional out Put it into MigrationState then we can use the properties to specify whether to enable storing global state. Removing global_state_set_optional() since now we can use HW_COMPAT_2_3 for x86/power, and AccelClass.global_props for Xen. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498536619-14548-6-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:18:38 +02:00
Marc-André Lureau	9848619a3b	pnv-core: use get_uint() for "core-pir" property This is an alias of TYPE_PNV_CORE's property "pir", which is defined with DEFINE_PROP_UINT32() Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20170607163635.17635-38-marcandre.lureau@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-06-20 14:31:33 +02:00
Marc-André Lureau	9ed442b8ae	pc-dimm: use get_uint() for dimm properties TYPE_PC_DIMM's property PC_DIMM_ADDR_PROP is defined with DEFINE_PROP_UINT64(). TYPE_PC_DIMM's property PC_DIMM_NODE_PROP is defined with DEFINE_PROP_UINT32(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20170607163635.17635-22-marcandre.lureau@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-06-20 14:31:32 +02:00
Marc-André Lureau	1e507bb0fd	object: use more specific property type names Use the actual unsigned integer type name. The type name change impacts the following externally visible area: * vl.c's machine_help_func() puts it in help for -machine NAME,help. * QMP command qom-list exposes it in ObjectPropertyInfo member @type. * QMP command device-list-properties exposes it in DevicePropertyInfo member @type. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20170607163635.17635-15-marcandre.lureau@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-06-20 14:31:32 +02:00
Peter Maydell	735286a4f8	migration/next for 20170613 -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJZP6n5AAoJEPSH7xhYctcj04oQAJczMfc2X8vTwII6lN9klf+T Cy32B4WB8FBO9M7oJYD/yytJ3ibcLuMwKwTy/GGfaTspuYDI/HrplUD3Pt+trDPc fUxmTNjK9vE9foPAwOTSwTGsdOp5ICoZuDjHTj8gtHmfFLclDxxJMojtthMJ1Csc qn9oJzjLn3izn8C6CY6oXGnqOt6gy2lz+RqNKlve/bwxaVdQIXTXCVsLWwQZuj48 VI9qAFw9TsgSBi9dlTYpVfdMvItO73SVYd2c1ETzL0YSNK3S/Yhpww7fyK8TQNpO Y8xXMMBMybHZej1ixHXh01CRmEnBZXpjLCIXnWwxQGXxTH8p7F+W1+lhDTL4IIXR Py0EwiPUj4sPyTW2htSnDBRtE1uHcJlDtsFAAmsEqfeASet7ueE2bkfKwWUftqTs GZ7ikseIb9F0eQKjecYcEfaLtYNn+0UflgVkimW1gXIeuO58VYLpa8vdiUV3eKJn UCDDHGYKf7QJQLpSzYWXGRT4HJOQvaCbJ0a03hKceYyLB6rJv96khajirbczKZ92 cja0EJfDy5S9fBulWRveHKLUAFMrR3zA4DhlK0pb591uIs4iMcKH3egHQZpv0uf0 iifWNI+AFuorhQfdhV2G4Zg1g/fwI2RRJK7HdBOklulUrcr0caPvjjGdbA3Q0Hf6 u61pWdr+Yb3XPaqlC2AH =EFHC -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20170613' into staging migration/next for 20170613 # gpg: Signature made Tue 13 Jun 2017 10:01:45 BST # gpg: using RSA key 0xF487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20170613: migration: Move migration.h to migration/ migration: Move remaining exported functions to migration/misc.h migration: create global_state.c migration: ram_control_* are implemented in qemu_file migration: Commands are only used inside migration.c migration: Move constants to savevm.h migration: Move dump_vmsate_json_to_file() to misc.h migration: Split registration functions from vmstate.h migration: Move self_announce_delay() to misc.h migration: Remove MigrationState from migration_channel_incomming() ram: Now POSTCOPY_ACTIVE is the same that STATUS_ACTIVE ram: Print block stats also in the complete case migration: Don't try to set *errp directly migration: isolate return path on src Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-06-13 13:51:29 +01:00
Juan Quintela	c4b63b7cc5	migration: Move remaining exported functions to migration/misc.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-06-13 11:00:45 +02:00
Juan Quintela	84a899de8c	migration: create global_state.c It don't belong anywhere else, just the global state where everybody can stick other things. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-06-13 11:00:45 +02:00
Juan Quintela	f2a8f0a631	migration: Split registration functions from vmstate.h They are indpendent, and nowadays almost every device register things with qdev->vmsd. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-06-13 11:00:44 +02:00
Laurent Vivier	593080936a	Revert "spapr: fix memory hot-unplugging" This reverts commit `fe6824d126`. Conflicts hw/ppc/spapr_drc.c, because get_index() has been renamed spapr_get_index(). This didn't fix the problem. Once the hotplug has been started some memory is allocated and some structures are allocated. We don't free it when we ignore the unplug, and we can't because they can be in use by the kernel. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-09 12:35:46 +10:00
Greg Kurz	9ed656631d	xics: setup cpu at realize time Until recently, spapr used to allocate ICPState objects for the lifetime of the machine. They would only be associated to vCPUs in xics_cpu_setup() when plugging a CPU core. Now that ICPState objects have the same lifecycle as vCPUs, it is possible to associate them during realization. This patch hence open-codes xics_cpu_setup() in icp_realize(). The vCPU is passed as a property. Note that vCPU now needs to be realized first for the IRQs to be allocated. It also needs to resetted before ICPState realization in order to synchronize with KVM. Since ICPState objects are freed when unrealized, xics_cpu_destroy() isn't needed anymore and can be safely dropped. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-09 12:15:57 +10:00
Greg Kurz	ad265631c0	xics: introduce macros for ICP/ICS link properties These properties are part of the XICS API. They deserve to appear explicitely in the XICS header file. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-09 12:12:34 +10:00
Thomas Huth	4871dd4c3f	hw/ppc/spapr: Adjust firmware name for PCI bridges SLOF uses "pci" as name for PCI bridges nodes in the device tree instead of "pci-bridges", so booting via bootindex from a device behind a PCI bridge currently does not work since QEMU passes the wrong name in the "qemu,boot-list" property. Fix it by changing the name of the PCI bridge nodes to "pci" instead. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1459170 Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-08 14:38:27 +10:00
Greg Kurz	67b544d65f	pnv_core: drop reference on ICPState object during CPU realization Similarly to what was done to spapr with commit `249127d0df`, this patch ensures that we don't keep an extra reference on the ICPState object. Also since the object was just created and not reparented yet, the call to object_property_add_child() should never fail: let's pass &error_abort to make this clear. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-08 14:38:27 +10:00
David Gibson	7980833619	spapr: Rework DRC name handling DRC objects have a get_name method which returns the DRC name generated when the DRC is created. Replace that with a fixed spapr_drc_name() function which generates the name on the fly from other information. This means: * We get rid of a method with only one implementation, and only local callers * We don't have to carry the name string around for the lifetime of the DRC * We use information added to the class structure to generate the name in standard format, so we don't need an explicit switch on drc type any more We also eliminate the 'name' property; it's basically useless since the only information in it can easily be deduced from other things. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-08 14:38:27 +10:00
David Gibson	6304fd27ef	spapr: Fold spapr_phb_{add,remove}_pci_device() into their only callers Both functions are fairly short, and so are their callers. There's no particular logical distinction between them, so fold them together. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-08 14:38:27 +10:00
David Gibson	0be4e88621	spapr: Change DRC attach & detach methods to functions DRC objects have attach & detach methods, but there's only one implementation. Although there are some differences in its behaviour for different DRC types, the overall structure is the same, so while we might want different method implementations for some parts, we're unlikely to want them for the top-level functions. So, replace them with direct function calls. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-08 14:38:26 +10:00
David Gibson	cd74d27e42	spapr: Clean up handling of DR-indicator There are 3 types of "indicator" associated with hotplug in the PAPR spec the "allocation state", "isolation state" and "DR-indicator". The first two are intimately tied to the various state transitions associated with hotplug. The DR-indicator, however, is different and simpler. It's basically just a guest controlled variable which can be used by the guest to flag state or problems associated with a device. The idea is that the hypervisor can use it to present information back on management consoles (on some machines with PowerVM it may even control physical LEDs on the machine case associated with the relevant device). For that reason, there's only ever likely to be a single update implementation so the set_indicator_state method isn't useful. Replace it with a direct function call. While we're there, make some small associated cleanups: * PAPR doesn't use the term "indicator state", just "DR-indicator" and the allocation state and isolation state are also considered "indicators". Rename things to be less confusing * Fold set_indicator_state() and rtas_set_indicator_state() into a single rtas_set_dr_indicator() function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-08 14:38:26 +10:00
David Gibson	7b7258f810	spapr: Clean up RTAS set-indicator In theory the RTAS set-indicator call can be used for a number of "indicators" defined by PAPR. In practice the only ones we're ever likely to implement are those used for Dynamic Reconfiguration (i.e. hotplug). Because of this, the current implementation determines the associated DRC object, before dispatching based on the type of indicator. However, this means we also need a check that we're dealing with a DR related indicator at all, which duplicates some of the logic from the switch further down. Even though it means a bit of code duplication, things work out cleaner if we delegate the DRC lookup to the individual indicator type functions - and it also allows some further cleanups. While we're there, remove references to "sensor", a copy/paste artefact from the related, but distinct "get-sensor" call. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-08 14:38:26 +10:00
David Gibson	454b580ae9	spapr: Don't misuse DR-indicator in spapr_recover_pending_dimm_state() With some combinations of migration and hotplug we can lost temporary state indicating how many DRCs (guest side hotplug handles) are still connected to a DIMM object in the process of removal. When we hit that situation spapr_recover_pending_dimm_state() is used to scan more extensively and work out the right number. It does this using drc->indicator state to determine what state of disconnection the DRC is in. However, this is not safe, because the indicator state is guest settable - in fact it's more-or-less a purely guest->host notification mechanism which should have no bearing on the internals of hotplug state management. So, replace the test for this with a test on drc->dev, which is a purely qemu side managed variable, and updated the same BQL critical section as the indicator state. This does introduce an off-by-one change, because the indicator state was updated before the call to spapr_lmb_release() on the current DRC, whereas drc->dev is updated afterwards. That's corrected by always decrementing the nr_lmbs value instead of only doing so in the case where we didn't have to recover information. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-08 14:38:26 +10:00
David Gibson	f224d35be9	spapr: Clean up DR entity sense handling DRC classes have an entity_sense method to determine (in a specific PAPR sense) the presence or absence of a device plugged into a DRC. However, we only have one implementation of the method, which explicitly tests for different DRC types. This changes it to instead have different method implementations for the two cases: "logical" and "physical" DRCs. While we're at it, the entity sense method always returns RTAS_OUT_SUCCESS, and the interesting value is returned via pass-by-reference. Simplify this to directly return the value we care about Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-08 14:38:26 +10:00
David Gibson	2c5534776b	pseries: Correct panic behaviour for pseries machine type The pseries machine type doesn't usually use the 'pvpanic' device as such, because it has a firmware/hypervisor facility with roughly the same purpose. The 'ibm,os-term' RTAS call notifies the hypervisor that the guest has crashed. Our implementation of this call was sending a GUEST_PANICKED qmp event; however, it was not doing the other usual panic actions, making its behaviour different from pvpanic for no good reason. To correct this, we should call qemu_system_guest_panicked() rather than directly sending the panic event. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com>	2017-06-08 14:38:18 +10:00
Greg Kurz	8a9e0e7b89	spapr: fix memory leak in spapr_memory_pre_plug() The string returned by object_property_get_str() is dynamically allocated. (Spotted by Coverity, CID 1375942) Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-08 11:05:31 +10:00
Peter Maydell	e02bbe1956	ppc patch queue 2017-06-06 Accumulated patches for ppc targets and the pseries machine type. The big thing in this batch is a start on a substantial cleanup of the pseries hotplug mechanisms, which were pretty confusing. For now these shouldn't cause substantial behavioural changes, but I am hoping these lead to clearer code and eventually to fixes for the bugs we have in hotplug handling, particularly when hotplug and migration are combined. The remaining patches are mostly bugfixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJZNhgSAAoJEGw4ysog2bOSERwP/A7T7UJ8XXWit9QXGCi+G83w +RUuHxjA9qFqrg1zYqFyLg3ctGl93Sxu7mzI5MOIKwAVXlTsE6+84TH7zBc18DPB fekPWmzJ6jfiVO+1Zg1JPWorMfIHDDc2v6Q6qPfD8KWbt02yPfrXbKlivQB4hVZ4 Qb4VJdjZgBDcVy79xhcW5k6v8dVw8PdSyDmkQrBhccI0noLerhI41Mgt7QQaWQRH Le3ziexUpWelVCRQB0FqE/PIWo2+NY/e0pumX7Aqtjs/G35KjOXy0ja3yKLjfeUW Z4NugIO2I2hncERa68YFar/BqG26DX8KCErNMDkn7LyZcoDAQWhcDH+65G1BNuf2 jW+KApMNm+N1vXabbz8P9BbLjuZpRQQhyPOxB3I8UGaTYGtCPe/lUCe2/V8EbKNa VFavc1UuLftOZuJj/rYGJeU/4JBU6srbAKCO3VVK4Tnd8DyiT3QCpUWEkjv+J6jo co35oYBavLfQPMr+rsX15lgbmZwg7iBV+dgKLa2+cwmKXzCf7aYe38aJy7nRBmhb ivhH3bKtdysy0qq4UYaCgW06qQcVF0QMJaxFQ0X7I+GBNwHA7wdZD/i6IMcO6Z7H 7gQdavBTdukgKb2+pVjR58H13ieHXuBxktonhOz70rvEDVa4xx8pxhnZlpSiH2ha RzpkhanrwEeECG6Lke/3 =QDWB -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170606' into staging ppc patch queue 2017-06-06 Accumulated patches for ppc targets and the pseries machine type. The big thing in this batch is a start on a substantial cleanup of the pseries hotplug mechanisms, which were pretty confusing. For now these shouldn't cause substantial behavioural changes, but I am hoping these lead to clearer code and eventually to fixes for the bugs we have in hotplug handling, particularly when hotplug and migration are combined. The remaining patches are mostly bugfixes. # gpg: Signature made Tue 06 Jun 2017 03:48:50 BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.10-20170606: spapr: Remove some non-useful properties on DRC objects spapr: Eliminate spapr_drc_get_type_str() spapr: Move configure-connector state into DRC spapr: Clean up spapr_dr_connector_by_() spapr: Introduce DRC subclasses spapr/drc: don't migrate DRC of cold-plugged CPUs and LMBs spapr: Allow boot from vhost--scsi backends ppc/pnv: check the return value of fdt_setprop() spapr_nvram: Check return value from blk_getlength() target/ppc: Fixup set_spr error in h_register_process_table target-ppc: Fix openpic timer read register offset spapr: Make DRC get_index and get_type methods into plain functions spapr: Abolish DRC set_configured method spapr: Abolish DRC get_fdt method spapr: Move DRC RTAS calls into spapr_drc.c migration: Mark CPU states dirty before incoming migration/loadvm migration: remove register_savevm() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-06-06 14:30:06 +01:00
David Gibson	91dcb1ffa6	spapr: Remove some non-useful properties on DRC objects * 'connector_type' is easily derived from the 'index' property, so there's no point to it (it's also implicit in the QOM type of the DRC) * 'isolation-state', 'indicator-state' and 'allocation-state' are part of the transaction between qemu and guest during PAPR hotplug operations, and outside tools really have no business looking at it (especially not changing, and these were RW properties) * 'entity-sense' is basically just a weird PAPR encoding of whether there is a device connected to this DRC Strictly speaking removing these properties is breaking the qemu interface. However, I'm pretty sure no management tools have ever used these. For debugging there are better alternatives. Therefore, I think removing these broken interfaces is the better option. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-06 09:24:25 +10:00
David Gibson	1693ea1685	spapr: Eliminate spapr_drc_get_type_str() This function was used in generating the device tree. However, now that we have different QOM types for different DRC types we can easily store the information we need in the class structure and avoid this specialized lookup function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-06 09:24:21 +10:00
David Gibson	b8fdd530be	spapr: Move configure-connector state into DRC Currently the sPAPRMachineState contains a list of sPAPRConfigureConnector structures which store intermediate state for the ibm,configure-connector RTAS call. This was an attempt to separate this state from the core of the DRC state. However the configure connector process is intimately tied to the DRC model, so there's really no point trying to have two levels of interface here. Moving the configure-connector state into its corresponding DRC allows removal of a number of helpers for maintaining the anciliary list. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-06 09:24:17 +10:00
David Gibson	fbf5539718	spapr: Clean up spapr_dr_connector_by_() Change names to something less ludicrously verbose * Now that we have QOM subclasses for the different DRC types, use a QOM typename instead of a PAPR type value parameter The latter allows removal of the get_type_shift() helper. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-06 09:24:08 +10:00
David Gibson	2d33581899	spapr: Introduce DRC subclasses Currently we only have a single QOM type for all DRCs, but lots of places where we switch behaviour based on the DRC's PAPR defined type. This is a poor use of our existing type system. So, instead create QOM subclasses for each PAPR defined DRC type. We also introduce intermediate subclasses for physical and logical DRCs, a division which will be useful later on. Instead of being stored in the DRC object itself, the PAPR type is now stored in the class structure. There are still many places where we switch directly on the PAPR type value, but this at least provides the basis to start to remove those. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2017-06-06 09:23:46 +10:00
Greg Kurz	a32e900b8a	spapr/drc: don't migrate DRC of cold-plugged CPUs and LMBs As explained in commit `5c0139a8c2` ("spapr: fix default DRC state for coldplugged LMBs"), guests expect cold-plugged LMBs to be pre-allocated and unisolated. The same goes for cold-plugged CPUs. While here, let's convert g_assert(false) to the better self documenting g_assert_not_reached(). Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-06 09:22:02 +10:00
Felipe Franciosi	c4e13492af	spapr: Allow boot from vhost--scsi backends The current implementation of spapr_get_fw_dev_path() doesn't take into consideration vhost--scsi devices. This makes said devices unbootable on PPC as SLOF is unable to work out the path to scan boot disks. This makes VMs bootable on spapr when using vhost-*-scsi by implementing a disk path for VHostSCSICommon (which currently includes both vhost-user-scsi and vhost-scsi). Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Signed-off-by: Mike Cui <cui@nutanix.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-06 09:19:01 +10:00
Cédric Le Goater	7032d92ac8	ppc/pnv: check the return value of fdt_setprop() Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: Correct typo in commit message] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-06 09:18:46 +10:00
Suraj Jitindar Singh	60694bc678	target/ppc: Fixup set_spr error in h_register_process_table set_spr is used in the function h_register_process_table() to update the LPCR_GTSE and LPCR_UPRT values based on the flags passed by the guest. The set_spr function takes the last two arguments mask and value used to mask and set the value of the spr respectively. The current call site passes these arguments in the wrong order and thus bot GTSE and UPRT will be set irrespective, which is obviously incorrect. Rearrange the function call so that these arguments are passed in the correct order and the correct behaviour is exhibited. It is worth noting that this wasn't detected earlier since these were always both set in all cases where this H_CALL was made. Fixes: `6de833070c` ("target/ppc: Set UPRT and GTSE on all cpus in H_REGISTER_PROCESS_TABLE") Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-06 08:53:24 +10:00
David Gibson	0b55aa91c9	spapr: Make DRC get_index and get_type methods into plain functions These two methods only have one implementation, and the spec they're implementing means any other implementation is unlikely, verging on impossible. So replace them with simple functions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-06-06 08:53:24 +10:00
David Gibson	4f65ce00ab	spapr: Abolish DRC set_configured method DRConnectorClass has a set_configured method, however: * There is only one implementation, and only ever likely to be one * There's exactly one caller, and that's (now) local * The implementation is very straightforward So abolish the method entirely, and just open-code what we need. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-06-06 08:53:24 +10:00
David Gibson	88af6ea568	spapr: Abolish DRC get_fdt method The DRConnectorClass includes a get_fdt method. However * There's only one implementation, and there's only likely to ever be one * Both callers are local to spapr_drc * Each caller only uses one half of the actual implementation So abolish get_fdt() entirely, and just open-code what we need. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-06-06 08:53:24 +10:00
David Gibson	b89b3d3929	spapr: Move DRC RTAS calls into spapr_drc.c Currently implementations of the RTAS calls related to DRCs are in spapr_rtas.c. They belong better in spapr_drc.c - that way they're closer to related code, and we'll be able to make some more things local. spapr_rtas.c was intended to contain the RTAS infrastructure and core calls that don't belong anywhere else, not every RTAS implementation. Code motion only. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>	2017-06-06 08:53:24 +10:00
Igor Mammedov	99861ecbc5	spapr: cleanup spapr_fixup_cpu_numa_dt() usage even though spapr_fixup_cpu_numa_dt() has no effect on FDT if numa is disabled, don't call it uselessly. It makes it obvious at call sites that function is needed only when numa is enabled. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496161442-96665-7-git-send-email-imammedo@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-06-05 14:59:09 -03:00
Igor Mammedov	15f8b14228	numa: move numa_node from CPUState into target specific classes Move vcpu's associated numa_node field out of generic CPUState into inherited classes that actually care about cpu<->numa mapping, i.e: ARMCPU, PowerPCCPU, X86CPU. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496161442-96665-6-git-send-email-imammedo@redhat.com> [ehabkost: s/CPU is belonging to/CPU belongs to/ on comments] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-06-05 14:59:09 -03:00
Igor Mammedov	a0ceb640d0	numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1496161442-96665-2-git-send-email-imammedo@redhat.com> [ehabkost: Fix indentation] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-06-05 14:59:08 -03:00
Marc-André Lureau	f664b88247	Remove/replace sysemu/char.h inclusion Those are apparently unnecessary includes. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>	2017-06-02 11:33:52 +04:00
Stefan Hajnoczi	a3203e7dd3	pci, virtio, vhost: fixes A bunch of fixes all over the place. Most notably this fixes the new MTU feature when using vhost. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZK2bwAAoJECgfDbjSjVRpNBgIALmNG7VaixhNUlnfX1n1JBnh +HBP2zNfvi0q5roBuPFmlziKa3IBHb2Fcte4nb6QxmPg+uoaj39AOzfrrvz210kR h2j5Qk2bCdMeWBpxI+xDDScwi/Im23Y6KN1eZyMekFr2CaSGiqOHZPPdbsyEcHPB VylM0uHqSTZL5JAAzEuYlH+LLfPu91HoxMsIAdNuQX+qKyM2DZ4eICBQ0zA73USt OduZltcRMk7UpvQMqY+2iaEXapXQQEUGrP2Mo8ZyqeIl2ItC33GspqBQIKjuZdrr tpr/T1VWsLdZnURZXyELrFqrErDXvKaP9HROwvyLyYPXZF+pJ3LA7TopS5UmfNQ= =Z4xG -----END PGP SIGNATURE----- Merge remote-tracking branch 'mst/tags/for_upstream' into staging pci, virtio, vhost: fixes A bunch of fixes all over the place. Most notably this fixes the new MTU feature when using vhost. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Mon 29 May 2017 01:10:24 AM BST # gpg: using RSA key 0x281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * mst/tags/for_upstream: acpi-test: update expected files pc: ACPI BIOS: use highest NUMA node for hotplug mem hole SRAT entry vhost-user: pass message as a pointer to process_message_reply() virtio_net: Bypass backends for MTU feature negotiation intel_iommu: turn off pt before 2.9 intel_iommu: support passthrough (PT) intel_iommu: allow dev-iotlb context entry conditionally intel_iommu: use IOMMU_ACCESS_FLAG() intel_iommu: provide vtd_ce_get_type() intel_iommu: renaming context entry helpers x86-iommu: use DeviceClass properties memory: remove the last param in memory_region_iommu_replay() memory: tune last param of iommu_ops.translate() Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-30 14:15:04 +01:00
Stefan Hajnoczi	5bb0d22cb4	ppc patch queue 2017-05-25 Assorted accumulated patches. These are nearly all bugfixes at one level or another - some for longstanding problems, others for some regressions caused by more recent cleanups. This includes preliminary patches towards fixing migration for Radix Page Table guests under POWER9 and also fixing some migration regressions due to the re-organization of the interrupt controller code. Not all the pieces are there yet, so those still won't quite work, but the preliminary changes make sense on their own. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJZJlRoAAoJEGw4ysog2bOS4m0P/0fm0k9znGQ8jpbGDJ18PF4g Z7rhEcz5Ab1f5xn+ujYSc23ViJ0wgonhQB0F2d02O50Br0Gu2zN1XMrstysUEN/6 qg7nngsDqe+mGFMXASNb+YIzK4mYZQXmW8qscVm6fdaGXq/tZ13zMRPoRHdJQpsg uN/uDWvQqwZO4RizKFbXlosoeNS1Q4c+Bm5MszV+B6TfVvgNd81Od7rjY/ucj4tr 9e8oG3lx1YpRjg6XN3uT/AEtPxgUe6hAS5RlsAWk/B0FBUK6JvRSaDAS8ojg8UIg 8cPWix5OrHQSpjcTsNW3X2FRb31O8YvExPYFHrVZeVhaB5HzVLPXEudeSIMiuqjn CfZxRz6+IToWUJWFn30NozfJUwgQlJ2sf92CHcmMKHu2Zd/hUWdApIukmEFY43Y5 jyhDkubrRtSsCcR6wd4mGeAg2iQWubSOPFdM/TAGzlbGWoT4qXBK1Ol03DaiF971 fkxWaHrmgiKhe8G1sUIZXfDDxpTIvFv1bcmGOnhGmsELFh65bMXVLmwjNvVK9fdE hTuWibRPPE3btyI4eOMbtVdooliCfp+0XvraACnuOXQlgD1bqCPSrnsS2HLPiDS+ npRKlHGlf4cYSVCeTCjmsAVIqzsDfyvpd67qP3xPsaX/pxI/i+I2H9usZWWJBXMp I5M78EL5NCkMnZgYIFad =nlnV -----END PGP SIGNATURE----- Merge remote-tracking branch 'dgibson/tags/ppc-for-2.10-20170525' into staging ppc patch queue 2017-05-25 Assorted accumulated patches. These are nearly all bugfixes at one level or another - some for longstanding problems, others for some regressions caused by more recent cleanups. This includes preliminary patches towards fixing migration for Radix Page Table guests under POWER9 and also fixing some migration regressions due to the re-organization of the interrupt controller code. Not all the pieces are there yet, so those still won't quite work, but the preliminary changes make sense on their own. # gpg: Signature made Thu 25 May 2017 04:50:00 AM BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * dgibson/tags/ppc-for-2.10-20170525: xics: add unrealize handler hw/ppc/spapr.c: recover pending LMB unplug info in spapr_lmb_release hw/ppc: migrating the DRC state of hotplugged devices hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState spapr: add pre_plug function for memory pseries: Restore support for total vcpus not a multiple of threads-per-core for old machine types pseries: Split CAS PVR negotiation out into a separate function spapr: fix error reporting in xics_system_init() spapr_cpu_core: drop reference on ICP object during CPU realization hw/ppc/spapr_events.c: removing 'exception' from sPAPREventLogEntry spapr: ensure core_slot isn't NULL in spapr_core_unplug() xics_kvm: cache already enabled vCPU ids spapr: Consolidate HPT freeing code into a routine spapr-cpu-core: release ICP object when realization fails spapr: sanitize error handling in spapr_ics_create() ppc/xics: simplify prototype of xics_spapr_init() target/ppc: reset reservation in do_rfi() Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-30 09:44:58 +01:00
Stefan Hajnoczi	d0eda02938	QAPI patches for 2017-05-23 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZJB4MAAoJEDhwtADrkYZTjhkP+wRaiZj9h4IJvcWoNEzfyuA1 kd7+Kx6QgfCmZE9vL2/mlOFddWL0fPtPffL/ZRu5UNgIILaCSPFsGkOGvXLZhaUW he5sqLCqMc2mxgB98HpbT0dzt0cOSCjdM5BxkFXeq/yPoDa0IiZiD8cpvj+FVwKi D0qGdrKKGCR3RteL4gr/kaXY/LXAZfuEjbAtylQx1aMHJ6CKmdSIVVVU2JJVIYhQ +dT/Xst0PSkJYk90wgmwpzPCqKR/N5zHFe8CyUoE67FxBhegdw19O3wlzU9DJ3N5 8Az+fbEjifWoMytTZR4H3snPJGwl6wxsh2UVj9SMCvebc0y278UPlGqiszvWBepa 1iZHHULH+yygHyUmX6CxjHOUW498ES2KGHx7qJJe8ebeJ4XuU7JcE+Sf4GQEAm8Q p6P5s3qXpuVjekCjmerUAtybr+hxEQC9fbAGqPq+r489jwjvUiETrMLbmEHyy/Xa fSUaW+f5kGI0GJS9FYcbcMy9w2130lTK2k4bZM0mSVlSsHA7W0GBDnzxUDtxo6uH oqMQgKIFWOBU5GkRUiL43vpiTIpiLCuG6PbQlgefQRPWdoODVxykuu2bq5hVaax8 8XMkkq7isG/J5esFc55L1qEUyrUDtVYx/LiHj0XXJikkGirXtp7b7l/TmFLZGsex UWWzFRbZnCVf2CKwdV6h =DNqn -----END PGP SIGNATURE----- Merge remote-tracking branch 'armbru/tags/pull-qapi-2017-05-23' into staging QAPI patches for 2017-05-23 # gpg: Signature made Tue 23 May 2017 12:33:32 PM BST # gpg: using RSA key 0x3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * armbru/tags/pull-qapi-2017-05-23: qapi-schema: Remove obsolete note from ObjectTypeInfo block: Use QDict helpers for --force-share shutdown: Expose bool cause in SHUTDOWN and RESET events shutdown: Add source information to SHUTDOWN and RESET shutdown: Preserve shutdown cause through replay shutdown: Prepare for use of an enum in reset/shutdown_request shutdown: Simplify shutdown_signal sockets: Plug memory leak in socket_address_flatten() scripts/qmp/qom-set: fix the value argument passed to srv.command() Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-30 09:33:40 +01:00
Peter Xu	bf55b7afce	memory: tune last param of iommu_ops.translate() This patch converts the old "is_write" bool into IOMMUAccessFlags. The difference is that "is_write" can only express either read/write, but sometimes what we really want is "none" here (neither read nor write). Replay is an good example - during replay, we should not check any RW permission bits since thats not an actual IO at all. CC: Paolo Bonzini <pbonzini@redhat.com> CC: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>	2017-05-25 21:25:27 +03:00
Daniel Henrique Barboza	16ee99805e	hw/ppc/spapr.c: recover pending LMB unplug info in spapr_lmb_release When a LMB hot unplug starts, the current DRC LMB status is stored at spapr->pending_dimm_unplugs QTAILQ. This queue isn't migrated, thus if a migration occurs in the middle of a LMB unplug the spapr_lmb_release callback will lost track of the LMB unplug progress. This patch implements a new recover function spapr_recover_pending_dimm_state that is used inside spapr_lmb_release to recover this DRC LMB release status that is lost during the migration. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> [dwg: Minor stylistic changes, simplify error handling] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-25 11:31:33 +10:00
Daniel Henrique Barboza	a50919dddf	hw/ppc: migrating the DRC state of hotplugged devices In pseries, a firmware abstraction called Dynamic Reconfiguration Connector (DRC) is used to assign a particular dynamic resource to the guest and provide an interface to manage configuration/removal of the resource associated with it. In other words, DRC is the 'plugged state' of a device. Before this patch, DRC wasn't being migrated. This causes post-migration problems due to DRC state mismatch between source and target. The DRC state of a device X in the source might change, while in the target the DRC state of X is still fresh. When migrating the guest, X will not have the same hotplugged state as it did in the source. This means that we can't hot unplug X in the target after migration is completed because its DRC state is not consistent. https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1677552 is one bug that is caused by this DRC state mismatch between source and target. To migrate the DRC state, we defined the VMStateDescription struct for spapr_drc to enable the transmission of spapr_drc state in migration. Not all the elements in the DRC state are migrated - only those that can be modified by guest actions or device add/remove operations: - 'isolation_state', 'allocation_state' and 'indicator_state' are involved in the DR state transition diagram from PAPR+ 2.7, 13.4; - 'configured', 'signalled', 'awaiting_release' and 'awaiting_allocation' are needed in attaching and detaching devices; - 'indicator_state' provides users with hardware state information. These are the DRC elements that are migrated. In this patch the DRC state is migrated for PCI, LMB and CPU connector types. At this moment there is no support to migrate DRC for the PHB (PCI Host Bridge) type. In the 'realize' function the DRC is registered using vmstate_register, similar to what hw/ppc/spapr_iommu.c does in 'spapr_tce_table_realize'. This approach works because DRCs are bus-less and do not sit on a BusClass that implements bc->get_dev_path, so as a fallback the VMSD gets identified via "spapr_drc"/get_index(drc). Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-25 11:31:33 +10:00
Daniel Henrique Barboza	318347234d	hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque The pointer drc->detach_cb is being used as a way of informing the detach() function inside spapr_drc.c which cb to execute. This information can also be retrieved simply by checking drc->type and choosing the right callback based on it. In this context, detach_cb is redundant information that must be managed. After the previous spapr_lmb_release change, no detach_cb_opaques are being used by any of the three callbacks functions. This is yet another information that is now unused and, on top of that, can't be migrated either. This patch makes the following changes: - removal of detach_cb_opaque. the 'opaque' argument was removed from the callbacks and from the detach() function of sPAPRConnectorClass. The attribute detach_cb_opaque of sPAPRConnector was removed. - removal of detach_cb from the detach() call. The function pointer detach_cb of sPAPRConnector was removed. detach() now uses a switch(drc->type) to execute the apropriate callback. To achieve this, spapr_core_release, spapr_lmb_release and spapr_phb_remove_pci_device_cb callbacks were made public to be visible inside detach(). Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-25 11:31:33 +10:00
David Gibson	0cffce56ae	hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState The LMB DRC release callback, spapr_lmb_release(), uses an opaque parameter, a sPAPRDIMMState struct that stores the current LMBs that are allocated to a DIMM (nr_lmbs). After each call to this callback, the nr_lmbs is decremented by one and, when it reaches zero, the callback proceeds with the qdev calls to hot unplug the LMB. Using drc->detach_cb_opaque is problematic because it can't be migrated in the future DRC migration work. This patch makes the following changes to eliminate the usage of this opaque callback inside spapr_lmb_release: - sPAPRDIMMState was moved from spapr.c and added to spapr.h. A new attribute called 'addr' was added to it. This is used as an unique identifier to associate a sPAPRDIMMState to a PCDIMM element. - sPAPRMachineState now hosts a new QTAILQ called 'pending_dimm_unplugs'. This queue of sPAPRDIMMState elements will store the DIMM state of DIMMs that are currently going under an unplug process. - spapr_lmb_release() will now retrieve the nr_lmbs value by getting the correspondent sPAPRDIMMState. A helper function called spapr_dimm_get_address was created to fetch the address of a PCDIMM device inside spapr_lmb_release. When nr_lmbs reaches zero and the callback proceeds with the qdev hot unplug calls, the sPAPRDIMMState struct is removed from spapr->pending_dimm_unplugs. After these changes, the opaque argument for spapr_lmb_release is now unused and is passed as NULL inside spapr_del_lmbs. This and the other opaque arguments can now be safely removed from the code. As an additional cleanup made by this patch, the spapr_del_lmbs function was merged with spapr_memory_unplug_request. The former was being called only by the latter and both were small enough to fit one single function. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> [dwg: Minor stylistic cleanups] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-25 11:31:28 +10:00
Laurent Vivier	c871bc70bb	spapr: add pre_plug function for memory This allows to manage errors before the memory has started to be hotplugged. We already have the function for the CPU cores. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> [dwg: Fixed a couple of style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 17:27:39 +10:00
David Gibson	459264ef24	pseries: Restore support for total vcpus not a multiple of threads-per-core for old machine types As of pseries-2.7 and later, we require the total number of guest vcpus to be a multiple of the threads-per-core. pseries-2.6 and earlier machine types, however, are supposed to allow this for the sake of migration from old qemu versions which allowed this. Unfortunately, `8149e29` "pseries: Enforce homogeneous threads-per-core" broke this by not considering the old machine type case. This fixes it by only applying the check when the machine type supports hotpluggable cpus. By not-entirely-coincidence, that corresponds to the same time when we started enforcing total threads being a multiple of threads-per-core. Fixes: `8149e2992f` Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org>	2017-05-24 11:39:53 +10:00
David Gibson	80c33d343f	pseries: Split CAS PVR negotiation out into a separate function Guests of the qemu machine type go through a feature negotiation process known as "client architecture support" (CAS) during early boot. This does a number of things, one of which is finding a CPU compatibility mode which can be supported by both guest and host. In fact the CPU negotiation is probably the single most complex part of the CAS process, so this splits it out into a helper function. We've recently made some mistakes in maintaining backward compatibility for old machine types here. Splitting this out will also make it easier to fix this. This also adds a possibly useful error message if the negotiation fails (i.e. if there isn't a CPU mode that's suitable for both guest and host). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2017-05-24 11:39:53 +10:00
Greg Kurz	3d85885a1b	spapr: fix error reporting in xics_system_init() If the user explicitely asked for kernel-irqchip support and "xics-kvm" initialization fails, we shouldn't fallback to emulated "xics" as we do now. It is also awkward to print an error message when we have an errp pointer argument. Let's use the errp argument to report the error and let the caller decide. This simplifies the code as we don't need a local Error * here. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:53 +10:00
Greg Kurz	249127d0df	spapr_cpu_core: drop reference on ICP object during CPU realization When a piece of code allocates an object, it implicitely gets a reference on it. If it then makes that object a child property of another object, it should drop its own reference at some point otherwise the child object can never be finalized. The current code hence leaks one ICP object per CPU when hot-removing a core. Failing to add a newly allocated ICP object to the CPU is a bug. While here, let's ensure QEMU aborts if this ever happens. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:53 +10:00
Daniel Henrique Barboza	bff3063837	hw/ppc/spapr_events.c: removing 'exception' from sPAPREventLogEntry Currenty we do not have any RTAS event that is reported by the event-scan interface. The existing events, RTAS_LOG_TYPE_EPOW and RTAS_LOG_TYPE_HOTPLUG, are being reported by the check-exception interface and, as such, marked as 'exception=true'. Commit `79853e18d9`, 'spapr_events: event-scan RTAS interface', added the event_scan interface because the guest kernel requires it to initialize other required interfaces. It is acting since then as a stub because no events that would be reported by it were added since then. However, the existence of the 'exception' boolean adds an unnecessary load in the future migration of the pending_events, sPAPREventLogEntry QTAILQ that hosts the pending RTAS events. To make the code cleaner and ease the future migration changes, this patch makes the following changes: - remove the 'exception' boolean that filter these events. There is nothing to filter since all events are reported by check-exception; - functions rtas_event_log_queue, rtas_event_log_dequeue and rtas_event_log_contains don't receive the 'exception' boolean as parameter; - event_scan function was simplified. It was calling 'rtas_event_log_dequeue(mask, false)' that was always returning 'NULL' because we have no events that are created with exception=false, thus in the end it would execute a jump to 'out_no_events' all the time. The function now assumes that this will always be the case and all the remaining logic were deleted. In the future, when or if we add new RTAS events that should be reported with the event_scan interface, we can refer to the changes made in this patch to add the event_scan logic back. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:53 +10:00
Greg Kurz	07572c0653	spapr: ensure core_slot isn't NULL in spapr_core_unplug() If we go that far on the path of hot-removing a core and we find out that the core-id is invalid, then we have a serious bug. Let's make it explicit with an assert() instead of dereferencing a NULL pointer. This fixes Coverity issue CID 1375404. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:53 +10:00
Bharata B Rao	06ec79e865	spapr: Consolidate HPT freeing code into a routine Consolidate the code that frees HPT into a separate routine spapr_free_hpt() as the same chunk of code is called from two places. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Greg Kurz	c8a98293f7	spapr-cpu-core: release ICP object when realization fails While here we introduce a single error path to avoid code duplication. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Greg Kurz	175d2aa038	spapr: sanitize error handling in spapr_ics_create() The spapr_ics_create() function handles errors in a rather convoluted way, with two local Error * variables. Moreover, failing to parent the ICS object to the machine should be considered as a bug but it is currently ignored. This patch addresses both issues. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Greg Kurz	f63ebfe0ac	ppc/xics: simplify prototype of xics_spapr_init() This function only does hypercall and RTAS-call registration, and thus never returns an error. This patch adapt the prototype to reflect that. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-24 11:39:52 +10:00
Eric Blake	cf83f14005	shutdown: Add source information to SHUTDOWN and RESET Time to wire up all the call sites that request a shutdown or reset to use the enum added in the previous patch. It would have been less churn to keep the common case with no arguments as meaning guest-triggered, and only modified the host-triggered code paths, via a wrapper function, but then we'd still have to audit that I didn't miss any host-triggered spots; changing the signature forces us to double-check that I correctly categorized all callers. Since command line options can change whether a guest reset request causes an actual reset vs. a shutdown, it's easy to also add the information to reset requests. Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> [ppc parts] Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> [SPARC part] Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x parts] Message-Id: <20170515214114.15442-5-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-23 13:28:17 +02:00
Eduardo Habkost	8a824e4d74	audio: Rename hw/audio/audio.h to hw/audio/soundhw.h All the functions in hw/audio/audio.h are called "soundhw_*()" and live in hw/audio/audiohw.c. Rename the header file for consistency. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Hervé Poussineau <hpoussin@reactos.org> Message-id: 20170508205735.23444-4-ehabkost@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>	2017-05-19 10:48:54 +02:00
Eduardo Habkost	4c565674a2	audio: Rename audio_init() to soundhw_init() To make it consistent with the remaining soundhw.c functions and avoid confusion with the audio_init() function in audio/audio.c, rename audio_init() to soundhw_init(). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-id: 20170508205735.23444-3-ehabkost@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>	2017-05-19 10:48:53 +02:00
Eduardo Habkost	ca89f72092	audio: Move arch_init audio code to hw/audio/soundhw.c There's no reason to keep the soundhw table in arch_init.c. Move that code to a new hw/audio/soundhw.c file. While moving the code, trivial coding style issues were fixed. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20170508205735.23444-2-ehabkost@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>	2017-05-19 10:48:53 +02:00
Eduardo Habkost	e4f4fb1eca	sysbus: Set user_creatable=false by default on TYPE_SYS_BUS_DEVICE commit `33cd52b5d7` unset cannot_instantiate_with_device_add_yet in TYPE_SYSBUS, making all sysbus devices appear on "-device help" and lack the "no-user" flag in "info qdm". To fix this, we can set user_creatable=false by default on TYPE_SYS_BUS_DEVICE, but this requires setting user_creatable=true explicitly on the sysbus devices that actually work with -device. Fortunately today we have just a few has_dynamic_sysbus=1 machines: virt, pc-q35-, ppce500, and spapr. virt, ppce500, and spapr have extra checks to ensure just a few device types can be instantiated: virt supports only TYPE_VFIO_CALXEDA_XGMAC, TYPE_VFIO_AMD_XGBE. * ppce500 supports only TYPE_ETSEC_COMMON. * spapr supports only TYPE_SPAPR_PCI_HOST_BRIDGE. This patch sets user_creatable=true explicitly on those 4 device classes. Now, the more complex cases: pc-q35-: q35 has no sysbus device whitelist yet (which is a separate bug). We are in the process of fixing it and building a sysbus whitelist on q35, but in the meantime we can fix the "-device help" and "info qdm" bugs mentioned above. Also, despite not being strictly necessary for fixing the q35 bug, reducing the list of user_creatable=true devices will help us be more confident when building the q35 whitelist. xen: We also have a hack at xen_set_dynamic_sysbus(), that sets has_dynamic_sysbus=true at runtime when using the Xen accelerator. This hack is only used to allow xen-backend devices to be dynamically plugged/unplugged. This means today we can use -device with the following 22 device types, that are the ones compiled into the qemu-system-x86_64 and qemu-system-i386 binaries: allwinner-ahci * amd-iommu * cfi.pflash01 * esp * fw_cfg_io * fw_cfg_mem * generic-sdhci * hpet * intel-iommu * ioapic * isabus-bridge * kvmclock * kvm-ioapic * kvmvapic * SUNW,fdtwo * sysbus-ahci * sysbus-fdc * sysbus-ohci * unimplemented-device * virtio-mmio * xen-backend * xen-sysdev This patch adds user_creatable=true explicitly to those devices, temporarily, just to keep 100% compatibility with existing behavior of q35. Subsequent patches will remove user_creatable=true from the devices that are really not meant to user-creatable on any machine, and remove the FIXME comment from the ones that are really supposed to be user-creatable. This is being done in separate patches because we still don't have an obvious list of devices that will be whitelisted by q35, and I would like to get each device reviewed individually. Cc: Alexander Graf <agraf@suse.de> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Beniamino Galvani <b.galvani@gmail.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Frank Blaschka <frank.blaschka@de.ibm.com> Cc: Gabriel L. Somlo <somlo@cmu.edu> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: John Snow <jsnow@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Laszlo Ersek <lersek@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Pierre Morel <pmorel@linux.vnet.ibm.com> Cc: Prasad J Pandit <pjp@fedoraproject.org> Cc: qemu-arm@nongnu.org Cc: qemu-block@nongnu.org Cc: qemu-ppc@nongnu.org Cc: Richard Henderson <rth@twiddle.net> Cc: Rob Herring <robh@kernel.org> Cc: Shannon Zhao <zhaoshenglong@huawei.com> Cc: sstabellini@kernel.org Cc: Thomas Huth <thuth@redhat.com> Cc: Yi Min Zhao <zyimin@linux.vnet.ibm.com> Acked-by: John Snow <jsnow@redhat.com> Acked-by: Juergen Gross <jgross@suse.com> Acked-by: Marcel Apfelbaum <marcel@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170503203604.31462-3-ehabkost@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [ehabkost: Small changes at sysbus_device_class_init() comments] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-17 10:37:01 -03:00
Eduardo Habkost	e90f2a8c3e	qdev: Replace cannot_instantiate_with_device_add_yet with !user_creatable cannot_instantiate_with_device_add_yet was introduced by commit `efec3dd631` to replace no_user. It was supposed to be a temporary measure. When it was introduced, we had 54 cannot_instantiate_with_device_add_yet=true lines in the code. Today (3 years later) this number has not shrunk: we now have 57 cannot_instantiate_with_device_add_yet=true lines. I think it is safe to say it is not a temporary measure, and we won't see the flag go away soon. Instead of a long field name that misleads people to believe it is temporary, replace it a shorter and less misleading field: user_creatable. Except for code comments, changes were generated using the following Coccinelle patch: @@ expression DC; @@ ( -DC->cannot_instantiate_with_device_add_yet = false; +DC->user_creatable = true; \| -DC->cannot_instantiate_with_device_add_yet = true; +DC->user_creatable = false; ) @@ typedef ObjectClass; expression dc; identifier class, data; @@ static void device_class_init(ObjectClass class, void data) { ... dc->hotpluggable = true; +dc->user_creatable = true; ... } @@ @@ struct DeviceClass { ... -bool cannot_instantiate_with_device_add_yet; +bool user_creatable; ... } @@ expression DC; @@ ( -!DC->cannot_instantiate_with_device_add_yet +DC->user_creatable \| -DC->cannot_instantiate_with_device_add_yet +!DC->user_creatable ) Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Laszlo Ersek <lersek@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Thomas Huth <thuth@redhat.com> Acked-by: Alistair Francis <alistair.francis@xilinx.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Acked-by: Marcel Apfelbaum <marcel@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170503203604.31462-2-ehabkost@redhat.com> [ehabkost: kept "TODO remove once we're there" comment] Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-17 10:37:00 -03:00
Stefan Hajnoczi	ba9915e1f8	x86 and machine queue, 2017-05-11 Highlights: * New "-numa cpu" option * NUMA distance configuration * migration/i386 vmstatification -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJZFLh3AAoJECgHk2+YTcWmppQQAJe9Y5a3VwHXqvHbwBHX2ysn RZDAUPd9DWpbM+UUydyKOVIZ7u5RXbbVq4E0NeCD8VYYd+grZB5Wo1cAzy3b4U2j 2s+MDqaPMtZtGoqxTsyQOVoVxazT5Kf1zglK+iUEzik44J7LGdro+ty2Z7Ut2c11 q9rE/GNS78czBm7c4lxgkxXW4N95K/tEGlLtDQ7uct//3U/ZimF+mO6GcbVFlOWT 4iEbOz2sqvBVv22nLJRufiPgFNIW4hizAz5KBWxwGFCCKvT3N6yYNKKjzEpCw+jE lpjIRODU02yIZZZY841fLRtyrk7p4zORS8jRaHTdEJgb5bGc/YazxxVL8nzRQT1W VxFwAMd+UNrDkV24hpN++Ln2O+b3kwcGZ7uA/qu9d5WvSYUKXlHqcMJ35q6zuhAI /ecfYO7EZfVP86VjIt5IH04iV8RChA9Q6de+kQEFa6wHUxufeCOwCFqukGo8zj07 plX8NcjnzYmSXKnYjHOHao4rKT+DiJhRB60rFiMeKP/qvKbZPjtgsIeonhHm53qZ /QwkhowahHKkpAnetIl0QHm8KS4YudAofMi/Fl+he4gRkEbSQVAo6iQb2L4cjcLC LNSDDsIVWGem4gCR+vcsFqB3lggRDfltHXm15JKh92UMpOr6RI6s8pD55T7EdnPC CfdxWB5kYM6/lLbOHj94 =48wH -----END PGP SIGNATURE----- Merge remote-tracking branch 'ehabkost/tags/x86-and-machine-pull-request' into staging x86 and machine queue, 2017-05-11 Highlights: * New "-numa cpu" option * NUMA distance configuration * migration/i386 vmstatification # gpg: Signature made Thu 11 May 2017 08:16:07 PM BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # gpg: Note: This key has expired! # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * ehabkost/tags/x86-and-machine-pull-request: (29 commits) migration/i386: Remove support for pre-0.12 formats vmstatification: i386 FPReg migration/i386: Remove old non-softfloat 64bit FP support tests: check -numa node,cpu=props_list usecase numa: add '-numa cpu,...' option for property based node mapping numa: remove node_cpu bitmaps as they are no longer used numa: use possible_cpus for not mapped CPUs check machine: call machine init from wrapper numa: remove no longer need numa_post_machine_init() tests: numa: add case for QMP command query-cpus QMP: include CpuInstanceProperties into query_cpus output output virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() numa: do default mapping based on possible_cpus instead of node_cpu bitmaps numa: mirror cpu to node mapping in MachineState::possible_cpus numa: add check that board supports cpu_index to node mapping virt-arm: add node-id property to CPU pc: add node-id property to CPU spapr: add node-id property to sPAPR core ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-15 14:12:03 +01:00
Stefan Hajnoczi	2f77ec7390	ppc patch queue for 2017-05-11 This pull request supersedes the one from yesterday (20170510), fixing an important style bug in one patch, and adding an extra couple of simple patches. Highlights of this set: * Some fixes for POWER9 * TCG support for POWER9 radix MMU * VGA rom for Mac machine types * Fixes for the XICS interrupt controller * MTTCG support for ppc targets As suggested by Paolo, I've tried to add the Docker tests to my standard pre-pull-request tests. I haven't wholly suceeded; this has been tested with some of the Docker images, but others I haven't managed due to problems that as best I can tell are not due to problems in this patch series. I'll continue working on this for future pull requests. Specifically, 'travis', 'fedora', and 'centos6' seem to work. 'min-glib' jammed while gtesting moxie, which seems very unlikely to be caused by this series. 'ubuntu', 'debian' and 'debian-bootstrap' hit build errors almost immediately that look like problems with the container configuration, and 'debian--cross' hit build errors later on which also look like missing dependencies from the container. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJZE+T6AAoJEGw4ysog2bOSgSYQAMszDZ+HCYlp6iVlJqDoy55S u8krYwkS9MnzrmbMjPVzGiFmH6IEuOd3zAx0aM1wZtcXjprUKsr6jHZOEtk7Frv5 vzIwzcP85vkVegPX+fNUAo+1+T3OHix9RAI3BF5rHdyCC2OmJriPyvyOQ76uVORJ 9ouSAeG/dCyjkVRYAlTQPidGqc/OQUaMFwZdLvhTJHeDqcdlqziCzP1YnDjN78UQ BRpsYOYFnGSzaqjNj16edF/yM4NiW/4tLd700mvGkvPUHrFEiyQur0Lm0bc1iZs5 JZwcgAxhivI6CiWt57y/OpC6pWsasVhlBY00aWBcEExAh6j+Kp20g0C6MYB4JdwX jVJUOzWGWuMFkS65S/nHmdngUWvrSpn1xzPr0KQihLRFpoYK3btaS2TcekQocnZc mF3NFvKXeS/F6ZYLDWkLF/9VVEjz2mJNRvimhMWljuFyLmxlQxQSvzNXZ7Lt/maj D4nFaOWf5eG0O0Em54hLizM6r6vnt2qkLVSjPmOFO2gQvDsu10G/5ociqkYNRYvz srJUfo2xMjzaM0lvJTJT3VOWfbX1Wq4A8zyjLuoi1xpqI1Yb95zEycFvc0Eszzh1 OByIuHLNynu+W2w08dAA8vU1tlh+Yf0yld2LOgY3Cn2gjgHQRFtmyrIsPgYsptC1 63Y0PbTnGdbaEdlAJu8I =D0Hk -----END PGP SIGNATURE----- Merge remote-tracking branch 'dgibson/tags/ppc-for-2.10-20170511' into staging ppc patch queue for 2017-05-11 This pull request supersedes the one from yesterday (20170510), fixing an important style bug in one patch, and adding an extra couple of simple patches. Highlights of this set: Some fixes for POWER9 * TCG support for POWER9 radix MMU * VGA rom for Mac machine types * Fixes for the XICS interrupt controller * MTTCG support for ppc targets As suggested by Paolo, I've tried to add the Docker tests to my standard pre-pull-request tests. I haven't wholly suceeded; this has been tested with some of the Docker images, but others I haven't managed due to problems that as best I can tell are not due to problems in this patch series. I'll continue working on this for future pull requests. Specifically, 'travis', 'fedora', and 'centos6' seem to work. 'min-glib' jammed while gtesting moxie, which seems very unlikely to be caused by this series. 'ubuntu', 'debian' and 'debian-bootstrap' hit build errors almost immediately that look like problems with the container configuration, and 'debian--cross' hit build errors later on which also look like missing dependencies from the container. # gpg: Signature made Thu 11 May 2017 05:13:46 AM BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 dgibson/tags/ppc-for-2.10-20170511: (23 commits) target/ppc: Avoid printing wrong aliases in CPU help text pnv: Fix build failures on some host platforms target/ppc: Allow workarounds for POWER9 DD1 spapr: Don't accidentally advertise HTM support on POWER9 ppc: xics: fix compilation with CentOS 6 target/ppc: Enable RADIX mmu mode for pseries TCG guest target/ppc: Implement ISA V3.00 radix page fault handler target/ppc: Change tlbie invalid fields for POWER9 support target/ppc: Update tlbie to check privilege level based on GTSE target/ppc: Set UPRT and GTSE on all cpus in H_REGISTER_PROCESS_TABLE ppc: add qemu_vga.ndrv ROM to fw_cfg interface for NewWorld Macs ppc: add qemu_vga.ndrv ROM to fw_cfg interface for OldWorld Macs Add QemuMacDrivers qemu_vga.ndrv revision d4e7d7a built as submodule Add QemuMacDrivers as submodule ppc/xics: preserve P and Q bits for KVM IRQs ppc/xics: Fix stale irq->status bits after get target/ppc: do not reset reserve_addr in exec_enter tcg: enable MTTCG by default for PPC64 on x86 cpus: Fix CPU unplug for MTTCG target/ppc: Generate fence operations ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-15 14:00:15 +01:00
Igor Mammedov	722387e78d	spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() it's safe to remove thread node_id != core node_id error branch as machine_set_cpu_numa_node() also does mismatch check and is called even before any CPU is created. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1494415802-227633-10-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:49 -03:00
Igor Mammedov	0b8497f08c	spapr: add node-id property to sPAPR core it will allow switching from cpu_index to core based numa mapping in follow up patches. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1494415802-227633-3-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:48 -03:00
Igor Mammedov	ea089eebbd	numa: move source of default CPUs to NUMA node mapping into boards Originally CPU threads were by default assigned in round-robin fashion. However it was causing issues in guest since CPU threads from the same socket/core could be placed on different NUMA nodes. Commit `fb43b73b` (pc: fix default VCPU to NUMA node mapping) fixed it by grouping threads within a socket on the same node introducing cpu_index_to_socket_id() callback and commit `20bb648d` (spapr: Fix default NUMA node allocation for threads) reused callback to fix similar issues for SPAPR machine even though socket doesn't make much sense there. As result QEMU ended up having 3 default distribution rules used by 3 targets /virt-arm, spapr, pc/. In effort of moving NUMA mapping for CPUs into possible_cpus, generalize default mapping in numa.c by making boards decide on default mapping and let them explicitly tell generic numa code to which node a CPU thread belongs to by replacing cpu_index_to_socket_id() with @cpu_index_to_instance_props() which provides default node_id assigned by board to specified cpu_index. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1494415802-227633-2-git-send-email-imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:48 -03:00
Laurent Vivier	3bfe57165b	numa: equally distribute memory on nodes When there are more nodes than available memory to put the minimum allowed memory by node, all the memory is put on the last node. This is because we put (ram_size / nb_numa_nodes) & ~((1 << mc->numa_mem_align_shift) - 1); on each node, and in this case the value is 0. This is particularly true with pseries, as the memory must be aligned to 256MB. To avoid this problem, this patch uses an error diffusion algorithm [1] to distribute equally the memory on nodes. We introduce numa_auto_assign_ram() function in MachineClass to keep compatibility between machine type versions. The legacy function is used with pseries-2.9, pc-q35-2.9 and pc-i440fx-2.9 (and previous), the new one with all others. Example: qemu-system-ppc64 -S -nographic -nodefaults -monitor stdio -m 1G -smp 8 \ -numa node -numa node -numa node \ -numa node -numa node -numa node Before: (qemu) info numa 6 nodes node 0 cpus: 0 6 node 0 size: 0 MB node 1 cpus: 1 7 node 1 size: 0 MB node 2 cpus: 2 node 2 size: 0 MB node 3 cpus: 3 node 3 size: 0 MB node 4 cpus: 4 node 4 size: 0 MB node 5 cpus: 5 node 5 size: 1024 MB After: (qemu) info numa 6 nodes node 0 cpus: 0 6 node 0 size: 0 MB node 1 cpus: 1 7 node 1 size: 256 MB node 2 cpus: 2 node 2 size: 0 MB node 3 cpus: 3 node 3 size: 256 MB node 4 cpus: 4 node 4 size: 256 MB node 5 cpus: 5 node 5 size: 256 MB [1] https://en.wikipedia.org/wiki/Error_diffusion Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170502162955.1610-2-lvivier@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> [ehabkost: s/ram_size/size/ at numa_default_auto_assign_ram()] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2017-05-11 16:08:47 -03:00
David Gibson	9bf502fe12	spapr: Don't accidentally advertise HTM support on POWER9 Logic in spapr_populate_pa_features() enables the bit advertising Hardware Transactional Memory (HTM) in the guest's device tree only when KVM advertises its availability with the KVM_CAP_PPC_HTM feature. However, this assumes that the HTM bit is off in the base template used for the device tree value. That is true for POWER8, but not for POWER9. It looks like that was accidentally changed in `9fb4541` "spapr: Enable ISA 3.0 MMU mode selection via CAS". Fixes: `9fb4541f58` Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com>	2017-05-11 09:45:15 +10:00
Suraj Jitindar Singh	545d6e2b5c	target/ppc: Enable RADIX mmu mode for pseries TCG guest Now that we have added all the infrastructure we can enable a pseries TCG guest to use radix. In order to do this we have to add the appropriate bits to the ibm,arch-vec-5-platform-support vector to represent that we support both hash and radix mmu models. A radix guest can now be booted in pseries tcg mode by specifying: -cpu POWER9 Note that we assume hash, that is we allocate a hpt, until a guest tells us otherwise via a H_REGISTER_PROCESS_TABLE call with radix specified - in which case we free the hpt. If we were right and the guest is hash then there's nothing for us to do. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-11 09:45:15 +10:00
Suraj Jitindar Singh	6de833070c	target/ppc: Set UPRT and GTSE on all cpus in H_REGISTER_PROCESS_TABLE The UPRT and GTSE bits are set when a guest calls H_REGISTER_PROCESS_TABLE to choose determine how address translation is performed. Currently these bits in the LPCR are only set for the cpu which handles the H_CALL, however they need to be set for all cpus for that guest as address translation cannot be performed differently on a per cpu basis. Update the H_CALL handler to set these bits in the LPCR correctly for all cpus of the guest. Note it is the reponsibility of the guest to ensure that any secondary cpus are suspended when the H_CALL is made and thus we can safely update these values here. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-11 09:45:15 +10:00
Mark Cave-Ayland	53ecf09df3	ppc: add qemu_vga.ndrv ROM to fw_cfg interface for NewWorld Macs Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-11 09:45:15 +10:00
Mark Cave-Ayland	b50de5cd77	ppc: add qemu_vga.ndrv ROM to fw_cfg interface for OldWorld Macs Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-11 09:45:15 +10:00
Cédric Le Goater	a1a636b8b4	ppc/pnv: restrict BMC object to the BMC simulator Today, when a PowerNV guest runs, it uses the sensor definitions of the BMC simulator to populate the device tree. But an external IPMI BMC could also be used and, in that case, it is not (yet) possible to retrieve the sensor list. Generating the OEM SEL event for shutdown or reboot also does not make sense as it should be generated on the BMC side. This change allows a guest to use an 'ipmi-bmc-extern' backend to the 'isa-ipmi-bt' device and a 'chardev' for transport such as : -chardev socket,id=ipmi0,host=localhost,port=9002,reconnect=10 \ -device ipmi-bmc-extern,id=bmc0,chardev=ipmi0 \ -device isa-ipmi-bt,bmc=bmc0,irq=10 and connect to a BMC simulator, the OpenIPMI ipmi_sim simulator for instance. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-05-11 09:45:14 +10:00
KONRAD Frederic	2d812d6dff	ppc_booke: drop useless assignment The tb_env variable is set two lines above. So just drop the double assignment. Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-05-07 09:57:51 +03:00
Ishani Chugh	d0e31a105e	Remove reduntant qemu: from error functions This patch removes redundant "qemu:" from error functions. The link to the bitesized task is: http://wiki.qemu-project.org/Contribute/BiteSizedTasks#Error_checking Signed-off-by: Ishani Chugh <chugh.ishani@research.iiit.ac.in> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-05-07 09:57:51 +03:00
Bharata B Rao	8f37e54e5b	spapr-cpu-core: Release ICPState object during CPU unrealization Recent commits that re-organized ICPState object missed to destroy the object when CPU is unrealized. Fix this so that CPU unplug doesn't abort QEMU. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:41:56 +10:00
Cédric Le Goater	bce0b69159	ppc/pnv: generate an OEM SEL event on shutdown OpenPOWER systems expect to be notified with such an event before a shutdown or a reboot. An OEM SEL message is sent with specific identifiers and a user data containing the request : OFF or REBOOT. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:41:56 +10:00
Cédric Le Goater	aeaef83dab	ppc/pnv: add initial IPMI sensors for the BMC simulator Skiboot, the firmware for the PowerNV platform, expects the BMC to provide some specific IPMI sensors. These sensors are exposed in the device tree and their values are updated by the firmware at boot time. Sensors of interest are : "FW Boot Progress" "Boot Count" As such a device is defined on the command line, we can only detect its presence at reset time. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:41:56 +10:00
Cédric Le Goater	04f6c8b2c0	ppc/pnv: populate device tree for IPMI BT devices When an ipmi-bt device [1] is defined on the ISA bus, we need to populate the device tree with the object properties. Such devices are created with the command line options : -device ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 [1] https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg03168.html Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:41:56 +10:00
Cédric Le Goater	cb228f5a00	ppc/pnv: populate device tree for serial devices Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:41:56 +10:00
Cédric Le Goater	c5ffdcaea5	ppc/pnv: populate device tree for RTC devices The code could be common to any ISA device but we are missing the IO length. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:41:56 +10:00
Cédric Le Goater	e7a3fee340	ppc/pnv: scan ISA bus to populate device tree This is an empty shell that we will use to include nodes in the device tree for ISA devices. We expect RTC, UART and IPMI BT devices. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:41:55 +10:00
Cédric Le Goater	5a7e14a274	ppc/pnv: enable only one LPC bus The default LPC bus of a multichip system is on chip 0. It's recognized by the firmware (skiboot) using a "primary" property in the device tree. We introduce a pnv_chip_lpc_offset() routine to locate the LPC node of a chip and set the property directly from the machine level. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:41:55 +10:00
Benjamin Herrenschmidt	4d1df88b63	ppc/pnv: Add support for POWER8+ LPC Controller It adds the Naples chip which supports proper LPC interrupts via the LPC controller rather than via an external CPLD. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [clg: - updated for qemu-2.9 - ported on latest PowerNV patchset - moved the IRQ handler in pnv_lpc.c - introduced pnv_lpc_isa_irq_create() to create the ISA IRQs ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:41:55 +10:00
Cédric Le Goater	71cd4dace9	spapr: remove the 'nr_servers' field from the machine xics_system_init() does not need 'nr_servers' anymore as it is only used to define the 'interrupt-controller' node in the device tree. So let's just compute the value when calling spapr_dt_xics(). This also gives us an opportunity to simplify the xics_system_init() routine and introduce a specific spapr_ics_create() helper to create the sPAPR ICS object. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:41:55 +10:00
Benjamin Herrenschmidt	0722d05ad8	ppc/pnv: Add OCC model stub with interrupt support The OCC is an on-chip microcontroller based on a ppc405 core used for various power management tasks. It comes with a pile of additional hardware sitting on the PIB (aka XSCOM bus). At this point we don't emulate it (nor plan to do so). However there is one facility which is provided by the surrounding hardware that we do need, which is the interrupt generation facility. OPAL uses it to send itself interrupts under some circumstances and there are other uses around the corner. So this implement just enough to support this. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [clg: - updated for qemu-2.9 - changed the XSCOM interface to fit new model - QOMified the model ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Cédric Le Goater	54f59d786c	ppc/pnv: Add cut down PSI bridge model and hookup external interrupt The Processor Service Interface (PSI) Controller is one of the engines of the "Bridge" unit which connects the different interfaces to the Power Processor. This adds just enough of the PSI bridge to handle various on-chip and the one external interrupt. The rest of PSI has to do with the link to the IBM FSP service processor which we don't plan to emulate (not used on OpenPower machines). The ics_get() and ics_resend() handlers of the XICSFabric interface of the PowerNV machine are now defined to handle the Interrupt Control Source of PSI. The InterruptStatsProvider interface is also modified to dump the new ICS. Originally from Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Cédric Le Goater	bf5615e77c	ppc/pnv: add memory regions for the ICP registers This provides to a PowerNV chip (POWER8) access to the Interrupt Management area, which contains the registers of the Interrupt Control Presenters of each thread. These are used to accept, return, forward interrupts in the system. This area is modeled with a per-chip container memory region holding all the ICP registers. Each thread of a chip is then associated with its ICP registers using a memory subregion indexed by its PIR number in the overall region. The device tree is populated accordingly. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Cédric Le Goater	960fbd29e5	ppc/pnv: create the ICP object under PnvCore Each thread of a core is linked to an ICP. This allocates a PnvICPState object before the PowerPCCPU object is realized and lets the XICSFabric do the store under the 'intc' backlink when xics_cpu_setup() is called. This modeling removes the need of maintaining an array of ICP objects under the PowerNV machine and also simplifies the XICSFabric icp_get() handler. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Cédric Le Goater	47fea43aa3	ppc/pnv: extend the machine with a InterruptStatsProvider interface Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Cédric Le Goater	36fc6f0800	ppc/pnv: extend the machine with a XICSFabric interface A XICSFabric QOM interface is used by the XICS layer to manipulate the ICP and ICS objects. Let's define the associated handlers for the PowerNV machine. All handlers should be defined even if there is no ICS under the PowerNV machine yet. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Cédric Le Goater	5bc8d26de2	spapr: allocate the ICPState object from under sPAPRCPUCore Today, all the ICPs are created before the CPUs, stored in an array under the sPAPR machine and linked to the CPU when the core threads are realized. This modeling brings some complexity when a lookup in the array is required and it can be simplified by allocating the ICPs when the CPUs are. This is the purpose of this proposal which introduces a new 'icp_type' field under the machine and creates the ICP objects of the right type (KVM or not) before the PowerPCCPU object are. This change allows more cleanups : the removal of the icps array under the sPAPR machine and the removal of the xics_get_cpu_index_by_dt_id() helper. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Cédric Le Goater	06747ba6d4	spapr: move the IRQ server number mapping under the machine This is the second step to abstract the IRQ 'server' number of the XICS layer. Now that the prereq cleanups have been done in the previous patch, we can move down the 'cpu_dt_id' to 'cpu_index' mapping in the sPAPR machine handler. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Cédric Le Goater	ad5d1add86	ppc/xics: introduce an 'intc' backlink under PowerPCCPU Today, the ICPState array of the sPAPR machine is indexed with 'cpu_index' of the CPUState. This numbering of CPUs is internal to QEMU and the guest only knows about what is exposed in the device tree, that is the 'cpu_dt_id'. This is why sPAPR uses the helper xics_get_cpu_index_by_dt_id() to do the mapping in a couple of places. To provide a more generic XICS layer, we need to abstract the IRQ 'server' number and remove any assumption made on its nature. It should not be used as a 'cpu_index' for lookups like xics_cpu_setup() and xics_cpu_destroy() do. To reach that goal, we choose to introduce a generic 'intc' backlink under PowerPCCPU, and let the machine core init routine do the ICPState lookup. The resulting object is passed on to xics_cpu_setup() which does the store under PowerPCCPU. The IRQ 'server' number in XICS is now generic. sPAPR uses 'cpu_dt_id' and PowerNV will use 'PIR' number. This also has the benefit of simplifying the sPAPR hcall routines which do not need to do any ICPState lookups anymore. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:42 +10:00
Alexey Kardashevskiy	c88fa6dd4a	spapr_pci: Removed unused include Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Alexey Kardashevskiy	a01f3432dd	spapr_pci: Warn when RAM page size is not enabled in IOMMU page mask If a page size used by QEMU is not enabled in the PHB IOMMU page mask, in-kernel acceleration of TCE handling won't be enabled and performance might be slower than expected. This prints a warning if system page size is not enabled. This should print a warning if huge pages are enabled but sphb.pgsz still uses the default value of 4K\|64K. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Alexey Kardashevskiy	3dc410ae83	target-ppc/kvm: Enable in-kernel TCE acceleration for multi-tce This enables in-kernel handling of H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls. The host kernel support is there since v4.6, in particular d3695aa4f452 ("KVM: PPC: Add support for multiple-TCE hcalls"). H_PUT_TCE is already accelerated and does not need any special enablement. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Sam Bobroff	e957f6a9b9	spapr: Workaround for broken radix guests For a little while around 4.9, Linux kernels that saw the radix bit in ibm,pa-features would attempt to set up the MMU as if they were a hypervisor, even if they were a guest, which would cause them to crash. Work around this by detecting pre-ISA 3.0 guests by their lack of that bit in option vector 1, and then removing the radix bit from ibm,pa-features. Note: This now requires regeneration of that node after CAS negotiation. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Sam Bobroff	9fb4541f58	spapr: Enable ISA 3.0 MMU mode selection via CAS Add the new node, /chosen/ibm,arch-vec-5-platform-support to the device tree. This allows the guest to determine which modes are supported by the hypervisor. Update the option vector processing in h_client_architecture_support() to handle the new MMU bits. This allows guests to request hash or radix mode and QEMU to create the guest's HPT at this time if it is necessary but hasn't yet been done. QEMU will terminate the guest if it requests an unavailable mode, as required by the architecture. Extend the ibm,pa-features node with the new ISA 3.0 values and set the radix bit if KVM supports radix mode. This probably won't be used directly by guests to determine the availability of radix mode (that is indicated by the new node added above) but the architecture requires that it be set when the hardware supports it. If QEMU is using KVM, and KVM is capable of running in radix mode, guests can be run in real-mode without allocating a HPT (because KVM will use a minimal RPT). So in this case, we avoid creating the HPT at reset time and later (during CAS) create it if it is necessary. ISA 3.0 guests will now begin to call h_register_process_table(), which has been added previously. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Strip some unneeded prefix from error messages] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Sam Bobroff	86d5771a5a	spapr: move spapr_populate_pa_features() In the next patch, spapr_fixup_cpu_dt() will need to call spapr_populate_pa_features() so move it's definition up without making any other changes. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Suraj Jitindar Singh	b4db54132f	target/ppc: Implement H_REGISTER_PROCESS_TABLE H_CALL The H_REGISTER_PROCESS_TABLE H_CALL is used by a guest to indicate to the hypervisor where in memory its process table is and how translation should be performed using this process table. Provide the implementation of this H_CALL for a guest. We first check for invalid flags, then parse the flags to determine the operation, and then check the other parameters for valid values based on the operation (register new table/deregister table/maintain registration). The process table is then stored in the appropriate location and registered with the hypervisor (if running under KVM), and the LPCR_[UPRT/GTSE] bits are updated as required. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Correct missing prototype and uninitialized variable] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Suraj Jitindar Singh	d77a98b015	target/ppc: Add new H-CALL shells for in memory table translation The use of the new in memory tables introduced in ISAv3.00 for translation, also referred to as process tables, requires the introduction of 3 new H-CALLs; H_REGISTER_PROCESS_TABLE, H_CLEAN_SLB, and H_INVALIDATE_PID. Add shells for each of these and register them as the hypercall handlers. Currently they all log an unimplemented hypercall and return H_FUNCTION. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Sam Bobroff	c64abd1f9c	spapr: Add ibm,processor-radix-AP-encodings to the device tree Use the new ioctl, KVM_PPC_GET_RMMU_INFO, to fetch radix MMU information from KVM and present the page encodings in the device tree under ibm,processor-radix-AP-encodings. This provides page size information to the guest which is necessary for it to use radix mode. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Compile fix for 32-bit targets, style nit fix] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Alexey Kardashevskiy	d6ee2a7c85	target-ppc: kvm: make use of KVM_CREATE_SPAPR_TCE_64 KVM_CAP_SPAPR_TCE capability allows creating TCE tables in KVM which allows having in-kernel acceleration for H_PUT_TCE_xxx hypercalls. However it only supports 32bit DMA windows at zero bus offset. There is a new KVM_CAP_SPAPR_TCE_64 capability which supports 64bit window size, variable page size and bus offset. This makes use of the new capability. The kernel headers are already updated as the kernel support went in to v4.6. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Thomas Huth	9d169fb3c8	hw/ppc/pnv: Classify the "PowerNV Chip" devices as CPU devices The devices that are derived from TYPE_PNV_CHIP currently show up as "uncategorized" devices in the help text of "-device ?". Since they obviously are related to the CPU, let's put them into the CPU category instead. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
Cédric Le Goater	147ff8079e	ppc/spapr: QOM'ify sPAPRRTCState Also use an 'sPAPRRTCState' attribute under the sPAPR machine to hold the RTC object. Overall, these changes remove an unnecessary and implicit dependency on SysBus. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
David Gibson	3fa14fbe13	pseries: Add pseries-2.10 machine type Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-04-26 12:00:41 +10:00
David Gibson	8149e2992f	pseries: Enforce homogeneous threads-per-core For reasons that may be useful in future, CPU core objects, as used on the pseries machine type have their own nr-threads property, potentially allowing cores with different numbers of threads in the same system. If the user/management uses the values specified in query-hotpluggable-cpus as they're expected to do, this will never matter in pratice. But that's not actually enforced - it's possible to manually specify a core with a different number of threads from that in -smp. That will confuse the platform - most immediately, this can be used to create a CPU thread with index above max_cpus which leads to an assertion failure in spapr_cpu_core_realize(). For now, enforce that all cores must have the same, standard, number of threads. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>	2017-04-03 13:46:18 +10:00
Laurent Vivier	fe6824d126	spapr: fix memory hot-unplugging If, once the kernel has booted, we try to remove a memory hotplugged while the kernel was not started, QEMU crashes on an assert: qemu-system-ppc64: hw/virtio/vhost.c:651: vhost_commit: Assertion `r >= 0' failed. ... #4 in vhost_commit #5 in memory_region_transaction_commit #6 in pc_dimm_memory_unplug #7 in spapr_memory_unplug #8 spapr_machine_device_unplug #9 in hotplug_handler_unplug #10 in spapr_lmb_release #11 in detach #12 in set_allocation_state #13 in rtas_set_indicator ... If we take a closer look to the guest kernel log, we can see when we try to unplug the memory: pseries-hotplug-mem: Attempting to hot-add 4 LMB(s) What happens: 1- The kernel has ignored the memory hotplug event because it was not started when it was generated. 2- When we hot-unplug the memory, QEMU starts to remove the memory, generates an hot-unplug event, and signals the kernel of the incoming new event 3- as the kernel is started, on the QEMU signal, it reads the event list, decodes the hotplug event and tries to finish the hotplugging. 4- QEMU receive the the hotplug notification while it is trying to hot-unplug the memory. This moves the memory DRC to an invalid state This patch prevents this by not allowing to set the allocation state to USABLE while the DRC is awaiting release. RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382 Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-29 11:35:16 +11:00
Marc-André Lureau	24ec2863b1	spapr: fix buffer-overflow Running postcopy-test with ASAN produces the following error: QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 tests/postcopy-test ... ================================================================= ==23641==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f1556600000 at pc 0x55b8e9d28208 bp 0x7f1555f4d3c0 sp 0x7f1555f4d3b0 READ of size 8 at 0x7f1556600000 thread T6 #0 0x55b8e9d28207 in htab_save_first_pass /home/elmarco/src/qq/hw/ppc/spapr.c:1528 #1 0x55b8e9d2939c in htab_save_iterate /home/elmarco/src/qq/hw/ppc/spapr.c:1665 #2 0x55b8e9beae3a in qemu_savevm_state_iterate /home/elmarco/src/qq/migration/savevm.c:1044 #3 0x55b8ea677733 in migration_thread /home/elmarco/src/qq/migration/migration.c:1976 #4 0x7f15845f46c9 in start_thread (/lib64/libpthread.so.0+0x76c9) #5 0x7f157d9d0f7e in clone (/lib64/libc.so.6+0x107f7e) 0x7f1556600000 is located 0 bytes to the right of 2097152-byte region [0x7f1556400000,0x7f1556600000) allocated by thread T0 here: #0 0x7f159bb76980 in posix_memalign (/lib64/libasan.so.3+0xc7980) #1 0x55b8eab185b2 in qemu_try_memalign /home/elmarco/src/qq/util/oslib-posix.c:106 #2 0x55b8eab186c8 in qemu_memalign /home/elmarco/src/qq/util/oslib-posix.c:122 #3 0x55b8e9d268a8 in spapr_reallocate_hpt /home/elmarco/src/qq/hw/ppc/spapr.c:1214 #4 0x55b8e9d26e04 in ppc_spapr_reset /home/elmarco/src/qq/hw/ppc/spapr.c:1261 #5 0x55b8ea12e913 in qemu_system_reset /home/elmarco/src/qq/vl.c:1697 #6 0x55b8ea13fa40 in main /home/elmarco/src/qq/vl.c:4679 #7 0x7f157d8e9400 in __libc_start_main (/lib64/libc.so.6+0x20400) Thread T6 created by T0 here: #0 0x7f159bae0488 in __interceptor_pthread_create (/lib64/libasan.so.3+0x31488) #1 0x55b8eab1d9cb in qemu_thread_create /home/elmarco/src/qq/util/qemu-thread-posix.c:465 #2 0x55b8ea67874c in migrate_fd_connect /home/elmarco/src/qq/migration/migration.c:2096 #3 0x55b8ea66cbb0 in migration_channel_connect /home/elmarco/src/qq/migration/migration.c:500 #4 0x55b8ea678f38 in socket_outgoing_migration /home/elmarco/src/qq/migration/socket.c:87 #5 0x55b8eaa5a03a in qio_task_complete /home/elmarco/src/qq/io/task.c:142 #6 0x55b8eaa599cc in gio_task_thread_result /home/elmarco/src/qq/io/task.c:88 #7 0x7f15823e38e6 (/lib64/libglib-2.0.so.0+0x468e6) SUMMARY: AddressSanitizer: heap-buffer-overflow /home/elmarco/src/qq/hw/ppc/spapr.c:1528 in htab_save_first_pass index seems to be wrongly incremented, unless I miss something that would be worth a comment. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-29 11:35:02 +11:00
Laurent Vivier	55641213fc	numa,spapr: align default numa node memory size to 256MB Since commit `224245b` ("spapr: Add LMB DR connectors"), NUMA node memory size must be aligned to 256MB (SPAPR_MEMORY_BLOCK_SIZE). But when "-numa" option is provided without "mem" parameter, the memory is equally divided between nodes, but 8MB aligned. This can be not valid for pseries. In that case we can have: $ ./ppc64-softmmu/qemu-system-ppc64 -m 4G -numa node -numa node -numa node qemu-system-ppc64: Node 0 memory size 0x55000000 is not aligned to 256 MiB With this patch, we have: (qemu) info numa 3 nodes node 0 cpus: 0 node 0 size: 1280 MB node 1 cpus: node 1 size: 1280 MB node 2 cpus: node 2 size: 1536 MB Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-22 11:32:42 +11:00
Paolo Bonzini	d2528bdc19	qemu-timer: do not include sysemu/cpus.h from util/qemu-timer.h This dependency is the wrong way, and we will need util/qemu-timer.h from sysemu/cpus.h in the next patch. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-03-14 13:28:18 +01:00
David Gibson	82516263ce	pseries: Don't expose PCIe extended config space on older machine types `bb9986452` "spapr_pci: Advertise access to PCIe extended config space" allowed guests to access the extended config space of PCI Express devices via the PAPR interfaces, even though the paravirtualized bus mostly acts like plain PCI. However, that patch enabled access unconditionally, including for existing machine types, which is an unwise change in behaviour. This patch limits the change to pseries-2.9 (and later) machine types. Suggested-by: Andrea Bolognani <abologna@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-14 11:54:17 +11:00
Peter Maydell	56b51708e9	ppc patch queue for 2017-03-06 Looks like my previous batch wasn't quite the last before hard freeze. This has a handful of bugfixes to go in. They're all genuine bugfixes, though not regressions in some cases. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYvOCUAAoJEGw4ysog2bOSQWgQAKzPeIqz8I/1eXL+zmZCUaiU J2gyjzfaKkQ/AVGPtT45ZjJsihxSFbZT6koxXtEaxwq5DD87yXQOqA/d+BH7jr5d 75FGjVzKOA0IKQymySztwoC2j/ftWmmSx0N6YUmL0QcXCISS1YHRvdQkdXf6j4I/ XtK1FA34wmCsTK1AgZ9WDxjABdkHP+7FDRBpVmr01Nv1TeK2Xms2MqJ5Wku/lOX/ 6bg1KbC8pVHy5YZhIpRFzgGxaMr2UcJ0Q3YR9fD/4UW/k518sJk+i2xlagVsFxyG gqfPolv0wjwuGpYt42UyFG4IouCbKN+MChU5MBIaqU10VouOw+0/W+p+1ZOHgdB8 GoaBGyfuJ6/i4EQL0/+FL4hPOI5vHLliWxPfMJxDL5ujP0cFaPm2XbK5Yqxksu3m uYp3yYIbiSaF8QUxbBjAAoKPdVpP5dsgHjAlxecwCUGlIo0Ur3uphnU5lPoNlvS4 5ZcDDlMGjPb0oIHfdPt2ai8g+32uAsD7X7pi+qI0x+srSnjisRpOT2wKv0otMbGx U4j01/Na2DjFjhGW+vNm9UYsE/QgKr6pU9z3jUXOIplX1HBXirtfv5C/OypCN7Zj LgqsmiMWMJFjSLk8N8cxeM1w839B3wEM+2+46su7/qpW9sd0jKvHk0cJDyZPzn29 zQ52CbQQiewXM8y+mffe =/RZL -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170306' into staging ppc patch queue for 2017-03-06 Looks like my previous batch wasn't quite the last before hard freeze. This has a handful of bugfixes to go in. They're all genuine bugfixes, though not regressions in some cases. # gpg: Signature made Mon 06 Mar 2017 04:07:48 GMT # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.9-20170306: target/ppc: use helper for excp handling target/ppc: fmadd: add macro for updating flags target/ppc: fmadd check for excp independently spapr: ensure that all threads within core are on the same NUMA node ppc/xics: register reset handlers for the ICP and ICS objects Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-03-06 13:06:30 +00:00
Igor Mammedov	17b7c39e27	spapr: ensure that all threads within core are on the same NUMA node Threads within a core shouldn't be on different NUMA nodes, so if user has misconfgured command line, fail QEMU at start up to force user fix it. For now use the first thread on the core as source of core's node-id. Later when cpu-numa refactoring lands it will be switched to core's node-id from possible_cpus[]. This prevents the same problems as commit `20bb648d` "spapr: Fix default NUMA node allocation for threads", but for the case of manually configured NUMA node mappings, instead of just the default case. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-06 10:32:53 +11:00
Cédric Le Goater	7ea6e06717	ppc/xics: register reset handlers for the ICP and ICS objects The recent changes on the XICS layer removed the XICSState object to let the sPAPR machine handle the ICP and ICS directly. The reset of these objects was previously handled by XICSState, which was a SysBus device, and to keep the same behavior, the ICP and ICS were assigned to SysbBus. But that broke the 'info qtree' command in the monitor. 'qtree' performs a loop on the children of a bus to print their properties and SysBus devices are expected to be found under SysBus, which is not the case anymore. The fix for this problem is to register reset handlers for the ICP and ICS objects and stop using SysBus for such devices. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-06 10:07:38 +11:00
Markus Armbruster	a4a1c70dc7	qapi: Make input visitors detect unvisited list tails Fix the design flaw demonstrated in the previous commit: new method check_list() lets input visitors report that unvisited input remains for a list, exactly like check_struct() lets them report that unvisited input remains for a struct or union. Implement the method for the qobject input visitor (straightforward), and the string input visitor (less so, due to the magic list syntax there). The opts visitor's list magic is even more impenetrable, and all I can do there today is a stub with a FIXME comment. No worse than before. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1488544368-30622-26-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-03-05 09:14:20 +01:00
Sam Bobroff	ec975e839c	spapr: Small cleanup of PPC MMU enums The PPC MMU types are sometimes treated as if they were a bit field and sometime as if they were an enum which causes maintenance problems: flipping bits in the MMU type (which is done on both the 1TB segment and 64K segment bits) currently produces new MMU type values that are not handled in every "switch" on it, sometimes causing an abort(). This patch provides some macros that can be used to filter out the "bit field-like" bits so that the remainder of the value can be switched on, like an enum. This allows removal of all of the "degraded" types from the list and should ease maintenance. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-03 11:30:59 +11:00
David Gibson	bb99864528	spapr_pci: Advertise access to PCIe extended config space The (paravirtual) PCI host bridge on the 'pseries' machine in most regards acts like a regular PCI bus, rather than a PCIe bus. Despite this, though, it does allow access to the PCIe extended config space. We already implemented the RTAS methods to allow this access.. but forgot to put the markers into the device tree so that guest's know it is there. This adds them in. With this, a pseries guest is able to view extended config space on (for example an e1000e device. This should be enough to allow guests to use at least some PCIe devices. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-03 11:30:59 +11:00
Suraj Jitindar Singh	24d8e5655f	hw/ppc/spapr: Add POWER9 to pseries cpu models Add POWER9 cpu to list of spapr core models which allows it to be specified as the cpu model for a pseries guest (e.g. -machine pseries -cpu POWER9). This now allows a POWER9 cpu to boot to userspace in tcg emulation for a pseries machine with a legacy kernel. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-03 11:30:59 +11:00
Suraj Jitindar Singh	4975c098c9	target/ppc/POWER9: Add POWER9 pa-features definition Add a pa-features definition which includes all of the new fields which have been added, note we don't claim support for any of these new features at this stage. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-03 11:30:59 +11:00
Suraj Jitindar Singh	9861bb3efd	target/ppc: Add patb_entry to sPAPRMachineState ISA v3.00 adds the idea of a partition table which is used to store the address translation details for all partitions on the system. The partition table consists of double word entries indexed by partition id where the second double word contains the location of the process table in guest memory. The process table is registered by the guest via a h-call. We need somewhere to store the address of the process table so we add an entry to the sPAPRMachineState struct called patb_entry to represent the second doubleword of a single partition table entry corresponding to the current guest. We need to store this value so we know if the guest is using radix or hash translation and the location of the corresponding process table in guest memory. Since we only have a single guest per qemu instance, we only need one entry. Since the partition table is technically a hypervisor resource we require that access to it is abstracted by the virtual hypervisor through the get_patbe() call. Currently the value of the entry is never set (and thus defaults to 0 indicating hash), but it will be required to both implement POWER9 kvm support and tcg radix support. We also add this field to be migrated as part of the sPAPRMachineState as we will need it on the receiving side as the guest will never tell us this information again and we need it to perform translation. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-03 11:30:59 +11:00
Cédric Le Goater	6449da4545	ppc/xics: move InterruptStatsProvider to the sPAPR machine It provides a better monitor output of the ICP and ICS objects, else the objects are printed out of order. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:40 +11:00
Cédric Le Goater	a7ff1212e9	ppc/xics: move ics-simple post_load under the machine The ICS object uses a post_load() handler which is implicitly relying on the fact that the internal state of the ICS and ICP objects has been restored but this is not guaranteed. So, let's move the code under the post_load() handler of the machine where we know the objects have been fully restored. The icp_resend() handler of the XICSFabric QOM interface is also removed as it is now obsolete. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:40 +11:00
Cédric Le Goater	e6f7e110ee	ppc/xics: remove the XICSState classes The XICSState classes are not used anymore. They have now been fully deprecated by the XICSFabric QOM interface. Do the cleanups. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:40 +11:00
Cédric Le Goater	2192a9303d	ppc/xics: export the XICS init routines There is nothing left related to the XICS object in the realize functions of the KVMXICSState and XICSState class. So adapt the interfaces to call these routines directly from the sPAPR machine init sequence. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:40 +11:00
Cédric Le Goater	852ad27e14	ppc/xics: move the ICP array under the sPAPR machine This is the last step to remove the XICSState abstraction and have the machine hold all the objects related to interrupts : ICSs and ICPs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:40 +11:00
Cédric Le Goater	20147f2fce	ppc/xics: register the reset handler of ICP objects The reset of the ICP objects is currently handled by XICS but this can be done for each individual ICP. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:40 +11:00
Cédric Le Goater	b0ec31290c	ppc/xics: simplify spapr_dt_xics() interface spapr_dt_xics() only needs the number of servers to build the device tree nodes. Let's change the routine interface to reflect that. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	b4f27d71e3	ppc/xics: use the QOM interface to grab an ICP Also introduce a xics_icp_get() helper to simplify the changes. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	b2fc59aaf9	ppc/xics: extend the QOM interface to handle ICPs Let's add two new handlers for ICPs. One is to get an ICP object from a server number and a second is to resend the irqs when needed. The icp_resend() handler is a temporary workaround needed by the ics-simple post_load() handler. It will be removed when the post_load portion can be done at the machine level. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	d114a66225	ppc/xics: remove the XICS list of ICS This is not used anymore. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	c79b2fdd7b	ppc/xics: register the reset handler of ICS objects The reset of the ICS objects is currently handled by XICS but this can be done for each individual ICS. This also reduces the use of the XICS list of ICS. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	2cd908d0ad	ppc/xics: use the QOM interface to resend irqs Also change the ICPState 'xics' backlink to be a XICSFabric, this removes the need of using qdev_get_machine() to get the QOM interface in some of the routines. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	f7759e4331	ppc/xics: use the QOM interface to get irqs Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	7844e12b28	ppc/xics: use the QOM interface under the sPAPR machine Add 'ics_get' and 'ics_resend' handlers to the sPAPR machine. These are relatively simple for a single ICS. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	681bfaded6	ppc/xics: store the ICS object under the sPAPR machine A list of ICS objects was introduced under the XICS object for the PowerNV machine but, for the sPAPR machine, it brings extra complexity as there is only a single ICS. To simplify the code, let's add the ICS pointer under the sPAPR machine and try to reduce the use of this list where possible. Also, change the xics_spapr_*() routines to use an ICS object instead of an XICSState and change their name to reflect that these are specific to the sPAPR ICS object. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	817bb6a446	ppc/xics: remove set_nr_servers() handler from XICSStateClass Today, the ICP (Interrupt Controller Presenter) objects are created by the 'nr_servers' property handler of the XICS object and a class handler. They are realized in the XICS object realize routine. Let's simplify the process by creating the ICP objects along with the XICS object at the machine level. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Cédric Le Goater	4e4169f7a2	ppc/xics: remove set_nr_irqs() handler from XICSStateClass Today, the ICS (Interrupt Controller Source) object is created and realized by the init and realize routines of the XICS object, but some of the parameters are only known at the machine level. These parameters are passed from the sPAPR machine to the ICS object in a rather convoluted way using property handlers and a class handler of the XICS object. The number of irqs required to allocate the IRQ state objects in the ICS realize routine is one of them. Let's simplify the process by creating the ICS object along with the XICS object at the machine level and link the ICS into the XICS list of ICSs at this level also. In the sPAPR machine, there is only a single ICS but that will change with the PowerNV machine. Also, QOMify the creation of the objects and get rid of the superfluous code. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
David Gibson	738d5db824	xics: XICS should not be a SysBusDevice Currently xics - the component of the IBM POWER interrupt controller representing the overall interrupt fabric / architecture is represented as a descendent of SysBusDevice. However, this is not really correct - the xics presents nothing in MMIO space so it should be an "unattached" device in the current QOM model. Since this device will always be created by the machine type, not created specifically from the command line, and because it has no migrated state it should be safe to move it around the device composition tree. Therefore this patch changes it to a descendent of TYPE_DEVICE, and makes it an unattached device. So that its reset handler still gets called correctly, we add a qdev_set_parent_bus() to attach it to sysbus. It's not really clear that's correct (instead of using register_reset()) but it appears to a common technique. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [clg corrected problems with reset] Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg folded together and updated commit message] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Greg Kurz	a8eeafda19	spapr/pci: populate PCI DT in reverse order Since commit `1d2d974244` "spapr_pci: enumerate and add PCI device tree", QEMU populates the PCI device tree in the opposite order compared to SLOF. Before 1d2d974244c6: Populating /pci@800000020000000 00 0000 (D) : 1af4 1000 virtio [ net ] 00 0800 (D) : 1af4 1001 virtio [ block ] 00 1000 (D) : 1af4 1009 virtio [ network ] Populating /pci@800000020000000/unknown-legacy-device@2 7e5294b8 : /pci@800000020000000 7e52b998 : \|-- ethernet@0 7e52c0c8 : \|-- scsi@1 7e52c7e8 : +-- unknown-legacy-device@2 ok Since 1d2d974244c6: Populating /pci@800000020000000 00 1000 (D) : 1af4 1009 virtio [ network ] Populating /pci@800000020000000/unknown-legacy-device@2 00 0800 (D) : 1af4 1001 virtio [ block ] 00 0000 (D) : 1af4 1000 virtio [ net ] 7e5e8118 : /pci@800000020000000 7e5ea6a0 : \|-- unknown-legacy-device@2 7e5eadb8 : \|-- scsi@1 7e5eb4d8 : +-- ethernet@0 ok This behaviour change is not actually a bug since no assumptions should be made on DT ordering. But it has no real justification either, other than being the consequence of the way fdt_add_subnode() inserts new elements to the front of the FDT rather than adding them to the tail. This patch reverts to the historical SLOF ordering by walking PCI devices in reverse order. This reconciles pseries with x86 machine types behavior. It is expected to make things easier when porting existing applications to power. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Tested-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> (slight update to the changelog) Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
David Gibson	e57ca75ce3	target/ppc: Manage external HPT via virtual hypervisor The pseries machine type implements the behaviour of a PAPR compliant hypervisor, without actually executing such a hypervisor on the virtual CPU. To do this we need some hooks in the CPU code to make hypervisor facilities get redirected to the machine instead of emulated internally. For hypercalls this is managed through the cpu->vhyp field, which points to a QOM interface with a method implementing the hypercall. For the hashed page table (HPT) - also a hypervisor resource - we use an older hack. CPUPPCState has an 'external_htab' field which when non-NULL indicates that the HPT is stored in qemu memory, rather than within the guest's address space. For consistency - and to make some future extensions easier - this merges the external HPT mechanism into the vhyp mechanism. Methods are added to vhyp for the basic operations the core hash MMU code needs: map_hptes() and unmap_hptes() for reading the HPT, store_hpte() for updating it and hpt_mask() to retrieve its size. To match this, the pseries machine now sets these vhyp fields in its existing vhyp class, rather than reaching into the cpu object to set the external_htab field. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>	2017-03-01 11:23:39 +11:00
David Gibson	36778660d7	target/ppc: Eliminate htab_base and htab_mask variables CPUPPCState includes fields htab_base and htab_mask which store the base address (GPA) and size (as a mask) of the guest's hashed page table (HPT). These are set when the SDR1 register is updated. Keeping these in sync with the SDR1 is actually a little bit fiddly, and probably not useful for performance, since keeping them expands the size of CPUPPCState. It also makes some upcoming changes harder to implement. This patch removes these fields, in favour of calculating them directly from the SDR1 contents when necessary. This does make a change to the behaviour of attempting to write a bad value (invalid HPT size) to the SDR1 with an mtspr instruction. Previously, the bad value would be stored in SDR1 and could be retrieved with a later mfspr, but the HPT size as used by the softmmu would be, clamped to the allowed values. Now, writing a bad value is treated as a no-op. An error message is printed in both new and old versions. I'm not sure which behaviour, if either, matches real hardware. I don't think it matters that much, since it's pretty clear that if an OS writes a bad value to SDR1, it's not going to boot. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2017-03-01 11:23:39 +11:00
David Gibson	7222b94a83	target/ppc: Cleanup HPTE accessors for 64-bit hash MMU Accesses to the hashed page table (HPT) are complicated by the fact that the HPT could be in one of three places: 1) Within guest memory - when we're emulating a full guest CPU at the hardware level (e.g. powernv, mac99, g3beige) 2) Within qemu, but outside guest memory - when we're emulating user and supervisor instructions within TCG, but instead of emulating the CPU's hypervisor mode, we just emulate a hypervisor's behaviour (pseries in TCG or KVM-PR) 3) Within the host kernel - a pseries machine using KVM-HV acceleration. Mostly accesses to the HPT are handled by KVM, but there are a few cases where qemu needs to access it via a special fd for the purpose. In order to batch accesses to the fd in case (3), we use a somewhat awkward ppc_hash64_start_access() / ppc_hash64_stop_access() pair, which for case (3) reads / releases several HPTEs from the kernel as a batch (usually a whole PTEG). For cases (1) & (2) it just returns an address value. The actual HPTE load helpers then need to interpret the returned token differently in the 3 cases. This patch keeps the same basic structure, but simplfiies the details. First start_access() / stop_access() are renamed to map_hptes() and unmap_hptes() to make their operation more obvious. Second, map_hptes() now always returns a qemu pointer, which can always be used in the same way by the load_hpte() helpers. In case (1) it comes from address_space_map() in case (2) directly from qemu's HPT buffer and in case (3) from a temporary buffer read from the KVM fd. While we're at it, make things a bit more consistent in terms of types and variable names: avoid variables named 'index' (it shadows index(3) which can lead to confusing results), use 'hwaddr ptex' for HPTE indices and uint64_t for each of the HPTE words, use ptex throughout the call stack instead of pte_offset in some places (we still need that at the bottom layer, but nowhere else). Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
David Gibson	b7b0b1f13a	target/ppc: Merge cpu_ppc_set_vhyp() with cpu_ppc_set_papr() cpu_ppc_set_papr() sets up various aspects of CPU state for use with PAPR paravirtualized guests. However, it doesn't set the virtual hypervisor, so callers must also call cpu_ppc_set_vhyp() so that PAPR hypercalls are handled properly. This is a bit silly, so fold setting the virtual hypervisor into cpu_ppc_set_papr(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>	2017-03-01 11:23:39 +11:00
David Gibson	c6404adebf	pseries: Minor cleanups to HPT management hypercalls * Standardize on 'ptex' instead of 'pte_index' for HPTE index variables for consistency and brevity * Avoid variables named 'index'; shadowing index(3) from libc can lead to surprising bugs if the variable is removed, because compiler errors might not appear for remaining references * Clarify index calculations in h_enter() - we have two cases, H_EXACT where the exact HPTE slot is given, and !H_EXACT where we search for an empty slot within the hash bucket. Make the calculation more consistent between the cases. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>	2017-03-01 11:23:39 +11:00
Greg Kurz	6244bb7e58	sysemu: support up to 1024 vCPUs Some systems can already provide more than 255 hardware threads. Bumping the QEMU limit to 1024 seems reasonable: - it has no visible overhead in top; - the limit itself has no effect on hot paths. Cc: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Laurent Vivier	2530a1a5cf	spapr: generate DT node names When DT node names for PCI devices are generated by SLOF, they are generated according to the type of the device (for instance, ethernet for virtio-net-pci device). Node name for hotplugged devices is generated by QEMU. This patch adds the mechanic to QEMU to create the node name according to the device type too. The data structure has been roughly copied from OpenBIOS/OpenHackware, node names from SLOF. Example: Hotplugging some PCI cards with QEMU monitor: device_add virtio-tablet-pci device_add virtio-serial-pci device_add virtio-mouse-pci device_add virtio-scsi-pci device_add virtio-gpu-pci device_add ne2k_pci device_add nec-usb-xhci device_add intel-hda What we can see in linux device tree: for dir in /proc/device-tree/pci@800000020000000/@/; do echo $dir cat $dir/name echo done WITHOUT this patch: /proc/device-tree/pci@800000020000000/pci@0/ pci /proc/device-tree/pci@800000020000000/pci@1/ pci /proc/device-tree/pci@800000020000000/pci@2/ pci /proc/device-tree/pci@800000020000000/pci@3/ pci /proc/device-tree/pci@800000020000000/pci@4/ pci /proc/device-tree/pci@800000020000000/pci@5/ pci /proc/device-tree/pci@800000020000000/pci@6/ pci /proc/device-tree/pci@800000020000000/pci@7/ pci WITH this patch: /proc/device-tree/pci@800000020000000/communication-controller@1/ communication-controller /proc/device-tree/pci@800000020000000/display@4/ display /proc/device-tree/pci@800000020000000/ethernet@5/ ethernet /proc/device-tree/pci@800000020000000/input-controller@0/ input-controller /proc/device-tree/pci@800000020000000/mouse@2/ mouse /proc/device-tree/pci@800000020000000/multimedia-device@7/ multimedia-device /proc/device-tree/pci@800000020000000/scsi@3/ scsi /proc/device-tree/pci@800000020000000/usb-xhci@6/ usb-xhci Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-03-01 11:23:39 +11:00
Peter Maydell	28f997a82c	This is the MTTCG pull-request as posted yesterday. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJYsBZfAAoJEPvQ2wlanipElJ0H+QGoStSPeHrvKu7Q07v4F9zM Pvf05gRsaxvXl7UbwmXC4oKhvZf9rVJ6ITk0x/y0WvmK0mHCmNBWkC0nn5UFL5IH cdxetLz21Q+Ghpc36tZvqn2HYwRQFoEznge2LdtBDG0TyVA4jwquHU3HCG2D51zi BaImI6lYW1e4ejjZHw8cEInSxsj/HJZE4pPas2Tkci+uAnrJroErwBVRRcE/y/Tn aupl9TJFs2JdyJFNDibIm0kjB+i+jvCiLgYjbKZ/dR/+GZt73TtiBk/q9ZOFjdmT 7YFPI3F46QbGHoZahtzh0Xt7WMj94SlQgQ9OJ3zmNMfpXrze6Yc78xo/nbQ33U0= =wR0/ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stsquad/tags/pull-mttcg-240217-1' into staging This is the MTTCG pull-request as posted yesterday. # gpg: Signature made Fri 24 Feb 2017 11:17:51 GMT # gpg: using RSA key 0xFBD0DB095A9E2A44 # gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" # Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44 * remotes/stsquad/tags/pull-mttcg-240217-1: (24 commits) tcg: enable MTTCG by default for ARM on x86 hosts hw/misc/imx6_src: defer clearing of SRC_SCR reset bits target-arm: ensure all cross vCPUs TLB flushes complete target-arm: don't generate WFE/YIELD calls for MTTCG target-arm/powerctl: defer cpu reset work to CPU context cputlb: introduce tlb_flush__all_cpus[_synced] cputlb: atomically update tlb fields used by tlb_reset_dirty cputlb: add tlb_flush_by_mmuidx async routines cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap cputlb: introduce tlb_flush_ async work. cputlb: tweak qemu_ram_addr_from_host_nofail reporting cputlb: add assert_cpu_is_self checks tcg: handle EXCP_ATOMIC exception for system emulation tcg: enable thread-per-vCPU tcg: enable tb_lock() for SoftMMU tcg: remove global exit_request tcg: drop global lock during TCG code execution tcg: rename tcg_current_cpu to tcg_current_rr_cpu tcg: add kick timer for single-threaded vCPU emulation tcg: add options for enabling MTTCG ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-02-25 18:43:52 +00:00
Jan Kiszka	8d04fb55de	tcg: drop global lock during TCG code execution This finally allows TCG to benefit from the iothread introduction: Drop the global mutex while running pure TCG CPU code. Reacquire the lock when entering MMIO or PIO emulation, or when leaving the TCG loop. We have to revert a few optimization for the current TCG threading model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not kicking it in qemu_cpu_kick. We also need to disable RAM block reordering until we have a more efficient locking mechanism at hand. Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here. These numbers demonstrate where we gain something: 20338 jan 20 0 331m 75m 6904 R 99 0.9 0:50.95 qemu-system-arm 20337 jan 20 0 331m 75m 6904 S 20 0.9 0:26.50 qemu-system-arm The guest CPU was fully loaded, but the iothread could still run mostly independent on a second core. Without the patch we don't get beyond 32206 jan 20 0 330m 73m 7036 R 82 0.9 1:06.00 qemu-system-arm 32204 jan 20 0 330m 73m 7036 S 21 0.9 0:17.03 qemu-system-arm We don't benefit significantly, though, when the guest is not fully loading a host CPU. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com> [FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex] Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> [EGC: fixed iothread lock for cpu-exec IRQ handling] Signed-off-by: Emilio G. Cota <cota@braap.org> [AJB: -smp single-threaded fix, clean commit msg, BQL fixes] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com> [PM: target-arm changes] Acked-by: Peter Maydell <peter.maydell@linaro.org>	2017-02-24 10:32:45 +00:00
Peter Maydell	fb6971c110	hw/ppc/ppc405_uc.c: Avoid integer overflows When performing clock calculations, the ppc405_uc code has several places where it multiplies together two 32-bit variables and assigns the result to a 64-bit variable. This doesn't quite do what is intended because C will compute a 32-bit multiply result. Add casts to ensure we don't truncate the result. (Spotted by Coverity, CID 1005504, 1005505.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 14:28:53 +11:00
Thomas Huth	df58713396	hw/ppc/spapr: Check for valid page size when hot plugging memory On POWER, the valid page sizes that the guest can use are bound to the CPU and not to the memory region. QEMU already has some fancy logic to find out the right maximum memory size to tell it to the guest during boot (see getrampagesize() in the file target/ppc/kvm.c for more information). However, once we're booted and the guest is using huge pages already, it is currently still possible to hot-plug memory regions that does not support huge pages - which of course does not work on POWER, since the guest thinks that it is possible to use huge pages everywhere. The KVM_RUN ioctl will then abort with -EFAULT, QEMU spills out a not very helpful error message together with a register dump and the user is annoyed that the VM unexpectedly died. To avoid this situation, we should check the page size of hot-plugged DIMMs to see whether it is possible to use it in the current VM. If it does not fit, we can print out a better error message and refuse to add it, so that the VM does not die unexpectely and the user has a second chance to plug a DIMM with a matching memory backend instead. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1419466 Signed-off-by: Thomas Huth <thuth@redhat.com> [dwg: Fix a build error on 32-bit builds with KVM] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 14:28:53 +11:00
Igor Mammedov	c5514d0e4b	machine: replace query_hotpluggable_cpus() callback with has_hotpluggable_cpus flag Generic helper machine_query_hotpluggable_cpus() replaced target specific query_hotpluggable_cpus() callbacks so there is no need in it anymore. However inon NULL callback value is used to detect/report hotpluggable cpus support, therefore it can be removed completely. Replace it with MachineClass.has_hotpluggable_cpus boolean which is sufficient for the task. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 11:28:28 +11:00
Igor Mammedov	f2d672c248	machine: unify [pc_\|spapr_]query_hotpluggable_cpus() callbacks All callbacks FOO_query_hotpluggable_cpus() are practically the same except of setting vcpus_count to different values. Convert them to a generic machine_query_hotpluggable_cpus() callback by moving vcpus_count initialization to per machine specific callback possible_cpu_arch_ids(). Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 11:28:28 +11:00
Igor Mammedov	535455fdee	spapr: reuse machine->possible_cpus instead of cores[] Replace SPAPR specific cores[] array with generic machine->possible_cpus and store core objects there. It makes cores bookkeeping similar to x86 cpus and will allow to unify similar code. It would allow to replace cpu_index based NUMA node mapping with iproperty based one (for -device created cores) since possible_cpus carries board defined topology/layout. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 11:28:28 +11:00
Laurent Vivier	5b929608b9	spapr: replace debug printf with trace points Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 11:28:28 +11:00
Laurent Vivier	f4af7d4438	ppc4xx: replace debug printf with trace points Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 11:28:28 +11:00
Laurent Vivier	5283c27fc5	mac99: replace debug printf with trace points Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 11:28:28 +11:00
Sam Bobroff	fe93e3e6ec	spapr: fix off-by-one error in spapr_ovec_populate_dt() The last byte of the option vector was missing due to an off-by-one error. Without this fix, client architecture support negotiation will fail because the last byte of option vector 5, which contains the MMU support, will be missed. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 11:28:27 +11:00
Thomas Huth	802fc7abd0	hw/ppc/pnv: Remove superfluous "qemu" prefix from error strings error_report() already puts a prefix with the program name in front of the error strings, so the "qemu:" prefix is not necessary here anymore. Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 11:28:27 +11:00
Igor Mammedov	115debf26c	spapr: make cpu core unplug follow expected hotunplug call flow spapr_core_unplug() were essentially spapr_core_unplug_request() handler that requested CPU removal and registered callback which did actual cpu core removali but it was called from spapr_machine_device_unplug() which is intended for actual object removal. Commit (`cf632463` spapr: Memory hot-unplug support) sort of fixed it introducing spapr_machine_device_unplug_request() and calling spapr_core_unplug() but it hasn't renamed callback and by mistake calls it from spapr_machine_device_unplug(). However spapr_machine_device_unplug() isn't ever called for cpu core since spapr_core_release() doesn't follow expected hotunplug call flow which is: 1: device_del() -> hotplug_handler_unplug_request() -> set destroy_cb() 2: destroy_cb() -> hotplug_handler_unplug() -> object_unparent // actual device removal Fix it by renaming spapr_core_unplug() to spapr_core_unplug_request() which is called from spapr_machine_device_unplug_request() and making spapr_core_release() call hotplug_handler_unplug() which will call spapr_machine_device_unplug() -> spapr_core_unplug() to remove cpu core. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reveiwed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 11:28:27 +11:00
Igor Mammedov	ff9006ddbf	spapr: move spapr_core_[foo]plug() callbacks close to machine code in spapr.c spapr_core_pre_plug/spapr_core_plug/spapr_core_unplug() are managing wiring CPU core into spapr machine state and not internal CPU core state. So move them from spapr_cpu_core.c to spapr.c where other similar (spapr_memory_[foo]plug()) callbacks are located, which also matches x86 target practice. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 11:28:27 +11:00
Igor Mammedov	f844616bf6	spapr: cpu core: separate child threads destruction from machine state operations Split off destroying VCPU threads from drc callback spapr_core_release() into new spapr_cpu_core_unrealizefn() which takes care of internal cpu core state cleanup (i.e. VCPU threads) and is called when object_unparent(core) is called. That leaves spapr_core_release() only with board mgmt code, which will be moved to board related file in follow up patch along with the rest on hotplug callbacks. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-22 11:28:27 +11:00
Markus Armbruster	2059839baa	hw: Default -drive to if=ide explicitly where it works Block backends defined with -drive if=ide are meant to be picked up by machine initialization code: a suitable frontend gets created and wired up automatically. if=ide drives not picked up that way can still be used with -device as if they had if=none, but that's unclean and best avoided. Unused ones produce an "Orphaned drive without device" warning. -drive parameter "if" is optional, and the default depends on the machine type. If a machine type doesn't specify a default, the default is "ide". Many machine types default to if=ide, even though they don't actually have an IDE controller. A future patch will change these defaults to something more sensible. To prepare for it, this patch makes default "ide" explicit for the machines that actually pick up if=ide drives: * alpha: clipper * arm/aarch64: spitz borzoi terrier tosa * i386/x86_64: generic-pc-machine (with concrete subtypes pc-q35-* pc-i440fx-* pc-* isapc xenfv) * mips64el: fulong2e * mips/mipsel/mips64el: malta mips * ppc/ppc64: mac99 g3beige prep * sh4/sh4eb: r2d * sparc64: sun4u sun4v Note that ppc64 machine powernv already sets an "ide" default explicitly. Its IDE controller isn't implemented, yet. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <1487153147-11530-2-git-send-email-armbru@redhat.com>	2017-02-21 13:10:53 +01:00
Anton Nefedov	c86f106b85	report guest crash information in GUEST_PANICKED event it's not very convenient to use the crash-information property interface, so provide a CPU class callback to get the guest crash information, and pass that information in the event Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Message-Id: <1487053524-18674-3-git-send-email-den@openvz.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-02-16 15:30:49 +01:00
Thomas Huth	7c6e879733	hw/ppc/pnv: Use error_report instead of hw_error if a ROM file can't be found hw_error() is for CPU related errors only (it dumps the CPU registers and calls abort()!), so using error_report() is the better choice of reporting an error in case we simply did not find a file. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-02 09:30:07 +11:00
Valentin Plotkin	00469dc373	target-ppc: Add MMU model check for booke machines Machines bamboo, e500 and virtex-ml507 assume a certain MMU model, otherwise resulting in unpredictable behavior. Add apropriate checks into *_init functions. Signed-off-by: Valentin Plotkin <caliborn@sdf.org> [regarding virtex parts] Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-02-02 09:30:06 +11:00
Michael S. Tsirkin	25e6a11832	ppc: switch to constants within BUILD_BUG_ON We are switching BUILD_BUG_ON to verify that it's parameter is a compile-time constant, and it turns out that some gcc versions (specifically gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609) are not smart enough to figure it out for expressions involving local variables. This is harmless but means that the check is ineffective for these platforms. To fix, replace the variable with macros. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> [dwg: Correct a printf format warning] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 14:04:06 +11:00
Laurent Vivier	42043e4f12	spapr: clock should count only if vm is running This is a port to ppc of the i386 commit: `00f4d64` kvmclock: clock should count only if vm is running We remove timebase_post_load function, and use the VM state change handler to save and restore the guest_timebase (on stop and continue). We keep timebase_pre_save to reduce the clock difference on migration like in: `6053a86` kvmclock: reduce kvmclock difference on migration Time base offset has originally been introduced by commit `98a8b52` spapr: Add support for time base offset migration So while VM is paused, the time is stopped. This allows to have the same result with date (based on Time Base Register) and hwclock (based on "get-time-of-day" RTAS call). Moreover in TCG mode, the Time Base is always paused, so this patch also adjust the behavior between TCG and KVM. VM state field "time_of_the_day_ns" is now useless but we keep it to be able to migrate to older version of the machine. As vmstate_ppc_timebase structure (with timebase_pre_save() and timebase_post_load() functions) was only used by vmstate_spapr, we register the VM state change handler only in ppc_spapr_init(). Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:14 +11:00
Thomas Huth	d9d6e78ea8	ppc: Remove unused function cpu_ppc601_rtc_init() It is completely unused, thus it can be removed without problems. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:14 +11:00
Roman Kapl	0dfe952dc5	ppc: Prevent inifnite loop in decrementer auto-reload. If the DECAR register is set to 0, QEMU tries to reload the decrementer with zero in an inifinite loop. According to PPC documentation, the decrementer is triggered on 1->0 transition, so avoid reloading the decrementer if if is already zero. The problem does not manifest under Linux, but it is valid to set DECAR to zero (and may make sense as part of decrementer initialization when interrupts are disabled). Signed-off-by: Roman Kapl <rka@sysgo.com> [dwg: Fixed style nit] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:14 +11:00
David Gibson	f6f242c757	ppc: Add ppc_set_compat_all() Once a compatiblity mode is negotiated with the guest, h_client_architecture_support() uses run_on_cpu() to update each CPU to the new mode. We're going to want this logic somewhere else shortly, so make a helper function to do this global update. We put it in target-ppc/compat.c - it makes as much sense at the CPU level as it does at the machine level. We also move the cpu_synchronize_state() into ppc_set_compat(), since it doesn't really make any sense to call that without synchronizing state. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:14 +11:00
David Gibson	152ef803ce	pseries: Rewrite CAS PVR compatibility logic During boot, PAPR guests negotiate CPU model support with the ibm,client-architecture-support mechanism. The logic to implement this in qemu is very convoluted. This cleans it up to be cleaner, using the new ppc_check_compat() call. The new logic for choosing a compatibility mode is: 1. Usually, use the most recent compatibility mode that is a) supported by the guest b) supported by the CPU and c) no later than the maximum allowed (if specified) 2. If no suitable compatibility mode was found, the guest does support this CPU explicitly, and no maximum compatibility mode is specified, then use "raw" mode for the current CPU 3. Otherwise, fail the boot. This differs from the results of the old code: the old code preferred using "raw" mode to a compatibility mode, whereas the new code prefers a compatibility mode if available. Using compatibility mode preferentially means that we're more likely to be able to migrate the guest to a similar but not identical host. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:14 +11:00
Hervé Poussineau	34b9b5575b	prep: add IBM RS/6000 7020 (40p) machine emulation Machine supports both Open Hack'Ware and OpenBIOS. Open Hack'Ware is the default because OpenBIOS is currently unable to boot PReP boot partitions or PReP kernels. Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> [dwg: Correct compile failure with KVM located by Thomas Huth] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:13 +11:00
Hervé Poussineau	79623312c6	prep: add IBM RS/6000 7020 (40p) memory controller Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Added CONFIG_RS6000_MC to ppc64 or it breaks testcases] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:13 +11:00
Hervé Poussineau	d2f8415226	prep: add PReP System I/O This device is a partial duplicate of System I/O device available in hw/ppc/prep.c This new one doesn't have all the Motorola-specific registers. The old one should be deprecated and removed with the 'prep' machine. Partial documentation available at ftp://ftp.software.ibm.com/rs6000/technology/spec/srp1_1.exe section 6.1.5 (I/O Device Mapping) Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:13 +11:00
xiaoqiang zhao	0f358a0710	hw/ppc: QOM'ify spapr_vio.c Drop the old and empty SysBus init Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:13 +11:00
xiaoqiang zhao	09a7eb978f	hw/ppc: QOM'ify ppce500_spin.c Drop the old SysBus init function and use instance_init Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:13 +11:00
xiaoqiang zhao	d0c2b0d089	hw/ppc: QOM'ify e500.c Drop the old SysBus init function and use instance_init Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:13 +11:00
David Gibson	12dbeb16d0	ppc: Rewrite ppc_get_compat_smt_threads() To continue consolidation of compatibility mode information, this rewrites the ppc_get_compat_smt_threads() function using the table of compatiblity modes in target-ppc/compat.c. It's not a direct replacement, the new ppc_compat_max_threads() function has simpler semantics - it just returns the number of threads the cpu model has, taking into account any compatiblity mode it is in. This no longer takes into account kvmppc_smt_threads() as the previous version did. That check wasn't useful because we check in ppc_cpu_realizefn() that CPUs aren't instantiated with more threads than kvm allows (or if we didn't things will already be broken and this won't make it any worse). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2017-01-31 10:10:13 +11:00
David Gibson	fa325e6cbf	pseries: Add pseries-2.9 machine type Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-01-31 10:10:13 +11:00
Hervé Poussineau	5904bca84e	prep: do not use global variable to access nvram Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:13 +11:00
Thomas Huth	b99260ebbb	hw/ppc/spapr: Fix boot path of usb-host storage devices When passing through an USB storage device to a pseries guest, it is currently not possible to automatically boot from the device if the "bootindex" property has been specified, too (e.g. when using "-device nec-usb-xhci -device usb-host,hostbus=1,hostaddr=2,bootindex=0" at the command line). The problem is that QEMU builds a device tree path like "/pci@800000020000000/usb@0/usb-host@1" and passes it to SLOF in the /chosen/qemu,boot-list property. SLOF, however, probes the USB device, recognizes that it is a storage device and thus changes its name to "storage", and additionally adds a child node for the SCSI LUN, so the correct boot path in SLOF is something like "/pci@800000020000000/usb@0/storage@1/disk@101000000000000" instead. So when we detect an USB mass storage device with SCSI interface, we've got to adjust the firmware boot-device path properly that SLOF can automatically boot from the device. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1354177 Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:13 +11:00
Nicholas Piggin	1c7ad77e56	ppc/spapr: implement H_SIGNAL_SYS_RESET The H_SIGNAL_SYS_RESET hcall allows a guest CPU to raise a system reset exception on CPUs within the same guest -- all CPUs, all-but-self, or a specific CPU (including self). This has not made its way to a PAPR release yet, but we have an hcall number assigned. H_SIGNAL_SYS_RESET = 0x380 Syntax: hcall(uint64 H_SIGNAL_SYS_RESET, int64 target); Generate a system reset NMI on the threads indicated by target. Values for target: -1 = target all online threads including the caller -2 = target all online threads except for the caller All other negative values: reserved Positive values: The thread to be targeted, obtained from the value of the "ibm,ppc-interrupt-server#s" property of the CPU in the OF device tree. Semantics: - Invalid target: return H_Parameter. - Otherwise: Generate a system reset NMI on target thread(s), return H_Success. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-01-31 10:10:13 +11:00
David Gibson	d6e166c082	ppc: Rename cpu_version to compat_pvr The 'cpu_version' field in PowerPCCPU is badly named. It's named after the 'cpu-version' device tree property where it is advertised, but that meaning may not be obvious in most places it appears. Worse, it doesn't even really correspond to that device tree property. The property contains either the processor's PVR, or, if the CPU is running in a compatibility mode, a special "logical PVR" representing which mode. Rename the cpu_version field, and a number of related variables to compat_pvr to make this clearer. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Thomas Huth <thuth@redhat.com>	2017-01-31 10:10:13 +11:00
David Gibson	1d1be34d26	ppc: Clean up and QOMify hypercall emulation The pseries machine type is a bit unusual in that it runs a paravirtualized guest. The guest expects to interact with a hypervisor, and qemu emulates the functions of that hypervisor directly, rather than executing hypervisor code within the emulated system. To implement this in TCG, we need to intercept hypercall instructions and direct them to the machine's hypercall handlers, rather than attempting to perform a privilege change within TCG. This is controlled by a global hook - cpu_ppc_hypercall. This cleanup makes the handling a little cleaner and more extensible than a single global variable. Instead, each CPU to have hypercalls intercepted has a pointer set to a QOM object implementing a new virtual hypervisor interface. A method in that interface is called by TCG when it sees a hypercall instruction. It's possible we may want to add other methods in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2017-01-31 10:10:13 +11:00
David Gibson	5b120785e7	pseries: Make cpu_update during CAS unconditional spapr_h_cas_compose_response() includes a cpu_update parameter which controls whether it includes updated information on the CPUs in the device tree fragment returned from the ibm,client-architecture-support (CAS) call. Providing the updated information is essential when CAS has negotiated compatibility options which require different cpu information to be presented to the guest. However, it should be safe to provide in other cases (it will just override the existing data in the device tree with identical data). This simplifies the code by removing the parameter and always providing the cpu update information. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2017-01-31 10:10:13 +11:00
David Gibson	0c86d0fd92	pseries: Always use core objects for CPU construction Currently the pseries machine has two paths for constructing CPUs. On newer machine type versions, which support cpu hotplug, it constructs cpu core objects, which in turn construct CPU threads. For older machine versions it individually constructs the CPU threads. This division is going to make some future changes to the cpu construction harder, so this patch unifies them. Now cpu core objects are always created. This requires some updates to allow core objects to be created without a full complement of threads (since older versions allowed a number of cpus not a multiple of the threads-per-core). Likewise it needs some changes to the cpu core hot/cold plug path so as not to choke on the old machine types without hotplug support. For good measure, we move the cpu construction to its own subfunction, spapr_init_cpus(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2017-01-31 10:10:13 +11:00
Stefan Weil	b12227afb1	hw: Fix typos found by codespell Signed-off-by: Stefan Weil <sw@weilnetz.de> Acked-by: Alistair Francis <alistair.francis@xilinx.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-01-24 23:26:52 +03:00
Vincent Palatin	b39466269b	kvm: move cpu synchronization code Move the generic cpu_synchronize_ functions to the common hw_accel.h header, in order to prepare for the addition of a second hardware accelerator. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Vincent Palatin <vpalatin@chromium.org> Message-Id: <f5c3cffe8d520011df1c2e5437bb814989b48332.1484045952.git.vpalatin@chromium.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-19 22:07:46 +01:00
Thomas Huth	fcf5ef2ab5	Move target-* CPU file into a target/ folder We've currently got 18 architectures in QEMU, and thus 18 target-xxx folders in the root folder of the QEMU source tree. More architectures (e.g. RISC-V, AVR) are likely to be included soon, too, so the main folder of the QEMU sources slowly gets quite overcrowded with the target-xxx folders. To disburden the main folder a little bit, let's move the target-xxx folders into a dedicated target/ folder, so that target-xxx/ simply becomes target/xxx/ instead. Acked-by: Laurent Vivier <laurent@vivier.eu> [m68k part] Acked-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> [tricore part] Acked-by: Michael Walle <michael@walle.cc> [lm32 part] Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x part] Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [s390x part] Acked-by: Eduardo Habkost <ehabkost@redhat.com> [i386 part] Acked-by: Artyom Tarasenko <atar4qemu@gmail.com> [sparc part] Acked-by: Richard Henderson <rth@twiddle.net> [alpha part] Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa part] Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [ppc part] Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> [cris&microblaze part] Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> [unicore32 part] Signed-off-by: Thomas Huth <thuth@redhat.com>	2016-12-20 21:52:12 +01:00
Michael Roth	5c0139a8c2	spapr: fix default DRC state for coldplugged LMBs Currently we set the initial isolation/allocation state for DRCs associated with coldplugged LMBs to ISOLATED/UNUSABLE, respectively, under the assumption that the guest will move this state to UNISOLATED/USABLE. In fact, this is only the case for LMBs added via hotplug. For coldplugged LMBs, the guest actually assumes the initial state to be UNISOLATED/USABLE. In practice, this only becomes an issue when we attempt to unplug one of these LMBs, where the guest kernel will issue an rtas-get-sensor-state call to check that the corresponding DRC is in an USABLE state before it will release the LMB back to QEMU. If the returned state is otherwise, the guest will assume no further action is needed, which bypasses the QEMU-side cleanup that occurs during the USABLE->UNUSABLE transition. This results in LMBs and their corresponding pc-dimm devices to stick around indefinitely. This patch fixes the issue by manually setting DRCs associated with cold-plugged LMBs to UNISOLATED/ALLOCATED, but leaving the hotplug state untouched. As it turns out, this is analogous to the handling for cold-plugged CPUs in spapr_core_plug(). Cc: qemu-ppc@nongnu.org Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-12-01 13:41:00 +11:00
David Gibson	5c4537bded	spapr: Fix 2.7<->2.8 migration of PCI host bridge `daa2369` "spapr_pci: Add a 64-bit MMIO window" subtly broke migration from qemu-2.7 to the current version. It split the device's MMIO window into two pieces for 32-bit and 64-bit MMIO. The patch included backwards compatibility code to convert the old property into the new format. However, the property value was also transferred in the migration stream and compared with a (probably unwise) VMSTATE_EQUAL. So, the "raw" value from 2.7 is compared to the new style converted value from (pre-)2.8 giving a mismatch and migration failure. Along with the actual field that caused the breakage, there are several other ill-advised VMSTATE_EQUAL()s. To fix forwards migration, we read the values in the stream into scratch variables and ignore them, instead of comparing for equality. To fix backwards migration, we populate those scratch variables in pre_save() with adjusted values to match the old behaviour. To permit the eventual possibility of removing this cruft from the stream, we only include these compatibility fields if a new 'pre-2.8-migration' property is set. We clear it on the pseries-2.8 machine type, which obviously can't be migrated backwards, but set it on earlier machine type versions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2016-11-23 12:00:48 +11:00
David Gibson	5a78b821eb	Revert "spapr: Fix migration of PCI host bridges from qemu-2.7" This reverts commit `9b54ca0ba7`. The commit above corrected a migration breakage between qemu-2.7 and qemu-2.8. However it did so by advancing the migration version for the PCI host bridge, which obviously breaks migration backwards to earlier qemu versions. Although it's not totally essential, we'd like to maintain the possibility for backwards migration, so revert the change in preparation for a better fix. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2016-11-23 12:00:48 +11:00
David Gibson	146c11f16f	target-ppc: Allow eventual removal of old migration mistakes Until very recently, the vmstate for ppc cpus included some poorly thought out VMSTATE_EQUAL() components, that can easily break migration compatibility, and did so between qemu-2.6 and later versions. A hack was recently added which fixes this migration breakage, but it leaves the unhelpful cruft of these fields in the migration stream. This patch adds a new cpu property allowing these fields to be removed from the stream entirely. For the pseries-2.8 machine type - which comes after the fix - and for all non-pseries machine types - which aren't mature enough to care about cross-version migration - we remove the fields from the stream. For pseries-2.7 and earlier, The migration hack remains in place, allowing backwards and forwards migration with the older machine types. This restricts the migration compatibility cruft to older machine types, and at least opens the possibility of eventually deprecating and removing it entirely. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2016-11-23 12:00:48 +11:00
Michael Roth	62ef3760d4	spapr: migration support for CAS-negotiated option vectors With the additional of the OV5_HP_EVT option vector, we now have certain functionality (namely, memory unplug) that checks at run-time for whether or not the guest negotiated the option via CAS. Because we don't currently migrate these negotiated values, we are unable to unplug memory from a guest after it's been migrated until after the guest is rebooted and CAS-negotiation is repeated. This patch fixes this by adding CAS-negotiated options to the migration stream. We do this using a subsection, since the negotiated value of OV5_HP_EVT is the only option currently needed to maintain proper functionality for a running guest. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-11-23 12:00:48 +11:00
Igor Mammedov	5836d16812	fw_cfg: move FW_CFG_NB_CPUS out of fw_cfg_init1() PC will use this field in other way, so move it outside the common code so PC could set a different value, i.e. all CPUs regardless of where they are coming from (-smp X \| -device cpu...). It's quick and dirty hack as it could be implemented in more generic way in MashineClass. But do it in simple way since only PC is affected so far. Later we can generalize it when another affected target gets support for -device cpu. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1479212236-183810-3-git-send-email-imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2016-11-16 12:09:58 -02:00
David Gibson	27d9ffd4b3	ppc/pnv: Fix fatal bug on 32-bit hosts If the pnv machine type is compiled on a 32-bit host, the unsigned long (host) type is 32-bit. This means that the hweight_long() used to calculate the number of allowed cores only considers the low 32 bits of the cores_mask variable, and can thus return 0 in some circumstances. This corrects the bug. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Suggested-by: Richard Henderson <rth@twiddle.net> [clg: replaced hweight_long() by ctpop64() ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-11-15 10:08:43 +11:00
Cédric Le Goater	f81e551229	ppc/pnv: fix xscom address translation for POWER9 High addresses can overflow the uint32_t pcba variable after the 8byte shift. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-11-15 10:08:43 +11:00
Cédric Le Goater	ad521238b4	ppc/pnv: add a 'xscom_core_base' field to PnvChipClass The XSCOM addresses for the core registers are encoded in a slightly different way on POWER8 and POWER9. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-11-15 10:08:43 +11:00
David Gibson	9b54ca0ba7	spapr: Fix migration of PCI host bridges from qemu-2.7 `daa2369` "spapr_pci: Add a 64-bit MMIO window" subtly broke migration from qemu-2.7 to the current version. It split the device's MMIO window into two pieces for 32-bit and 64-bit MMIO. The patch included backwards compatibility code to convert the old property into the new format. However, the property value was also transferred in the migration stream and compared with a (probably unwise) VMSTATE_EQUAL. So, the "raw" value from 2.7 is compared to the new style converted value from (pre-)2.8 giving a mismatch and migration failure. Although it would be technically possible to fix this in a way allowing backwards migration, that would leave an ugly legacy around indefinitely. This patch takes the simpler approach of bumping the migration version, dropping the unwise VMSTATE_EQUAL (and some equally unwise ones around it) and ignoring them on an incoming migration. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2016-11-15 10:08:42 +11:00
Cédric Le Goater	ec575aa0ae	ppc/pnv: fix compile breakage on old gcc PnvChip is defined twice and this can confuse old compilers : CC ppc64-softmmu/hw/ppc/pnv_xscom.o In file included from qemu.git/hw/ppc/pnv.c:29: qemu.git/include/hw/ppc/pnv.h:60: error: redefinition of typedef ‘PnvChip’ qemu.git/include/hw/ppc/pnv_xscom.h:24: note: previous declaration of ‘PnvChip’ was here make[1]: * [hw/ppc/pnv.o] Error 1 make[1]: * Waiting for unfinished jobs.... Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-11-15 10:05:51 +11:00
David Gibson	8bd9530e13	powernv: CPU compatibility modes don't make sense for powernv powernv has some code (derived from the spapr equivalent) used in device tree generation which depends on the CPU's compatibility mode / logical PVR. However, compatibility modes don't make sense on powernv - at least not as a property controlled by the host - because the guest in powernv has full hypervisor level access to the virtual system, and so owns the PCR (Processor Compatibility Register) which implements compatiblity modes. Note: the new logic doesn't take into account kvmppc_smt_threads() like the old version did. However, if core->nr_threads exceeds kvmppc_smt_threads() then things will already be broken and clamping the value in the device tree isn't going to save us. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Thomas Huth <thuth@redhat.com>	2016-11-15 10:05:51 +11:00
Peter Maydell	6bc56d317f	Base patches for MTTCG enablement. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQExBAABCAAbBQJYF07FFBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D ppoIAI4AxWocso5WIUH6uEHjOAxw9ZNhZ92nF8VtcbvGtN/eh8Qk4jfRX+W/Jl0q D13Rm3m8ynNHqh8YFs+O6i/WSgxHGxKwb75mNr36HDnYnMFluTvRQkvYJUXRyRuL CVtNgy8+q8FbbWo+NiJ5I7gfk2Si4BQfZN0uCLqGuCwqvvA/spN13xUcpeBXEKhL TeDGZBT/atDnT2bRcve8E8g5/0RKjTL9EB0jwfJjHocT5bs+toPe6js9VnZDRNWN ZldcONgEHj3zAj9j7hTkVWFTGPSCx/tt6y6JeORq1oxk0mCCswEk0U9A3hLzLjc/ 94XHsLaEoZ7HNAKtkLc07NYhkQM= =+6Sj -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream-mttcg' into staging Base patches for MTTCG enablement. # gpg: Signature made Mon 31 Oct 2016 14:01:41 GMT # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream-mttcg: tcg: move locking for tb_invalidate_phys_page_range up *_run_on_cpu: introduce run_on_cpu_data type cpus: re-factor out handle_icount_deadline tcg: cpus rm tcg_exec_all() tcg: move tcg_exec_all and helpers above thread fn target-arm/arm-powerctl: wake up sleeping CPUs tcg: protect translation related stuff with tb_lock. translate-all: Add assert_(memory\|tb)_lock annotations linux-user/elfload: ensure mmap_lock() held while setting up tcg: comment on which functions have to be called with tb_lock held cpu-exec: include cpu_index in CPU_LOG_EXEC messages translate-all: add DEBUG_LOCKING asserts translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH cpus: make all_vcpus_paused() return bool Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-10-31 15:29:12 +00:00
Paolo Bonzini	14e6fe12a7	_run_on_cpu: introduce run_on_cpu_data type This changes the _run_on_cpu APIs (and helpers) to pass data in a run_on_cpu_data type instead of a plain void *. This is because we sometimes want to pass a target address (target_ulong) and this fails on 32 bit hosts emulating 64 bit guests. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20161027151030.20863-24-alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-10-31 15:00:25 +01:00
Peter Maydell	277d44f5a6	trivial patches for 2016-10-28 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABCAAGBQJYE2wfAAoJEHAbT2saaT5ZGYUH/3QWJ4OFWbqGo1YYN5AIAheF v1bQGTh1HGbLk46ajhUvzB0bMHb1FC1KoOruU2wFYuKK/J5zQ+4X9EmaC/fD7hyx nGTcPWAyxKOlqOq3In9ro+xWQNzEhfoypKCQQVC4Y3quzub48wAro8fuFSNXLyBq ERvAsjgj0TrLEHoWtJl2bPYiqSd6KAHZAKPFW3Jw8MmsBcTLmnF2PVW3LBfdcHe7 6vlhqX7lPzVlHRaUsaxRkFxYd2YGisbe3bPRDw2fTxrtOYyEkopQq7xi2Q6Yq5N0 z0yM2oJ7o1QtUOXYa7KBf03WZ7e119HimaUkGLg+0LVhQNbeG3hd3gNwApXa5og= =tYml -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mjt/tags/trivial-patches-fetch' into staging trivial patches for 2016-10-28 # gpg: Signature made Fri 28 Oct 2016 16:17:51 BST # gpg: using RSA key 0x701B4F6B1A693E59 # gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" # gpg: aka "Michael Tokarev <mjt@corpit.ru>" # gpg: aka "Michael Tokarev <mjt@debian.org>" # Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5 # Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931 4B22 701B 4F6B 1A69 3E59 * remotes/mjt/tags/trivial-patches-fetch: (23 commits) Fix build for less common build directories names clean-up: removed duplicate #includes scripts/clean-includes: added duplicate #include check monitor: deprecate 'default' option qemu-ga: Remove stray 'q' in documentation Makefile: Fix help text for target 'installer' s390: avoid always-true comparison in s390_pci_generate_fid() migration: Remove unneeded NULL check from migrate_fd_error() scripts/hxtool: fix undefined behavour of echo qemu-options.hx: set: fix copy-paste error usb: Change *_exitfn return type from int to void MAINTAINERS: qemu-trivial information colo-compare: remove unused struct CompareChardevProps and 'props' variable milkymist-pfpu: fix potential integer overflow hw/block/nvme: Simplify if-statements a little bit target-lm32: rewrite gen_compare() lm32: milkymist-tmu2: fix integer overflow target-lm32: disable asm logging via LOG_DIS() target-lm32: swap operand of wcsr in LOG_DIS() target-lm32: fix LOG_DIS operand order ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-10-31 11:58:30 +00:00
Anand J	814bb12a56	clean-up: removed duplicate #includes Some files contain multiple #includes of the same header file. Removed most of those unnecessary duplicate entries using scripts/clean-includes. Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Anand J <anand.indukala@gmail.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2016-10-28 18:17:24 +03:00
Bharata B Rao	cf63246319	spapr: Memory hot-unplug support Add support to hot remove pc-dimm memory devices. Since we're introducing a machine-level unplug_request hook, we also had handling for CPU unplug there as well to ensure CPU unplug continues to work as it did before. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> * add hooks to CAS/cmdline enablement of hotplug ACR support * add hook for CPU unplug Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 11:17:35 +11:00
Michael Roth	79b78a6bd4	spapr: use count+index for memory hotplug Commit 0a417869: spapr: Move memory hotplug to RTAS_LOG_V6_HP_ID_DRC_COUNT type dropped per-DRC/per-LMB hotplugs event in favor of a bulk add via a single LMB count value. This was to avoid overrunning the guest EPOW event queue with hotplug events. This works fine, but relies on the guest exhaustively scanning for pluggable LMBs to satisfy the requested count by issuing rtas-get-sensor(DR_ENTITY_SENSE, ...) calls until all the LMBs associated with the DIMM are identified. With newer support for dedicated hotplug event source, this queue exhaustion is no longer as much of an issue due to implementation details on the guest side, but we still try to avoid excessive hotplug events by now supporting both a count and a starting index to avoid unecessary work. This patch makes use of that approach when the capability is available. Cc: bharata@linux.vnet.ibm.com Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 11:17:35 +11:00
Bharata B Rao	afdbd40356	spapr: Add DRC count indexed hotplug identifier type Add support for DRC count indexed hotplug ID type which is primarily needed for memory hot unplug. This type allows for specifying the number of DRs that should be plugged/unplugged starting from a given DRC index. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> * updated rtas_event_log_v6_hp to reflect count/index field ordering used in PAPR hotplug ACR Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 11:17:35 +11:00
Michael Roth	f622921430	spapr: add hotplug interrupt machine options This adds machine options of the form: -machine pseries,modern-hotplug-events=true -machine pseries,modern-hotplug-events=false If false, QEMU will force the use of "legacy" style hotplug events, which are surfaced through EPOW events instead of a dedicated hot plug event source, and lack certain features necessary, mainly, for memory unplug support. If true, QEMU will enable support for "modern" dedicated hot plug event source. Note that we will still default to "legacy" style unless the guest advertises support for the "modern" hotplug events via ibm,client-architecture-support hcall during early boot. For pseries-2.7 and earlier we default to false, for newer machine types we default to true. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 11:17:35 +11:00
Michael Roth	ffbb1705a3	spapr_events: add support for dedicated hotplug event source Hotplug events were previously delivered using an EPOW interrupt and were queued by linux guests into a circular buffer. For traditional EPOW events like shutdown/resets, this isn't an issue, but for hotplug events there are cases where this buffer can be exhausted, resulting in the loss of hotplug events, resets, etc. Newer-style hotplug event are delivered using a dedicated event source. We enable this in supported guests by adding standard an additional event source in the guest device-tree via /event-sources, and, if the guest advertises support for the newer-style hotplug events, using the corresponding interrupt to signal the available of hotplug/unplug events. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 11:17:35 +11:00
Michael Roth	417ece33fc	spapr: improve ibm,architecture-vec-5 property handling ibm,architecture-vec-5 is supposed to encode all option vector 5 bits negotiated between platform/guest. Currently we hardcode this property in the boot-time device tree to advertise a single negotiated capability, "Form 1" NUMA Affinity, regardless of whether or not CAS has been invoked or that capability has actually been negotiated. Improve this by generating ibm,architecture-vec-5 based on the full set of option vector 5 capabilities negotiated via CAS. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 09:38:26 +11:00
Michael Roth	6787d27b04	spapr: add option vector handling in CAS-generated resets In some cases, ibm,client-architecture-support calls can fail. This could happen in the current code for situations where the modified device tree segment exceeds the buffer size provided by the guest via the call parameters. In these cases, QEMU will reset, allowing an opportunity to regenerate the device tree from scratch via boot-time handling. There are potentially other scenarios as well, not currently reachable in the current code, but possible in theory, such as cases where device-tree properties or nodes need to be removed. We currently don't handle either of these properly for option vector capabilities however. Instead of carrying the negotiated capability beyond the reset and creating the boot-time device tree accordingly, we start from scratch, generating the same boot-time device tree as we did prior to the CAS-generated and the same device tree updates as we did before. This could (in theory) cause us to get stuck in a reset loop. This hasn't been observed, but depending on the extensiveness of CAS-induced device tree updates in the future, could eventually become an issue. Address this by pulling capability-related device tree updates resulting from CAS calls into a common routine, spapr_dt_cas_updates(), and adding an sPAPROptionVector* parameter that allows us to test for newly-negotiated capabilities. We invoke it as follows: 1) When ibm,client-architecture-support gets called, we call spapr_dt_cas_updates() with the set of capabilities added since the previous call to ibm,client-architecture-support. For the initial boot, or a system reset generated by something other than the CAS call itself, this set will consist of all options supported both the platform and the guest. For calls to ibm,client-architecture-support immediately after a CAS-induced reset, we call spapr_dt_cas_updates() with only the set of capabilities added since the previous call, since the other capabilities will have already been addressed by the boot-time device-tree this time around. In the unlikely event that capabilities are removed since the previous CAS, we will generate a CAS-induced reset. In the unlikely event that we cannot fit the device-tree updates into the buffer provided by the guest, well generate a CAS-induced reset. 2) When a CAS update results in the need to reset the machine and include the updates in the boot-time device tree, we call the spapr_dt_cas_updates() using the full set of negotiated capabilities as part of the reset path. At initial boot, or after a reset generated by something other than the CAS call itself, this set will be empty, resulting in what should be the same boot-time device-tree as we generated prior to this patch. For CAS-induced reset, this routine will be called with the full set of capabilities negotiated by the platform/guest in the previous CAS call, which should result in CAS updates from previous call being accounted for in the initial boot-time device tree. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Changed an int -> bool conversion to be more explicit] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 09:38:26 +11:00
Michael Roth	facdb8b63b	spapr_hcall: use spapr_ovec_* interfaces for CAS options Currently we access individual bytes of an option vector via ldub_phys() to test for the presence of a particular capability within that byte. Currently this is only done for the "dynamic reconfiguration memory" capability bit. If that bit is present, we pass a boolean value to spapr_h_cas_compose_response() to generate a modified device tree segment with the additional properties required to enable this functionality. As more capability bits are added, will would need to modify the code to add additional option vector accesses and extend the param list for spapr_h_cas_compose_response() to include similar boolean values for these parameters. Avoid this by switching to spapr_ovec_* helpers so we can do all the parsing in one shot and then test for these additional bits within spapr_h_cas_compose_response() directly. Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 09:38:26 +11:00
Michael Roth	b20b7b7add	spapr_ovec: initial implementation of option vector helpers PAPR guests advertise their capabilities to the platform by passing an ibm,architecture-vec structure via an ibm,client-architecture-support hcall as described by LoPAPR v11, B.6.2.3. during early boot. Using this information, the platform enables the capabilities it supports, then encodes a subset of those enabled capabilities (the 5th option vector of the ibm,architecture-vec structure passed to ibm,client-architecture-support) into the guest device tree via "/chosen/ibm,architecture-vec-5". The logical format of these these option vectors is a bit-vector, where individual bits are addressed/documented based on the byte-wise offset from the beginning of the bit-vector, followed by the bit-wise index starting from the byte-wise offset. Thus the bits of each of these bytes are stored in reverse order. Additionally, the first byte of each option vector is encodes the length of the option vector, so byte offsets begin at 1, and bit offset at 0. This is not very intuitive for the purposes of mapping these bits to a particular documented capability, so this patch introduces a set of abstractions that encapsulate the work of parsing/encoding these options vectors and testing for individual capabilities. Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> [dwg: Tweaked double-include protection to not trigger a checkpatch false positive] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 09:38:26 +11:00
David Gibson	398a0bd5ae	pseries: Remove spapr_create_fdt_skel() For historical reasons construction of the guest device tree in spapr is divided between spapr_create_fdt_skel() which is called at init time, and spapr_build_fdt() which runs at reset time. Over time, more and more things have needed to be moved to reset time. Previous cleanups mean the only things left in spapr_create_fdt_skel() are the properties of the root node itself. Finish consolidating these two parts of device tree construction, by moving this to the start of spapr_build_fdt(), and removing spapr_create_fdt_skel() entirely. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2016-10-28 09:38:26 +11:00
David Gibson	bf5a6696ba	pseries: Consolidate construction of /vdevice device tree node Construction of the /vdevice node (and its children) is divided between spapr_create_fdt_skel() (at init time), which creates the base node, and spapr_populate_vdevice() (at reset time) which creates the nodes for each individual virtual device. This consolidates both into a single function called from spapr_build_fdt(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2016-10-28 09:38:26 +11:00
David Gibson	fca5f2dc6c	pseries: Move /hypervisor node construction to fdt_build_fdt() Currently the /hypervisor device tree node is constructed in spapr_create_fdt_skel(). As part of consolidating device tree construction to reset time, move it to a function called from spapr_build_fdt(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2016-10-28 09:38:26 +11:00
David Gibson	ffb1e275a6	pseries: Move /event-sources construction to spapr_build_fdt() The /event-sources device tree node is built from spapr_create_fdt_skel(). As part of consolidating device tree construction to reset time, this moves it to spapr_build_fdt(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2016-10-28 09:38:26 +11:00
David Gibson	3f5dabceba	pseries: Consolidate construction of /rtas device tree node For historical reasons construction of the /rtas node in the device tree (amongst others) is split into several places. In particular it's split between spapr_create_fdt_skel(), spapr_build_fdt() and spapr_rtas_device_tree_setup(). In fact, as well as adding the actual RTAS tokens to the device tree, spapr_rtas_device_tree_setup() just adds the ibm,lrdr-capacity property, which despite going in the /rtas node, doesn't have a lot to do with RTAS. This patch consolidates the code constructing /rtas together into a new spapr_dt_rtas() function. spapr_rtas_device_tree_setup() is renamed to spapr_dt_rtas_tokens() and now only adds the token properties. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2016-10-28 09:38:26 +11:00
David Gibson	7c866c6a60	pseries: Consolidate construction of /chosen device tree node For historical reasons, building the /chosen node in the guest device tree is split across several places and includes both parts which write the DT sequentially and others which use random access functions. This patch consolidates construction of the node into one place, using random access functions throughout. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2016-10-28 09:38:26 +11:00
David Gibson	9b9a19080a	pseries: Move construction of /interrupt-controller fdt node Currently the device tree node for the XICS interrupt controller is in spapr_create_fdt_skel(). As part of consolidating device tree construction to reset time, this moves it to a function called from spapr_build_fdt(). In addition we move the actual code into hw/intc/xics_spapr.c with the rest of the PAPR specific interrupt controller code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2016-10-28 09:38:26 +11:00
David Gibson	2cac78c12a	pseries: Consolidate RTAS loading At each system reset, the pseries machine needs to load RTAS (the runtime portion of the guest firmware) into the VM. This means copying the actual RTAS code into guest memory, and also updating the device tree so that the guest OS and boot firmware can locate it. For historical reasons the copy and update to the device tree were in different parts of the code. This cleanup brings them both together in an spapr_load_rtas() function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2016-10-28 09:38:26 +11:00
David Gibson	cf6e522390	pseries: Move adding of fdt reserve map entries The flattened device tree passed to pseries guests contains a list of reserved memory areas. Currently we construct this list early in spapr_create_fdt_skel() as we sequentially write the fdt. This will be inconvenient for upcoming cleanups, so this patch moves the reserve map changes to the end of fdt construction. This changes fdt_add_reservemap_entry() calls - which work when writing the fdt sequentially to fdt_add_mem_rsv() calls used when altering the fdt in random access mode. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2016-10-28 09:38:25 +11:00
David Gibson	a19f7fb045	pseries: Make spapr_create_fdt_skel() get information from machine state Currently spapr_create_fdt_skel() takes a bunch of individual parameters for various things it will put in the device tree. Some of these can already be taken directly from sPAPRMachineState. This patch alters it so that all of them can be taken from there, which will allow this code to be moved away from its current caller in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2016-10-28 09:38:25 +11:00
David Gibson	cae172ab6d	pseries: Remove rtas_addr and fdt_addr fields from machinestate These values are used only within ppc_spapr_reset(), so just change them to local variables. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>	2016-10-28 09:38:25 +11:00
David Gibson	997b6cfc3d	pseries: Split device tree construction from device tree load spapr_finalize_fdt() both finishes building the device tree for the guest and loads it into guest memory. For future cleanups, it's going to be more convenient to do these two things separately. The loading portion is pretty trivial, so we move it inline into the caller, ppc_spapr_reset(). We also rename spapr_finalize_fdt(), because the current name is going to become inaccurate. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>	2016-10-28 09:38:25 +11:00
Cédric Le Goater	3495b6b610	ppc/pnv: add a ISA bus As Qemu only supports a single instance of the ISA bus, we use the LPC controller of chip 0 to create one and plug in a couple of useful devices, like an UART and RTC. An IPMI BT device, which is also an ISA device, can be defined on the command line to connect an external BMC. That is for later. The PowerNV machine now has a console. Skiboot should load a kernel and jump into it but execution will stop quite early because we lack a model for the native XICS controller for the moment : [ 0.000000] NR_IRQS:512 nr_irqs:512 16 [ 0.000000] XICS: Cannot find a Presentation Controller ! [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at arch/powerpc/platforms/powernv/setup.c:81 ... [ 0.000000] NIP [c00000000079d65c] pnv_init_IRQ+0x30/0x44 You can still do a few things under xmon. Based on previous work from : Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Trivial fix for a change in the serial_hds_isa_init() interface] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 09:38:25 +11:00
Benjamin Herrenschmidt	a3980bf517	ppc/pnv: add a LPC controller The LPC (Low Pin Count) interface on a POWER8 is made accessible to the system through the ADU (XSCOM interface). This interface is part of set of units connected together via a local OPB (On-Chip Peripheral Bus) which act as a bridge between the ADU and the off chip LPC endpoints, like external flash modules. The most important units of this OPB are : - OPB Master: contains the ADU slave logic, a set of internal registers and the logic to control the OPB. - LPCHC (LPC HOST Controller): which implements a OPB Slave, a set of internal registers and the LPC HOST Controller to control the LPC interface. Four address spaces are provided to the ADU : - LPC Bus Firmware Memory - LPC Bus Memory - LPC Bus I/O (ISA bus) - and the registers for the OPB Master and the LPC Host Controller On POWER8, an intermediate hop is necessary to reach the OPB, through a unit called the ECCB. OPB commands are simply mangled in ECCB write commands. On POWER9, the OPB master address space can be accessed via MMIO. The logic is same but the code will be simpler as the XSCOM and ECCB hops are not necessary anymore. This version of the LPC controller model doesn't yet implement support for the SerIRQ deserializer present in the Naples version of the chip though some preliminary work is there. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [clg: - updated for qemu-2.7 - ported on latest PowerNV patchset - changed the XSCOM interface to fit new model - QOMified the model - moved the ISA hunks in another patch - removed printf logging - added a couple of UNIMP logging - rewrote commit log ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 09:38:25 +11:00
Cédric Le Goater	24ece07250	ppc/pnv: add XSCOM handlers to PnvCore Now that we are using real HW ids for the cores in PowerNV chips, we can route the XSCOM accesses to them. We just need to attach a specific XSCOM memory region to each core in the appropriate window for the core number. To start with, let's install the DTS (Digital Thermal Sensor) handlers which should return 38°C for each core. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 09:38:25 +11:00
Cédric Le Goater	967b75230b	ppc/pnv: add XSCOM infrastructure On a real POWER8 system, the Pervasive Interconnect Bus (PIB) serves as a backbone to connect different units of the system. The host firmware connects to the PIB through a bridge unit, the Alter-Display-Unit (ADU), which gives him access to all the chiplets on the PCB network (Pervasive Connect Bus), the PIB acting as the root of this network. XSCOM (serial communication) is the interface to the sideband bus provided by the POWER8 pervasive unit to read and write to chiplets resources. This is needed by the host firmware, OPAL and to a lesser extent, Linux. This is among others how the PCI Host bridges get configured at boot or how the LPC bus is accessed. To represent the ADU of a real system, we introduce a specific AddressSpace to dispatch XSCOM accesses to the targeted chiplets. The translation of an XSCOM address into a PCB register address is slightly different between the P9 and the P8. This is handled before the dispatch using a 8byte alignment for all. To customize the device tree, a QOM InterfaceClass, PnvXScomInterface, is provided with a populate() handler. The chip populates the device tree by simply looping on its children. Therefore, each model needing custom nodes should not forget to declare itself as a child at instantiation time. Based on previous work done by : Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: Added cpu parameter to xscom_complete()] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 09:38:25 +11:00
Cédric Le Goater	d2fd9612ee	ppc/pnv: add a PnvCore object This is largy inspired by sPAPRCPUCore with some simplification, no hotplug for instance. A set of PnvCore objects is added to the PnvChip and the device tree is populated looping on these cores. Real HW cpu ids are now generated depending on the chip cpu model, the chip id and a core mask. The id is propagated to the CPU object, using properties, to set the SPR_PIR (Processor Identification Register) Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 09:38:25 +11:00
Cédric Le Goater	631adaff31	ppc/pnv: add a PIR handler to PnvChip The Processor Identification Register (PIR) is a register that holds a processor identifier which is used for bus transactions (XSCOM) and for processor differentiation in multiprocessor systems. It also used in the interrupt vector entries (IVE) to identify the thread serving the interrupts. P9 and P8 have some differences in the CPU PIR encoding. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 09:38:25 +11:00
Cédric Le Goater	397a79e757	ppc/pnv: add a core mask to PnvChip This will be used to build real HW ids for the cores and enforce some limits on the available cores per chip. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 09:38:25 +11:00
Cédric Le Goater	e997040e3f	ppc/pnv: add a PnvChip object This is is an abstraction of a POWER8 chip which is a set of cores plus other 'units', like the pervasive unit, the interrupt controller, the memory controller, the on-chip microcontroller, etc. The whole can be seen as a socket. It depends on a cpu model and its characteristics: max cores and specific inits are defined in a PnvChipClass. We start with an near empty PnvChip with only a few cpu constants which we will grow in the subsequent patches with the controllers required to run the system. The Chip CFAM (Common FRU Access Module) ID gives the model of the chip and its version number. It is generally the first thing firmwares fetch, available at XSCOM PCB address 0xf000f, to start initialization. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 09:38:25 +11:00
Benjamin Herrenschmidt	9e933f4a62	ppc/pnv: add skeleton PowerNV platform The goal is to emulate a PowerNV system at the level of the skiboot firmware, which loads the OS and provides some runtime services. Power Systems have a lower firmware (HostBoot) that does low level system initialization, like DRAM training. This is beyond the scope of what qemu will address in a PowerNV guest. No devices yet, not even an interrupt controller. Just to get started, some RAM to load the skiboot firmware, the kernel and initrd. The device tree is fully created in the machine reset op. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [clg: - updated for qemu-2.7 - replaced fprintf by error_report - used a common definition of _FDT macro - removed VMStateDescription as migration is not yet supported - added IBM Copyright statements - reworked kernel_filename handling - merged PnvSystem and sPowerNVMachineState - removed PHANDLE_XICP - added ppc_create_page_sizes_prop helper - removed nmi support - removed kvm support - updated powernv machine to version 2.8 - removed chips and cpus, They will be provided in another patches - added a machine reset routine to initialize the device tree (also) - french has a squelette and english a skeleton. - improved commit log. - reworked prototypes parameters - added a check on the ram size (thanks to Michael Ellerman) - fixed chip-id cell - changed MAX_CPUS to 2048 - simplified memory node creation to one node only - removed machine version - rewrote the device tree creation with the fdt "rw" routines - s/sPowerNVMachineState/PnvMachineState/ - etc.] Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 09:38:24 +11:00
Michael Roth	4bcfa56ca9	spapr_pci: advertise explicit numa IDs even when there's 1 node With the addition of "numa_node" properties for PHBs we began advertising NUMA affinity in cases where nb_numa_nodes > 1. Since the default on the guest side is to make no assumptions about PHB NUMA affinity (defaulting to -1), there is still a valid use-case for explicitly defining a PHB's NUMA affinity even when there's just one node. In particular, some workloads make faulty assumptions about /sys/bus/pci/<devid>/numa_node being >= 0, warranting the use of this property as a workaround even if there's just 1 PHB or NUMA node. Enable this use-case by always advertising the PHB's NUMA affinity if "numa_node" has been explicitly set. We could achieve this by relaxing the check to simply be nb_numa_nodes > 0, but even safer would be to check numa_info[nodeid].present explicitly, and to fail at start time for cases where it does not exist. This has an additional affect of no longer advertising PHB NUMA affinity unconditionally if nb_numa_nodes > 1 and "numa_node" property is unset/-1, but since the default value on the guest side for each PHB is also -1, the behavior should be the same for that situation. We could still retain the old behavior if desired, but the decision seems arbitrary, so we take the simpler route. Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Shivaprasad G. Bhat <shivapbh@in.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-28 09:36:58 +11:00
Igor Mammedov	079019f2e3	Increase MAX_CPUMASK_BITS from 255 to 288 so that it would be possible to increase maxcpus limit for x86 target. Keep spapr/virt_arm at limit they used to have 255. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2016-10-24 17:29:15 -02:00
David Gibson	357d1e3bc7	spapr: Improved placement of PCI host bridges in guest memory map Currently, the MMIO space for accessing PCI on pseries guests begins at 1 TiB in guest address space. Each PCI host bridge (PHB) has a 64 GiB chunk of address space in which it places its outbound PIO and 32-bit and 64-bit MMIO windows. This scheme as several problems: - It limits guest RAM to 1 TiB (though we have a limited fix for this now) - It limits the total MMIO window to 64 GiB. This is not always enough for some of the large nVidia GPGPU cards - Putting all the windows into a single 64 GiB area means that naturally aligning things within there will waste more address space. In addition there was a miscalculation in some of the defaults, which meant that the MMIO windows for each PHB actually slightly overran the 64 GiB region for that PHB. We got away without nasty consequences because the overrun fit within an unused area at the beginning of the next PHB's region, but it's not pretty. This patch implements a new scheme which addresses those problems, and is also closer to what bare metal hardware and pHyp guests generally use. Because some guest versions (including most current distro kernels) can't access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and 64 TiB. This is broken into 1 TiB chunks. The first 1 TiB contains the PIO (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs. Each subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for one PHB each. This reduces the number of allowed PHBs (without full manual configuration of all the windows) from 256 to 31, but this should still be plenty in practice. We also change some of the default window sizes for manually configured PHBs to saner values. Finally we adjust some tests and libqos so that it correctly uses the new default locations. Ideally it would parse the device tree given to the guest, but that's a more complex problem for another time. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2016-10-16 12:04:15 +11:00
David Gibson	daa2369903	spapr_pci: Add a 64-bit MMIO window On real hardware, and under pHyp, the PCI host bridges on Power machines typically advertise two outbound MMIO windows from the guest's physical memory space to PCI memory space: - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space - A 64-bit window which maps onto a large region somewhere high in PCI address space (traditionally this used an identity mapping from guest physical address to PCI address, but that's not always the case) The qemu implementation in spapr-pci-host-bridge, however, only supports a single outbound MMIO window, however. At least some Linux versions expect the two windows however, so we arranged this window to map onto the PCI memory space from 2 GiB..~64 GiB, then advertised it as two contiguous windows, the "32-bit" window from 2G..4G and the "64-bit" window from 4G..~64G. This approach means, however, that the 64G window is not naturally aligned. In turn this limits the size of the largest BAR we can map (which does have to be naturally aligned) to roughly half of the total window. With some large nVidia GPGPU cards which have huge memory BARs, this is starting to be a problem. This patch adds true support for separate 32-bit and 64-bit outbound MMIO windows to the spapr-pci-host-bridge implementation, each of which can be independently configured. The 32-bit window always maps to 2G.. in PCI space, but the PCI address of the 64-bit window can be configured (it defaults to the same as the guest physical address). So as not to break possible existing configurations, as long as a 64-bit window is not specified, a large single window can be specified. This will appear the same way to the guest as the old approach, although it's now implemented by two contiguous memory regions rather than a single one. For now, this only adds the possibility of 64-bit windows. The default configuration still uses the legacy mode. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2016-10-16 12:03:09 +11:00
David Gibson	2efff1c0dd	spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM Currently the default PCI host bridge for the 'pseries' machine type is constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in guest memory space. This means that if > 1TiB of guest RAM is specified, the RAM will collide with the PCI IO windows, causing serious problems. Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because there's a little unused space at the bottom of the area reserved for PCI, but essentially this means that > 1TiB of RAM has never worked with the pseries machine type. This patch fixes this by altering the placement of PHBs on large-RAM VMs. Instead of always placing the first PHB at 1TiB, it is placed at the next 1 TiB boundary after the maximum RAM address. Technically, this changes behaviour in a migration-breaking way for existing machines with > 1TiB maximum memory, but since having > 1 TiB memory was broken anyway, this seems like a reasonable trade-off. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2016-10-16 12:03:09 +11:00
David Gibson	6737d9ad79	spapr_pci: Delegate placement of PCI host bridges to machine type The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB) for a PAPR guest. Unlike on x86, it's routine on Power (both bare metal and PAPR guests) to have numerous independent PHBs, each controlling a separate PCI domain. There are two ways of configuring the spapr-pci-host-bridge device: first it can be done fully manually, specifying the locations and sizes of all the IO windows. This gives the most control, but is very awkward with 6 mandatory parameters. Alternatively just an "index" can be specified which essentially selects from an array of predefined PHB locations. The PHB at index 0 is automatically created as the default PHB. The current set of default locations causes some problems for guests with large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia GPGPU cards via VFIO). Obviously, for migration we can only change the locations on a new machine type, however. This is awkward, because the placement is currently decided within the spapr-pci-host-bridge code, so it breaks abstraction to look inside the machine type version. So, this patch delegates the "default mode" PHB placement from the spapr-pci-host-bridge device back to the machine type via a public method in sPAPRMachineClass. It's still a bit ugly, but it's about the best we can do. For now, this just changes where the calculation is done. It doesn't change the actual location of the host bridges, or any other behaviour. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2016-10-16 12:03:09 +11:00
Benjamin Herrenschmidt	cc706a5305	ppc/xics: Make the ICSState a list Instead of an array of fixed sized blocks, use a list, as we will need to have sources with variable number of interrupts. SPAPR only uses a single entry. Native will create more. If performance becomes an issue we can add some hashed lookup but for now this will do fine. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [ move the initialization of list to xics_common_initfn, restore xirr_owner after migration and move restoring to icp_post_load] Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> [ clg: removed the icp_post_load() changes from nikunj patchset v3: http://patchwork.ozlabs.org/patch/646008/ ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-14 16:31:02 +11:00
Michael Roth	672de881e9	spapr: fix inheritance chain for default machine options Rather than machine instances having backward-compatible option defaults that need to be repeatedly re-enabled for every new machine type we introduce, we set the defaults appropriate for newer machine types, then add code to explicitly disable instance options as needed to maintain compatibility with older machine types. Currently pseries-2.5 does not inherit from pseries-2.6 in this fashion, which is okay at the moment since we do not have any instance compatibility options for pseries-2.6+ currently. We will make use of this in future patches though, so fix it here. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> [dwg: Extended to make 2.7 inherit from 2.8 as well] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-14 15:33:32 +11:00
Igor Mammedov	6bea1ddf8b	numa: reduce code duplication by adding helper numa_get_node_for_cpu() Replace repeated pattern for (i = 0; i < nb_numa_nodes; i++) { if (test_bit(idx, numa_info[i].node_cpu)) { ... break; with a helper function to lookup numa node index for cpu. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2016-10-10 01:16:57 +03:00
Thomas Huth	3daa4a9f95	hw/ppc/spapr: Use POWER8 by default for the pseries-2.8 machine A couple of distributors are compiling their distributions with "-mcpu=power8" for ppc64le these days, so the user sooner or later runs into a crash there when not explicitely specifying the "-cpu POWER8" option to QEMU (which is currently using POWER7 for the "pseries" machine by default). Due to this reason, the linux-user target already switched to POWER8 a while ago (see commit `de3f1b9841`). Since the softmmu target of course has the same problem, we should switch there to POWER8 for the newer machine types, too. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-06 16:15:53 +11:00
Greg Kurz	e17a87792d	spapr: fix check of cpu alias name in spapr_get_cpu_core_type() If the user passes an alias name and a property to -cpu, QEMU fails to find the CPU definition and exits. $ qemu-system-ppc64 -cpu POWER8E,compat=power7 qemu-system-ppc64: Unable to find sPAPR CPU Core definition This happens because spapr_get_cpu_core_type() passes the full string from the command line (i.e. "POWER8E,compat=power7") to ppc_cpu_lookup_alias(), instead of the alias name piece only (i.e. "POWER8E"). The fix is to pass model_pieces[0] to ppc_cpu_lookup_alias(). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-06 16:15:53 +11:00
Thomas Huth	bac3bf287a	ppc: Check the availability of transactional memory KVM-PR currently does not support transactional memory, and the implementation in TCG is just a fake. We should not announce TM support in the ibm,pa-features property when running on such a system, so disable it by default and only enable it if the KVM implementation supports it (i.e. recent versions of KVM-HV). These changes are based on some earlier work from Anton Blanchard (thanks!). Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-05 11:05:28 +11:00
Thomas Huth	4cbec30d76	hw/ppc/spapr: Fix the selection of the processor features The current code uses pa_features_206 for POWERPC_MMU_2_06, and for everything else, it uses pa_features_207. This is bad in some cases because there is also a "degraded" MMU version of ISA 2.06, called POWERPC_MMU_2_06a, which should of course use the flags for 2.06 instead. And there is also the possibility that the user runs the pseries machine with a POWER5+ or even 970 processor. In that case we certainly do not want to set the flags for 2.07, and rather simply skip the setting of the pa-features property instead. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-05 11:05:28 +11:00
Thomas Huth	230bf719d3	hw/ppc/spapr: Move code related to "ibm,pa-features" to a separate function The function spapr_populate_cpu_dt() has become quite big already, and since we likely have to extend the pa-features property for every new processor generation, it is nicer if we put the related code into a separate function. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-05 11:05:28 +11:00
David Gibson	db800b21d8	pseries: Add 2.8 machine type, set up compatibility macros Now that 2.7 is released, create the pseries-2.8 machine type and add the boilerplate compatiblity macro stuff. There's nothing new to put into the 2.7 compatiliby properties yet, but we'll need something eventually, so we might as well get it ready now. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-10-05 11:05:28 +11:00
Peter Maydell	c640f2849e	* thread-safe tb_flush (Fred, Alex, Sergey, me, Richard, Emilio,... :-) * license clarification for compiler.h (Felipe) * glib cflags improvement (Marc-André) * checkpatch silencing (Paolo) * SMRAM migration fix (Paolo) * Replay improvements (Pavel) * IOMMU notifier improvements (Peter) * IOAPIC now defaults to version 0x20 (Peter) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQExBAABCAAbBQJX6kKUFBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D M1UIAKCQ7XfWDoClYd1TyGZ+Qj3K3TrjwLDIl/Z258euyeZ9p7PpqYQ64OCRsREJ fsGQOqkFYDe7gi4epJiJOuu4oAW7Xu8G6lB2RfBd7KWVMhsl3Che9AEom7amzyzh yoN+g9gwKfAmYwpKyjYWnlWOSjUvif6o0DaTCQCMTaAoEM3b4HKdgHfr6A2dA/E/ 47rtIVp/jNExmrZkaOjnCDS1DJ8XYT3aVeoTkuzRFQ3DBzrAiPABn6B4ExP8IBcJ YLFX/W8xG7F3qyXbKQOV/uYM25A55WS5B0G94ZfSlDtUGa/avzS7df9DFD/IWQT+ RpfiyDdeJueByiTw9R0ZYxFjhd8= =g7xm -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * thread-safe tb_flush (Fred, Alex, Sergey, me, Richard, Emilio,... :-) * license clarification for compiler.h (Felipe) * glib cflags improvement (Marc-André) * checkpatch silencing (Paolo) * SMRAM migration fix (Paolo) * Replay improvements (Pavel) * IOMMU notifier improvements (Peter) * IOAPIC now defaults to version 0x20 (Peter) # gpg: Signature made Tue 27 Sep 2016 10:57:40 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (28 commits) replay: allow replay stopping and restarting replay: vmstate for replay module replay: move internal data to the structure cpus-common: lock-free fast path for cpu_exec_start/end tcg: Make tb_flush() thread safe cpus-common: Introduce async_safe_run_on_cpu() cpus-common: simplify locking for start_exclusive/end_exclusive cpus-common: remove redundant call to exclusive_idle() cpus-common: always defer async_run_on_cpu work items docs: include formal model for TCG exclusive sections cpus-common: move exclusive work infrastructure from linux-user cpus-common: fix uninitialized variable use in run_on_cpu cpus-common: move CPU work item management to common code cpus-common: move CPU list management to common code linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick() linux-user: Use QemuMutex and QemuCond cpus: Rename flush_queued_work() cpus: Move common code out of {async_, }run_on_cpu() cpus: pass CPUState to run_on_cpu helpers build-sys: put glib_cflags in QEMU_CFLAGS ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-09-28 23:02:56 +01:00
David Gibson	4f01a63779	sysbus: Remove ignored return value of FindSysbusDeviceFunc Functions of type FindSysbusDeviceFunc currently return an integer. However, this return value is always ignored by the caller in find_sysbus_device(). This changes the function type to return void, to avoid confusion over the function semantics. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2016-09-27 17:03:34 -03:00
Alex Bennée	e0eeb4a21a	cpus: pass CPUState to run_on_cpu helpers CPUState is a fairly common pointer to pass to these helpers. This means if you need other arguments for the async_run_on_cpu case you end up having to do a g_malloc to stuff additional data into the routine. For the current users this isn't a massive deal but for MTTCG this gets cumbersome when the only other parameter is often an address. This adds the typedef run_on_cpu_func for helper functions which has an explicit CPUState * passed as the first parameter. All the users of run_on_cpu and async_run_on_cpu have had their helpers updated to use CPUState where available. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> [Sergey Fedorov: - eliminate more CPUState in user data; - remove unnecessary user data passing; - fix target-s390x/kvm.c and target-s390x/misc_helper.c] Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts) Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> (s390 parts) Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <1470158864-17651-3-git-send-email-alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-27 11:57:29 +02:00
Peter Xu	5bf3d31903	memory: introduce IOMMUOps.notify_flag_changed The new interface can be used to replace the old notify_started() and notify_stopped(). Meanwhile it provides explicit flags so that IOMMUs can know what kind of notifications it is requested for. Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1474606948-14391-3-git-send-email-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-27 09:00:04 +02:00
Peter Maydell	c229472af0	ppc patch queue 2016-09-23 This pull request supersedes ppc-for-2.8-20160922. There was a clang build error in that, and I've also added one extra patch in the new pull. Included in this set of ppc and spapr patches are: * TCG implementations for more POWER9 instructions * Some preliminary XICS fixes in preparataion for the pnv machine type * A significant ADB (Macintosh kbd/mouse) cleanup * Some conversions to use trace instead of debug macros * Fixes to correctly handle global TLB flush synchronization in TCG. This is already a bug, but it will have much more impact when we get MTTCG * Add more qtest testcases for Power * Some MAINTAINERS updates * Assorted bugfixes * Add the basics of NUMA associativity to the spapr PCI host bridge This touches some test files and monitor.c which are technically outside the ppc code, but coming through this tree because the changes are primarily of interest to ppc. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJX5NZnAAoJEGw4ysog2bOSoLEP/1YpRFG/6gmiT+T+Btz1QYcd eqrJkV63/rY/lvgZOvUBdqA/YKaBSWDOEByNFRZ+Grqz9h5zKrRcmM7IWdRWg+vG gyrZUm1pscFG20iGNcenxB8mD0VMk7C77gnUlv12bo+mK+1D1i8eUfKLFqxb0kOx JGIRQNG5orF5vZxsyjRPVpvMS9gNG90vrPIypux4ryozCVMWbrjXRZNsPQKz8wb9 UGcJIFB6R6JVbmBGchi434PEJkcdZzP/a0HvVSO51oGsFBnwYwQ7XVc3PyA4KCD7 tTbm6T2Rpdak3Pcd/nuzoXCMBCkh48XGKxZ+yPuLXGG5ZGIZ6rzlHPqBsEqqiLz5 DLzbsxKyLHX2Af87js4J9OXkoNQI4rVGurvNbkQ7IMQ2/Xt97kgUEgr3W0Vj+r82 bqIqWm4OdJ9cDzTGVlQ7l2vLv6RMe7DrkeWRNEKZZgfir7Hgj1gr79BOe96ETKBd 7r/1z0fBkZoWSq2OdjX8RouXMwd1Nq3FnqYv2BQ99rvM/AqpkY0HYsPIfUilHq6T ZXhvm/4LIEev0F/GiJvV5jHHg637QS4QqdyglF8ODC8vSMvOThhL9Gj7EMgJs7hj Ywt1B5y88//Zq4+IGVda98J5ynOZO1CArvzoYR5UMnWiq2K0Lxpq7wemE/finyIK 0jWLqlmCmYRzsS+oQEg/ =et1C -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.8-20160923' into staging ppc patch queue 2016-09-23 This pull request supersedes ppc-for-2.8-20160922. There was a clang build error in that, and I've also added one extra patch in the new pull. Included in this set of ppc and spapr patches are: * TCG implementations for more POWER9 instructions * Some preliminary XICS fixes in preparataion for the pnv machine type * A significant ADB (Macintosh kbd/mouse) cleanup * Some conversions to use trace instead of debug macros * Fixes to correctly handle global TLB flush synchronization in TCG. This is already a bug, but it will have much more impact when we get MTTCG * Add more qtest testcases for Power * Some MAINTAINERS updates * Assorted bugfixes * Add the basics of NUMA associativity to the spapr PCI host bridge This touches some test files and monitor.c which are technically outside the ppc code, but coming through this tree because the changes are primarily of interest to ppc. # gpg: Signature made Fri 23 Sep 2016 08:14:47 BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.8-20160923: (45 commits) spapr_pci: Add numa node id monitor: fix crash for platforms without a CPU 0 linux-user: ppc64: fix ARCH_206 bit in AT_HWCAP ppc/kvm: Mark 64kB page size support as disabled if not available ppc/xics: An ICS with offset 0 is assumed to be uninitialized ppc/xics: account correct irq status Enable H_CLEAR_MOD and H_CLEAR_REF hypercalls on KVM/PPC64. target-ppc: tlbie/tlbivax should have global effect target-ppc: add flag in check_tlb_flush() target-ppc: add TLB_NEED_LOCAL_FLUSH flag spapr: Introduce sPAPRCPUCoreClass target-ppc: implement darn instruction target-ppc: add stxsi[bh]x instruction target-ppc: add lxsi[bw]zx instruction target-ppc: add xxspltib instruction target-ppc: consolidate store conditional target-ppc: move out stqcx impementation target-ppc: consolidate load with reservation target-ppc: convert st[16,32,64]r to use new macro target-ppc: convert st64 to use new macro ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-09-23 14:26:12 +01:00
Fam Zheng	9c5ce8db2e	vl: Switch qemu_uuid to QemuUUID Update all qemu_uuid users as well, especially get rid of the duplicated low level g_strdup_printf, sscanf and snprintf calls with QEMU UUID API. Since qemu_uuid_parse is quite tangled with qemu_uuid, its switching to QemuUUID is done here too to keep everything in sync and avoid code churn. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-Id: <1474432046-325-10-git-send-email-famz@redhat.com>	2016-09-23 11:42:52 +08:00
Alexey Kardashevskiy	4814401fa0	spapr_pci: Add numa node id This adds a numa id property to a PHB to allow linking passed PCI device to CPU/memory. It is up to the management stack to do CPU/memory pinning to the node with the actual PCI device. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [dwg: Renamed property from "node" to "numa_node" to match the similar one in the pxb device] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-09-23 12:39:07 +10:00
Nathan Whitehorn	5145ad4fad	Enable H_CLEAR_MOD and H_CLEAR_REF hypercalls on KVM/PPC64. These are mandatory per PAPR and available on Linux 4.3 and newer kernels. The calls in question are required to run FreeBSD guests with reasonable performance, so enable them if possible. Signed-off-by: Nathan Whitehorn <nwhitehorn@freebsd.org> [dwg: Added a stub to fix compile without KVM (e.g. on x86 host)] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-09-23 12:39:07 +10:00
Nikunj A Dadhania	d76ab5e1c7	target-ppc: tlbie/tlbivax should have global effect tlbie (BookS) and tlbivax (BookE) plus the H_CALLs(pseries) should have a global effect. Introduces TLB_NEED_GLOBAL_FLUSH flag. During lazy tlb flush, after taking care of pending local flushes, check broadcast flush(at context synchronizing event ptesync/tlbsync, etc) is needed. Depending on the bitmask state of the tlb_need_flush, tlb is flushed from other cpus if needed and the flags are cleared. Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Use 'true' instead of '1' for call to check_tlb_flush()] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-09-23 12:39:07 +10:00
Nikunj A Dadhania	e3cffe6fad	target-ppc: add flag in check_tlb_flush() We flush the qemu TLB lazily. check_tlb_flush is called whenever we hit a context synchronizing event or instruction that requires a pending flush to be performed. However, we fail to handle broadcast TLB flush operations. In order to fix that efficiently, we want to differentiate whether check_tlb_flush() needs to only apply pending local flushes (isync instructions, interrupts, ...) or also global pending flush operations. The latter is only needed when executing instructions that are defined architecturally as synchronizing global TLB flush operations. This in our case is ptesync on BookS and tlbsync on BookE along with the paravirtualized hypervisor calls. Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> [dwg: Changed gen_check_tlb_flush() to also take a bool, and fixed some spelling errors in commit message] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-09-23 12:39:07 +10:00
Bharata B Rao	7ebaf79556	spapr: Introduce sPAPRCPUCoreClass Each spapr cpu core type defines an instance_init routine which just populates the CPU class name. This can be done in the class_init commonly for all core types which simplifies the registration. This is inspired by how PowerNV core types are registered. Certain types of spapr cpu cores ('host' and generic type based on host CPU) are initialized in target-ppc/kvm.c. To convert these type registrations to use class_init, we need to expose spapr_cpu_core_class_init() outside of spapr_cpu_core.c. Commit `d11b268e17` added a generic sPAPR CPU core family type to support cases like POWER8 CPU type on POWER8E host CPU. Switching to class_init would fix such scenarios to use the right CPU thread type instead of defaulting to host-powerpc64-cpu. In an unrelated cleanup, fix a typo in .get_hotplug_handler routine. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-09-23 12:39:06 +10:00
Laurent Vivier	7ab6a501c6	spapr_vio: convert to trace framework instead of DPRINTF Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-09-23 10:29:40 +10:00
Laurent Vivier	028ec3cee3	spapr_rtas: convert to trace framework instead of DPRINTF Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-09-23 10:29:40 +10:00
Laurent Vivier	24ac7755d7	spapr_drc: convert to trace framework instead of DPRINTF Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-09-23 10:29:40 +10:00
Laurent Vivier	eeddd59f59	tests: add RTAS command in the protocol Add a first test to validate the protocol: - rtas/get-time-of-day compares the time from the guest with the time from the host. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-09-23 10:29:40 +10:00
Ladi Prosek	d4b84d564e	Remove unused function declarations Unused function declarations were found using a simple gcc plugin and manually verified by grepping the sources. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2016-09-15 15:32:22 +03:00
Cédric Le Goater	3654fa95bc	hw/ppc: add a ppc_create_page_sizes_prop() helper routine The exact same routine will be used in PowerNV. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-09-07 12:40:12 +10:00
Cédric Le Goater	ce9863b797	hw/ppc: use error_report instead of fprintf Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-09-07 12:40:12 +10:00
Cédric Le Goater	7804c353a9	hw/ppc: include fdt helper routine in a common file spapr_pci would also be a good candidate but the macro _FDT is slightly different. It returns and does not exit. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-09-07 09:52:14 +10:00
Peter Maydell	f3b9e787ae	ppc patch queue for 2016-08-15 Just a single patch here, I hope this is the last ppc / spapr fix to squeeze into qemu-2.7. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXsWVMAAoJEGw4ysog2bOSxpkQAKCybBBMbQ6viEeqZBNtrleC whKm6WhN5AZvxb1W/NzacrpwXPHCM8C9+jZRIpea3ucHn5ijyRPCE73gBZLcyV6h CRFisJQ2NT9gq4iCw0Iw1TwxL+tt6xw2dPr3+mKQpJuUHbcKK8hO5EhZLe/dr+u7 54j2l+EgqhokTjLJuD7GEa/qca1qSsae/Q0HvIThcA4h4jX5RtpMHNSpbh6PJ8fI dxlcHnjtfei75ptMMqrP+YZ+HPEuiqOqLSVKmcEsjJblKABk7SW7RjbW4Jk8dKYo Z8VA+MOP+eLrbjYOPJHROHK80Ik6hg3NH/4/tduZM0hsOeFV2i9AyMR1n/Qhkpyu xEi8Ld+wcVun8NFWV2dj/m/RAE/BgZ1non3wddxVIog8W2R/+PMIfMdVOWt3pRMj KS/1kkCzKYHWFO18FTpxGfFLsdiNo1szjtJydjfAGd5RvectDm6bBguz0ZwgDPSo 338I7uIFB7h4L/DwMFcPSYTRTSyrvE5MsxcwpQoS4OB5ZKrKGLrqLG9cy0XvO9sO ImHRMT/YMnD9qiXXnuzmHCg8XgRPyfbxdml6EkxcIDJn9wsINDRdvN9GZ33vDUgT CBy7xqxRlYJ+MXFJP5S6dyzM6mqtwy8MFDqlcDvIzNDl5GEAyVJHjQdtUu/t3cRx OzQ0bArG7WeIK2norvwL =Jm4E -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160815' into staging ppc patch queue for 2016-08-15 Just a single patch here, I hope this is the last ppc / spapr fix to squeeze into qemu-2.7. # gpg: Signature made Mon 15 Aug 2016 07:46:36 BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.7-20160815: ppc: parse cpu features once Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-08-15 21:48:03 +01:00
Greg Kurz	e703d2f71c	ppc: parse cpu features once Considering that features are converted to global properties and global properties are automatically applied to every new instance of created CPU (at object_new() time), there is no point in parsing cpu_model string every time a CPU created. So move parsing outside CPU creation loop and do it only once. Parsing also should be done before any CPU is created so that features would affect the first CPU a well. This patch does that for all PowerPC machine types. It is based on previous work from Bharata: https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg07564.html Signed-off-by: Greg Kurz <groug@kaod.org> [clg: only kept the fix for the spapr platform. support for other platform will be added in 2.8 ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-08-13 17:32:58 +10:00
Laurent Vivier	e723b87103	trace-events: fix first line comment in trace-events Documentation is docs/tracing.txt instead of docs/trace-events.txt. find . -name trace-events -exec \ sed -i "s?See docs/trace-events.txt for syntax documentation.?See docs/tracing.txt for syntax documentation.?" \ {} \; Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-id: 1470669081-17860-1-git-send-email-lvivier@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-08-12 10:36:01 +01:00
Thomas Huth	4babfaf05d	hw/ppc/spapr: Look up CPU alias names instead of hard-coding the aliases Hard-coding the CPU alias names in the spapr_cores[] array has two big disadvantages: 1) We register a real type with the CPU alias name in spapr_cpu_core_register_types() - this prevents us from registering a CPU family name in kvm_ppc_register_host_cpu_type() with the same name (as we do it for the non-hotpluggable CPU types). 2) It's quite cumbersome to maintain the aliases here in sync with the ppc_cpu_aliases list from target-ppc/cpu-models.c. So let's simply add proper alias lookup to the spapr cpu core code, too (by checking whether the given model can be used directly, and if not by trying to look up the given model as an alias name instead). Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-08-10 13:12:20 +10:00
Cédric Le Goater	caebf37859	spapr: remove extra type variable The sPAPR CPU core typename is already available in the upper block. Let's use it and move the check upward also. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-08-10 13:12:20 +10:00
Peter Maydell	f5edfcfafb	Error reporting patches for 2016-08-08 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXqDFpAAoJEDhwtADrkYZTlQMQALZDzjoYJQlmcLQu92O52a3L XlluF82W4Y6jOLR6u/eRsP4uok/C3FA23SMtw7CfPLJZbet/PDKLS4N7J0m4mrqa mGmBT/9ZY7jVeISJz4X7WW7chgFR0JF2rOUpEmQPvzrEYYY7cTd4DwHpb0UB1f7W /H3i55vkVUCpSeib8Ah/MgzYGdgv1ZVmh0X+IsEwd42J8f4nv8y3YSPO8J/DPooY hfHVikObX/LIx1yItFkKWzA2JW+nSLvBMXYtbvVUkVkDXwQYcHJcAKhYPzdiE6Iy GTSrnwXCW/4ckic/AumZ1WNTbcK5tp9FtdI/li4JzZHoJ/pWo0lt+BWCTmQOFCvs f0Vqza5Ux3B+hvCYM+ulmydnEGZVopc51u8cqEKGzYE2VrxJ0A63lqWCzm5F9gQj cE/546oiTa9pm4DDTfB064+Chzq1ao4AWga2yol7IWBvljkQZ7j+I620l5xv5Xaa WLhIDZg4e6EwViNtta73Fo3y8HqlvHTiPh3Gpfgvrnc7hocL7im3yh8O1RSOUCdY 4aUmWonDg4zKPb2u9nkerWBCDM4s0p5rNTYmntJtoVIlsFvcUm/3yzVipdWyz5AX y9xLc3FqVfE2Kfw1qJHlw5fx7FegFJCfGzsa1xBZfL1qC9bfU1XGqj4fnyIbQ8pE WWrWL7bGuzSWZsQ2+JBT =FNBu -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2016-08-08' into staging Error reporting patches for 2016-08-08 # gpg: Signature made Mon 08 Aug 2016 08:14:49 BST # gpg: using RSA key 0x3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-error-2016-08-08: error: Fix error_printf() calls lacking newlines vfio: Use error_report() instead of error_printf() for errors checkpatch: Fix newline detection in error_setg() & friends error: Strip trailing '\n' from error string arguments (again) Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-08-08 13:25:35 +01:00
Markus Armbruster	df3c286c53	error: Strip trailing '\n' from error string arguments (again) Commit `9af9e0f`, `6daf194d`, `be62a2eb` and `312fd5f` got rid of a bunch, but they keep coming back. checkpatch.pl tries to flag them since commit `5d596c2`, but it's not very good at it. Offenders tracked down with Coccinelle script scripts/coccinelle/err-bad-newline.cocci, an updated version of the script from commit `312fd5f`. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1470224274-31522-2-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-08-08 09:00:44 +02:00
David Gibson	57c0eb1e0d	spapr: Fix undefined behaviour in spapr_tce_reset() When a TCE table (sPAPR IOMMU context) is in disabled state (which is true by default for the 64-bit window), it has tcet->nb_table == 0 and tcet->table == NULL. However, on system reset, spapr_tce_reset() executes, which unconditionally calls memset(tcet->table, 0, table_size); We get away with this in practice, because it's a zero length memset(), but memset() on a NULL pointer is undefined behaviour, so we should not call it in this case. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-08-08 10:06:25 +10:00
David Gibson	3c0c47e346	spapr: Correctly set query_hotpluggable_cpus hook based on machine version Prior to `c8721d3` "spapr: Error out when CPU hotplug is attempted on older pseries machines", attempting to use query-hotpluggable-cpus on pseries-2.6 and earlier machine types would SEGV. That change fixed that, but due to some unexpected interactions in init order and a brown-paper-bag worthy failure to test, it accidentally disabled query-hotpluggable-cpus for all pseries machine types, including the current one which should allow it. In fact, query_hotpluggable_cpus needs to be non-NULL when and only when the dr_cpu_enabled flag in sPAPRMachineClass is set, which makes dr_cpu_enabled itself redundant. This patch removes dr_cpu_enabled, instead directly setting query_hotpluggable_cpus from the machine class_init functions, and using that to determine the availability of CPU hotplug when necessary. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-08-08 09:45:03 +10:00
Bharata B Rao	c8721d3599	spapr: Error out when CPU hotplug is attempted on older pseries machines CPU hotplug and coldplug aren't supported prior to pseries-2.7. Further, earlier machine types don't use CPU core objects at all. These mean that query-hotpluggable-cpus and coldplug on older pseries machines will crash QEMU. It also means that hotpluggable_cpus flag in query-machines will be incorrectly set to true for pseries < 2.7, since it is based on the presence of the query_hotpluggable_cpus hook. - Don't assign the query_hotpluggable_cpus hook for pseries < 2.7 - query_hotpluggable_cpus should therefore never be called on pseries < 2.7, so add an assert - spapr_core_pre_plug() should fail hot/cold plug attempts for pseries < 2.7, since core objects are never used there - spapr_core_plug() should therefore never be called for pseries < 2.7, so add an assert. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> [dwg: Change from query_hotpluggable_cpus returning NULL for pseries < 2.7 to not being called at all, reword commit message for accuracy] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-08-03 13:08:54 +10:00
Bharata B Rao	62be8b044a	spapr: Prevent boot CPU core removal Boot CPU is assumed to be always present in QEMU code. So until that assumptions are gone, deny removal request. In another words, QEMU won't support boot CPU core hot-unplug. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> [dwg: Tweaked error message for clarity] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-29 12:02:31 +10:00
David Gibson	7cdd76132a	Revert "spapr: Ensure CPU cores are added contiguously and removed in LIFO order" This reverts commit `5cbc64de25`. Now that we have stable cpu_index values for pseries-2.7 (and future) machine types, we can now safely allow hotplug and unplug in any order. Conflicts: hw/ppc/spapr_cpu_core.c Some conflicts on revert due to some small changes in the inserted code since the original commit. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-29 12:02:31 +10:00
Igor Mammedov	b63578bdb5	spapr: init CPUState->cpu_index with index relative to core-id It will enshure that cpu_index for a given cpu stays the same regardless of the order cpus has been created/deleted and so it would be possible to migrate QEMU instance with out of order created CPU. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-29 12:02:31 +10:00
Greg Kurz	12bf2d33fe	spapr: disintricate core-id from DT semantics The goal of this patch is to have a stable core-id which does not depend on any DT related semantics, which involve non-obvious computations on modern PowerPC server cpus. With this patch, the DT core id is computed on-demand as: (core-id / smp_threads) * smt where smt is the number of threads per core in the host. This formula should be consolidated in a helper since it is needed in several places. Other uses for core-id includes: compute a stable cpu_index (which allows random order hotplug/unplug without breaking migration) and NUMA. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-25 15:43:41 +10:00
Thomas Huth	c573fc03da	hw/ppc/spapr: Make sure to close the htab_fd when migration is canceled When canceling a migration process, we currently do not close the HTAB migration file descriptor since htab_save_complete() is never called in that case. So we leave the migration process with a dangling htab_fd value around, and this causes any further migration attempts to fail. To fix this issue, simply make sure that the htab_fd is closed during the migration cleanup stage. And since the cleanup() function is also called when migration succeeds, we can also remove the call to close_htab_fd() from the htab_save_complete() function. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1354341 Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-25 10:19:30 +10:00
Bharata B Rao	5cbc64de25	spapr: Ensure CPU cores are added contiguously and removed in LIFO order If CPU core addition or removal is allowed in random order leading to holes in the core id range (and hence in the cpu_index range), migration can fail as migration with holes in cpu_index range isn't yet handled correctly. Prevent this situation by enforcing the addition in contiguous order and removal in LIFO order so that we never end up with holes in cpu_index range. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-18 10:40:27 +10:00
Greg Kurz	44d691f7d9	spapr: fix core unplug crash If the host has 8 threads/core and the guest is started with: -smp cores=1,threads=4,maxcpus=12 It is possible to crash QEMU by doing: (qemu) device_add host-spapr-cpu-core,core-id=16,id=foo (qemu) device_del foo Segmentation fault This happens because spapr_core_unplug() assumes cpu_dt_id == core_id. As long as cpu_dt_id is derived from the non-table cpu_index, this is only true when you plug cores with contiguous ids. It is safer to be consistent: the DR connector was created with an index that is immediately written to cc->core_id, and spapr_core_plug() also relies on cc->core_id. Let's use it also in spapr_core_unplug(). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-18 10:40:27 +10:00
Markus Armbruster	2a6a4076e1	Clean up ill-advised or unusual header guards Cleaned up with scripts/clean-header-guards.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2016-07-12 16:20:46 +02:00
Markus Armbruster	121d07125b	Clean up header guards that don't match their file name Header guard symbols should match their file name to make guard collisions less likely. Offenders found with scripts/clean-header-guards.pl -vn. Cleaned up with scripts/clean-header-guards.pl, followed by some renaming of new guard symbols picked by the script to better ones. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2016-07-12 16:19:16 +02:00
Markus Armbruster	a9c94277f0	Use #include "..." for our own headers, <...> for others Tracked down with an ugly, brittle and probably buggy Perl script. Also move includes converted to <...> up so they get included before ours where that's obviously okay. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2016-07-12 16:19:16 +02:00
Eric Blake	1158bb2a05	qapi: Add parameter to visit_end_* Rather than making the dealloc visitor track of stack of pointers remembered during visit_start_* in order to free them during visit_end_, it's a lot easier to just make all callers pass the same pointer to visit_end_. The generated code has access to the same pointer, while all other users are doing virtual walks and can pass NULL. The dealloc visitor is then greatly simplified. All three visit_end_() functions intentionally take a void, even though the visit_start_() functions differ between void, GenericList, and GenericAlternate*. This is done for several reasons: when doing a virtual walk, passing NULL doesn't care what the type is, but when doing a generated walk, we already have to cast the caller's specific FOO to call visit_start, while using void** lets us use visit_end without a cast. Also, an upcoming patch will add a clone visitor that wants to use the same implementation for all three visit_end callbacks, which is made easier if all three share the same signature. For visitors with already track per-object state (the QMP visitors via a stack, and the string visitors which do not allow nesting), add an assertion that the caller is indeed passing the same pointer to paired calls. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1465490926-28625-4-git-send-email-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-07-06 10:52:04 +02:00
Peter Maydell	791b7d2340	pc, pci, virtio: new features, cleanups, fixes iommus can not be added with -device. cleanups and fixes all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXe4l4AAoJECgfDbjSjVRpIz4IALye7mKG61/POA4Gqmhalc3d HnlNSZ2YcKAuvPg7WWkBuRacrQvVY/MbW1mLloG1lY0tdFgZG8Cy+CY6wJg1NE4c cXd+77vHkIyrnl+Nil+QOgTFiAsMnD+mXHHsnCDw2jGn3JbgVNuCMi7V34fGkQd2 PDkZyYfwTqO3HytuG0/j2Somc9du1gjYdn+9qigfZVgP96jGDojBuJWuuU5flCB3 Kj5xrOuI01XlbdTk71tVjBJBektQurWr6r7GECDqZIpUfc+BI70FU9jPh+OlLTO/ 92yi29ncjyStz4tRnf18xoQ8uBgH/tD1xigEUPRtnm1+0i/tgONBL8cAdBF9FBE= =ABGE -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging pc, pci, virtio: new features, cleanups, fixes iommus can not be added with -device. cleanups and fixes all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Tue 05 Jul 2016 11:18:32 BST # gpg: using RSA key 0x281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (30 commits) vmw_pvscsi: remove unnecessary internal msi state flag e1000e: remove unnecessary internal msi state flag vmxnet3: remove unnecessary internal msi state flag mptsas: remove unnecessary internal msi state flag megasas: remove unnecessary megasas_use_msi() pci: Convert msi_init() to Error and fix callers to check it pci bridge dev: change msi property type megasas: change msi/msix property type mptsas: change msi property type intel-hda: change msi property type usb xhci: change msi/msix property type change pvscsi_init_msi() type to void tests: add APIC.cphp and DSDT.cphp blobs tests: acpi: add CPU hotplug testcase log: Permit -dfilter 0..0xffffffffffffffff range: Replace internal representation of Range range: Eliminate direct Range member access log: Clean up misuse of Range for -dfilter pci_register_bar: cleanup Revert "virtio-net: unbreak self announcement and guest offloads after migration" ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-07-05 16:48:24 +01:00
Benjamin Herrenschmidt	912acdf487	ppc/hash64: Add proper real mode translation support This adds proper support for translating real mode addresses based on the combination of HV and LPCR bits. This handles HRMOR offset for hypervisor real mode, and both RMA and VRMA modes for guest real mode. PAPR mode adjusts the offsets appropriately to match the RMA used in TCG, but we need to limit to the max supported by the implementation (16G). This includes some fixes by Cédric Le Goater <clg@kaod.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [dwg: Adjusted for differences in my version of the prereq patches] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:31:08 +10:00
Cédric Le Goater	1f0252e66e	ppc: simplify ppc_hash64_hpte_page_shift_noslb() The segment page shift parameter is never used. Let's remove it. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:31:08 +10:00
Alexey Kardashevskiy	ae4de14cd3	spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) This adds support for Dynamic DMA Windows (DDW) option defined by the SPAPR specification which allows to have additional DMA window(s) The "ddw" property is enabled by default on a PHB but for compatibility the pseries-2.6 machine and older disable it. This also creates a single DMA window for the older machines to maintain backward migration. This implements DDW for PHB with emulated and VFIO devices. The host kernel support is required. The advertised IOMMU page sizes are 4K and 64K; 16M pages are supported but not advertised by default, in order to enable them, the user has to specify "pgsz" property for PHB and enable huge pages for RAM. The existing linux guests try creating one additional huge DMA window with 64K or 16MB pages and map the entire guest RAM to. If succeeded, the guest switches to dma_direct_ops and never calls TCE hypercalls (H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM and not waste time on map/unmap later. This adds a "dma64_win_addr" property which is a bus address for the 64bit window and by default set to 0x800.0000.0000.0000 as this is what the modern POWER8 hardware uses and this allows having emulated and VFIO devices on the same bus. This adds 4 RTAS handlers: * ibm,query-pe-dma-window * ibm,create-pe-dma-window * ibm,remove-pe-dma-window * ibm,reset-pe-dma-window These are registered from type_init() callback. These RTAS handlers are implemented in a separate file to avoid polluting spapr_iommu.c with PCI. This changes sPAPRPHBState::dma_liobn to an array to allow 2 LIOBNs and updates all references to dma_liobn. However this does not add 64bit LIOBN to the migration stream as in fact even 32bit LIOBN is rather pointless there (as it is a PHB property and the management software can/should pass LIOBNs via CLI) but we keep it for the backward migration support. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 14:31:08 +10:00
Alexey Kardashevskiy	606b54986d	spapr_iommu: Realloc guest visible TCE table when starting/stopping listening The sPAPR TCE tables manage 2 copies when VFIO is using an IOMMU - a guest view of the table and a hardware TCE table. If there is no VFIO presense in the address space, then just the guest view is used, if this is the case, it is allocated in the KVM. However since there is no support yet for VFIO in KVM TCE hypercalls, when we start using VFIO, we need to move the guest view from KVM to the userspace; and we need to do this for every IOMMU on a bus with VFIO devices. This implements the callbacks for the sPAPR IOMMU - notify_started() reallocated the guest view to the user space, notify_stopped() does the opposite. This removes explicit spapr_tce_set_need_vfio() call from PCI hotplug path as the new callbacks do this better - they notify IOMMU at the exact moment when the configuration is changed, and this also includes the case of PCI hot unplug. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 10:43:02 +10:00
Bharata B Rao	7093645a84	spapr: Ensure thread0 of CPU core is always realized first During CPU core realization, we create all the thread objects and parent them to the core object in a loop. However, the realization of thread objects is done separately by walking the threads of a core using object_child_foreach(). With this, there is no guarantee on the order in which the child thread objects get realized. Since CPU device tree properties are currently derived from the CPU thread object, we assume thread0 of the core to be the representative thread of the core when creating device tree properties for the core. If thread0 is not the first thread that gets realized, then we would end up having an incorrect dt_id for the core and this causes hotplug failures from the guest. Fix this by realizing each thread object by walking the core's thread object list thereby ensuring that thread0 and other threads are always realized in the correct order. Future TODO: CPU DT nodes are per-core properties and we should ideally base the creation of CPU DT nodes on core objects rather than the thread objects. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-05 10:43:02 +10:00
Marcel Apfelbaum	1b04cc801a	hw/ppc: realize the PCI root bus as part of mac99 init Mac99's PCI root bus is not part of a host bridge, realize it manually. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2016-07-04 14:50:01 +03:00
Greg Kurz	8a1eb71bd8	spapr: drop duplicate variable in spapr_core_release() Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-01 13:41:47 +10:00
Greg Kurz	f11235b920	spapr: do proper error propagation in spapr_cpu_core_realize_child() This patch changes spapr_cpu_core_realize_child() to have a local error pointer and use error_propagate() as it is supposed to be done. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-01 13:41:47 +10:00
Greg Kurz	8e758dee66	spapr: drop reference on child object during core realization When a core is being realized, we create a child object for each thread of the core. The child is first initialized with object_initialize() which sets its ref count to 1, and then added to the core with object_property_add_child() which bumps the ref count to 2. When the core gets released, object_unparent() decreases the ref count to 1, and we g_free() the object: we hence loose the reference on an unfinalized object. This is likely to cause random crashes. Let's drop the extra reference as soon as we don't need it, after the thread is added to the core. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-01 13:41:47 +10:00
Bharata B Rao	470f215787	spapr: Restore support for 970MP and POWER8NVL CPU cores Introduction of core based CPU hotplug for PowerPC sPAPR didn't add support for 970MP and POWER8NVL based core types. Add support for the same. While we are here, add support for explicit specification of POWER5+_v2.1 core type. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-01 13:41:47 +10:00
Benjamin Herrenschmidt	27f2458245	ppc/xics: Replace "icp" with "xics" in most places The "ICP" is a different object than the "XICS". For historical reasons, we have a number of places where we name a variable "icp" while it contains a XICSState pointer. There is an ICPState structure too so this makes the code really confusing. This is a mechanical replacement of all those instances to use the name "xics" instead. There should be no functional change. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [spapr_cpu_init has been moved to spapr_cpu_core.c, change there] Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-01 13:41:47 +10:00
Benjamin Herrenschmidt	161deaf225	ppc/xics: Rename existing xics to xics_spapr The common class doesn't change, the KVM one is sPAPR specific. Rename variables and functions to xics_spapr. Retain the type name as "xics" to preserve migration for existing sPAPR guests. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-01 13:41:46 +10:00
Aaron Larson	a36848ff7c	target-ppc: Eliminate redundant and incorrect function booke206_page_size_to_tlb Eliminate redundant and incorrect booke206_page_size_to_tlb function from ppce500_spin.c in preference to previously existing but newly exported definition from e500.c Defect analysis: The booke206_page_size_to_tlb function in e500.c was updated in commit `2bd9543` "ppc: booke206: use MAV=2.0 TSIZE definition, fix 4G pages" to reflect a change in the definition of MAS1_TSIZE_SHIFT from 8 (corresponding to a min TLB page size of 4kb) to a value of 7 (TLB page size 2k). The booke206_page_size_to_tlb() function defined in ppce500_spin.c was never updated to reflect the change in MAS1_TSIZE_SHIFT. In http://lists.nongnu.org/archive/html/qemu-ppc/2016-06/msg00533.html, Scott Wood suggested this "root cause" explanation: SW> The patch that changed MAS1_TSIZE_SHIFT from 8 to 7 was around the SW> same time as the patch that added this code, which is probably why SW> adjusting it got missed. Commit `2bd9543cd3` did update the SW> equivalent code in ppce500_mpc8544ds.c, which now resides in SW> hw/ppc/e500.c and has been changed to not assume a power-of-2 SW> size. The ppce500_spin version should be eliminated. Signed-off-by: Aaron Larson <alarson@ddci.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-01 09:57:01 +10:00
Bharata B Rao	ff461b8da9	spapr: Restore support for older PowerPC CPU cores Introduction of core based CPU hotplug for PowerPC sPAPR didn't add support for 970 and POWER5+ based core types. Add support for the same. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-01 09:57:01 +10:00
Greg Kurz	dde35bc966	spapr: fix write-past-end-of-array error in cpu core device init code This fixes a potential QEMU crash introduced by commit `3b54254966`. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-01 09:57:01 +10:00
Thomas Huth	6cc09e261b	hw/ppc/spapr: Add some missing hcall function set strings Add "hcall-sprg0" (for H_SET_SPRG0), "hcall-copy" (for H_PAGE_INIT) and "hcall-debug" (for H_LOGICAL_CI_LOAD/STORE) to the property "ibm,hypertas-functions" to indicate that we support these hypercalls. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-01 09:57:01 +10:00
Benjamin Herrenschmidt	4b236b621b	ppc: Initial HDEC support The current behaviour isn't completely right, as for the DEC, we don't properly re-arm when wrapping around, but I will fix this in a separate patch. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [clg: fixed checkpatch.pl errors ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-07-01 09:57:01 +10:00
Peter Krempa	27393c33d8	qapi: keep names in 'CpuInstanceProperties' in sync with struct CPUCore struct CPUCore uses 'id' suffix in the property name. As docs for query-hotpluggable-cpus state that the cpu core properties should be passed back to device_add by management in case new members are added and thus the names for the fields should be kept in sync. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> [dwg: Removed a duplicated word in comment] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-06-27 13:15:06 +10:00
Aaron Larson	6d18a7a1ff	target-ppc: ppce500_spin.c uses SPR_PIR, should use SPR_BOOKE_PIR ppce500_spin.c uses SPR_PIR to initialize the spin table, however on Book E processors the correct SPR is SPR_BOOKE_PIR. Signed-off-by: Aaron Larson <alarson@ddci.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-06-27 13:12:22 +10:00
Alexey Kardashevskiy	f682e9c244	memory: Add reporting of supported page sizes Every IOMMU has some granularity which MemoryRegionIOMMUOps::translate uses when translating, however this information is not available outside the translate context for various checks. This adds a get_min_page_size callback to MemoryRegionIOMMUOps and a wrapper for it so IOMMU users (such as VFIO) can know the minimum actual page size supported by an IOMMU. As IOMMU MR represents a guest IOMMU, this uses TARGET_PAGE_SIZE as fallback. This removes vfio_container_granularity() and uses new helper in memory_region_iommu_replay() when replaying IOMMU mappings on added IOMMU memory region. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> [dwg: Removed an unnecessary calculation] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2016-06-22 11:13:09 +10:00

... 6 7 8 9 10 ...

1474 Commits