Commit Graph

913 Commits

Author SHA1 Message Date
Daniel Henrique Barboza 07b10bc42c spapr.c: set a 'kvm-type' default value instead of relying on NULL
spapr_kvm_type() is considering 'vm_type=NULL' as a valid input, where
the function returns 0. This is relying on the current QEMU machine
options handling logic, where the absence of the 'kvm-type' option
will be reflected as 'vm_type=NULL' in this function.

This is not robust, and will break if QEMU options code decides to propagate
something else in the case mentioned above (e.g. an empty string instead
of NULL).

Let's avoid this entirely by setting a non-NULL default value in case of
no user input for 'kvm-type'. spapr_kvm_type() was changed to handle 3 fixed
values of kvm-type: "auto", "hv", and "pr", with "auto" being the default
if no kvm-type was set by the user. This allows us to always be predictable
regardless of any enhancements/changes made in QEMU options mechanics.

While we're at it, let's also document in 'kvm-type' description the
already existing default mode, now named 'auto'. The information provided
about it is based on how the pseries kernel handles the KVM_CREATE_VM
ioctl(), where the default value '0' makes the kernel choose an available
KVM module to use, giving precedence to kvm_hv. This logic is described in
the kernel source file arch/powerpc/kvm/powerpc.c, function kvm_arch_init_vm().

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20201210145517.1532269-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2020-12-14 15:54:12 +11:00
Greg Kurz bc370a659a spapr: spapr_drc_attach() cannot fail
All users are passing &error_abort already. Document the fact
that spapr_drc_attach() should only be passed a free DRC, which
is supposedly the case if appropriate checking is done earlier.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201201113728.885700-5-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-14 15:54:12 +11:00
Greg Kurz f9b43958b9 spapr: Simplify error path of spapr_core_plug()
spapr_core_pre_plug() already guarantees that the slot for the given core
ID is available. It is thus safe to assume that spapr_find_cpu_slot()
returns a slot during plug. Turn the error path into an assertion.
It is also safe to assume that no device is attached to the corresponding
DRC and that spapr_drc_attach() shouldn't fail.

Pass &error_abort to spapr_drc_attach() and simplify error handling.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201201113728.885700-4-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-14 15:54:12 +11:00
Greg Kurz 376412135d spapr: Abort if ppc_set_compat() fails for hot-plugged CPUs
When a CPU is hot-plugged, we set its compat mode to match the boot
CPU, which was either set by machine reset or by CAS. This is currently
handled in the plug handler after the core got realized. Potential errors
of ppc_set_compat() are propagated to the hot-plug logic.

Handling errors this late in the hot-plug sequence is generally frown
upon. Ideally, we should do sanity checks in a pre-plug handler and pass
&error_abort to ppc_set_compat() in the plug handler.

We can filter out some error cases of ppc_set_compat() by calling
ppc_check_compat() at pre-plug. But ppc_set_compat() also sets the
compat register in KVM, and KVM doesn't provide any API that would
allow to check valid compat mode settings beforehand.

However, at this point we know that the compat mode was already
successfully set for the boot CPU. Since this all boils down to
setting a register with the very same value that was valid
for the boot CPU, it should definitely not fail for hot-plugged
CPUS.

Pass &error_abort to ppc_set_compat().

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201201113728.885700-3-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-14 15:54:12 +11:00
Greg Kurz 1b4ab51493 spapr: Fix pre-2.10 dummy ICP hack
This hack registers dummy VMState entries of ICPs in order to
support migration of old pseries machine types that used to
create all smp.max_cpus possible ICPs at machine init.

Part of the work is to unregister the dummy entries when plugging
an actual vCPU core, and to register them back when unplugging the
core. The code that unregisters the dummy ICPs in spapr_core_plug()
is misplaced: if ppc_set_compat() fails afterwards, the hotplug
operation will be cancelled and the dummy ICPs won't be registered
back since the unplug handler isn't called.

Unregister the dummy ICPs at the end of spapr_core_plug().

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201201113728.885700-2-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-14 15:54:12 +11:00
Greg Kurz ac96807b02 spapr: Do TPM proxy hotplug sanity checks at pre-plug
There can be only one TPM proxy at a time. This is currently
checked at plug time. But this can be detected at pre-plug in
order to error out earlier.

This allows to get rid of error handling in the plug handler.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201120234208.683521-9-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-14 15:50:55 +11:00
Greg Kurz 9a07069958 spapr: Do PHB hoplug sanity check at pre-plug
We currently detect that a PHB index is already in use at plug time.
But this can be decteted at pre-plug in order to error out earlier.

This allows to pass &error_abort to spapr_drc_attach() and to end
up with a plug handler that doesn't need to report errors anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201120234208.683521-8-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-14 15:50:55 +11:00
Greg Kurz f5598c92b8 spapr: Make PHB placement functions and spapr_pre_plug_phb() return status
Read documentation in "qapi/error.h" and changelog of commit
e3fe3988d7 ("error: Document Error API usage rules") for
rationale.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201120234208.683521-7-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-14 15:50:55 +11:00
Greg Kurz ea042c53f4 spapr: Do NVDIMM/PC-DIMM device hotplug sanity checks at pre-plug only
Pre-plug of a memory device, be it an NVDIMM or a PC-DIMM, ensures
that the memory slot is available and that addresses don't overlap
with existing memory regions. The corresponding DRCs in the LMB
and PMEM namespaces are thus necessarily attachable at plug time.

Pass &error_abort to spapr_drc_attach() in spapr_add_lmbs() and
spapr_add_nvdimm(). This allows to greatly simplify error handling
on the plug path.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201120234208.683521-3-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-14 15:50:55 +11:00
Paolo Bonzini 46ee119fb6 vl: remove serial_max_hds
serial_hd(i) is NULL if and only if i >= serial_max_hds().  Test
serial_hd(i) instead of bounding the loop at serial_max_hds(),
thus removing one more function that vl.c is expected to export.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-10 12:15:19 -05:00
Paolo Bonzini 2c65db5e58 vl: extract softmmu/datadir.c
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-10 12:15:18 -05:00
Paolo Bonzini cd7b94989a ppc: remove bios_name
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20201026143028.3034018-11-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-10 12:15:06 -05:00
Cornelia Huck 576a00bdeb hw: add compat machines for 6.0
Add 6.0 machine types for arm/i440fx/q35/s390x/spapr.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20201109173928.1001764-1-cohuck@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-08 13:48:58 -05:00
Greg Kurz 184b813e7b spapr: Drop dead code in spapr_reallocate_hpt()
Sometimes QEMU needs to allocate the HPT in userspace, namely with TCG
or PR KVM. This is performed with qemu_memalign() because of alignment
requirements. Like glib's allocators, its behaviour is to abort on OOM
instead of returning NULL.

This could be changed to qemu_try_memalign(), but in the specific case
of spapr_reallocate_hpt(), the outcome would be to terminate QEMU anyway
since no HPT means no MMU for the guest. Drop the dead code instead.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160398562892.32380.15006707861753544263.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-11-05 12:18:48 +11:00
Greg Kurz a4e3a7c02b spapr: Improve spapr_reallocate_hpt() error reporting
spapr_reallocate_hpt() has three users, two of which pass &error_fatal
and the third one, htab_load(), passes &local_err, uses it to detect
failures and simply propagates -EINVAL up to vmstate_load(), which will
cause QEMU to exit. It is thus confusing that spapr_reallocate_hpt()
doesn't return right away when an error is detected in some cases. Also,
the comment suggesting that the caller is welcome to try to carry on
seems like a remnant in this respect.

This can be improved:
- change spapr_reallocate_hpt() to always report a negative errno on
  failure, either as reported by KVM or -ENOSPC if the HPT is smaller
  than what was asked,
- use that to detect failures in htab_load() which is preferred over
  checking &local_err,
- propagate this negative errno to vmstate_load() because it is more
  accurate than propagating -EINVAL for all possible errors.

[dwg: Fix compile error due to omitted prelim patch]
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160371605460.305923.5890143959901241157.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-28 01:08:53 +11:00
Greg Kurz 0a06e4d626 target/ppc: Fix kvmppc_load_htab_chunk() error reporting
If kvmppc_load_htab_chunk() fails, its return value is propagated up
to vmstate_load(). It should thus be a negative errno, not -1 (which
maps to EPERM and would lure the user into thinking that the problem
is necessarily related to a lack of privilege).

Return the error reported by KVM or ENOSPC in case of short write.
While here, propagate the error message through an @errp argument
and have the caller to print it with error_report_err() instead
of relying on fprintf().

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160371604713.305923.5264900354159029580.stgit@bahia.lan>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-28 01:08:53 +11:00
Greg Kurz c3e051ed6d spapr: Use error_append_hint() in spapr_reallocate_hpt()
Hints should be added with the dedicated error_append_hint() API
because we don't want to print them when using QMP. This requires
to insert ERRP_GUARD as explained in "qapi/error.h".

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160371604030.305923.17464161378167312662.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-28 01:08:53 +11:00
Greg Kurz 6e837f98ba spapr: Simplify error handling in spapr_memory_plug()
As recommended in "qapi/error.h", add a bool return value to
spapr_add_lmbs() and spapr_add_nvdimm(), and use them instead
of local_err in spapr_memory_plug().

This allows to get rid of the error propagation overhead.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160309734178.2739814.3488437759887793902.stgit@bahia.lan>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-28 01:08:53 +11:00
Greg Kurz 271ced1d62 spapr: Pass &error_abort when getting some PC DIMM properties
Both PC_DIMM_SLOT_PROP and PC_DIMM_ADDR_PROP are defined in the
default property list of the PC DIMM device class:

    DEFINE_PROP_UINT64(PC_DIMM_ADDR_PROP, PCDIMMDevice, addr, 0),

    DEFINE_PROP_INT32(PC_DIMM_SLOT_PROP, PCDIMMDevice, slot,
                      PC_DIMM_UNASSIGNED_SLOT),

They should thus be always gettable for both PC DIMMs and NVDIMMs.
An error in getting them can only be the result of a programming
error. It doesn't make much sense to propagate the error in this
case. Abort instead.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160309732180.2739814.7243774674998010907.stgit@bahia.lan>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-28 01:08:53 +11:00
Greg Kurz 581778dd47 spapr: Use appropriate getter for PC_DIMM_SLOT_PROP
The PC_DIMM_SLOT_PROP property is defined as:

    DEFINE_PROP_INT32(PC_DIMM_SLOT_PROP, PCDIMMDevice, slot,
                      PC_DIMM_UNASSIGNED_SLOT),

Use object_property_get_int() instead of object_property_get_uint().
Since spapr_memory_plug() only gets called if pc_dimm_pre_plug()
succeeded, we expect to have a valid >= 0 slot number, either because
the user passed a valid slot number or because pc_dimm_get_free_slot()
picked one up for us.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160309730758.2739814.15821922745424652642.stgit@bahia.lan>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-28 01:08:53 +11:00
Greg Kurz 65226afd90 spapr: Use appropriate getter for PC_DIMM_ADDR_PROP
The PC_DIMM_ADDR_PROP property is defined as:

    DEFINE_PROP_UINT64(PC_DIMM_ADDR_PROP, PCDIMMDevice, addr, 0),

Use object_property_get_uint() instead of object_property_get_int().

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160309729609.2739814.4996614957953215591.stgit@bahia.lan>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-28 01:08:53 +11:00
Greg Kurz 84fd549619 pc-dimm: Drop @errp argument of pc_dimm_plug()
pc_dimm_plug() doesn't use it. It only aborts on error.

Drop @errp and adapt the callers accordingly.

[dwg: Removed unused label to fix compile]
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160309728447.2739814.12831204841251148202.stgit@bahia.lan>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-28 01:08:53 +11:00
Greg Kurz ce316b5118 spapr: Move spapr_create_nvdimm_dr_connectors() to core machine code
The spapr_create_nvdimm_dr_connectors() function doesn't need to access
any internal details of the sPAPR NVDIMM implementation. Also, pretty
much like for the LMBs, only spapr_machine_init() is responsible for the
creation of DR connectors for NVDIMMs.

Make this clear by making this function static in hw/ppc/spapr.c.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160249772183.757627.7396780936543977766.stgit@bahia.lan>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-28 01:08:53 +11:00
Daniel Henrique Barboza 29bfe52a52 spapr: add spapr_machine_using_legacy_numa() helper
The changes to come to NUMA support are all guest visible. In
theory we could just create a new 5_1 class option flag to
avoid the changes to cascade to 5.1 and under. The reality is that
these changes are only relevant if the machine has more than one
NUMA node. There is no need to change guest behavior that has
been around for years needlesly.

This new helper will be used by the next patches to determine
whether we should retain the (soon to be) legacy NUMA behavior
in the pSeries machine. The new behavior will only be exposed
if:

- machine is pseries-5.2 and newer;
- more than one NUMA node is declared in NUMA state.

Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20201007172849.302240-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-09 10:52:09 +11:00
Greg Kurz 35dce34fbc spapr: Add a return value to spapr_check_pagesize()
As recommended in "qapi/error.h", return true on success and false on
failure. This allows to reduce error propagation overhead in the callers.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-14-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-09 10:15:06 +11:00
Greg Kurz 451c690589 spapr: Add a return value to spapr_nvdimm_validate()
As recommended in "qapi/error.h", return true on success and false on
failure. This allows to reduce error propagation overhead in the callers.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-13-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-09 10:15:06 +11:00
Greg Kurz cfdc527473 spapr: Add a return value to spapr_set_vcpu_id()
As recommended in "qapi/error.h", return true on success and false on
failure. This allows to reduce error propagation overhead in the callers.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-11-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-09 10:15:06 +11:00
Greg Kurz 17548fe64a spapr: Add a return value to spapr_drc_attach()
As recommended in "qapi/error.h", return true on success and false on
failure. This allows to reduce error propagation overhead in the callers.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-9-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-09 10:15:06 +11:00
Greg Kurz a3114923d4 spapr: Simplify error handling in callers of ppc_set_compat()
Now that ppc_set_compat() indicates success/failure with a return
value, use it and reduce error propagation overhead.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200914123505.612812-5-groug@kaod.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-09 10:15:06 +11:00
Fabiano Rosas f0638a0b6b spapr: Handle HPT allocation failure in nested guest
The nested KVM code does not yet support HPT guests. Calling the
KVM_CAP_PPC_ALLOC_HTAB ioctl currently leads to KVM setting the guest
as HPT and erroneously executing code in L1 that should only run in
hypervisor mode, leading to an exception in the L1 vcpu thread when it
enters the nested guest.

This can be reproduced with -machine max-cpu-compat=power8 in the L2
guest command line.

The KVM code has since been modified to fail the ioctl when running in
a nested environment so QEMU needs to be able to handle that. This
patch provides an error message informing the user about the lack of
support for HPT in nested guests.

Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20200911043123.204162-1-farosas@linux.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-10-09 10:15:06 +11:00
Igor Mammedov b21aa7e01e numa: drop support for '-numa node' (without memory specified)
it was deprecated since 4.1
commit 4bb4a2732e (numa: deprecate implict memory distribution between nodes)

Users of existing VMs, wishing to preserve the same RAM distribution,
should configure it explicitly using ``-numa node,memdev`` options.
Current RAM distribution can be retrieved using HMP command
`info numa` and if separate memory devices (pc|nv-dimm) are present
use `info memory-device` and subtract device memory from output of
`info numa`.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20200911084410.788171-2-imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-30 19:09:20 +02:00
BALATON Zoltan 617160c9e1 load_elf: Remove unused address variables from callers
Several callers of load_elf() pass pointers for lowaddr and highaddr
parameters which are then not used for anything. This may stem from a
misunderstanding that load_elf need a value here but in fact it can
take NULL to ignore these values. Remove such unused variables and
pass NULL instead from callers that don't need these.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Message-Id: <20200705174020.BDD0174633F@zero.eik.bme.hu>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2020-09-25 16:52:08 -07:00
Daniel Henrique Barboza 0ee520126a spapr, spapr_numa: move lookup-arrays handling to spapr_numa.c
In a similar fashion as the previous patch, let's move the
handling of ibm,associativity-lookup-arrays from spapr.c to
spapr_numa.c. A spapr_numa_write_assoc_lookup_arrays() helper was
created, and spapr_dt_dynamic_reconfiguration_memory() can now
use it to advertise the lookup-arrays.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200903220639.563090-4-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-09-08 10:08:43 +10:00
Daniel Henrique Barboza 8f86a40824 spapr, spapr_numa: handle vcpu ibm,associativity
Vcpus have an additional paramenter to be appended, vcpu_id. This
also changes the size of the of property itself, which is being
represented in index 0 of numa_assoc_array[cpu->node_id],
and defaults to MAX_DISTANCE_REF_POINTS for all cases but
vcpus.

All this logic makes more sense in spapr_numa.c, where we handle
everything NUMA and associativity. A new helper spapr_numa_fixup_cpu_dt()
was added, and spapr.c uses it the same way as it was using the former
spapr_fixup_cpu_numa_dt().

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200903220639.563090-3-danielhb413@gmail.com>
[dwg: Correct uint to int type, which can break windows builds]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-09-08 10:08:43 +10:00
Daniel Henrique Barboza f1aa45fffe spapr: introduce SpaprMachineState::numa_assoc_array
The next step to centralize all NUMA/associativity handling in
the spapr machine is to create a 'one stop place' for all
things ibm,associativity.

This patch introduces numa_assoc_array, a 2 dimensional array
that will store all ibm,associativity arrays of all NUMA nodes.
This array is initialized in a new spapr_numa_associativity_init()
function, called in spapr_machine_init(). It is being initialized
with the same values used in other ibm,associativity properties
around spapr files (i.e. all zeros, last value is node_id).
The idea is to remove all hardcoded definitions and FDT writes
of ibm,associativity arrays, doing instead a call to the new
helper spapr_numa_write_associativity_dt() helper, that will
be able to write the DT with the correct values.

We'll start small, handling the trivial cases first. The
remaining instances of ibm,associativity will be handled
next.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200903220639.563090-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-09-08 10:08:43 +10:00
Daniel Henrique Barboza 1eee995026 ppc: introducing spapr_numa.c NUMA code helper
We're going to make changes in how spapr handles all
ibm,associativity* related properties to enhance our current NUMA
support.

At this moment we have associativity code scattered all around
spapr_* files, with hardcoded values and array sizes. This
makes it harder to change any NUMA specific parameters in
the future. Having everything in the same place allows not
only for easier tuning, but also easier understanding since all
NUMA related code is on the same file.

This patch introduces a new file to gather all NUMA/associativity
handling code in spapr, spapr_numa.c. To get things started, let's
remove associativity-reference-points and max-associativity-domains
code from spapr_dt_rtas() to a new helper called spapr_numa_write_rtas_dt().
This will decouple spapr_dt_rtas() from the NUMA changes that
are going to happen in those two properties.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200901125645.118026-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-09-08 10:08:43 +10:00
Daniel Henrique Barboza beb6073fe7 spapr, spapr_nvdimm: fold NVDIMM validation in the same place
NVDIMM has different contraints and conditions than the regular
DIMM and we'll need to add at least one more.

Instead of relying on 'if (nvdimm)' conditionals in the body of
spapr_memory_pre_plug(), use the existing spapr_nvdimm_validate_opts()
and put all NVDIMM handling code there. Rename it to
spapr_nvdimm_validate() to reflect that the function is now checking
more than the nvdimm device options. This makes spapr_memory_pre_plug()
a bit easier to follow, and we can tune in NVDIMM parameters
and validation in the same place.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200825215749.213536-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-09-08 10:08:42 +10:00
Peter Maydell dd8014e4e9 ppc patch queue 2020-08-18
Here's my first pull request for qemu-5.2, which has quite a few
 accumulated things.  Highlights are:
 
  * Preliminary support for POWER10 (Power ISA 3.1) instruction emulation
  * Add documentation on the (very confusing) pseries NUMA configuration
  * Fix some bugs handling edge cases with XICS, XIVE and kernel_irqchip
  * Fix icount for a number of POWER registers
  * Many cleanups to error handling in XIVE code
  * Validate size of -prom-env data
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl87VpwACgkQbDjKyiDZ
 s5LjIxAAs8YAQe3uDRz1Wb9GftoMmEHdq7JQoO0FbXDQIVXzpTAXmFLSBtCWKl6p
 O1MEIy/o48b5ORXJqSDSA5LgxbHxYfHdIPEY5Tbn/TGvTvKyCukx9n11milUG8In
 JxRrOTQBnQAAHkLoyuZyrWKOauC0N1scNrnX9Geuid13GcmqHg1d2alXAUu8jEeC
 HSiVmtMqqyyqTx2xA4vfhaGuuwTthnKNfbGdg9ksVqBsCW+etn6ZKGImt8hBe3qO
 5iqbQZvFbkpzgbjkhDzUDM6tmUAFN55y/Y+y7I8Tz4/IX7d3WbdqpplwrXXVWkpq
 2gcBBjQ/9a1hPTBRVN9jn4CvHfhILBfeHIElUiLpSTQZQQALymTnnI2pLCgKoEFX
 LcchXbjiX+pZ2OJnAijpwBcknjgT2U/ZNyiqHJfSQ6jzlYx1YtUf4xGUsgloSiK8
 9QDK8o2k0Cm8Be+lPMBMmTctoi8bq+8SN5UUF710WQL235J58o9+z1vuGO2HVk3x
 flBtv/+B890wcCDpGU80DPs/LSzR0xTTbA5JsWft2fvO569mda0MoWkJH5w6jvSc
 ZLYqljCzFCVW+tKiGHzaBalJaMwn0+QMDTsxzP3yTt5LmmEeRXpBELgvrW64IobD
 xBeryH3nG4SwxFSJq+4ATfvUzjy/Eo58lTTl6c53Ji8/D3aFwsA=
 =L9Wi
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.2-20200818' into staging

ppc patch queue 2020-08-18

Here's my first pull request for qemu-5.2, which has quite a few
accumulated things.  Highlights are:

 * Preliminary support for POWER10 (Power ISA 3.1) instruction emulation
 * Add documentation on the (very confusing) pseries NUMA configuration
 * Fix some bugs handling edge cases with XICS, XIVE and kernel_irqchip
 * Fix icount for a number of POWER registers
 * Many cleanups to error handling in XIVE code
 * Validate size of -prom-env data

# gpg: Signature made Tue 18 Aug 2020 05:18:36 BST
# gpg:                using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-5.2-20200818: (40 commits)
  spapr/xive: Use xive_source_esb_len()
  nvram: Exit QEMU if NVRAM cannot contain all -prom-env data
  spapr/xive: Simplify error handling of kvmppc_xive_cpu_synchronize_state()
  ppc/xive: Simplify error handling in xive_tctx_realize()
  spapr/xive: Simplify error handling in kvmppc_xive_connect()
  ppc/xive: Fix error handling in vmstate_xive_tctx_*() callbacks
  spapr/xive: Fix error handling in kvmppc_xive_post_load()
  spapr/kvm: Fix error handling in kvmppc_xive_pre_save()
  spapr/xive: Rework error handling of kvmppc_xive_set_source_config()
  spapr/xive: Rework error handling in kvmppc_xive_get_queues()
  spapr/xive: Rework error handling of kvmppc_xive_[gs]et_queue_config()
  spapr/xive: Rework error handling of kvmppc_xive_cpu_[gs]et_state()
  spapr/xive: Rework error handling of kvmppc_xive_mmap()
  spapr/xive: Rework error handling of kvmppc_xive_source_reset()
  spapr/xive: Rework error handling of kvmppc_xive_cpu_connect()
  spapr: Simplify error handling in spapr_phb_realize()
  spapr/xive: Convert KVM device fd checks to assert()
  ppc/xive: Introduce dedicated kvm_irqchip_in_kernel() wrappers
  ppc/xive: Rework setup of XiveSource::esb_mmio
  target/ppc: Integrate icount to purr, vtb, and tbu40
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-08-24 09:35:21 +01:00
Cornelia Huck 3ff3c5d317 hw: add compat machines for 5.2
Add 5.2 machine types for arm/i440fx/q35/s390x/spapr.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20200819144016.281156-1-cohuck@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-08-19 10:45:48 -04:00
Anton Blanchard 7abf979750 ppc/spapr: Fix 32 bit logical memory block size assumptions
When testing large LMB sizes (eg 4GB), I found a couple of places
that assume they are 32bit in size.

Signed-off-by: Anton Blanchard <anton@ozlabs.org>
Message-Id: <20200715004228.1262681-1-anton@ozlabs.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-08-12 13:16:27 +10:00
Reza Arbab a6030d7e0b spapr: Add a new level of NUMA for GPUs
NUMA nodes corresponding to GPU memory currently have the same
affinity/distance as normal memory nodes. Add a third NUMA associativity
reference point enabling us to give GPU nodes more distance.

This is guest visible information, which shouldn't change under a
running guest across migration between different qemu versions, so make
the change effective only in new (pseries > 5.0) machine types.

Before, `numactl -H` output in a guest with 4 GPUs (nodes 2-5):

node distances:
node   0   1   2   3   4   5
  0:  10  40  40  40  40  40
  1:  40  10  40  40  40  40
  2:  40  40  10  40  40  40
  3:  40  40  40  10  40  40
  4:  40  40  40  40  10  40
  5:  40  40  40  40  40  10

After:

node distances:
node   0   1   2   3   4   5
  0:  10  40  80  80  80  80
  1:  40  10  80  80  80  80
  2:  80  80  10  80  80  80
  3:  80  80  80  10  80  80
  4:  80  80  80  80  10  80
  5:  80  80  80  80  80  10

These are the same distances as on the host, mirroring the change made
to host firmware in skiboot commit f845a648b8cb ("numa/associativity:
Add a new level of NUMA for GPU's").

Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Message-Id: <20200716225655.24289-1-arbab@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-07-20 09:21:39 +10:00
Markus Armbruster dcfe480544 error: Avoid unnecessary error_propagate() after error_setg()
Replace

    error_setg(&err, ...);
    error_propagate(errp, err);

by

    error_setg(errp, ...);

Related pattern:

    if (...) {
        error_setg(&err, ...);
        goto out;
    }
    ...
 out:
    error_propagate(errp, err);
    return;

When all paths to label out are that way, replace by

    if (...) {
        error_setg(errp, ...);
        return;
    }

and delete the label along with the error_propagate().

When we have at most one other path that actually needs to propagate,
and maybe one at the end that where propagation is unnecessary, e.g.

    foo(..., &err);
    if (err) {
        goto out;
    }
    ...
    bar(..., &err);
 out:
    error_propagate(errp, err);
    return;

move the error_propagate() to where it's needed, like

    if (...) {
        foo(..., &err);
        error_propagate(errp, err);
        return;
    }
    ...
    bar(..., errp);
    return;

and transform the error_setg() as above.

In some places, the transformation results in obviously unnecessary
error_propagate().  The next few commits will eliminate them.

Bonus: the elimination of gotos will make later patches in this series
easier to review.

Candidates for conversion tracked down with this Coccinelle script:

    @@
    identifier err, errp;
    expression list args;
    @@
    -    error_setg(&err, args);
    +    error_setg(errp, args);
         ... when != err
         error_propagate(errp, err);

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-34-armbru@redhat.com>
2020-07-10 15:18:08 +02:00
Markus Armbruster 5325cc34a2 qom: Put name parameter before value / visitor parameter
The object_property_set_FOO() setters take property name and value in
an unusual order:

    void object_property_set_FOO(Object *obj, FOO_TYPE value,
                                 const char *name, Error **errp)

Having to pass value before name feels grating.  Swap them.

Same for object_property_set(), object_property_get(), and
object_property_parse().

Convert callers with this Coccinelle script:

    @@
    identifier fun = {
        object_property_get, object_property_parse, object_property_set_str,
        object_property_set_link, object_property_set_bool,
        object_property_set_int, object_property_set_uint, object_property_set,
        object_property_set_qobject
    };
    expression obj, v, name, errp;
    @@
    -    fun(obj, v, name, errp)
    +    fun(obj, name, v, errp)

Chokes on hw/arm/musicpal.c's lcd_refresh() with the unhelpful error
message "no position information".  Convert that one manually.

Fails to convert hw/arm/armsse.c, because Coccinelle gets confused by
ARMSSE being used both as typedef and function-like macro there.
Convert manually.

Fails to convert hw/rx/rx-gdbsim.c, because Coccinelle gets confused
by RXCPU being used both as typedef and function-like macro there.
Convert manually.  The other files using RXCPU that way don't need
conversion.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200707160613.848843-27-armbru@redhat.com>
[Straightforwad conflict with commit 2336172d9b "audio: set default
value for pcspk.iobase property" resolved]
2020-07-10 15:18:08 +02:00
Markus Armbruster 9bc6bfdf67 qdev: Drop qbus_set_hotplug_handler() parameter @errp
qbus_set_hotplug_handler() is a simple wrapper around
object_property_set_link().

object_property_set_link() fails when the property doesn't exist, is
not settable, or its .check() method fails.  These are all programming
errors here, so passing &error_abort to qbus_set_hotplug_handler() is
appropriate.

Most of its callers do.  Exceptions:

* pcie_cap_slot_init(), shpc_init(), spapr_phb_realize() pass NULL,
  i.e. they ignore errors.

* spapr_machine_init() passes &error_fatal.

* s390_pcihost_realize(), virtio_serial_device_realize(),
  s390_pcihost_plug() pass the error to their callers.  The latter two
  keep going after the error, which looks wrong.

Drop the @errp parameter, and instead pass &error_abort to
object_property_set_link().

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20200630090351.1247703-15-armbru@redhat.com>
2020-07-02 06:25:29 +02:00
Markus Armbruster 14963c34b9 spapr: Plug minor memory leak in spapr_machine_init()
spapr_machine_init() leaks an Error object when
kvmppc_check_papr_resize_hpt() fails and spapr->resize_hpt is
SPAPR_RESIZE_HPT_DISABLED, i.e. when the host doesn't support hash
page table resizing, and the user didn't ask for it.  As harmless as
memory leaks can possibly be.  Plug it.

Fixes: 30f4b05bd0
Cc: David Gibson <dgibson@redhat.com>
Cc: qemu-ppc@nongnu.org
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200630090351.1247703-8-armbru@redhat.com>
2020-07-02 06:25:29 +02:00
Igor Mammedov 32a354dc6c numa: forbid '-numa node, mem' for 5.1 and newer machine types
Deprecation period is run out and it's a time to flip the switch
introduced by cd5ff8333a.  Disable legacy option for new machine
types (since 5.1) and amend documentation.

'-numa node,memdev' shall be used instead of disabled option
with new machine types.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200609135635.761587-1-imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-26 09:39:39 -04:00
Markus Armbruster 934df91296 qdev: Make qdev_prop_set_drive() match the other helpers
qdev_prop_set_drive() can fail.  None of the other qdev_prop_set_FOO()
can; they abort on error.

To clean up this inconsistency, rename qdev_prop_set_drive() to
qdev_prop_set_drive_err(), and create a qdev_prop_set_drive() that
aborts on error.

Coccinelle script to update callers:

    @ depends on !(file in "hw/core/qdev-properties-system.c")@
    expression dev, name, value;
    symbol error_abort;
    @@
    -    qdev_prop_set_drive(dev, name, value, &error_abort);
    +    qdev_prop_set_drive(dev, name, value);

    @@
    expression dev, name, value, errp;
    @@
    -    qdev_prop_set_drive(dev, name, value, errp);
    +    qdev_prop_set_drive_err(dev, name, value, errp);

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200622094227.1271650-14-armbru@redhat.com>
2020-06-23 16:07:07 +02:00
Markus Armbruster ce189ab230 qdev: Convert bus-less devices to qdev_realize() with Coccinelle
All remaining conversions to qdev_realize() are for bus-less devices.
Coccinelle script:

    // only correct for bus-less @dev!

    @@
    expression errp;
    expression dev;
    @@
    -    qdev_init_nofail(dev);
    +    qdev_realize(dev, NULL, &error_fatal);

    @ depends on !(file in "hw/core/qdev.c") && !(file in "hw/core/bus.c")@
    expression errp;
    expression dev;
    symbol true;
    @@
    -    object_property_set_bool(OBJECT(dev), true, "realized", errp);
    +    qdev_realize(DEVICE(dev), NULL, errp);

    @ depends on !(file in "hw/core/qdev.c") && !(file in "hw/core/bus.c")@
    expression errp;
    expression dev;
    symbol true;
    @@
    -    object_property_set_bool(dev, true, "realized", errp);
    +    qdev_realize(DEVICE(dev), NULL, errp);

Note that Coccinelle chokes on ARMSSE typedef vs. macro in
hw/arm/armsse.c.  Worked around by temporarily renaming the macro for
the spatch run.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-57-armbru@redhat.com>
2020-06-15 22:06:04 +02:00
Markus Armbruster 3c6ef471ee sysbus: Convert to sysbus_realize() etc. with Coccinelle
Convert from qdev_realize(), qdev_realize_and_unref() with null @bus
argument to sysbus_realize(), sysbus_realize_and_unref().

Coccinelle script:

    @@
    expression dev, errp;
    @@
    -    qdev_realize(DEVICE(dev), NULL, errp);
    +    sysbus_realize(SYS_BUS_DEVICE(dev), errp);

    @@
    expression sysbus_dev, dev, errp;
    @@
    +    sysbus_dev = SYS_BUS_DEVICE(dev);
    -    qdev_realize_and_unref(dev, NULL, errp);
    +    sysbus_realize_and_unref(sysbus_dev, errp);
    -    sysbus_dev = SYS_BUS_DEVICE(dev);

    @@
    expression sysbus_dev, dev, errp;
    expression expr;
    @@
         sysbus_dev = SYS_BUS_DEVICE(dev);
         ... when != dev = expr;
    -    qdev_realize_and_unref(dev, NULL, errp);
    +    sysbus_realize_and_unref(sysbus_dev, errp);

    @@
    expression dev, errp;
    @@
    -    qdev_realize_and_unref(DEVICE(dev), NULL, errp);
    +    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), errp);

    @@
    expression dev, errp;
    @@
    -    qdev_realize_and_unref(dev, NULL, errp);
    +    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), errp);

Whitespace changes minimized manually.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-46-armbru@redhat.com>
[Conflicts in hw/misc/empty_slot.c and hw/sparc/leon3.c resolved]
2020-06-15 22:05:28 +02:00
Markus Armbruster 9fc7fc4d39 qom: Less verbose object_initialize_child()
All users of object_initialize_child() pass the obvious child size
argument.  Almost all pass &error_abort and no properties.  Tiresome.

Rename object_initialize_child() to
object_initialize_child_with_props() to free the name.  New
convenience wrapper object_initialize_child() automates the size
argument, and passes &error_abort and no properties.

Rename object_initialize_childv() to
object_initialize_child_with_propsv() for consistency.

Convert callers with this Coccinelle script:

    @@
    expression parent, propname, type;
    expression child, size;
    symbol error_abort;
    @@
    -    object_initialize_child(parent, propname, OBJECT(child), size, type, &error_abort, NULL)
    +    object_initialize_child(parent, propname, child, size, type, &error_abort, NULL)

    @@
    expression parent, propname, type;
    expression child;
    symbol error_abort;
    @@
    -    object_initialize_child(parent, propname, child, sizeof(*child), type, &error_abort, NULL)
    +    object_initialize_child(parent, propname, child, type)

    @@
    expression parent, propname, type;
    expression child;
    symbol error_abort;
    @@
    -    object_initialize_child(parent, propname, &child, sizeof(child), type, &error_abort, NULL)
    +    object_initialize_child(parent, propname, &child, type)

    @@
    expression parent, propname, type;
    expression child, size, err;
    expression list props;
    @@
    -    object_initialize_child(parent, propname, child, size, type, err, props)
    +    object_initialize_child_with_props(parent, propname, child, size, type, err, props)

Note that Coccinelle chokes on ARMSSE typedef vs. macro in
hw/arm/armsse.c.  Worked around by temporarily renaming the macro for
the spatch run.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
[Rebased: machine opentitan is new (commit fe0fe4735e)]
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-37-armbru@redhat.com>
2020-06-15 22:05:28 +02:00
Markus Armbruster 3e80f6902c qdev: Convert uses of qdev_create() with Coccinelle
This is the transformation explained in the commit before previous.
Takes care of just one pattern that needs conversion.  More to come in
this series.

Coccinelle script:

    @ depends on !(file in "hw/arm/highbank.c")@
    expression bus, type_name, dev, expr;
    @@
    -    dev = qdev_create(bus, type_name);
    +    dev = qdev_new(type_name);
         ... when != dev = expr
    -    qdev_init_nofail(dev);
    +    qdev_realize_and_unref(dev, bus, &error_fatal);

    @@
    expression bus, type_name, dev, expr;
    identifier DOWN;
    @@
    -    dev = DOWN(qdev_create(bus, type_name));
    +    dev = DOWN(qdev_new(type_name));
         ... when != dev = expr
    -    qdev_init_nofail(DEVICE(dev));
    +    qdev_realize_and_unref(DEVICE(dev), bus, &error_fatal);

    @@
    expression bus, type_name, expr;
    identifier dev;
    @@
    -    DeviceState *dev = qdev_create(bus, type_name);
    +    DeviceState *dev = qdev_new(type_name);
         ... when != dev = expr
    -    qdev_init_nofail(dev);
    +    qdev_realize_and_unref(dev, bus, &error_fatal);

    @@
    expression bus, type_name, dev, expr, errp;
    symbol true;
    @@
    -    dev = qdev_create(bus, type_name);
    +    dev = qdev_new(type_name);
         ... when != dev = expr
    -    object_property_set_bool(OBJECT(dev), true, "realized", errp);
    +    qdev_realize_and_unref(dev, bus, errp);

    @@
    expression bus, type_name, expr, errp;
    identifier dev;
    symbol true;
    @@
    -    DeviceState *dev = qdev_create(bus, type_name);
    +    DeviceState *dev = qdev_new(type_name);
         ... when != dev = expr
    -    object_property_set_bool(OBJECT(dev), true, "realized", errp);
    +    qdev_realize_and_unref(dev, bus, errp);

The first rule exempts hw/arm/highbank.c, because it matches along two
control flow paths there, with different @type_name.  Covered by the
next commit's manual conversions.

Missing #include "qapi/error.h" added manually.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-10-armbru@redhat.com>
[Conflicts in hw/misc/empty_slot.c and hw/sparc/leon3.c resolved]
2020-06-15 22:00:10 +02:00
Markus Armbruster 981c3dcd94 qdev: Convert to qdev_unrealize() with Coccinelle
For readability, and consistency with qbus_realize().

Coccinelle script:

    @ depends on !(file in "hw/core/qdev.c")@
    typedef DeviceState;
    DeviceState *dev;
    symbol false, error_abort;
    @@
    -    object_property_set_bool(OBJECT(dev), false, "realized", &error_abort);
    +    qdev_unrealize(dev);

    @ depends on !(file in "hw/core/qdev.c") && !(file in "hw/core/bus.c")@
    expression dev;
    symbol false, error_abort;
    @@
    -    object_property_set_bool(OBJECT(dev), false, "realized", &error_abort);
    +    qdev_unrealize(DEVICE(dev));

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200610053247.1583243-8-armbru@redhat.com>
2020-06-15 21:36:30 +02:00
Leonardo Bras 0911a60c76 ppc/spapr: Add hotremovable flag on DIMM LMBs on drmem_v2
On reboot, all memory that was previously added using object_add and
device_add is placed in this DIMM area.

The new SPAPR_LMB_FLAGS_HOTREMOVABLE flag helps Linux to put this memory in
the correct memory zone, so no unmovable allocations are made there,
allowing the object to be easily hot-removed by device_del and
object_del.

This new flag was accepted in Power Architecture documentation.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Bharata B Rao <bharata@linux.ibm.com>
Message-Id: <20200511200201.58537-1-leobras.c@gmail.com>
[dwg: Fixed syntax error spotted by Cédric Le Goater]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-05-27 15:29:36 +10:00
Philippe Mathieu-Daudé 8e5c952b37 hw: Remove unnecessary DEVICE() cast
The DEVICE() macro is defined as:

  #define DEVICE(obj) OBJECT_CHECK(DeviceState, (obj), TYPE_DEVICE)

which expands to:

  ((DeviceState *)object_dynamic_cast_assert((Object *)(obj), (name),
                                             __FILE__, __LINE__,
                                             __func__))

This assertion can only fail when @obj points to something other
than its stated type, i.e. when we're in undefined behavior country.

Remove the unnecessary DEVICE() casts when we already know the
pointer is of DeviceState type.

Patch created mechanically using spatch with this script:

  @@
  typedef DeviceState;
  DeviceState *s;
  @@
  -   DEVICE(s)
  +   s

Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Paul Durrant <paul@xen.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: John Snow <jsnow@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20200512070020.22782-4-f4bug@amsat.org>
2020-05-15 07:08:52 +02:00
Markus Armbruster b69c3c21a5 qdev: Unrealize must not fail
Devices may have component devices and buses.

Device realization may fail.  Realization is recursive: a device's
realize() method realizes its components, and device_set_realized()
realizes its buses (which should in turn realize the devices on that
bus, except bus_set_realized() doesn't implement that, yet).

When realization of a component or bus fails, we need to roll back:
unrealize everything we realized so far.  If any of these unrealizes
failed, the device would be left in an inconsistent state.  Must not
happen.

device_set_realized() lets it happen: it ignores errors in the roll
back code starting at label child_realize_fail.

Since realization is recursive, unrealization must be recursive, too.
But how could a partly failed unrealize be rolled back?  We'd have to
re-realize, which can fail.  This design is fundamentally broken.

device_set_realized() does not roll back at all.  Instead, it keeps
unrealizing, ignoring further errors.

It can screw up even for a device with no buses: if the lone
dc->unrealize() fails, it still unregisters vmstate, and calls
listeners' unrealize() callback.

bus_set_realized() does not roll back either.  Instead, it stops
unrealizing.

Fortunately, no unrealize method can fail, as we'll see below.

To fix the design error, drop parameter @errp from all the unrealize
methods.

Any unrealize method that uses @errp now needs an update.  This leads
us to unrealize() methods that can fail.  Merely passing it to another
unrealize method cannot cause failure, though.  Here are the ones that
do other things with @errp:

* virtio_serial_device_unrealize()

  Fails when qbus_set_hotplug_handler() fails, but still does all the
  other work.  On failure, the device would stay realized with its
  resources completely gone.  Oops.  Can't happen, because
  qbus_set_hotplug_handler() can't actually fail here.  Pass
  &error_abort to qbus_set_hotplug_handler() instead.

* hw/ppc/spapr_drc.c's unrealize()

  Fails when object_property_del() fails, but all the other work is
  already done.  On failure, the device would stay realized with its
  vmstate registration gone.  Oops.  Can't happen, because
  object_property_del() can't actually fail here.  Pass &error_abort
  to object_property_del() instead.

* spapr_phb_unrealize()

  Fails and bails out when remove_drcs() fails, but other work is
  already done.  On failure, the device would stay realized with some
  of its resources gone.  Oops.  remove_drcs() fails only when
  chassis_from_bus()'s object_property_get_uint() fails, and it can't
  here.  Pass &error_abort to remove_drcs() instead.

Therefore, no unrealize method can fail before this patch.

device_set_realized()'s recursive unrealization via bus uses
object_property_set_bool().  Can't drop @errp there, so pass
&error_abort.

We similarly unrealize with object_property_set_bool() elsewhere,
always ignoring errors.  Pass &error_abort instead.

Several unrealize methods no longer handle errors from other unrealize
methods: virtio_9p_device_unrealize(),
virtio_input_device_unrealize(), scsi_qdev_unrealize(), ...
Much of the deleted error handling looks wrong anyway.

One unrealize methods no longer ignore such errors:
usb_ehci_pci_exit().

Several realize methods no longer ignore errors when rolling back:
v9fs_device_realize_common(), pci_qdev_unrealize(),
spapr_phb_realize(), usb_qdev_realize(), vfio_ccw_realize(),
virtio_device_realize().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-17-armbru@redhat.com>
2020-05-15 07:08:14 +02:00
Markus Armbruster 40c2281cc3 Drop more @errp parameters after previous commit
Several functions can't fail anymore: ich9_pm_add_properties(),
device_add_bootindex_property(), ppc_compat_add_property(),
spapr_caps_add_properties(), PropertyInfo.create().  Drop their @errp
parameter.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-16-armbru@redhat.com>
2020-05-15 07:08:14 +02:00
Markus Armbruster d2623129a7 qom: Drop parameter @errp of object_property_add() & friends
The only way object_property_add() can fail is when a property with
the same name already exists.  Since our property names are all
hardcoded, failure is a programming error, and the appropriate way to
handle it is passing &error_abort.

Same for its variants, except for object_property_add_child(), which
additionally fails when the child already has a parent.  Parentage is
also under program control, so this is a programming error, too.

We have a bit over 500 callers.  Almost half of them pass
&error_abort, slightly fewer ignore errors, one test case handles
errors, and the remaining few callers pass them to their own callers.

The previous few commits demonstrated once again that ignoring
programming errors is a bad idea.

Of the few ones that pass on errors, several violate the Error API.
The Error ** argument must be NULL, &error_abort, &error_fatal, or a
pointer to a variable containing NULL.  Passing an argument of the
latter kind twice without clearing it in between is wrong: if the
first call sets an error, it no longer points to NULL for the second
call.  ich9_pm_add_properties(), sparc32_ledma_realize(),
sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize()
are wrong that way.

When the one appropriate choice of argument is &error_abort, letting
users pick the argument is a bad idea.

Drop parameter @errp and assert the preconditions instead.

There's one exception to "duplicate property name is a programming
error": the way object_property_add() implements the magic (and
undocumented) "automatic arrayification".  Don't drop @errp there.
Instead, rename object_property_add() to object_property_try_add(),
and add the obvious wrapper object_property_add().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-15-armbru@redhat.com>
[Two semantic rebase conflicts resolved]
2020-05-15 07:07:58 +02:00
Markus Armbruster 7eecec7d12 qom: Drop object_property_set_description() parameter @errp
object_property_set_description() and
object_class_property_set_description() fail only when property @name
is not found.

There are 85 calls of object_property_set_description() and
object_class_property_set_description().  None of them can fail:

* 84 immediately follow the creation of the property.

* The one in spapr_rng_instance_init() refers to a property created in
  spapr_rng_class_init(), from spapr_rng_properties[].

Every one of them still gets to decide what to pass for @errp.

51 calls pass &error_abort, 32 calls pass NULL, one receives the error
and propagates it to &error_abort, and one propagates it to
&error_fatal.  I'm actually surprised none of them violates the Error
API.

What are we gaining by letting callers handle the "property not found"
error?  Use when the property is not known to exist is simpler: you
don't have to guard the call with a check.  We haven't found such a
use in 5+ years.  Until we do, let's make life a bit simpler and drop
the @errp parameter.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200505152926.18877-8-armbru@redhat.com>
[One semantic rebase conflict resolved]
2020-05-15 07:06:49 +02:00
Peter Maydell b894c6ed4a ppc patch queue for 2020-04-07
First pull request for qemu-5.1.  This includes:
  * Removal of all remaining cases where we had CAS triggered reboots
  * A number of improvements to NMI injection
  * Support for partition scoped radix translation in softmmu
  * Some fixes for NVDIMM handling
  * A handful of other minor fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl6zlgcACgkQbDjKyiDZ
 s5LhIQ//YRqYuR9JIcIjcL4qFKqk93RrE8KFoxY4Qri7+o6Zru1ATqpVru4tixpd
 YN0ntF3oMDV/uveQAG771n5iAX7TgbKiOaqIP/qnL6aUEtG4t3KvPhEIZr9Z3kkW
 eGL8vzObGlkTHJUdGbUaMrpxJZDLW9MADqTVa1PfDGThk3jKCcMqAInBQwFwNifY
 lAoHJi0SkF8i7ib6dT1Vp+EPw1SYmnLEFyrQU6+jshvxsb9FGNot0widQeSGCJme
 uolBiO63gxc4AjAt/5PvtAHe1SY9UGUheHp9hMSGoNrFfrCaMgheE8bOsS3MmPJ0
 2kEIW4ZIq+CSqnlNlUciaPWn2X5INkXt+XAZyuTSbGC51yLGGpio5fn5CGdDL3wA
 +mefdJaYvfv5e5UuM38Lv6D7WyPczh2wIDvCOaJP4Lcr+yv0FOgSQOkd6LtnejqV
 tFqIAVpI7HeNUDmkt/dWRsje6L5gjfPzhA2c1Qm5r7pac4jQXu4POCFP964KXJ1W
 Ix7qaVOLVcNfSBbHKu79tRHRZjWDiK0SplrHfO6aSUJ/whJ2raT3O8DL9Rbj1M4/
 QDYdMvockuwZRWZeYs1+A0LJ3LcPYVpVRvOjGpZEex8DQZ05+Elys33DMEM9MXpK
 fOiRu/Op286QxEKAkv/xaMMsJpYZ2k+AJXA+7nOCq0SNj0YvF0c=
 =INvG
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.1-20200507' into staging

ppc patch queue for 2020-04-07

First pull request for qemu-5.1.  This includes:
 * Removal of all remaining cases where we had CAS triggered reboots
 * A number of improvements to NMI injection
 * Support for partition scoped radix translation in softmmu
 * Some fixes for NVDIMM handling
 * A handful of other minor fixes

# gpg: Signature made Thu 07 May 2020 06:00:55 BST
# gpg:                using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-5.1-20200507:
  target-ppc: fix rlwimi, rlwinm, rlwnm for Clang-9
  spapr_nvdimm: Tweak error messages
  spapr_nvdimm.c: make 'label-size' mandatory
  target/ppc: Add support for Radix partition-scoped translation
  target/ppc: Rework ppc_radix64_walk_tree() for partition-scoped translation
  target/ppc: Extend ppc_radix64_check_prot() with a 'partition_scoped' bool
  target/ppc: Introduce ppc_radix64_xlate() for Radix tree translation
  spapr: Don't allow unplug of NVLink2 devices
  target/ppc: Assert if HV mode is set when running under a pseries machine
  target/ppc: Introduce a relocation bool in ppc_radix64_handle_mmu_fault()
  target/ppc: Enforce that the root page directory size must be at least 5
  spapr: Drop CAS reboot flag
  spapr/cas: Separate CAS handling from rebuilding the FDT
  spapr: Simplify selection of radix/hash during CAS
  ppc/pnv: Add support for NMI interface
  ppc/spapr: tweak change system reset helper
  spapr: Don't check capabilities removed between CAS calls
  target/ppc: Improve syscall exception logging

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-05-07 10:55:12 +01:00
Greg Kurz 087820e37f spapr: Drop CAS reboot flag
The CAS reboot flag is false by default and all the locations that
could set it to true have been dropped. This means that all code
blocks depending on the flag being set is dead code and the other
code blocks should be executed always.

Just do that and drop the now uneeded CAS reboot flag. Fix a
comment on the way to make checkpatch happy.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <158514994893.478799.11772512888322840990.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-05-07 11:10:50 +10:00
Alexey Kardashevskiy 91067db1ab spapr/cas: Separate CAS handling from rebuilding the FDT
At the moment "ibm,client-architecture-support" ("CAS") is implemented
in SLOF and QEMU assists via the custom H_CAS hypercall which copies
an updated flatten device tree (FDT) blob to the SLOF memory which
it then uses to update its internal tree.

When we enable the OpenFirmware client interface in QEMU, we won't need
to copy the FDT to the guest as the client is expected to fetch
the device tree using the client interface.

This moves FDT rebuild out to a separate helper which is going to be
called from the "ibm,client-architecture-support" handler and leaves
writing FDT to the guest in the H_CAS handler.

This should not cause any behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20200310050733.29805-3-aik@ozlabs.ru>
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <158514994229.478799.2178881312094922324.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-05-07 11:10:50 +10:00
Greg Kurz b4b83312e7 spapr: Simplify selection of radix/hash during CAS
The guest can select the MMU mode by setting bits 0-1 of byte 24
in OV5 to to 0b00 for hash or 0b01 for radix. As required by the
architecture, we terminate the boot process if any other value
is found there.

The usual way to negotiate features in OV5 is basically ANDing
the bitfield provided by the guest and the bitfield of features
supported by QEMU, previously populated at machine init.

For some not documented reason, MMU is treated differently : bit 1
of byte 24 (the radix/hash bit) is cleared from the guest OV5 and
explicitely set in the final negotiated OV5 if radix was requested.

Since the only expected input from the guest is the radix/hash bit
being set or not, it seems more appropriate to handle this like we
do for XIVE.

Set the radix bit in spapr->ov5 at machine init if it has a chance
to work (ie. power9, either TCG or a radix capable KVM) and rely
exclusively on spapr_ovec_intersect() to set the radix bit in
spapr->ov5_cas.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <158514993621.478799.4204740354545734293.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-05-07 11:10:50 +10:00
Nicholas Piggin b5b7f39181 ppc/spapr: tweak change system reset helper
Rather than have the helper take an optional vector address
override, instead have its caller modify env->nip itself.
This is more consistent when adding pnv nmi support, and also
with mce injection added later.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20200325144147.221875-2-npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-05-07 11:10:50 +10:00
Cornelia Huck 541aaa1df8 hw: add compat machines for 5.1
Add 5.1 machine types for arm/i440fx/q35/s390x/spapr.

Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-id: 20200429144605.7262-1-cohuck@redhat.com
2020-05-06 10:12:16 -04:00
Peter Maydell b319df5537 ppc patch queue 2020-03-17
Here's my final pull request for the qemu-5.0 soft freeze.  Sorry this
 is just under the wire - I hit some last minute problems that took a
 while to fix up and retest.
 
 Highlights are:
  * Numerous fixes for the FWNMI feature
  * A handful of cleanups to the device tree construction code
  * Numerous fixes for the spapr-vscsi device
  * A number of fixes and cleanups for real mode (MMU off) softmmu
    handling
  * Fixes for handling of the PAPR RMA
  * Better handling of hotplug/unplug events during boot
  * Assorted other fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl5wnnsACgkQbDjKyiDZ
 s5JdpQ//eY/AOTs09UhvKxt8DN7lC2WyHGxYSncb2Tj2zaJyPPX9p296IDBMw+KX
 Cafr6LzwLjpcpOyf/EWzg7qYGbNYoYgRWoOkHI/9pHsrIH3ZvhmnyTVQI5CffeEb
 EDDXJUQo/2sFpAGeODr5zz+zAQUGzt6ZZUxAiQAF9RYc9ohUGD2x5c86Asx6ZTZo
 /14bd3qnrcy1x+TxDetb1idFxFr2DsdYqpHAi88zHm+UaWzxYrb7kakd+YbqI24N
 tYryf5SdtGrWAAdF/7nq2PQJFzskx+t0QearU+ruovRydxYbUtBpkr5HauoVuQXR
 LiV270sDYDS/D1vvQQKzLxkUuvWmbZ0rB+2BAtS1rwq2sOKqYyQEAkTWfGtSXcf8
 7fuZm2i1G78MuYGTOLCrF1u0owUB3QYHvt1NUW09GyWS8X3mahtj2fRe1RtPV/5d
 NL217bcd32fkMoGCg/lFvK9sCQzR6zJGKkJvOGMVW4ahHCLixpjIWabWtdXjfguT
 UahRPvlX7fzeVT+DISfjqyxwL+THnTvB3CTMWG2cktf0K1ke4SXcQ0mPyksN1NuC
 QocfPCr1TN2ri8g9dAPwQmOkojnNs9izpIWRYSl3avTJFNseNPxuHQALXj2Y3Y/O
 EoYxLN+cqPukQ1O3GxEj5QMKe8V/0986mxWnuS/dMohQOoy+zV4=
 =BPnR
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200317' into staging

ppc patch queue 2020-03-17

Here's my final pull request for the qemu-5.0 soft freeze.  Sorry this
is just under the wire - I hit some last minute problems that took a
while to fix up and retest.

Highlights are:
 * Numerous fixes for the FWNMI feature
 * A handful of cleanups to the device tree construction code
 * Numerous fixes for the spapr-vscsi device
 * A number of fixes and cleanups for real mode (MMU off) softmmu
   handling
 * Fixes for handling of the PAPR RMA
 * Better handling of hotplug/unplug events during boot
 * Assorted other fixes

# gpg: Signature made Tue 17 Mar 2020 09:55:07 GMT
# gpg:                using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-5.0-20200317: (45 commits)
  pseries: Update SLOF firmware image
  ppc/spapr: Ignore common "ibm,nmi-interlock" Linux bug
  ppc/spapr: Implement FWNMI System Reset delivery
  target/ppc: allow ppc_cpu_do_system_reset to take an alternate vector
  ppc/spapr: Allow FWNMI on TCG
  ppc/spapr: Fix FWNMI machine check interrupt delivery
  ppc/spapr: Add FWNMI System Reset state
  ppc/spapr: Change FWNMI names
  ppc/spapr: Fix FWNMI machine check failure handling
  spapr: Rename DT functions to newer naming convention
  spapr: Move creation of ibm,architecture-vec-5 property
  spapr: Move creation of ibm,dynamic-reconfiguration-memory dt node
  spapr/rtas: Reserve space for RTAS blob and log
  pseries: Update SLOF firmware image
  ppc/spapr: Move GPRs setup to one place
  target/ppc: Fix rlwinm on ppc64
  spapr/xive: use SPAPR_IRQ_IPI to define IPI ranges exposed to the guest
  hw/scsi/spapr_vscsi: Convert debug fprintf() to trace event
  hw/scsi/spapr_vscsi: Prevent buffer overflow
  hw/scsi/spapr_vscsi: Do not mix SRP IU size with DMA buffer size
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-03-18 15:07:57 +00:00
Nicholas Piggin 0e236d3477 ppc/spapr: Implement FWNMI System Reset delivery
PAPR requires that if "ibm,nmi-register" succeeds, then the hypervisor
delivers all system reset and machine check exceptions to the registered
addresses.

System Resets are delivered with registers set to the architected state,
and with no interlock.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20200316142613.121089-8-npiggin@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-03-17 17:00:22 +11:00
Nicholas Piggin 9aa2528070 target/ppc: allow ppc_cpu_do_system_reset to take an alternate vector
Provide for an alternate delivery location, -1 defaults to the
architected address.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20200316142613.121089-7-npiggin@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-03-17 17:00:22 +11:00
Nicholas Piggin edfdbf9c6b ppc/spapr: Add FWNMI System Reset state
The FWNMI option must deliver system reset interrupts to their
registered address, and there are a few constraints on the handler
addresses specified in PAPR. Add the system reset address state and
checks.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20200316142613.121089-4-npiggin@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviwed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-03-17 17:00:22 +11:00
Nicholas Piggin 8af7e1fe6f ppc/spapr: Change FWNMI names
The option is called "FWNMI", and it involves more than just machine
checks, also machine checks can be delivered without the FWNMI option,
so re-name various things to reflect that.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20200316142613.121089-3-npiggin@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-03-17 17:00:22 +11:00
David Gibson 91335a5e15 spapr: Rename DT functions to newer naming convention
In the spapr code we've been gradually moving towards a convention that
functions which create pieces of the device tree are called spapr_dt_*().
This patch speeds that along by renaming most of the things that don't yet
match that so that they do.

For now we leave the *_dt_populate() functions which are actual methods
used in the DRCClass::dt_populate method.

While we're there we remove a few comments that don't really say anything
useful.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2020-03-17 17:00:19 +11:00
David Gibson 1e0e11085a spapr: Move creation of ibm,architecture-vec-5 property
This is currently called from spapr_dt_cas_updates() which is a hang
over from when we created this only as a diff to the DT at CAS time.
Now that we fully rebuild the DT at CAS time, just create it along
with the rest of the properties in /chosen.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2020-03-17 16:59:22 +11:00
David Gibson fa523f0dd3 spapr: Move creation of ibm,dynamic-reconfiguration-memory dt node
Currently this node with information about hotpluggable memory is created
from spapr_dt_cas_updates().  But that's just a hangover from when we
created it only as a diff to the device tree at CAS time.  Now that we
fully rebuild the DT as CAS time, it makes more sense to create this along
with the rest of the memory information in the device tree.

So, move it to spapr_populate_memory().  The patch is huge, but it's nearly
all just code motion.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2020-03-17 15:08:50 +11:00
Alexey Kardashevskiy 4dba872219 spapr/rtas: Reserve space for RTAS blob and log
At the moment SLOF reserves space for RTAS and instantiates the RTAS blob
which is 20 bytes binary blob calling an hypercall. The rest of the RTAS
area is a log which SLOF has no idea about but QEMU does.

This moves RTAS sizing to QEMU and this overrides the size from SLOF.
The only remaining problem is that SLOF copies the number of bytes it
reserved (2KB for now) so QEMU needs to reserve at least this much;
SLOF will be fixed separately to check that rtas-size from QEMU is
enough for those 20 bytes for the H_RTAS hcall.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20200316011841.99970-1-aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-03-17 15:08:50 +11:00
Alexey Kardashevskiy 395a20d3cc ppc/spapr: Move GPRs setup to one place
At the moment "pseries" starts in SLOF which only expects the FDT blob
pointer in r3. As we are going to introduce a OpenFirmware support in
QEMU, we will be booting OF clients directly and these expect a stack
pointer in r1, Linux looks at r3/r4 for the initramdisk location
(although vmlinux can find this from the device tree but zImage from
distro kernels cannot).

This extends spapr_cpu_set_entry_state() to take more registers. This
should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20200310050733.29805-2-aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-03-17 15:08:50 +11:00
David Gibson 425f0b7adb spapr: Clean up RMA size calculation
Move the calculation of the Real Mode Area (RMA) size into a helper
function.  While we're there clean it up and correct it in a few ways:
  * Add comments making it clearer where the various constraints come from
  * Remove a pointless check that the RMA fits within Node 0 (we've just
    clamped it so that it does)

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2020-03-17 15:08:47 +11:00
David Gibson 1052ab67f4 spapr: Don't clamp RMA to 16GiB on new machine types
In spapr_machine_init() we clamp the size of the RMA to 16GiB and the
comment saying why doesn't make a whole lot of sense.  In fact, this was
done because the real mode handling code elsewhere limited the RMA in TCG
mode to the maximum value configurable in LPCR[RMLS], 16GiB.

But,
 * Actually LPCR[RMLS] has been able to encode a 256GiB size for a very
   long time, we just didn't implement it properly in the softmmu
 * LPCR[RMLS] shouldn't really be relevant anyway, it only was because we
   used to abuse the RMOR based translation mode in order to handle the
   fact that we're not modelling the hypervisor parts of the cpu

We've now removed those limitations in the modelling so the 16GiB clamp no
longer serves a function.  However, we can't just remove the limit
universally: that would break migration to earlier qemu versions, where
the 16GiB RMLS limit still applies, no matter how bad the reasons for it
are.

So, we replace the 16GiB clamp, with a clamp to a limit defined in the
machine type class.  We set it to 16 GiB for machine types 4.2 and earlier,
but set it to 0 meaning unlimited for the new 5.0 machine type.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2020-03-17 09:41:15 +11:00
David Gibson 8897ea5a9f spapr: Don't attempt to clamp RMA to VRMA constraint
The Real Mode Area (RMA) is the part of memory which a guest can access
when in real (MMU off) mode.  Of course, for a guest under KVM, the MMU
isn't really turned off, it's just in a special translation mode - Virtual
Real Mode Area (VRMA) - which looks like real mode in guest mode.

The mechanics of how this works when using the hash MMU (HPT) put a
constraint on the size of the RMA, which depends on the size of the
HPT.  So, the latter part of spapr_setup_hpt_and_vrma() clamps the RMA
we advertise to the guest based on this VRMA limit.

There are several things wrong with this:
 1) spapr_setup_hpt_and_vrma() doesn't actually clamp, it takes the minimum
    of Node 0 memory size and the VRMA limit.  That will *often* work the
    same as clamping, but there can be other constraints on RMA size which
    supersede Node 0 memory size.  We have real bugs caused by this
    (currently worked around in the guest kernel)
 2) Some callers of spapr_setup_hpt_and_vrma() are in a situation where
    we're past the point that we can actually advertise an RMA limit to the
    guest
 3) But most fundamentally, the VRMA limit depends on host configuration
    (page size) which shouldn't be visible to the guest, but this partially
    exposes it.  This can cause problems with migration in certain edge
    cases, although we will mostly get away with it.

In practice, this clamping is almost never applied anyway.  With 64kiB
pages and the normal rules for sizing of the HPT, the theoretical VRMA
limit will be 4x(guest memory size) and so never hit.  It will hit with
4kiB pages, where it will be (guest memory size)/4.  However all mainstream
distro kernels for POWER have used a 64kiB page size for at least 10 years.

So, simply replace this logic with a check that the RMA we've calculated
based only on guest visible configuration will fit within the host implied
VRMA limit.  This can break if running HPT guests on a host kernel with
4kiB page size.  As noted that's very rare.  There also exist several
possible workarounds:
  * Change the host kernel to use 64kiB pages
  * Use radix MMU (RPT) guests instead of HPT
  * Use 64kiB hugepages on the host to back guest memory
  * Increase the guest memory size so that the RMA hits one of the fixed
    limits before the RMA limit.  This is relatively easy on POWER8 which
    has a 16GiB limit, harder on POWER9 which has a 1TiB limit.
  * Use a guest NUMA configuration which artificially constrains the RMA
    within the VRMA limit (the RMA must always fit within Node 0).

Previously, on KVM, we also temporarily reduced the rma_size to 256M so
that the we'd load the kernel and initrd safely, regardless of the VRMA
limit.  This was a) confusing, b) could significantly limit the size of
images we could load and c) introduced a behavioural difference between
KVM and TCG.  So we remove that as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
2020-03-17 09:41:15 +11:00
David Gibson 6a84737c80 spapr,ppc: Simplify signature of kvmppc_rma_size()
This function calculates the maximum size of the RMA as implied by the
host's page size of structure of the VRMA (there are a number of other
constraints on the RMA size which will supersede this one in many
circumstances).

The current interface takes the current RMA size estimate, and clamps it
to the VRMA derived size.  The only current caller passes in an arguably
wrong value (it will match the current RMA estimate in some but not all
cases).

We want to fix that, but for now just keep concerns separated by having the
KVM helper function just return the VRMA derived limit, and let the caller
combine it with other constraints.  We call the new function
kvmppc_vrma_limit() to more clearly indicate its limited responsibility.

The helper should only ever be called in the KVM enabled case, so replace
its !CONFIG_KVM stub with an assert() rather than a dummy value.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cedric Le Goater <clg@fr.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2020-03-17 09:41:15 +11:00
David Gibson 9943266ec3 spapr: Don't use weird units for MIN_RMA_SLOF
MIN_RMA_SLOF records the minimum about of RMA that the SLOF firmware
requires.  It lets us give a meaningful error if the RMA ends up too small,
rather than just letting SLOF crash.

It's currently stored as a number of megabytes, which is strange for global
constants.  Move that megabyte scaling into the definition of the constant
like most other things use.

Change from M to MiB in the associated message while we're at it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2020-03-17 09:41:15 +11:00
Felipe Franciosi 64a7b8de42 qom/object: Use common get/set uint helpers
Several objects implemented their own uint property getters and setters,
despite them being straightforward (without any checks/validations on
the values themselves) and identical across objects. This makes use of
an enhanced API for object_property_add_uintXX_ptr() which offers
default setters.

Some of these setters used to update the value even if the type visit
failed (eg. because the value being set overflowed over the given type).
The new setter introduces a check for these errors, not updating the
value if an error occurred. The error is propagated.

Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 23:02:24 +01:00
Philippe Mathieu-Daudé ea0ac7f6f8 hw: Make MachineClass::is_default a boolean type
There's no good reason for it to be type int, change it to bool.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200207161948.15972-3-philmd@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-02-28 14:57:19 -05:00
Paolo Bonzini ca6155c0f2 Merge tag 'patchew/20200219160953.13771-1-imammedo@redhat.com' of https://github.com/patchew-project/qemu into HEAD
This series removes ad hoc RAM allocation API (memory_region_allocate_system_memory)
and consolidates it around hostmem backend. It allows to

* resolve conflicts between global -mem-prealloc and hostmem's "policy" option,
  fixing premature allocation before binding policy is applied

* simplify complicated memory allocation routines which had to deal with 2 ways
  to allocate RAM.

* reuse hostmem backends of a choice for main RAM without adding extra CLI
  options to duplicate hostmem features.  A recent case was -mem-shared, to
  enable vhost-user on targets that don't support hostmem backends [1] (ex: s390)

* move RAM allocation from individual boards into generic machine code and
  provide them with prepared MemoryRegion.

* clean up deprecated NUMA features which were tied to the old API (see patches)
  - "numa: remove deprecated -mem-path fallback to anonymous RAM"
  - (POSTPONED, waiting on libvirt side) "forbid '-numa node,mem' for 5.0 and newer machine types"
  - (POSTPONED) "numa: remove deprecated implicit RAM distribution between nodes"

Introduce a new machine.memory-backend property and wrapper code that aliases
global -mem-path and -mem-alloc into automatically created hostmem backend
properties (provided memory-backend was not set explicitly given by user).
A bulk of trivial patches then follow to incrementally convert individual
boards to using machine.memory-backend provided MemoryRegion.

Board conversion typically involves:

* providing MachineClass::default_ram_size and MachineClass::default_ram_id
  so generic code could create default backend if user didn't explicitly provide
  memory-backend or -m options

* dropping memory_region_allocate_system_memory() call

* using convenience MachineState::ram MemoryRegion, which points to MemoryRegion
   allocated by ram-memdev

On top of that for some boards:

* missing ram_size checks are added (typically it were boards with fixed ram size)

* ram_size fixups are replaced by checks and hard errors, forcing user to
  provide correct "-m" values instead of ignoring it and continuing running.

After all boards are converted, the old API is removed and memory allocation
routines are cleaned up.
2020-02-25 09:19:00 +01:00
Alexey Kardashevskiy 87262806cb spapr: Allow changing offset for -kernel image
This allows moving the kernel in the guest memory. The option is useful
for step debugging (as Linux is linked at 0x0); it also allows loading
grub which is normally linked to run at 0x20000.

This uses the existing kernel address by default.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20200203032943.121178-6-aik@ozlabs.ru>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-21 09:15:04 +11:00
Shivaprasad G Bhat ee3a71e366 spapr: Add NVDIMM device support
Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm
device interface in QEMU to support virtual NVDIMM devices for Power.
Create the required DT entries for the device (some entries have
dummy values right now).

The patch creates the required DT node and sends a hotplug
interrupt to the guest. Guest is expected to undertake the normal
DR resource add path in response and start issuing PAPR SCM hcalls.

The device support is verified based on the machine version unlike x86.

This is how it can be used ..
Ex :
For coldplug, the device to be added in qemu command line as shown below
-object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896
-device nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0

For hotplug, the device to be added from monitor as below
object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896
device_add nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
               [Early implementation]
Message-Id: <158131058078.2897.12767731856697459923.stgit@lep8c.aus.stglabs.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-21 09:15:04 +11:00
Michael S. Tsirkin a784926819 ppc: function to setup latest class options
We are going to add more init for the latest machine, so move the setup
to a function so we don't have to change the DEFINE_SPAPR_MACHINE macro
each time.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20200207064628.1196095-1-mst@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-21 09:15:03 +11:00
Igor Mammedov ab74e54311 ppc/spapr: use memdev for RAM
memory_region_allocate_system_memory() API is going away, so
replace it with memdev allocated MemoryRegion. The later is
initialized by generic code, so board only needs to opt in
to memdev scheme by providing
  MachineClass::default_ram_id
and using MachineState::ram instead of manually initializing
RAM memory region.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200219160953.13771-68-imammedo@redhat.com>
2020-02-19 16:50:01 +00:00
Aravinda Prasad e0aeef7a35 ppc: spapr: Activate the FWNMI functionality
This patch sets the default value of SPAPR_CAP_FWNMI_MCE
to SPAPR_CAP_ON for machine type 5.0.

Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Message-Id: <20200130184423.20519-8-ganeshgr@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-03 11:33:11 +11:00
Aravinda Prasad 2500fb423a migration: Include migration support for machine check handling
This patch includes migration support for machine check
handling. Especially this patch blocks VM migration
requests until the machine check error handling is
complete as these errors are specific to the source
hardware and is irrelevant on the target hardware.

Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
[Do not set FWNMI cap in post_load, now its done in .apply hook]
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Message-Id: <20200130184423.20519-7-ganeshgr@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-03 11:33:11 +11:00
Aravinda Prasad 9ac703ac5f target/ppc: Handle NMI guest exit
Memory error such as bit flips that cannot be corrected
by hardware are passed on to the kernel for handling.
If the memory address in error belongs to guest then
the guest kernel is responsible for taking suitable action.
Patch [1] enhances KVM to exit guest with exit reason
set to KVM_EXIT_NMI in such cases. This patch handles
KVM_EXIT_NMI exit.

[1] https://www.spinics.net/lists/kvm-ppc/msg12637.html
    (e20bbd3d and related commits)

Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20200130184423.20519-4-ganeshgr@linux.ibm.com>
[dwg: #ifdefs to fix compile for 32-bit target]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-03 11:33:10 +11:00
Aravinda Prasad 9d953ce447 ppc: spapr: Introduce FWNMI capability
Introduce fwnmi an spapr capability and add a helper function
which tries to enable it, which would be used by following patch
of the series. This patch by itself does not change the existing
behavior.

Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com>
[eliminate cap_ppc_fwnmi, add fwnmi cap to migration state
 and reprhase the commit message]
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20200130184423.20519-3-ganeshgr@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-02-03 11:33:10 +11:00
David Gibson 37965dfe4d spapr: Enable DD2.3 accelerated count cache flush in pseries-5.0 machine
For POWER9 DD2.2 cpus, the best current Spectre v2 indirect branch
mitigation is "count cache disabled", which is configured with:
    -machine cap-ibs=fixed-ccd
However, this option isn't available on DD2.3 CPUs with KVM, because they
don't have the count cache disabled.

For POWER9 DD2.3 cpus, it is "count cache flush with assist", configured
with:
    -machine cap-ibs=workaround,cap-ccf-assist=on
However this option isn't available on DD2.2 CPUs with KVM, because they
don't have the special CCF assist instruction this relies on.

On current machine types, we default to "count cache flush w/o assist",
that is:
    -machine cap-ibs=workaround,cap-ccf-assist=off
This runs, with mitigation on both DD2.2 and DD2.3 host cpus, but has a
fairly significant performance impact.

It turns out we can do better.  The special instruction that CCF assist
uses to trigger a count cache flush is a no-op on earlier CPUs, rather than
trapping or causing other badness.  It doesn't, of itself, implement the
mitigation, but *if* we have count-cache-disabled, then the count cache
flush is unnecessary, and so using the count cache flush mitigation is
harmless.

Therefore for the new pseries-5.0 machine type, enable cap-ccf-assist by
default.  Along with that, suppress throwing an error if cap-ccf-assist
is selected but KVM doesn't support it, as long as KVM *is* giving us
count-cache-disabled.  To allow TCG to work out of the box, even though it
doesn't implement the ccf flush assist, downgrade the error in that case to
a warning.  This matches several Spectre mitigations where we allow TCG
to operate for debugging, since we don't really make guarantees about TCG
security properties anyway.

While we're there, make the TCG warning for this case match that for other
mitigations.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
2020-02-03 11:33:02 +11:00
Aleksandar Markovic 6cdda0ff4b hw/core/loader: Let load_elf() populate a field with CPU-specific flags
While loading the executable, some platforms (like AVR) need to
detect CPU type that executable is built for - and, with this patch,
this is enabled by reading the field 'e_flags' of the ELF header of
the executable in question. The change expands functionality of
the following functions:

  - load_elf()
  - load_elf_as()
  - load_elf_ram()
  - load_elf_ram_sym()

The argument added to these functions is called 'pflags' and is of
type 'uint32_t*' (that matches 'pointer to 'elf_word'', 'elf_word'
being the type of the field 'e_flags', in both 32-bit and 64-bit
variants of ELF header). Callers are allowed to pass NULL as that
argument, and in such case no lookup to the field 'e_flags' will
happen, and no information will be returned, of course.

CC: Richard Henderson <rth@twiddle.net>
CC: Peter Maydell <peter.maydell@linaro.org>
CC: Edgar E. Iglesias <edgar.iglesias@gmail.com>
CC: Michael Walle <michael@walle.cc>
CC: Thomas Huth <huth@tuxfamily.org>
CC: Laurent Vivier <laurent@vivier.eu>
CC: Philippe Mathieu-Daudé <f4bug@amsat.org>
CC: Aleksandar Rikalo <aleksandar.rikalo@rt-rk.com>
CC: Aurelien Jarno <aurelien@aurel32.net>
CC: Jia Liu <proljc@gmail.com>
CC: David Gibson <david@gibson.dropbear.id.au>
CC: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
CC: BALATON Zoltan <balaton@eik.bme.hu>
CC: Christian Borntraeger <borntraeger@de.ibm.com>
CC: Thomas Huth <thuth@redhat.com>
CC: Artyom Tarasenko <atar4qemu@gmail.com>
CC: Fabien Chouteau <chouteau@adacore.com>
CC: KONRAD Frederic <frederic.konrad@adacore.com>
CC: Max Filippov <jcmvbkbc@gmail.com>

Reviewed-by: Aleksandar Rikalo <aleksandar.rikalo@rt-rk.com>
Signed-off-by: Michael Rolnik <mrolnik@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Message-Id: <1580079311-20447-24-git-send-email-aleksandar.markovic@rt-rk.com>
2020-01-29 19:28:52 +01:00
Peter Xu 1df2c9a26f migration: Define VMSTATE_INSTANCE_ID_ANY
Define the new macro VMSTATE_INSTANCE_ID_ANY for callers who wants to
auto-generate the vmstate instance ID.  Previously it was hard coded
as -1 instead of this macro.  It helps to change this default value in
the follow up patches.  No functional change.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2020-01-20 09:10:23 +01:00
Cédric Le Goater baa45b1710 spapr/xive: remove redundant check in spapr_match_nvt()
spapr_match_nvt() is a XIVE operation and is used by the machine to
look for a matching target when an event notification is being
delivered. An assert checks that spapr_match_nvt() is called only when
the machine has selected the XIVE interrupt mode but it is redundant
with the XIVE_PRESENTER() dynamic cast.

Apply the cast to spapr->active_intc and remove the assert.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20200106163207.4608-1-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-01-08 11:01:59 +11:00
Daniel Henrique Barboza 9b6c1da5e9 spapr.c: remove 'out' label in spapr_dt_cas_updates()
'out' can be replaced by 'return' with the appropriate
return value.

CC: David Gibson <david@gibson.dropbear.id.au>
CC: qemu-ppc@nongnu.org
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200106182425.20312-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-01-08 11:01:59 +11:00
Bharata B Rao 905db91697 ppc/spapr: Support reboot of secure pseries guest
A pseries guest can be run as a secure guest on Ultravisor-enabled
POWER platforms. When such a secure guest is reset, we need to
release/reset a few resources both on ultravisor and hypervisor side.
This is achieved by invoking this new ioctl KVM_PPC_SVM_OFF from the
machine reset path.

As part of this ioctl, the secure guest is essentially transitioned
back to normal mode so that it can reboot like a regular guest and
become secure again.

This ioctl has no effect when invoked for a normal guest. If this ioctl
fails for a secure guest, the guest is terminated.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Message-Id: <20191219031445.8949-3-bharata@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-01-08 11:01:59 +11:00
David Gibson d1d32d6255 spapr: Simplify ovec diff
spapr_ovec_diff(ov, old, new) has somewhat complex semantics.  ov is set
to those bits which are in new but not old, and it returns as a boolean
whether or not there are any bits in old but not new.

It turns out that both callers only care about the second, not the first.
This is basically equivalent to a bitmap subset operation, which is easier
to understand and implement.  So replace spapr_ovec_diff() with
spapr_ovec_subset().

Cc: Mike Roth <mdroth@linux.vnet.ibm.com>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cedric Le Goater <clg@fr.ibm.com>
2019-12-17 10:39:48 +11:00
David Gibson 0c21e07354 spapr: Fold h_cas_compose_response() into h_client_architecture_support()
spapr_h_cas_compose_response() handles the last piece of the PAPR feature
negotiation process invoked via the ibm,client-architecture-support OF
call.  Its only caller is h_client_architecture_support() which handles
most of the rest of that process.

I believe it was placed in a separate file originally to handle some
fiddly dependencies between functions, but mostly it's just confusing
to have the CAS process split into two pieces like this.  Now that
compose response is simplified (by just generating the whole device
tree anew), it's cleaner to just fold it into
h_client_architecture_support().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cedric Le Goater <clg@fr.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-12-17 10:39:48 +11:00
David Gibson 97b32a6afa spapr: Improve handling of fdt buffer size
Previously, spapr_build_fdt() constructed the device tree in a fixed
buffer of size FDT_MAX_SIZE.  This is a bit inflexible, but more
importantly it's awkward for the case where we use it during CAS.  In
that case the guest firmware supplies a buffer and we have to
awkwardly check that what we generated fits into it afterwards, after
doing a lot of size checks during spapr_build_fdt().

Simplify this by having spapr_build_fdt() take a 'space' parameter.
For the CAS case, we pass in the buffer size provided by SLOF, for the
machine init case, we continue to pass FDT_MAX_SIZE.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cedric Le Goater <clg@fr.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-12-17 10:39:48 +11:00
Vladimir Sementsov-Ogievskiy cdcca22aab ppc: well form kvmppc_hint_smt_possible error hint helper
Make kvmppc_hint_smt_possible hint append helper well formed:
rename errp to errp_in, as it is IN-parameter here (which is unusual
for errp), rename function to be kvmppc_error_append_*_hint.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20191127191434.20945-1-vsementsov@virtuozzo.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cédric Le Goater 932de7aef8 ppc/spapr: Implement the XiveFabric interface
The CAM line matching sequence in the pseries machine does not change
much apart from the use of the new QOM interfaces. There is an extra
indirection because of the sPAPR IRQ backend of the machine. Only the
XIVE backend implements the new 'match_nvt' handler.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20191125065820.927-11-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-12-17 10:39:48 +11:00
Cornelia Huck 3eb74d2087 hw: add compat machines for 5.0
Add 5.0 machine types for arm/i440fx/q35/s390x/spapr.

For i440fx and q35, unversioned cpu models are still translated
to -v1; I'll leave changing this (if desired) to the respective
maintainers.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20191112104811.30323-1-cohuck@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
2019-12-14 10:25:50 +01:00
Evgeny Yakovlev 5f2585772f virtio-blk: advertise F_WCE (F_FLUSH) if F_CONFIG_WCE is advertised
Virtio spec 1.1 (and earlier), 5.2.5.2 Driver Requirements: Device
Initialization:

"Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it if
they offer VIRTIO_BLK_F_CONFIG_WCE"

Currently F_CONFIG_WCE and F_WCE are not connected to each other.
Qemu will advertise F_CONFIG_WCE if config-wce argument is
set for virtio-blk device. And F_WCE is advertised only if
underlying block backend actually has it's caching enabled.

Fix this by advertising F_WCE if F_CONFIG_WCE is also advertised.

To preserve backwards compatibility with newer machine types make this
behaviour governed by "x-enable-wce-if-config-wce" virtio-blk-device
property and introduce hw_compat_4_2 with new property being off by
default for all machine types <= 4.2 (but don't introduce 4.3
machine type itself yet).

Signed-off-by: Evgeny Yakovlev <wrfsh@yandex-team.ru>
Message-Id: <1572978137-189218-1-git-send-email-wrfsh@yandex-team.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-12-13 11:22:06 +00:00
Alexey Kardashevskiy a49f62b9fd spapr: Add /chosen to FDT only at reset time to preserve kernel and initramdisk
Since "spapr: Render full FDT on ibm,client-architecture-support" we build
the entire flatten device tree (FDT) twice - at the reset time and
when "ibm,client-architecture-support" (CAS) is called. The full FDT from
CAS is then applied on top of the SLOF internal device tree.

This is mostly ok, however there is a case when the QEMU is started with
-initrd and for some reason the guest decided to move/unpack the init RAM
disk image - the guest correctly notifies SLOF about the change but
at CAS it is overridden with the QEMU initial location addresses and
the guest may fail to boot if the original initrd memory was changed.

This fixes the problem by only adding the /chosen node at the reset time
to prevent the original QEMU's linux,initrd-start/linux,initrd-end to
override the updated addresses.

This only treats /chosen differently as we know there is a special case
already and it is unlikely anything else will need to change /chosen at CAS
we are better off not touching /chosen after we handed it over to SLOF.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20191024041308.5673-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
2019-11-18 11:50:33 +01:00
Greg Kurz 47c8c915b1 spapr: Don't request to unplug the same core twice
We must not call spapr_drc_detach() on a detached DRC otherwise bad things
can happen, ie. QEMU hangs or crashes. This is easily demonstrated with
a CPU hotplug/unplug loop using QMP.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157185826035.3073024.1664101000438499392.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-24 09:37:54 +11:00
David Gibson 54255c1f65 spapr: Move SpaprIrq::nr_xirqs to SpaprMachineClass
For the benefit of peripheral device allocation, the number of available
irqs really wants to be the same on a given machine type version,
regardless of what irq backends we are using.  That's the case now, but
only because we make sure the different SpaprIrq instances have the same
value except for the special legacy one.

Since this really only depends on machine type version, move the value to
SpaprMachineClass instead of SpaprIrq.  This also puts the code to set it
to the lower value on old machine types right next to setting
legacy_irq_allocation, which needs to go hand in hand.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson 8cbe71ecb8 spapr: Remove SpaprIrq::nr_msis
The nr_msis value we use here has to line up with whether we're using
legacy or modern irq allocation.  Therefore it's safer to derive it based
on legacy_irq_allocation rather than having SpaprIrq contain a canned
value.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson 05289273c0 spapr, xics, xive: Move dt_populate from SpaprIrq to SpaprInterruptController
This method depends only on the active irq controller.  Now that we've
formalized the notion of active controller we can dispatch directly
through that, rather than dispatching via SpaprIrq with the dual
version having to do a second conditional dispatch.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
David Gibson 328d8eb24d spapr, xics, xive: Move print_info from SpaprIrq to SpaprInterruptController
This method depends only on the active irq controller.  Now that we've
formalized the notion of active controller we can dispatch directly
through that, rather than dispatching via SpaprIrq with the dual
version having to do a second conditional dispatch.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-24 09:36:55 +11:00
Greg Kurz 29cb418749 spapr: Set VSMT to smp_threads by default
Support for setting VSMT is available in KVM since linux-4.13. Most distros
that support KVM on POWER already have it. It thus seem reasonable enough
to have the default machine to set VSMT to smp_threads.

This brings contiguous VCPU ids and thus brings their upper bound down to
the machine's max_cpus. This is especially useful for XIVE KVM devices,
which may thus allocate only one VP descriptor per VCPU.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <157010411885.246126.12610015369068227139.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-24 09:36:55 +11:00
Tao Xu 0533ef5f20 numa: Introduce MachineClass::auto_enable_numa for implicit NUMA node
Add MachineClass::auto_enable_numa field. When it is true, a NUMA node
is expected to be created implicitly.

Acked-by: David Gibson <david@gibson.dropbear.id.au>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <20190905083238.1799-1-tao3.xu@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-10-15 18:18:08 -03:00
David Gibson ca62823b79 spapr: Use less cryptic representation of which irq backends are supported
SpaprIrq::ov5 stores the value for a particular byte in PAPR option vector
5 which indicates whether XICS, XIVE or both interrupt controllers are
available.  As usual for PAPR, the encoding is kind of overly complicated
and confusing (though to be fair there are some backwards compat things it
has to handle).

But to make our internal code clearer, have SpaprIrq encode more directly
which backends are available as two booleans, and derive the OV5 value from
that at the point we need it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 19:08:23 +10:00
Alexey Kardashevskiy e68cd0cb5c spapr: Render full FDT on ibm,client-architecture-support
The ibm,client-architecture-support call is a way for the guest to
negotiate capabilities with a hypervisor. It is implemented as:
- the guest calls SLOF via client interface;
- SLOF calls QEMU (H_CAS hypercall) with an options vector from the guest;
- QEMU returns a device tree diff (which uses FDT format with
an additional header before it);
- SLOF walks through the partial diff tree and updates its internal tree
with the values from the diff.

This changes QEMU to simply re-render the entire tree and send it as
an update. SLOF can handle this already mostly, [1] is needed before this
can be applied. This stores the resulting tree in the spapr machine to have
the latest valid FDT copy possible (this should not matter much as
H_UPDATE_DT happens right after that but nevertheless).

The benefit is reduced code size as there is no need for another set of
DT rendering helpers such as spapr_fixup_cpu_dt().

The downside is that the updates are bigger now (as they include all
nodes and properties) but the difference on a '-smp 256,threads=1' system
before/after is 2.35s vs. 2.5s.

[1] https://patchwork.ozlabs.org/patch/1152915/

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 19:08:21 +10:00
Alexey Kardashevskiy 744a928cce spapr: Stop providing RTAS blob
SLOF implements one itself so let's remove it from QEMU. It is one less
image and simpler setup as the RTAS blob never stays in its initial place
anyway as the guest OS always decides where to put it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Alexey Kardashevskiy 5ced78955f spapr: Do not put empty properties for -kernel/-initrd/-append
We are going to use spapr_build_fdt() for the boot time FDT and as an
update for SLOF during handling of H_CAS. SLOF will apply all properties
from the QEMU's FDT which is usually ok unless there are properties
changed by grub or guest kernel. The properties are:
bootargs, linux,initrd-start, linux,initrd-end, linux,stdout-path,
linux,rtas-base, linux,rtas-entry. Resetting those during CAS will most
likely cause grub failure.

Don't create such properties if we're booting without "-kernel" and
"-initrd" so they won't get included into the DT update blob and
therefore the guest is more likely to boot successfully.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[dwg: Tweaked commit message based on Greg Kurz's input]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Alexey Kardashevskiy 3a17e38f6e spapr: Skip leading zeroes from memory@ DT node names
The device tree build by QEMU at the machine reset time is used by SLOF
to build its internal device tree but the node names are not preserved
exactly so when QEMU provides a device tree update in response to H_CAS,
it might become tricky to match a node from the update blob to
the actual node in SLOF.

This removed leading zeroes from "memory@" nodes and makes
the DTC checker happy.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2019-10-04 10:25:23 +10:00
Alexey Kardashevskiy f767b1ac57 spapr: Fixes a leak in CAS
Add a missing g_free(fdt) if the resulting tree is bigger
than the space allocated by SLOF.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2019-10-04 10:25:23 +10:00
David Gibson db5127b28a spapr: Move handling of special NVLink numa node from reset to init
The number of NUMA nodes in the system is fixed from the command line.
Therefore, there's no need to recalculate it at reset time, and we can
determine the special gpu_numa_id value used for NVLink2 devices at init
time.

This simplifies the reset path a bit which will make further improvements
easier.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2019-10-04 10:25:23 +10:00
David Gibson daa36379ce spapr: Simplify handling of pre ISA 3.0 guest workaround handling
Certain old guest versions don't understand the radix MMU introduced with
POWER ISA 3.0, but incorrectly select it if presented with the option at
CAS time.  We workaround this in qemu by explicitly excluding the radix
(and other ISA 3.0 linked) options if the guest doesn't explicitly note
support for ISA 3.0.

This is handled by the 'cas_legacy_guest_workaround' flag, which is pretty
vague.  Rename it to 'cas_pre_isa3_guest' to be clearer about what it's for.

In addition, we unnecessarily call spapr_populate_pa_features() with
different options when initially constructing the device tree and when
adjusting it at CAS time.  At the initial construct time cas_pre_isa3_guest
is already false, so we can still use the flag, rather than explicitly
overriding it to be false at the callsite.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2019-10-04 10:25:23 +10:00
Greg Kurz f041d6af55 spapr: Report kvm_irqchip_in_kernel() in 'info pic'
Unless the machine was started with kernel-irqchip=on, we cannot easily
tell if we're actually using an in-kernel or an emulated irqchip. This
information is important enough that it is worth printing it in 'info
pic'.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156829860985.2073005.5893493824873412773.stgit@bahia.tls.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Laurent Vivier 58c46efa45 pseries: do not allow memory-less/cpu-less NUMA node
When we hotplug a CPU on memory-less/cpu-less node, the linux kernel
crashes.

This happens because linux kernel needs to know the NUMA topology at
start to be able to initialize the distance lookup table.

On pseries, the topology is provided by the firmware via the existing
CPUs and memory information. Thus a node without memory and CPU cannot be
discovered by the kernel.

To avoid the kernel crash, do not allow to start pseries with empty
nodes.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20190830161345.22436-1-lvivier@redhat.com>
[dwg: Rework to cope with movement of numa state from globals to MachineState]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-10-04 10:25:23 +10:00
Dr. David Alan Gilbert ce62df5378 migration: register_savevm_live doesn't need dev
Commit 78dd48df3 removed the last caller of register_savevm_live for an
instantiable device (rather than a single system wide device);
so trim out the parameter.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190822115433.12070-1-dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-09-12 11:15:03 +01:00
Peter Maydell f884294bd7 Machine + x86 queue, 2019-09-03
Bug fixes:
 * Fix die-id validation regression (Eduardo Habkost)
 * vmmouse: Properly reset state (Jan Kiszka)
 * hostmem-file: fix pmem file size check (Stefan Hajnoczi)
 * Keep query-hotpluggable-cpus output compatible with older QEMU
   if '-smp dies' is not set (Igor Mammedov)
 * migration: Do not re-read the clock on pre_save in case of paused guest
   (Maxiwell S. Garcia)
 
 Cleanups:
 * NUMA code cleanups (Tao Xu)
 * Remove stale externs from includes (Alex Bennée)
 
 Features:
 * qapi: report the default CPU type for each machine (Daniel P. Berrangé)
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEWjIv1avE09usz9GqKAeTb5hNxaYFAl1u08EUHGVoYWJrb3N0
 QHJlZGhhdC5jb20ACgkQKAeTb5hNxaaKGQ//WQY+JQgXj2M7i5bAuz1lkR0QKJvh
 n++70ugqNmmlj1YH7LKmZNll0tz+auo25PLgEBOamPZPFQXxkRhPBxTUnOdQJ1UC
 bSwyRzHrFluVITXD/nGkIXgmP4rjXil5QBWTxneWb7zYsXDGBEnauZnC1YsXzc9T
 5LISvc5zEz6pEzz5s3LdUJ947jTui/dDHVHupeyK/5bPkiPoKVoymsd4p8rvAmFw
 4obMftjuFzklm8oLPKpHYAm7VvXj5yb92/FE/ZKdaahcLPGStWixiHJ7xJlGMBti
 GqcWca+2sdbsraOz4Pg05x//vbOgiwIECqgKJRlJSAnG7Roz7E6J/xXQIYIkhpkL
 Sn0+s181WtFeNFlQgEP056iTUCq81oBjek2XzgsXzuQyFip5IJGLLQox4E+w0ty6
 7houoCkJD70ddl3sEj/koXi6rBeswNStfuxVYxUgwYa7HecehNvVD5q9NlElRhev
 Lce4szuWJzHBbhW5ubGmN6rCbXNa+mPrBunrDwbjApl12DFkr163dj9DsyN/DUgy
 MmfsgqpKZ+g18VSajck2QtvTg+9Oqv0bv3SWtpDwzDxS9VULz0r2wfcN9TZDipV0
 qCZWg39BpCIgdd4s5L0q6bamC9+eSwoByFx54WrkoQT81odHJqUHNsCE9wnoNvmG
 aZlV3idjGmsTFiE=
 =u5HZ
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging

Machine + x86 queue, 2019-09-03

Bug fixes:
* Fix die-id validation regression (Eduardo Habkost)
* vmmouse: Properly reset state (Jan Kiszka)
* hostmem-file: fix pmem file size check (Stefan Hajnoczi)
* Keep query-hotpluggable-cpus output compatible with older QEMU
  if '-smp dies' is not set (Igor Mammedov)
* migration: Do not re-read the clock on pre_save in case of paused guest
  (Maxiwell S. Garcia)

Cleanups:
* NUMA code cleanups (Tao Xu)
* Remove stale externs from includes (Alex Bennée)

Features:
* qapi: report the default CPU type for each machine (Daniel P. Berrangé)

# gpg: Signature made Tue 03 Sep 2019 21:57:37 BST
# gpg:                using RSA key 5A322FD5ABC4D3DBACCFD1AA2807936F984DC5A6
# gpg:                issuer "ehabkost@redhat.com"
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full]
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/machine-next-pull-request:
  migration: Do not re-read the clock on pre_save in case of paused guest
  x86: do not advertise die-id in query-hotpluggbale-cpus if '-smp dies' is not set
  i386/vmmouse: Properly reset state
  hostmem-file: fix pmem file size check
  qapi: report the default CPU type for each machine
  pc: Don't make die-id mandatory unless necessary
  pc: Improve error message when die-id is omitted
  pc: Fix error message on die-id validation
  numa: move numa global variable numa_info into MachineState
  numa: move numa global variable have_numa_distance into MachineState
  numa: move numa global variable nb_numa_nodes into MachineState
  hw/arm: simplify arm_load_dtb
  includes: remove stale [smp|max]_cpus externs

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-09-04 14:44:54 +01:00
Tao Xu 7e721e7b10 numa: move numa global variable numa_info into MachineState
Move existing numa global numa_info (renamed as "nodes") into NumaState.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <20190809065731.9097-5-tao3.xu@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-09-03 11:26:55 -03:00
Tao Xu aa57020774 numa: move numa global variable nb_numa_nodes into MachineState
Add struct NumaState in MachineState and move existing numa global
nb_numa_nodes(renamed as "num_nodes") into NumaState. And add variable
numa_support into MachineClass to decide which submachines support NUMA.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Message-Id: <20190809065731.9097-3-tao3.xu@intel.com>
[ehabkost: include hw/boards.h again to fix build failures]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-09-03 11:26:55 -03:00
Greg Kurz b1e8156743 spapr: Set compat mode in spapr_core_plug()
A recent change in spapr_machine_reset() showed that resetting the compat
mode in spapr_machine_reset() for the boot vCPU and in spapr_cpu_reset()
for all other vCPUs was fragile. The fix was thus to reset the compat mode
for all vCPUs in spapr_machine_reset(), but we still have to propagate
it to hot-plugged CPUs. This is still performed from spapr_cpu_reset(),
hence resulting in ppc_set_compat() being called twice for every vCPU at
machine reset. Apart from wasting cycles, which isn't really an issue
during machine reset, this seems to indicate that spapr_cpu_reset() isn't
the best place to set the compat mode.

A natural candidate for CPU-hotplug specific code is spapr_core_plug().
Also, it sits in the same file as spapr_machine_reset() : this makes
it easier for someone who wants to know when the compat PVR is set.

Call ppc_set_compat() from there. This doesn't need to be done for
initial vCPUs since the compat PVR is 0 and spapr_machine_reset() sets
the appropriate value later. No need to do this on manually added vCPUS
on the destination QEMU during migration since the compat PVR is
part of the migrated vCPU state. Both conditions can be checked with
spapr_drc_hotplugged().

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156701285312.499757.7807417667750711711.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-29 09:46:07 +10:00
Alexey Kardashevskiy 6c3829a265 spapr_pci: Advertise BAR reallocation capability
The pseries guests do not normally allocate PCI resources and rely on
the system firmware doing so. Furthermore at least at some point in
the past the pseries guests won't even allowed to change BARs, probably
it is still the case for phyp. So since the initial commit we have [1]
which prevents resource reallocation.

This is not a problem until we want specific BAR alignments, for example,
PAGE_SIZE==64k to make sure we can still map MMIO BARs directly. For
the boot time devices we handle this in SLOF [2] but since QEMU's RTAS
does not allocate BARs, the guest does this instead and does not align
BARs even if Linux is given pci=resource_alignment=16@pci:0:0 as
PCI_PROBE_ONLY makes Linux ignore alignment requests.

ARM folks added a dial to control PCI_PROBE_ONLY via the device tree [3].
This makes use of the dial to advertise to the guest that we can handle
BAR reassignments. This limits the change to the latest pseries machine
to avoid old guests explosion.

We do not remove the flag from [1] as pseries guests are still supported
under phyp so having that removed may cause problems.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/platforms/pseries/setup.c?h=v5.1#n773
[2] https://git.qemu.org/?p=SLOF.git;a=blob;f=board-qemu/slof/pci-phb.fs;h=06729bcf77a0d4e900c527adcd9befe2a269f65d;hb=HEAD#l338
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f81c11af
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190719043734.108462-1-aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-29 09:46:07 +10:00
Laurent Vivier ce03a193e1 pseries: Fix compat_pvr on reset
If we a migrate P8 machine to a P9 machine, the migration fails on
destination with:

  error while loading state for instance 0x1 of device 'cpu'
  load of migration failed: Operation not permitted

This is caused because the compat_pvr field is only present for the first
CPU.
Originally, spapr_machine_reset() calls ppc_set_compat() to set the value
max_compat_pvr for the first cpu and this was propagated to all CPUs by
spapr_cpu_reset().  Now, as spapr_cpu_reset() is called before that, the
value is not propagated to all CPUs and the migration fails.

To fix that, propagate the new value to all CPUs in spapr_machine_reset().

Fixes: 25c9780d38 ("spapr: Reset CAS & IRQ subsystem after devices")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20190826090812.19080-1-lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-29 09:46:07 +10:00
Peter Maydell f3b8f18ebf Monitor patches for 2019-08-21
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl1dZKsSHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZTJ4QP/10izA+dSofQ9404GRq3TNzwRCKugU44
 nES9CqDh6x5emx+ADQWYkugblgfH9GOvUaAUNtY+uFaEr55yC/F+VWeVXvyjt5U6
 ZpPZqIRDOHo2+PZrddr/KcKmiomS6plz03m9bzb3pYN1yIl2ZzgClAhAqWQLk0WB
 wwiY+YsJ83YR4sdiRMZkuF+UL7N8fSqYvIIj0yzM8+8ONDor9n16PoPeFg3JSsyG
 aMxXIUnSBZAVtClaNkUPtS0Wf9XEuqoG1rvMRV4Vv+eeb7fwA414DqanRJdLlGMA
 yNRtFcVyztCfjgVEXnY9JJlFe6pDkoe8ycoimQ4YA60C9c1DIMHqyjFWXRHfDwk8
 bYMSX6CTpfoEvbTfmwqYR6KSkb/KuXiFDmcYlTYFvIt3grhhdHQbru9vy+E5sm/b
 j3CPV2DTCkeGY+oZFfKIaQT9yoWZOhmMY5doMTYyinXygPTGQROUrHtzUeRXKmJZ
 arqDRmh+mlEiGETNeYQCI45eYCSDYxO+UNrhszxhmv6B1+ixhIrV2oXhi61vVBeY
 yngY4EILbuA2Z/E4BevJk91ESWJTr3UP13c6p7yf21iN4BD1KkHy5HoXCgYfQDeV
 4kar49g6WQ/VQEiwhi65Xd0OwstynkcV69F+kMagVMgaLeRsdU5ikGJQzxTeWJRl
 SPpc7oDwuAS+
 =2F3E
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-monitor-2019-08-21' into staging

Monitor patches for 2019-08-21

# gpg: Signature made Wed 21 Aug 2019 16:35:07 BST
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-monitor-2019-08-21:
  monitor/qmp: Update comment for commit 4eaca8de26
  qdev: Collect HMP handlers command handlers in qdev-monitor.c
  qapi: Move query-target from misc.json to machine.json
  hw/core: Move cpu.c, cpu.h from qom/ to hw/core/

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-08-22 10:31:21 +01:00
Markus Armbruster 2e5b09fd0e hw/core: Move cpu.c, cpu.h from qom/ to hw/core/
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190709152053.16670-2-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
[Rebased onto merge commit 95a9457fd44; missed instances of qom/cpu.h
in comments replaced]
2019-08-21 13:24:01 +02:00
Greg Kurz e1588bcdd2 spapr/irq: Drop spapr_irq_msi_reset()
PHBs already take care of clearing the MSIs from the bitmap during reset
or unplug. No need to do this globally from the machine code. Rather add
an assert to ensure that PHBs have acted as expected.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156415228966.1064338.190189424190233355.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[dwg: Fix crash in qtest case where spapr->irq_map can be NULL at the
 new assert()]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 17:17:39 +10:00
Nicholas Piggin 93eac7b8f4 spapr: Implement ibm,suspend-me
This has been useful to modify and test the Linux pseries suspend
code but it requires modification to the guest to call it (due to
being gated by other unimplemented features). It is not otherwise
used by Linux yet, but work is slowly progressing there.

This allows a (lightly modified) guest kernel to suspend with
`echo mem > /sys/power/state` and be resumed with system_wakeup
monitor command.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20190722061752.22114-2-npiggin@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 17:17:39 +10:00
Michael Roth 0fb6bd0732 spapr: initial implementation for H_TPM_COMM/spapr-tpm-proxy
This implements the H_TPM_COMM hypercall, which is used by an
Ultravisor to pass TPM commands directly to the host's TPM device, or
a TPM Resource Manager associated with the device.

This also introduces a new virtual device, spapr-tpm-proxy, which
is used to configure the host TPM path to be used to service
requests sent by H_TPM_COMM hcalls, for example:

  -device spapr-tpm-proxy,id=tpmp0,host-path=/dev/tpmrm0

By default, no spapr-tpm-proxy will be created, and hcalls will return
H_FUNCTION.

The full specification for this hypercall can be found in
docs/specs/ppc-spapr-uv-hcalls.txt

Since SVM-related hcalls like H_TPM_COMM use a reserved range of
0xEF00-0xEF80, we introduce a separate hcall table here to handle
them.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com
Message-Id: <20190717205842.17827-3-mdroth@linux.vnet.ibm.com>
[dwg: Corrected #include for upstream change]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 17:17:12 +10:00
Nicholas Piggin 107413142b spapr: Implement H_JOIN
This has been useful to modify and test the Linux pseries suspend
code but it requires modification to the guest to call it (due to
being gated by other unimplemented features). It is not otherwise
used by Linux yet, but work is slowly progressing there.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20190718034214.14948-5-npiggin@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 17:17:12 +10:00
Nicholas Piggin 3a6e6224a9 spapr: Implement H_PROD
H_PROD is added, and H_CEDE is modified to test the prod bit
according to PAPR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20190718034214.14948-3-npiggin@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 17:17:12 +10:00
Nicholas Piggin 03ef074c04 spapr: Implement dispatch tracking for tcg
Implement cpu_exec_enter/exit on ppc which calls into new methods of
the same name in PPCVirtualHypervisorClass. These are used by spapr
to implement the splpar VPA dispatch counter initially.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20190718034214.14948-2-npiggin@gmail.com>
[dwg: Removed unnecessary CONFIG_USER_ONLY checks as suggested by gkurz]
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 17:17:11 +10:00
David Gibson d15d4ad64f spapr_pci: Allow 2MiB and 16MiB IOMMU pagesizes by default
We've had the qemu and kernel KVM infrastructure to handle larger TCE
page sizes for a while, but forgot to update the defaults to actually
allow them.  This turns that change on.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 17:16:22 +10:00
Cornelia Huck 9aec2e52ce hw: add compat machines for 4.2
Add 4.2 machine types for arm/i440fx/q35/s390x/spapr.

For i440fx and q35, unversioned cpu models are still translated
to -v1, as 0788a56bd1 ("i386: Make unversioned CPU models be
aliases") states this should only transition to the latest cpu
model version in 4.3 (or later).

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190724103524.20916-1-cohuck@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-21 11:32:11 +10:00
Markus Armbruster 54d31236b9 sysemu: Split sysemu/runstate.h off sysemu/sysemu.h
sysemu/sysemu.h is a rather unfocused dumping ground for stuff related
to the system-emulator.  Evidence:

* It's included widely: in my "build everything" tree, changing
  sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600
  objects (not counting tests and objects that don't depend on
  qemu/osdep.h, down from 5400 due to the previous two commits).

* It pulls in more than a dozen additional headers.

Split stuff related to run state management into its own header
sysemu/runstate.h.

Touching sysemu/sysemu.h now recompiles some 850 objects.  qemu/uuid.h
also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400
to 4200.  Touching new sysemu/runstate.h recompiles some 500 objects.

Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also
add qemu/main-loop.h.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-30-armbru@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
[Unbreak OS-X build]
2019-08-16 13:37:36 +02:00
Markus Armbruster b58c5c2dd2 numa: Move remaining NUMA declarations from sysemu.h to numa.h
Commit e35704ba9c "numa: Move NUMA declarations from sysemu.h to
numa.h" left a few NUMA-related macros behind.  Move them now.

Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190812052359.30071-26-armbru@redhat.com>
2019-08-16 13:31:53 +02:00
Markus Armbruster a27bd6c779 Include hw/qdev-properties.h less
In my "build everything" tree, changing hw/qdev-properties.h triggers
a recompile of some 2700 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).

Many places including hw/qdev-properties.h (directly or via hw/qdev.h)
actually need only hw/qdev-core.h.  Include hw/qdev-core.h there
instead.

hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h
and hw/qdev-properties.h, which in turn includes hw/qdev-core.h.
Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h.

While there, delete a few superfluous inclusions of hw/qdev-core.h.

Touching hw/qdev-properties.h now recompiles some 1200 objects.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190812052359.30071-22-armbru@redhat.com>
2019-08-16 13:31:53 +02:00
Markus Armbruster 650d103d3e Include hw/hw.h exactly where needed
In my "build everything" tree, changing hw/hw.h triggers a recompile
of some 2600 out of 6600 objects (not counting tests and objects that
don't depend on qemu/osdep.h).

The previous commits have left only the declaration of hw_error() in
hw/hw.h.  This permits dropping most of its inclusions.  Touching it
now recompiles less than 200 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190812052359.30071-19-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster ca77ee28e0 Include migration/qemu-file-types.h a lot less
In my "build everything" tree, changing migration/qemu-file-types.h
triggers a recompile of some 2600 out of 6600 objects (not counting
tests and objects that don't depend on qemu/osdep.h).

The culprit is again hw/hw.h, which supposedly includes it for
convenience.

Include migration/qemu-file-types.h only where it's needed.  Touching
it now recompiles less than 200 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190812052359.30071-10-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16 13:31:52 +02:00
Markus Armbruster 71e8a91585 Include sysemu/reset.h a lot less
In my "build everything" tree, changing sysemu/reset.h triggers a
recompile of some 2600 out of 6600 objects (not counting tests and
objects that don't depend on qemu/osdep.h).

The main culprit is hw/hw.h, which supposedly includes it for
convenience.

Include sysemu/reset.h only where it's needed.  Touching it now
recompiles less than 200 objects.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190812052359.30071-9-armbru@redhat.com>
2019-08-16 13:31:52 +02:00
David Gibson 25c9780d38 spapr: Reset CAS & IRQ subsystem after devices
This fixes a nasty regression in qemu-4.1 for the 'pseries' machine,
caused by the new "dual" interrupt controller model.  Specifically,
qemu can crash when used with KVM if a 'system_reset' is requested
while there's active I/O in the guest.

The problem is that in spapr_machine_reset() we:

1. Reset the CAS vector state
	spapr_ovec_cleanup(spapr->ov5_cas);

2. Reset all devices
	qemu_devices_reset()

3. Reset the irq subsystem
	spapr_irq_reset();

However (1) implicitly changes the interrupt delivery mode, because
whether we're using XICS or XIVE depends on the CAS state.  We don't
properly initialize the new irq mode until (3) though - in particular
setting up the KVM devices.

During (2), we can temporarily drop the BQL allowing some irqs to be
delivered which will go to an irq system that's not properly set up.

Specifically, if the previous guest was in (KVM) XIVE mode, the CAS
reset will put us back in XICS mode.  kvm_kernel_irqchip() still
returns true, because XIVE was using KVM, however XICs doesn't have
its KVM components intialized and kernel_xics_fd == -1.  When the irq
is delivered it goes via ics_kvm_set_irq() which assert()s that
kernel_xics_fd != -1.

This change addresses the problem by delaying the CAS reset until
after the devices reset.  The device reset should quiesce all the
devices so we won't get irqs delivered while we mess around with the
IRQ.  The CAS reset and irq re-initialize should also now be under the
same BQL critical section so nothing else should be able to interrupt
it either.

We also move the spapr_irq_msi_reset() used in one of the legacy irq
modes, since it logically makes sense at the same point as the
spapr_irq_reset() (it's essentially an equivalent operation for older
machine types).  Since we don't need to switch between different
interrupt controllers for those old machine types it shouldn't
actually be broken in those cases though.

Cc: Cédric Le Goater <clg@kaod.org>

Fixes: b2e22477 "spapr: add a 'reset' method to the sPAPR IRQ backend"
Fixes: 13db0cd9 "spapr: introduce a new sPAPR IRQ backend supporting
                 XIVE and XICS"
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-08-13 15:59:21 +10:00
Igor Mammedov cd5ff8333a machine: show if CLI option '-numa node,mem' is supported in QAPI schema
Legacy '-numa node,mem' option has a number of issues and mgmt often
defaults to it. Unfortunately it's no possible to replace it with
an alternative '-numa memdev' without breaking migration compatibility.
What's possible though is to deprecate it, keeping option working with
old machine types only.

In order to help users to find out if being deprecated CLI option
'-numa node,mem' is still supported by particular machine type, add new
"numa-mem-supported" property to output of query-machines.

"numa-mem-supported" is set to 'true' for machines that currently support
NUMA, but it will be flipped to 'false' later on, once deprecation period
expires and kept 'true' only for old machine types that used to support
the legacy option so it won't break existing configuration that are using
it.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1560172207-378962-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-07-05 17:08:03 -03:00
Like Xu fe6b6346e9 hw/ppc: Replace global smp variables with machine smp properties
The global smp variables in ppc are replaced with smp machine properties.

A local variable of the same name would be introduced in the declaration
phase if it's used widely in the context OR replace it on the spot if it's
only used once. No semantic changes.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Message-Id: <20190518205428.90532-5-like.xu@linux.intel.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-07-05 17:07:36 -03:00
Like Xu a0628599fa machine: Refactor smp-related call chains to pass MachineState
To get rid of the global smp_* variables we're currently using, it's recommended
to pass MachineState in the list of incoming parameters for functions that use
global smp variables, thus some redundant parameters are dropped. It's applied
for legacy smbios_*(), *_machine_reset(), hot_add_cpu() and mips *_create_cpu().

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20190518205428.90532-3-like.xu@linux.intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-07-05 17:07:36 -03:00
Peter Maydell a050901d4b ppc patch queue 2019-06-12
Next pull request against qemu-4.1.  The big thing here is adding
 support for hot plug of P2P bridges, and PCI devices under P2P bridges
 on the "pseries" machine (which doesn't use SHPC).  Other than that
 there's just a handful of fixes and small enhancements.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl0AkgwACgkQbDjKyiDZ
 s5Jyug//cwxP+t1t2CNHtffKwiXFzuEKx9YSNE1V0wog6aB40EbPKU72FzCq6FfA
 lev+pZWV9AwVMzFYe4VM/7Lqh7WFMYDT3DOXaZwfANs4471vYtgvPi21L2TBj80d
 hMszlyLWMLY9ByOzCxIq3xnbivGpA94G2q9rKbwXdK4T/5i62Pe3SIfgG+gXiiwW
 +YlHWCPX0I1cJz2bBs9ElXdl7ONWnn+7uDf7gNfWkTKuiUq6Ps7mxzy3GhJ1T7nz
 OFKmQ5dKzLJsgOULSSun8kWpXBmnPffkM3+fCE07edrWZVor09fMCk4HvtfaRy2K
 FFa2Kvzn/V/70TL+44dsSX4QcwdcHQztiaMO7UGPq9CMswx5L7gsNmfX6zvK1Nrb
 1t7ORZKNJ72hMyvDPSMiGU2DpVjO3ZbBlSL4/xG8Qeal4An0kgkN5NcFlB/XEfnz
 dsKu9XzuGSeD1bWz1Mgcf1x7lPDBoHIKLcX6notZ8epP/otu4ywNFvAkPu4fk8s0
 4jQGajIT7328SmzpjXClsmiEskpKsEr7hQjPRhu0hFGrhVc+i9PjkmbDl0TYRAf6
 N6k6gJQAi+StJde2rcua1iS7Ra+Tka6QRKy+EctLqfqOKPb2VmkZ6fswQ3nfRRlT
 LgcTHt2iJcLeud2klVXs1e4pKXzXchkVyFL4ucvmyYG5VeimMzU=
 =ERgu
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190612' into staging

ppc patch queue 2019-06-12

Next pull request against qemu-4.1.  The big thing here is adding
support for hot plug of P2P bridges, and PCI devices under P2P bridges
on the "pseries" machine (which doesn't use SHPC).  Other than that
there's just a handful of fixes and small enhancements.

# gpg: Signature made Wed 12 Jun 2019 06:47:56 BST
# gpg:                using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-4.1-20190612:
  ppc/xive: Make XIVE generate the proper interrupt types
  ppc/pnv: activate the "dumpdtb" option on the powernv machine
  target/ppc: Use tcg_gen_gvec_bitsel
  spapr: Allow hot plug/unplug of PCI bridges and devices under PCI bridges
  spapr: Direct all PCI hotplug to host bridge, rather than P2P bridge
  spapr: Don't use bus number for building DRC ids
  spapr: Clean up DRC index construction
  spapr: Clean up spapr_drc_populate_dt()
  spapr: Clean up dt creation for PCI buses
  spapr: Clean up device tree construction for PCI devices
  spapr: Clean up device node name generation for PCI devices
  target/ppc: Fix lxvw4x, lxvh8x and lxvb16x
  spapr_pci: Improve error message

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-06-12 14:43:47 +01:00
Markus Armbruster a8d2532645 Include qemu-common.h exactly where needed
No header includes qemu-common.h after this commit, as prescribed by
qemu-common.h's file comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-5-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c
block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c
target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h
target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h
target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h
target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and
net/tap-bsd.c fixed up]
2019-06-12 13:20:20 +02:00