Commit Graph

961 Commits

Author SHA1 Message Date
Richard Henderson 078ddbc936 hw/ppc: Constify VMState
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231221031652.119827-48-richard.henderson@linaro.org>
2023-12-30 07:38:06 +11:00
Cornelia Huck 2b10a6760e hw: Add compat machines for 9.0
Add 9.0 machine types for arm/i440fx/m68k/q35/s390x/spapr.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-ID: <20231120094259.1191804-1-cohuck@redhat.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Acked-by: Eric Farman <farman@linux.ibm.com>  # s390x
Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-12-20 10:29:23 +01:00
Juan Quintela 485fb95546 migration: Hack to maintain backwards compatibility for ppc
Current code does:
- register pre_2_10_vmstate_dummy_icp with "icp/server" and instance
  dependinfg on cpu number
- for newer machines, it register vmstate_icp with "icp/server" name
  and instance 0
- now it unregisters "icp/server" for the 1st instance.

This is wrong at many levels:
- we shouldn't have two VMSTATEDescriptions with the same name
- In case this is the only solution that we can came with, it needs to
  be:
  * register pre_2_10_vmstate_dummy_icp
  * unregister pre_2_10_vmstate_dummy_icp
  * register real vmstate_icp

Created vmstate_replace_hack_for_ppc() with warnings left and right
that it is a hack.

CC: Cedric Le Goater <clg@kaod.org>
CC: Daniel Henrique Barboza <danielhb413@gmail.com>
CC: David Gibson <david@gibson.dropbear.id.au>
CC: Greg Kurz <groug@kaod.org>

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-8-quintela@redhat.com>
2023-11-01 16:13:58 +01:00
Steve Sistare c8a7fc5179 migration: simplify blockers
Modify migrate_add_blocker and migrate_del_blocker to take an Error **
reason.  This allows migration to own the Error object, so that if
an error occurs in migrate_add_blocker, migration code can free the Error
and clear the client handle, simplifying client code.  It also simplifies
the migrate_del_blocker call site.

In addition, this is a pre-requisite for a proposed future patch that would
add a mode argument to migration requests to support live update, and
maintain a list of blockers for each mode.  A blocker may apply to a single
mode or to multiple modes, and passing Error** will allow one Error object
to be registered for multiple modes.

No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Tested-by: Michael Galaxy <mgalaxy@akamai.com>
Reviewed-by: Michael Galaxy <mgalaxy@akamai.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1697634216-84215-1-git-send-email-steven.sistare@oracle.com>
2023-10-20 08:51:41 +02:00
Richard Henderson b77af26e97 accel/tcg: Replace CPUState.env_ptr with cpu_env()
Reviewed-by: Anton Johansson <anjo@rev.ng>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-10-04 11:03:54 -07:00
Cédric Le Goater 01a78f23cb spapr: Clean up local variable shadowing in spapr_get_fw_dev_path()
Rename PCIDevice variable to avoid this warning :

  ../hw/ppc/spapr.c: In function ‘spapr_get_fw_dev_path’:
  ../hw/ppc/spapr.c:3217:20: warning: declaration of ‘pcidev’ shadows a previous local [-Wshadow=compatible-local]
   3217 |         PCIDevice *pcidev = CAST(PCIDevice, dev, TYPE_PCI_DEVICE);
        |                    ^~~~~~
  ../hw/ppc/spapr.c:3147:16: note: shadowed declaration is here
   3147 |     PCIDevice *pcidev = CAST(PCIDevice, dev, TYPE_PCI_DEVICE);
        |                ^~~~~~

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-ID: <20230918145850.241074-6-clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2023-09-29 10:07:18 +02:00
Cédric Le Goater c0b648d9e9 spapr: Clean up local variable shadowing in spapr_init_cpus()
Remove extra 'i' variable to fix this warning :

  ../hw/ppc/spapr.c: In function ‘spapr_init_cpus’:
  ../hw/ppc/spapr.c:2668:13: warning: declaration of ‘i’ shadows a previous local [-Wshadow=compatible-local]
   2668 |         int i;
        |             ^
  ../hw/ppc/spapr.c:2645:9: note: shadowed declaration is here
   2645 |     int i;
        |         ^

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-ID: <20230918145850.241074-5-clg@kaod.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2023-09-29 10:07:17 +02:00
Cédric Le Goater bd87a59f52 spapr: Clean up local variable shadowing in spapr_dt_cpus()
Introduce a helper routine defining one CPU device node to fix this
warning :

  ../hw/ppc/spapr.c: In function ‘spapr_dt_cpus’:
  ../hw/ppc/spapr.c:812:19: warning: declaration of ‘cs’ shadows a previous local [-Wshadow=compatible-local]
    812 |         CPUState *cs = rev[i];
        |                   ^~
  ../hw/ppc/spapr.c:786:15: note: shadowed declaration is here
    786 |     CPUState *cs;
        |               ^~

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-ID: <20230918145850.241074-4-clg@kaod.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2023-09-29 10:07:17 +02:00
Michael Tokarev e6a19a6477 ppc: spelling fixes
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2023-09-20 07:54:34 +03:00
Cédric Le Goater 44fa20c928 spapr: Remove support for NVIDIA V100 GPU with NVLink2
NVLink2 support was removed from the PPC PowerNV platform and VFIO in
Linux 5.13 with commits :

  562d1e207d32 ("powerpc/powernv: remove the nvlink support")
  b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2")

This was 2.5 years ago. Do the same in QEMU with a revert of commit
ec132efaa8 ("spapr: Support NVIDIA V100 GPU with NVLink2"). Some
adjustements are required on the NUMA part.

Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Message-ID: <20230918091717.149950-1-clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2023-09-18 07:25:28 -03:00
Nicholas Piggin b27fcb288b spapr: Fix record-replay machine reset consuming too many events
spapr_machine_reset gets a random number to populate the device-tree
rng seed with. When loading a snapshot for record-replay, the machine
is reset again, and that tries to consume the random event record
again, crashing due to inconsistent record

Fix this by saving the seed to populate the device tree with, and
skipping the rng on snapshot load.

Acked-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-09-06 11:19:33 +02:00
Nicholas Piggin 9c7b7f01f9 spapr: Fix machine reset deadlock from replay-record
When the machine is reset to load a new snapshot while being debugged
with replay-record, it is done from another thread, so the CPU does
not run the register setting operations. Set CPU registers directly in
machine reset.

Cc: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-09-06 11:19:33 +02:00
Stefan Hajnoczi 50e7a40af3 target-arm queue:
* hw/gpio/nrf51: implement DETECT signal
  * accel/kvm: Specify default IPA size for arm64
  * ptw: refactor, fix some FEAT_RME bugs
  * target/arm: Adjust PAR_EL1.SH for Device and Normal-NC memory types
  * target/arm/helper: Implement CNTHCTL_EL2.CNT[VP]MASK
  * Fix SME ST1Q
  * Fix 64-bit SSRA
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmTnIoUZHHBldGVyLm1h
 eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3vufEACPJcwyFvSBHDv4VQ6tbgOU
 zwjpUMv4RMKhCOjuxBlJ2DICwOcGNuKer0tc6wkH2T5Ebhoego1osYbRZZoawAJf
 ntg+Ndrx1QH9ORuGqYccLXtHnP741KiKggDHM05BJqB7rqtuH+N4fEn7Cdsw/DNg
 XuCYD5QrxMYvkSOD1l8W0aqp81ucYPgkFqLufypgxrXUiRZ1RBAmPF47BFFdnM8f
 NmrmT1LTF5jr70ySRB+ukK6BAGDc0CUfs6R6nYRwUjRPmSG2rrtUDGo+nOQGDqJo
 PHWmt7rdZQG2w7HVyE/yc3h/CQ3NciwWKbCkRlaoujxHx/B6DRynSeO3NXsP8ELu
 Gizoi3ltwHDQVIGQA19P5phZKHZf7x3MXmK4fDBGB9znvoSFTcjJqkdaN/ARXXO3
 e1vnK1MqnPI8Z1nGdeVIAUIrqhtLHnrrM7jf1tI/e4sjpl3prHq2PvQkakXu8clr
 H8bPZ9zZzyrrSbl4NhpaFTsUiYVxeLoJsNKAmG8dHb+9YsFGXTvEBhtR9eUxnbaV
 XyZ3jEdeW7/ngQ4C6XMD2ZDiKVdx2xJ2Pp5npvljldjmtGUvwQabKo+fPDt2fKjM
 BwjhHA50I633k4fYIwm8YOb70I4oxoL9Lr6PkKriWPMTI5r7+dtwgigREVwnCn+Y
 RsiByKMkDO2TcoQjvBZlCA==
 =3MJ8
 -----END PGP SIGNATURE-----

Merge tag 'pull-target-arm-20230824' of https://git.linaro.org/people/pmaydell/qemu-arm into staging

target-arm queue:
 * hw/gpio/nrf51: implement DETECT signal
 * accel/kvm: Specify default IPA size for arm64
 * ptw: refactor, fix some FEAT_RME bugs
 * target/arm: Adjust PAR_EL1.SH for Device and Normal-NC memory types
 * target/arm/helper: Implement CNTHCTL_EL2.CNT[VP]MASK
 * Fix SME ST1Q
 * Fix 64-bit SSRA

# -----BEGIN PGP SIGNATURE-----
#
# iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmTnIoUZHHBldGVyLm1h
# eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3vufEACPJcwyFvSBHDv4VQ6tbgOU
# zwjpUMv4RMKhCOjuxBlJ2DICwOcGNuKer0tc6wkH2T5Ebhoego1osYbRZZoawAJf
# ntg+Ndrx1QH9ORuGqYccLXtHnP741KiKggDHM05BJqB7rqtuH+N4fEn7Cdsw/DNg
# XuCYD5QrxMYvkSOD1l8W0aqp81ucYPgkFqLufypgxrXUiRZ1RBAmPF47BFFdnM8f
# NmrmT1LTF5jr70ySRB+ukK6BAGDc0CUfs6R6nYRwUjRPmSG2rrtUDGo+nOQGDqJo
# PHWmt7rdZQG2w7HVyE/yc3h/CQ3NciwWKbCkRlaoujxHx/B6DRynSeO3NXsP8ELu
# Gizoi3ltwHDQVIGQA19P5phZKHZf7x3MXmK4fDBGB9znvoSFTcjJqkdaN/ARXXO3
# e1vnK1MqnPI8Z1nGdeVIAUIrqhtLHnrrM7jf1tI/e4sjpl3prHq2PvQkakXu8clr
# H8bPZ9zZzyrrSbl4NhpaFTsUiYVxeLoJsNKAmG8dHb+9YsFGXTvEBhtR9eUxnbaV
# XyZ3jEdeW7/ngQ4C6XMD2ZDiKVdx2xJ2Pp5npvljldjmtGUvwQabKo+fPDt2fKjM
# BwjhHA50I633k4fYIwm8YOb70I4oxoL9Lr6PkKriWPMTI5r7+dtwgigREVwnCn+Y
# RsiByKMkDO2TcoQjvBZlCA==
# =3MJ8
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 24 Aug 2023 05:27:33 EDT
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [full]
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>" [full]
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [full]
# gpg:                 aka "Peter Maydell <peter@archaic.org.uk>" [unknown]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* tag 'pull-target-arm-20230824' of https://git.linaro.org/people/pmaydell/qemu-arm: (35 commits)
  target/arm: Fix 64-bit SSRA
  target/arm: Fix SME ST1Q
  target/arm/helper: Implement CNTHCTL_EL2.CNT[VP]MASK
  target/arm/helper: Check SCR_EL3.{NSE, NS} encoding for AT instructions
  target/arm: Pass security space rather than flag for AT instructions
  target/arm: Skip granule protection checks for AT instructions
  target/arm/helper: Fix tlbmask and tlbbits for TLBI VAE2*
  target/arm/ptw: Load stage-2 tables from realm physical space
  target/arm: Adjust PAR_EL1.SH for Device and Normal-NC memory types
  target/arm/ptw: Report stage 2 fault level for stage 2 faults on stage 1 ptw
  target/arm/ptw: Check for block descriptors at invalid levels
  target/arm/ptw: Set attributes correctly for MMU disabled data accesses
  target/arm/ptw: Drop S1Translate::out_secure
  target/arm/ptw: Remove S1Translate::in_secure
  target/arm/ptw: Remove last uses of ptw->in_secure
  target/arm/ptw: Only fold in NSTable bit effects in Secure state
  target/arm: Pass an ARMSecuritySpace to arm_is_el2_enabled_secstate()
  target/arm/ptw: Pass an ARMSecuritySpace to arm_hcr_el2_eff_secstate()
  target/arm/ptw: Pass ARMSecurityState to regime_translation_disabled()
  target/arm/ptw: Pass ptw into get_phys_addr_pmsa*() and get_phys_addr_disabled()
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-08-24 10:08:33 -04:00
Cornelia Huck 95f5c89eca hw: Add compat machines for 8.2
Add 8.2 machine types for arm/i440fx/m68k/q35/s390x/spapr.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20230718142235.135319-1-cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Laurent Vivier <laurent@vivier.eu>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-08-23 12:06:39 +02:00
Akihiko Odaki bc3e41a0e8 accel/kvm: Use negative KVM type for error propagation
On MIPS, kvm_arch_get_default_type() returns a negative value when an
error occurred so handle the case. Also, let other machines return
negative values when errors occur and declare returning a negative
value as the correct way to propagate an error that happened when
determining KVM type.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20230727073134.134102-5-akihiko.odaki@daynix.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-08-22 17:31:03 +01:00
David Hildenbrand c0ce7b4acb hw/ppc/spapr: Use machine_memory_devices_init()
Let's use our new helper and stop always allocating ms->device_memory.
There is no difference in common memory-device code anymore between
ms->device_memory being NULL or the size being 0. So we only have to
teach spapr code that ms->device_memory isn't always around.

We can now modify two maxram_size checks to rely on ms->device_memory
for detecting whether we have memory devices.

Cc: Daniel Henrique Barboza <danielhb413@gmail.com>
Cc: "Cédric Le Goater" <clg@kaod.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Greg Kurz <groug@kaod.org>
Cc: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20230623124553.400585-5-david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12 09:25:37 +02:00
Nicholas Piggin dc5e072188 spapr: TCG allow up to 8-thread SMT on POWER8 and newer CPUs
PPC TCG supports SMT CPU configurations for non-hypervisor state, so
permit POWER8-10 pseries machines to enable SMT.

This requires PIR and TIR be set, because that's how sibling thread
matching is done by TCG.

spapr's nested-HV capability does not currently coexist with SMT, so
that combination is prohibited (interestingly somewhat analogous to
LPAR-per-core mode on real hardware which also does not support KVM).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[ clg: Also test smp_threads when checking for POWER8 CPU and above ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-06-25 22:41:30 +02:00
Philippe Mathieu-Daudé 516cd73733 hw/ppc/spapr: Test whether TCG is enabled with tcg_enabled()
Although the PPC target only supports the TCG and KVM
accelerators, QEMU supports more. We can not assume that
'!kvm == tcg', so test for the correct accelerator. This
also eases code review, because here we don't care about
KVM, we really want to test for TCG.

Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
[np: Fix changelog typo noticed by Zoltan]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-06-25 22:41:30 +02:00
Nicholas Piggin 6b8a05373b ppc/spapr: Move spapr nested HV to a new file
Create spapr_nested.c for most of the nested HV implementation.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-06-25 22:41:30 +02:00
Nicholas Piggin 277ee17212 target/ppc: Add POWER9 DD2.2 model
POWER9 DD2.1 and earlier had significant limitations when running KVM,
including lack of "mixed mode" MMU support (ability to run HPT and RPT
mode on threads of the same core), and a translation prefetch issue
which is worked around by disabling "AIL" mode for the guest.

These processors are not widely available, and it's difficult to deal
with all these quirks in qemu +/- KVM, so create a POWER9 DD2.2 CPU
and make it the default POWER9 CPU.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-Id: <20230515160201.394587-1-npiggin@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2023-05-28 13:25:11 -03:00
Nicholas Piggin ccc5a4c5e1 spapr: Add SPAPR_CAP_AIL_MODE_3 for AIL mode 3 support for H_SET_MODE hcall
The behaviour of the Address Translation Mode on Interrupt resource is
not consistently supported by all CPU versions or all KVM versions: KVM
HV does not support mode 2, and does not support mode 3 on POWER7 or
early POWER9 processesors. KVM PR only supports mode 0. TCG supports all
modes (0, 2, 3) on CPUs with support for the corresonding LPCR[AIL] mode.
This leads to inconsistencies in guest behaviour and could cause problems
migrating guests.

This was not noticable for Linux guests for a long time because the
kernel only uses modes 0 and 3, and it used to consider AIL-3 to be
advisory in that it would always keep the AIL-0 vectors around, so it
did not matter whether or not interrupts were delivered according to
the AIL mode. Recent Linux guests depend on AIL mode 3 working as
specified in order to support the SCV facility interrupt. If AIL-3 can
not be provided, then H_SET_MODE must return an error to Linux so it can
disable the SCV facility (failure to do so can lead to userspace being
able to crash the guest kernel).

Add the ail-mode-3 capability to specify that AIL-3 is supported. AIL-0
is implied as the baseline, and AIL-2 is no longer supported by spapr.
AIL-2 is not known to be used by any software, but support in TCG could
be restored with an ail-mode-2 capability quite easily if a regression
is reported.

Modify the H_SET_MODE Address Translation Mode on Interrupt resource
handler to check capabilities and correctly return error if not
supported.

KVM has a cap to advertise support for AIL-3.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20230515160216.394612-1-npiggin@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2023-05-28 07:13:54 -03:00
Juan Quintela e1fde0e038 migration: Move rate_limit_max and rate_limit_used to migration_stats
These way we can make them atomic and use this functions from any
place.  I also moved all functions that use rate_limit to
migration-stats.

Functions got renamed, they are not qemu_file anymore.

qemu_file_rate_limit -> migration_rate_exceeded
qemu_file_set_rate_limit -> migration_rate_set
qemu_file_get_rate_limit -> migration_rate_get
qemu_file_reset_rate_limit -> migration_rate_reset
qemu_file_acct_rate_limit -> migration_rate_account.

Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-6-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-18 18:40:51 +02:00
Cornelia Huck f9be4771d3 hw: Add compat machines for 8.1
Add 8.1 machine types for arm/i440fx/m68k/q35/s390x/spapr.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20230314173009.152667-1-cohuck@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-04-20 06:44:15 +02:00
Peter Maydell 222059a0fc ppc patch queue for 2022-12-21:
This queue contains a MAINTAINERS update, the implementation of the Freescale eSDHC,
 the introduction of the DEXCR/HDEXCR instructions and other assorted fixes (most of
 them for the e500 board).
 -----BEGIN PGP SIGNATURE-----
 
 iIwEABYKADQWIQQX6/+ZI9AYAK8oOBk82cqW3gMxZAUCY6M//RYcZGFuaWVsaGI0
 MTNAZ21haWwuY29tAAoJEDzZypbeAzFkaNABAKfQ/zpg2ugr/SmC7Ee9tnFNxDrq
 JsNw+roXpUZvnkUZAQCMRm4BxfaXhXikRaSL2ZfGRtybKXki5o3Ez+rLxISiAg==
 =gRo7
 -----END PGP SIGNATURE-----

Merge tag 'pull-ppc-20221221' of https://gitlab.com/danielhb/qemu into staging

ppc patch queue for 2022-12-21:

This queue contains a MAINTAINERS update, the implementation of the Freescale eSDHC,
the introduction of the DEXCR/HDEXCR instructions and other assorted fixes (most of
them for the e500 board).

# gpg: Signature made Wed 21 Dec 2022 17:18:53 GMT
# gpg:                using EDDSA key 17EBFF9923D01800AF2838193CD9CA96DE033164
# gpg:                issuer "danielhb413@gmail.com"
# gpg: Good signature from "Daniel Henrique Barboza <danielhb413@gmail.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 17EB FF99 23D0 1800 AF28  3819 3CD9 CA96 DE03 3164

* tag 'pull-ppc-20221221' of https://gitlab.com/danielhb/qemu:
  target/ppc: Check DEXCR on hash{st, chk} instructions
  target/ppc: Implement the DEXCR and HDEXCR
  hw/ppc/e500: Move comment to more appropriate place
  hw/ppc/e500: Resolve variable shadowing
  hw/ppc/e500: Prefer local variable over qdev_get_machine()
  hw/ppc/virtex_ml507: Prefer local over global variable
  target/ppc/mmu_common: Fix table layout of "info tlb" HMP command
  target/ppc/mmu_common: Log which effective address had no TLB entry found
  hw/ppc/spapr: Reduce "vof.h" inclusion
  hw/ppc/vof: Do not include the full "cpu.h"
  target/ppc/kvm: Add missing "cpu.h" and "exec/hwaddr.h"
  hw/ppc/e500: Add Freescale eSDHC to e500plat
  hw/sd/sdhci: Support big endian SD host controller interfaces
  MAINTAINERS: downgrade PPC KVM/TCG CPUs and pSeries to 'Odd Fixes'

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-12-21 18:08:09 +00:00
Philippe Mathieu-Daudé 46d80a56a1 hw/ppc/spapr: Reduce "vof.h" inclusion
Currently objects including "hw/ppc/spapr.h" are forced to be
target specific due to the inclusion of "vof.h" in "spapr.h".

"spapr.h" only uses a Vof pointer, so doesn't require the structure
declaration. The only place where Vof structure is accessed is in
spapr.c, so include "vof.h" there, and forward declare the structure
in "spapr.h".

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20221213123550.39302-4-philmd@linaro.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-12-21 14:17:55 -03:00
Cornelia Huck db723c80b1 hw: Add compat machines for 8.0
Add 8.0 machine types for arm/i440fx/m68k/q35/s390x/spapr.

Reviewed-by: Cédric Le Goater <clg@kaod.org> [ppc]
Reviewed-by: Thomas Huth <thuth@redhat.com> [s390x]
Reviewed-by: Greg Kurz <groug@kaod.org> [ppc]
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20221212152145.124317-2-cohuck@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
2022-12-21 06:35:28 -05:00
Markus Armbruster 047f2ca1ce qapi qdev qom: Elide redundant has_FOO in generated C
The has_FOO for pointer-valued FOO are redundant, except for arrays.
They are also a nuisance to work with.  Recent commit "qapi: Start to
elide redundant has_FOO in generated C" provided the means to elide
them step by step.  This is the step for qapi/qdev.json and
qapi/qom.json.

Said commit explains the transformation in more detail.  The invariant
violations mentioned there do not occur here.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Cc: Eduardo Habkost <eduardo@habkost.net>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221104160712.3005652-21-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2022-12-14 20:05:07 +01:00
Jason A. Donenfeld 7966d70f6f reset: allow registering handlers that aren't called by snapshot loading
Snapshot loading only expects to call deterministic handlers, not
non-deterministic ones. So introduce a way of registering handlers that
won't be called when reseting for snapshots.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-id: 20221025004327.568476-2-Jason@zx2c4.com
[PMM: updated json doc comment with Markus' text; fixed
 checkpatch style nit]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27 11:34:31 +01:00
Daniel Henrique Barboza d890f2fa9f hw/ppc: set machine->fdt in spapr machine
The pSeries machine never bothered with the common machine->fdt
attribute. We do all the FDT related work using spapr->fdt_blob.

We're going to introduce a QMP/HMP command to dump the FDT, which will
rely on setting machine->fdt properly to work across all machine
archs/types.

Let's set machine->fdt in two places where we manipulate the FDT:
spapr_machine_reset() and CAS. There are other places where the FDT is
manipulated in the pSeries machines, most notably the hotplug/unplug
path. For now we'll acknowledge that we won't have the most accurate
representation of the FDT, depending on the current machine state, when
using this QMP/HMP fdt command. Making the internal FDT representation
always match the actual FDT representation that the guest is using is a
problem for another day.

spapr->fdt_blob is left untouched for now. To replace it with
machine->fdt, since we're migrating spapr->fdt_blob, we would need to
migrate machine->fdt as well. This is something that we would like to to
do keep our code simpler but it's also a work we'll leave for later.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220926173855.1159396-14-danielhb413@gmail.com>
2022-10-17 16:15:10 -03:00
Philippe Mathieu-Daudé a580fdcd60 hw/ppc/pnv: Avoid dynamic stack allocation
Use autofree heap allocation instead of variable-length
array on the stack.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-id: 20220819153931.3147384-7-peter.maydell@linaro.org
2022-09-22 16:38:28 +01:00
Xuzhou Cheng cb5b5ab9a5 hw/ppc: spapr: Use qemu_vfree() to free spapr->htab
spapr->htab is allocated by qemu_memalign(), hence we should use
qemu_vfree() to free it.

Fixes: c5f54f3e31 ("pseries: Move hash page table allocation to reset time")
Fixes: b4db54132f ("target/ppc: Implement H_REGISTER_PROCESS_TABLE H_CALL"")
Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com>
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220920103159.1865256-28-bmeng.cn@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-09-20 12:31:53 -03:00
Cornelia Huck f514e1477f hw: Add compat machines for 7.2
Add 7.2 machine types for arm/i440fx/m68k/q35/s390x/spapr.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220727121755.395894-1-cohuck@redhat.com>
[thuth: fixed conflict with pcmc->legacy_no_rng_seed]
Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-08-25 21:59:04 +02:00
Leandro Lupori 3c2e80ad2f ppc: Check partition and process table alignment
Check if partition and process tables are properly aligned, in
their size, according to PowerISA 3.1B, Book III 6.7.6 programming
note. Hardware and KVM also raise an exception in these cases.

Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20220628133959.15131-2-leandro.lupori@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18 13:59:43 -03:00
Jason A. Donenfeld c4b075318e hw/ppc: pass random seed to fdt
If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to
initialize early. Set this using the usual guest random number
generation function. This is confirmed to successfully initialize the
RNG on Linux 5.19-rc6. The rng-seed node is part of the DT spec. Set
this on the paravirt platforms, spapr and e500, just as is done on other
architectures with paravirt hardware.

Cc: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220712135114.289855-1-Jason@zx2c4.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18 13:59:43 -03:00
Alexey Kardashevskiy 81b205cecf ppc/spapr: Implement H_WATCHDOG
The new PAPR 2.12 defines a watchdog facility managed via the new
H_WATCHDOG hypercall.

This adds H_WATCHDOG support which a proposed driver for pseries uses:
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=303120

This was tested by running QEMU with a debug kernel and command line:
-append \
 "pseries-wdt.timeout=60 pseries-wdt.nowayout=1 pseries-wdt.action=2"

and running "echo V > /dev/watchdog0" inside the VM.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220622051008.1067464-1-aik@ozlabs.ru>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-06 10:22:38 -03:00
Alexey Kardashevskiy 5bb55f3e3b spapr: Use address from elf parser for kernel address
tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.

QEMU loads the kernel at 0x400000 by default which works most of
the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
(position independent code). This works for a little endian zImage too.

However a big endian zImage is compiled without -pie, is 32bit, linked to
0x4000000 so current QEMU ends up loading it at
0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.

This uses the kernel address returned from load_elf().
If the default kernel_addr is used, there is no change in behavior (as
translate_kernel_address() takes care of this), which is:
LE/BE vmlinux and LE zImage boot, BE zImage does not.
If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
prints a warning and BE zImage boots.

Note #1: SLOF (x-vof=off) still cannot boot a big endian zImage as
SLOF enables MSR_SF for everything loaded by QEMU and this leads to early
crash of 32bit zImage.

Note #2: BE/LE vmlinux images set MSR_SF in early boot so these just work;
a LE zImage restores MSR_SF after every CI call and we are lucky enough
not to crash before the first CI call.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20220504065536.3534488-1-aik@ozlabs.ru>
[danielhb: use PRIx64 instead of lx in warn_report]
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-05-26 17:11:32 -03:00
Paolo Bonzini f73eb9484b pseries: allow setting stdout-path even on machines with a VGA
-machine graphics=off is the usual way to tell the firmware or the OS that the
user wants a serial console.  The pseries machine however does not support
this, and never adds the stdout-path node to the device tree if a VGA device
is provided.  This is in addition to the other magic behavior of VGA devices,
which is to add a keyboard and mouse to the default USB bus.

Split spapr->has_graphics in two variables so that the two behaviors can be
separated: the USB devices remains the same, but the stdout-path is added
even with "-device VGA -machine graphics=off".

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220507054826.124936-1-pbonzini@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-05-26 17:11:32 -03:00
Paolo Bonzini 97ec4d21e0 machine: use QAPI struct for boot configuration
As part of converting -boot to a property with a QAPI type, define
the struct and use it throughout QEMU to access boot configuration.
machine_boot_parse takes care of doing the QemuOpts->QAPI conversion by
hand, for now.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414165300.555321-2-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-12 12:29:43 +02:00
Gautam Agrawal f9bcb2d684 Warn user if the vga flag is passed but no vga device is created
A global boolean variable "vga_interface_created"(declared in softmmu/globals.c)
has been used to track the creation of vga interface. If the vga flag is passed
in the command line "default_vga"(declared in softmmu/vl.c) variable is set to 0.
To warn user, the condition checks if vga_interface_created is false
and default_vga is equal to 0. If "-vga none" is passed, this patch will not warn the
user regarding the creation of VGA device.

The warning "A -vga option was passed but this
machine type does not use that option; no VGA device has been created"
is logged if vga flag is passed but no vga device is created.

This patch has been tested for x86_64, i386, sparc, sparc64 and arm boards.

Signed-off-by: Gautam Agrawal <gautamnagrawal@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/581
Message-Id: <20220501122505.29202-1-gautamnagrawal@gmail.com>
[thuth: Fix wrong warning with "-device" in some cases as reported by Paolo]
Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-05-09 08:21:14 +02:00
Víctor Colombo d41ccf6eea target/ppc: Remove msr_pr macro
msr_pr macro hides the usage of env->msr, which is a bad behavior
Substitute it with FIELD_EX64 calls that explicitly use env->msr
as a parameter.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220504210541.115256-4-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-05-05 15:36:17 -03:00
Cornelia Huck 0ca703662e hw: Add compat machines for 7.1
Add 7.1 machine types for arm/i440fx/m68k/q35/s390x/spapr.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20220316145521.1224083-1-cohuck@redhat.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-04-20 09:36:24 +02:00
Marc-André Lureau 0f9668e0c1 Remove qemu-common.h include from most units
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06 14:31:55 +02:00
Markus Armbruster b21e238037 Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
for two reasons.  One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.

This commit only touches allocations with size arguments of the form
sizeof(T).

Patch created mechanically with:

    $ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \
	     --macro-file scripts/cocci-macro-file.h FILES...

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20220315144156.1595462-4-armbru@redhat.com>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
2022-03-21 15:44:44 +01:00
Peter Maydell 5df022cf2e osdep: Move memalign-related functions to their own header
Move the various memalign-related functions out of osdep.h and into
their own header, which we include only where they are used.
While we're doing this, add some brief documentation comments.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20220226180723.1706285-10-peter.maydell@linaro.org
2022-03-07 13:16:49 +00:00
Daniel Henrique Barboza 5f2b96b38e hw/ppc/spapr.c: fail early if no firmware found in machine_init()
The firmware check consists on a file search (qemu_find_file) and load
it via load_imag_targphys(). This validation is not dependent on any
other machine state but it currently being done at the end of
spapr_machine_init(). This means that we can do a lot of stuff and end
up failing at the end for something that we can verify right out of the
gate.

Move this validation to the start of spapr_machine_init() to fail
earlier.  While we're at it, use g_autofree in the 'filename' pointer.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220228175004.8862-3-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-03-02 06:51:39 +01:00
Daniel Henrique Barboza aebb9b9cb2 hw/ppc/spapr.c: use g_autofree in spapr_dt_chosen()
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220228175004.8862-2-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-03-02 06:51:39 +01:00
Nicholas Piggin 120f738a46 spapr: implement nested-hv capability for the virtual hypervisor
This implements the Nested KVM HV hcall API for spapr under TCG.

The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
L1 is switched back in returned from the hcall when a HV exception
is sent to the vhyp. Register state is copied in and out according to
the nested KVM HV hcall API specification.

The hdecr timer is started when the L2 is switched in, and it provides
the HDEC / 0x980 return to L1.

The MMU re-uses the bare metal radix 2-level page table walker by
using the get_pate method to point the MMU to the nested partition
table entry. MMU faults due to partition scope errors raise HV
exceptions and accordingly are routed back to the L1.

The MMU does not tag translations for the L1 (direct) vs L2 (nested)
guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
and exit).

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[ clg: checkpatch fixes ]
Message-Id: <20220216102545.1808018-10-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Nicholas Piggin 7cebc5db2e target/ppc: Introduce a vhyp framework for nested HV support
Introduce virtual hypervisor methods that can support a "Nested KVM HV"
implementation using the bare metal 2-level radix MMU, and using HV
exceptions to return from H_ENTER_NESTED (rather than cause interrupts).

HV exceptions can now be raised in the TCG spapr machine when running a
nested KVM HV guest. The main ones are the lev==1 syscall, the hdecr,
hdsi and hisi, hv fu, and hv emu, and h_virt external interrupts.

HV exceptions are intercepted in the exception handler code and instead
of causing interrupts in the guest and switching the machine to HV mode,
they go to the vhyp where it may exit the H_ENTER_NESTED hcall with the
interrupt vector numer as return value as required by the hcall API.

Address translation is provided by the 2-level page table walker that is
implemented for the bare metal radix MMU. The partition scope page table
is pointed to the L1's partition scope by the get_pate vhc method.

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220216102545.1808018-9-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Nicholas Piggin f32d4ab41c target/ppc: make vhyp get_pate method take lpid and return success
In prepartion for implementing a full partition table option for
vhyp, update the get_pate method to take an lpid and return a
success/fail indicator.

The spapr implementation currently just asserts lpid is always 0
and always return success.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[ clg: checkpatch fixes ]
Message-Id: <20220216102545.1808018-6-npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Shivaprasad G Bhat b5513584a0 spapr: nvdimm: Implement H_SCM_FLUSH hcall
The patch adds support for the SCM flush hcall for the nvdimm devices.
To be available for exploitation by guest through the next patch. The
hcall is applicable only for new SPAPR specific device class which is
also introduced in this patch.

The hcall expects the semantics such that the flush to return with
H_LONG_BUSY_ORDER_10_MSEC when the operation is expected to take longer
time along with a continue_token. The hcall to be called again by providing
the continue_token to get the status. So, all fresh requests are put into
a 'pending' list and flush worker is submitted to the thread pool. The
thread pool completion callbacks move the requests to 'completed' list,
which are cleaned up after collecting the return status for the guest
in subsequent hcall from the guest.

The semantics makes it necessary to preserve the continue_tokens and
their return status across migrations. So, the completed flush states
are forwarded to the destination and the pending ones are restarted
at the destination in post_load. The necessary nvdimm flush specific
vmstate structures are also introduced in this patch which are to be
saved in the new SPAPR specific nvdimm device to be introduced in the
following patch.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <164396254862.109112.16675611182159105748.stgit@ltczzess4.aus.stglabs.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-02-18 08:34:14 +01:00
Daniel Henrique Barboza 1977434bbf spapr.c: check bus != NULL in spapr_get_fw_dev_path()
spapr_get_fw_dev_path() is an impl of
FWPathProviderClass::get_dev_path(). This interface is used by
hw/core/qdev-fw.c via fw_path_provider_try_get_dev_path() in two
functions:

- static char *qdev_get_fw_dev_path_from_handler(), which is used only in
qdev_get_fw_dev_path_helper() and it's guarded by "if (dev &&
dev->parent_bus)";

- char *qdev_get_own_fw_dev_path_from_handler(), which is used in
softmmu/bootdevice.c in get_boot_device_path() like this:

    if (dev) {
        d = qdev_get_own_fw_dev_path_from_handler(dev->parent_bus, dev);

This means that, when called via softmmu/bootdevice.c, there's no check
of 'dev->parent_bus' being not NULL. The result is that the "BusState
*bus" arg of spapr_get_fw_dev_path() can potentially be NULL and if, at
the same time, "SCSIDevice *d" is not NULL, we'll hit this line:

    void *spapr = CAST(void, bus->parent, "spapr-vscsi");

And we'll SIGINT because 'bus' is NULL and we're accessing bus->parent.

Adding a simple 'bus != NULL' check to guard the instances where we
access 'bus->parent' can avoid this altogether.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220121213852.30243-1-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-01-28 13:15:02 +01:00
Cédric Le Goater 2460e1d75b spapr: Fix support of POWER5+ processors
POWER5+ (ISA v2.03) processors are supported by the pseries machine
but they do not have Altivec instructions. Do not advertise support
for it in the DT.

To be noted that this test is in contradiction with the assert in
cap_vsx_apply().

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20220105095142.3990430-3-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2022-01-12 11:28:26 +01:00
Cornelia Huck 01854af2cf hw: Add compat machines for 7.0
Add 7.0 machine types for arm/i440fx/q35/s390x/spapr.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211217143948.289995-1-cohuck@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-01-05 09:06:36 +01:00
Leandro Lupori 7bf00dfb51 target/ppc: fix Hash64 MMU update of PTE bit R
When updating the R bit of a PTE, the Hash64 MMU was using a wrong byte
offset, causing the first byte of the adjacent PTE to be corrupted.
This caused a panic when booting FreeBSD, using the Hash MMU.

Fixes: a2dd4e83e7 ("ppc/hash64: Rework R and C bit updates")
Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2021-11-29 21:00:08 +01:00
Yanan Wang 2b52619994 machine: Move smp_prefer_sockets to struct SMPCompatProps
Now we have a common structure SMPCompatProps used to store information
about SMP compatibility stuff, so we can also move smp_prefer_sockets
there for cleaner code.

No functional change intended.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-15-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 15:29:15 +02:00
Yanan Wang 4a0af2930a machine: Prefer cores over sockets in smp parsing since 6.2
In the real SMP hardware topology world, it's much more likely that
we have high cores-per-socket counts and few sockets totally. While
the current preference of sockets over cores in smp parsing results
in a virtual cpu topology with low cores-per-sockets counts and a
large number of sockets, which is just contrary to the real world.

Given that it is better to make the virtual cpu topology be more
reflective of the real world and also for the sake of compatibility,
we start to prefer cores over sockets over threads in smp parsing
since machine type 6.2 for different arches.

In this patch, a boolean "smp_prefer_sockets" is added, and we only
enable the old preference on older machines and enable the new one
since type 6.2 for all arches by using the machine compat mechanism.

Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-10-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-01 15:28:16 +02:00
Daniel Henrique Barboza e0eb84d4f5 spapr_numa.c: FORM2 NUMA affinity support
The main feature of FORM2 affinity support is the separation of NUMA
distances from ibm,associativity information. This allows for a more
flexible and straightforward NUMA distance assignment without relying on
complex associations between several levels of NUMA via
ibm,associativity matches. Another feature is its extensibility. This base
support contains the facilities for NUMA distance assignment, but in the
future more facilities will be added for latency, performance, bandwidth
and so on.

This patch implements the base FORM2 affinity support as follows:

- the use of FORM2 associativity is indicated by using bit 2 of byte 5
of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1
or FORM2 affinity. Setting both forms will default to FORM2. We're not
advertising FORM2 for pseries-6.1 and older machine versions to prevent
guest visible changes in those;

- ibm,associativity-reference-points has a new semantic. Instead of
being used to calculate distances via NUMA levels, it's now used to
indicate the primary domain index in the ibm,associativity domain of
each resource. In our case it's set to {0x4}, matching the position
where we already place logical_domain_id;

- two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table
and ibm,numa-distance-table. The index table is used to list all the
NUMA logical domains of the platform, in ascending order, and allows for
spartial NUMA configurations (although QEMU ATM doesn't support that).
ibm,numa-distance-table is an array that contains all the distances from
the first NUMA node to all other nodes, then the second NUMA node
distances to all other nodes and so on;

- get_max_dist_ref_points(), get_numa_assoc_size() and get_associativity()
now checks for OV5_FORM2_AFFINITY and returns FORM2 values if the guest
selected FORM2 affinity during CAS.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-7-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30 12:26:06 +10:00
Daniel Henrique Barboza 5dab5abe62 spapr: move FORM1 verifications to post CAS
FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less)
NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM
device that has a different latency than the original NUMA node from the
regular memory. FORM2 is also able  to deal with asymmetric NUMA
distances gracefully, something that our FORM1 implementation doesn't
do.

Move these FORM1 verifications to a new function and wait until after
CAS, when we're sure that we're sticking with FORM1, to enforce them.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-6-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30 12:26:06 +10:00
Daniel Henrique Barboza 4b08cd567b spapr: use DEVICE_UNPLUG_GUEST_ERROR to report unplug errors
Linux Kernel 5.12 is now unisolating CPU DRCs in the device_removal
error path, signalling that the hotunplug process wasn't successful.
This allow us to send a DEVICE_UNPLUG_GUEST_ERROR in drc_unisolate_logical()
to signal this error to the management layer.

We also have another error path in spapr_memory_unplug_rollback() for
configured LMB DRCs. Kernels older than 5.13 will not unisolate the LMBs
in the hotunplug error path, but it will reconfigure them. Let's send
the DEVICE_UNPLUG_GUEST_ERROR event in that code path as well to cover the
case of older kernels.

Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210907004755.424931-7-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-30 12:26:06 +10:00
Daniel Henrique Barboza 44d886abab spapr.c: handle dev->id in spapr_memory_unplug_rollback()
As done in hw/acpi/memory_hotplug.c, pass an empty string if dev->id
is NULL to qapi_event_send_mem_unplug_error() to avoid relying on
a behavior that can be changed in the future.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210907004755.424931-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-09-29 19:37:39 +10:00
Yanan Wang 52e64f5b1f hw: Add compat machines for 6.2
Add 6.2 machine types for arm/i440fx/q35/s390x/spapr.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-01 11:08:16 +01:00
Peter Maydell d1987c8114 * More SVM fixes (Lara)
* Module annotation database (Gerd)
 * Memory leak fixes (myself)
 * Build fixes (myself)
 * --with-devices-* support (Alex)
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDoeBgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMtFAgAippmxRt3lt+tcdSrCOZlKmxW6veK
 nUidtzfH5uE8vQsh5Q98WCEq871C/C+St1gK+q2H/MLrJeAqZD39DV+SKTuZ6Tcp
 3jL0iYC+oO0OjkHppDQTUDweF9KrsAW1WEeNz2th1OUDSjBXuXbZ+N497taouX18
 p2UN0gKNsOO2/QFrKL5KO7vSC56eBGoZz6gKtw/7dDtJBtizf1xKBRHW43b+CnQJ
 mHLs7Tj6oMC+vnMHkUKLH/6za3WJF1XHs5fp2isRgqoOSP8m0r6CMg8JnFIvmQf/
 tbLospKSWqcgD5C5PlFm2wSOjdU7zuPKM7wchhKrrEIvdDPhXaKrlpwi5Q==
 =GFX1
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging

* More SVM fixes (Lara)
* Module annotation database (Gerd)
* Memory leak fixes (myself)
* Build fixes (myself)
* --with-devices-* support (Alex)

# gpg: Signature made Fri 09 Jul 2021 17:23:52 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini-gitlab/tags/for-upstream: (48 commits)
  meson: Use input/output for entitlements target
  configure: allow the selection of alternate config in the build
  configs: rename default-configs to configs and reorganise
  hw/arm: move CONFIG_V7M out of default-devices
  hw/arm: add dependency on OR_IRQ for XLNX_VERSAL
  meson: Introduce target-specific Kconfig
  meson: switch function tests from compilation to linking
  vl: fix leak of qdict_crumple return value
  target/i386: fix exceptions for MOV to DR
  target/i386: Added DR6 and DR7 consistency checks
  target/i386: Added MSRPM and IOPM size check
  monitor/tcg: move tcg hmp commands to accel/tcg, register them dynamically
  usb: build usb-host as module
  monitor/usb: register 'info usbhost' dynamically
  usb: drop usb_host_dev_is_scsi_storage hook
  monitor: allow register hmp commands
  accel: build tcg modular
  accel: add tcg module annotations
  accel: build qtest modular
  accel: add qtest module annotations
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-11 22:20:51 +01:00
Gerd Hoffmann b7b2a60b01 usb: drop usb_host_dev_is_scsi_storage hook
Introduce an usb device flag instead, set it when usb-host looks at the
device descriptors anyway.  Also set it for emulated storage devices,
for consistency.  Add an inline helper function to check the flag.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jose R. Ziviani <jziviani@suse.de>
Message-Id: <20210624103836.2382472-32-kraxel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-09 18:21:33 +02:00
Bharata B Rao 82123b756a target/ppc: Support for H_RPT_INVALIDATE hcall
If KVM_CAP_RPT_INVALIDATE KVM capability is enabled, then

- indicate the availability of H_RPT_INVALIDATE hcall to the guest via
  ibm,hypertas-functions property.
- Enable the hcall

Both the above are done only if the new sPAPR machine capability
cap-rpt-invalidate is set.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Message-Id: <20210706112440.1449562-3-bharata@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-07-09 11:01:06 +10:00
Alexey Kardashevskiy 21bde1ecb6 spapr: Fix implementation of Open Firmware client interface
This addresses the comments from v22.

The functional changes are (the VOF ones need retesting with Pegasos2):

(VOF) setprop will start failing if the machine class callback
did not handle it;
(VOF) unit addresses are lowered in path_offset();
(SPAPR) /chosen/bootargs is initialized from kernel_cmdline if
the client did not change it.

Fixes: 5c991e5d4378 ("spapr: Implement Open Firmware client interface")
Cc: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210708065625.548396-1-aik@ozlabs.ru>
Tested-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-07-09 10:55:11 +10:00
Alexey Kardashevskiy fc8c745d50 spapr: Implement Open Firmware client interface
The PAPR platform describes an OS environment that's presented by
a combination of a hypervisor and firmware. The features it specifies
require collaboration between the firmware and the hypervisor.

Since the beginning, the runtime component of the firmware (RTAS) has
been implemented as a 20 byte shim which simply forwards it to
a hypercall implemented in qemu. The boot time firmware component is
SLOF - but a build that's specific to qemu, and has always needed to be
updated in sync with it. Even though we've managed to limit the amount
of runtime communication we need between qemu and SLOF, there's some,
and it has become increasingly awkward to handle as we've implemented
new features.

This implements a boot time OF client interface (CI) which is
enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
which implements Open Firmware Client Interface (OF CI). This allows
using a smaller stateless firmware which does not have to manage
the device tree.

The new "vof.bin" firmware image is included with source code under
pc-bios/. It also includes RTAS blob.

This implements a handful of CI methods just to get -kernel/-initrd
working. In particular, this implements the device tree fetching and
simple memory allocator - "claim" (an OF CI memory allocator) and updates
"/memory@0/available" to report the client about available memory.

This implements changing some device tree properties which we know how
to deal with, the rest is ignored. To allow changes, this skips
fdt_pack() when x-vof=on as not packing the blob leaves some room for
appending.

In absence of SLOF, this assigns phandles to device tree nodes to make
device tree traversing work.

When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.

This adds basic instances support which are managed by a hash map
ihandle -> [phandle].

Before the guest started, the used memory is:
0..e60 - the initial firmware
8000..10000 - stack
400000.. - kernel
3ea0000.. - initramdisk

This OF CI does not implement "interpret".

Unlike SLOF, this does not format uninitialized nvram. Instead, this
includes a disk image with pre-formatted nvram.

With this basic support, this can only boot into kernel directly.
However this is just enough for the petitboot kernel and initradmdisk to
boot from any possible source. Note this requires reasonably recent guest
kernel with:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735

The immediate benefit is much faster booting time which especially
crucial with fully emulated early CPU bring up environments. Also this
may come handy when/if GRUB-in-the-userspace sees light of the day.

This separates VOF and sPAPR in a hope that VOF bits may be reused by
other POWERPC boards which do not support pSeries.

This assumes potential support for booting from QEMU backends
such as blockdev or netdev without devices/drivers used.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210625055155.2252896-1-aik@ozlabs.ru>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
[dwg: Adjusted some includes which broke compile in some more obscure
 compilation setups]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-07-09 10:38:19 +10:00
Alexey Kardashevskiy 7381c5d11f spapr: tune rtas-size
QEMU reserves space for RTAS via /rtas/rtas-size which tells the client
how much space the RTAS requires to work which includes the RTAS binary
blob implementing RTAS runtime. Because pseries supports FWNMI which
requires plenty of space, QEMU reserves more than 2KB which is
enough for the RTAS blob as it is just 20 bytes (under QEMU).

Since FWNMI reset delivery was added, RTAS_SIZE macro is not used anymore.
This replaces RTAS_SIZE with RTAS_MIN_SIZE and uses it in
the /rtas/rtas-size calculation to account for the RTAS blob.

Fixes: 0e236d3477 ("ppc/spapr: Implement FWNMI System Reset delivery")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210622070336.1463250-1-aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-07-09 10:38:18 +10:00
Greg Kurz 3bf0844f3b spapr: Don't hijack current_machine->boot_order
QEMU 6.0 moved all the -boot variables to the machine. Especially, the
removal of the boot_order static changed the handling of '-boot once'
from:

    if (boot_once) {
        qemu_boot_set(boot_once, &error_fatal);
        qemu_register_reset(restore_boot_order, g_strdup(boot_order));
    }

to

    if (current_machine->boot_once) {
        qemu_boot_set(current_machine->boot_once, &error_fatal);
        qemu_register_reset(restore_boot_order,
                            g_strdup(current_machine->boot_order));
    }

This means that we now register as subsequent boot order a copy
of current_machine->boot_once that was just set with the previous
call to qemu_boot_set(), i.e. we never transition away from the
once boot order.

It is certainly fragile^Wwrong for the spapr code to hijack a
field of the base machine type object like that. The boot order
rework simply turned this software boundary violation into an
actual bug.

Have the spapr code to handle that with its own field in
SpaprMachineState. Also kfree() the initial boot device
string when "once" was used.

Fixes: 4b7acd2ac8 ("vl: clean up -boot variables")
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1960119
Cc: pbonzini@redhat.com
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210521160735.1901914-1-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-06-03 13:22:06 +10:00
Lucas Mateus Castro (alqotel) 03282a3ab8 hw/ppc: moved has_spr to cpu.h
Moved has_spr to cpu.h as ppc_has_spr and turned it into an inline function.
Change spr verification in pnv.c and spapr.c to a version that can
compile in a !TCG environment.

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Message-Id: <20210507164146.67086-1-lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-05-19 10:30:28 +10:00
Fabiano Rosas ab5add4c7b hw/ppc/spapr.c: Make sure the host supports the selected MMU mode
Starting with Linux kernel v5.12 we dropped support[1] in KVM for
hosts that can't have their threads running in different MMU modes
(POWER9 < DD2.2). In these hosts, KVM will no longer report the
KVM_CAP_PPC_MMU_HASH_V3 capability[2] when the host is running Radix.

For guests that support both MMU modes, the negotiation during CAS
will make sure it selects the correct one.

For guests that only support Hash, such as P8 compat mode guests, the
following error is currently thrown:

  $ ~/qemu-system-ppc64 -machine pseries,accel=kvm,max-cpu-compat=power8 ...
  error: kvm run failed Invalid argument
  NIP 0000000000000100   LR 0000000000000000 CTR 0000000000000000 XER 0000000000000000 CPU#0
  MSR 8000000000001000 HID0 0000000000000000  HF 8000000000000000 iidx 3 didx 3
  TB 00000000 00000000 DECR 0
  GPR00 0000000000000000 0000000000000000 0000000000000000 000000007ff00000
  GPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  CR 00000000  [ -  -  -  -  -  -  -  -  ]             RES ffffffffffffffff
   SRR0 0000000000000000  SRR1 0000000000000000    PVR 00000000004e1201 VRSAVE 0000000000000000
  SPRG0 0000000000000000 SPRG1 0000000000000000  SPRG2 0000000000000000  SPRG3 0000000000000000
  SPRG4 0000000000000000 SPRG5 0000000000000000  SPRG6 0000000000000000  SPRG7 0000000000000000
  HSRR0 0000000000000000 HSRR1 0000000000000000
   CFAR 0000000000000000
   LPCR 000000000004f01f
   PTCR 0000000000000000   DAR 0000000000000000  DSISR 0000000000000000

This patch adds a verification during the writing of the platform
support vector so that we error out as soon as we determine this guest
only supports Hash and the host doesn't.

  ~/qemu-system-ppc64 -machine pseries,accel=kvm,max-cpu-compat=power8 ...
  qemu-system-ppc64: Guest requested unavailable MMU mode (hash).

1- https://git.kernel.org/torvalds/p/b1b1697ae0cc8
2- https://git.kernel.org/torvalds/p/a722076e94702

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20210505001130.3999968-3-farosas@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-05-19 10:30:28 +10:00
Fabiano Rosas 068479e1e1 hw/ppc/spapr.c: Extract MMU mode error reporting into a function
A following patch will make use of it.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20210505001130.3999968-2-farosas@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-05-19 10:30:28 +10:00
Peter Maydell d90f154867 ppc patch queue 2021-05-04
Here's the first ppc pull request for qemu-6.1.  It has a wide variety
 of stuff accumulated during the 6.0 freeze.  Highlights are:
 
  * Multi-phase reset cleanups for PAPR
  * Preliminary cleanups towards allowing !CONFIG_TCG for the ppc target
  * Cleanup of AIL logic and extension to POWER10
  * Further improvements to handling of hot unplug failures on PAPR
  * Allow much larger numbers of CPU on pseries
  * Support for the H_SCM_HEALTH hypercall
  * Add support for the Pegasos II board
  * Substantial cleanup to hflag handling
  * Assorted minor fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAmCQ4ScACgkQbDjKyiDZ
 s5KmNhAAsICdDqeu/jm1uhRCr0DDT/Wa6KE1xlglQ53ybWb5Hm2ae0Uwzti5ZWkt
 T9yryObX++wiugbU5Dlx9eXTiJIPgTbDoBV1wfOa3a1BAxSEES1t70jwuwAXXBpX
 mgU++SurQB70IB7vVvyXDi2Z592qGvMiKXqT0sdkfoexPHzAL0+KkQPyJZLeFchM
 Ap/zRHAodXf9SuWAl+LwLXeb350jivXYXBWNcFRrBbOGpbVT0AJMYrk/TEa2ZIpi
 SvbzAWuW+9mX0EOmk7JK5JfkT41cGNdcBcwd0bt4xyvUpmkXLaTMFDLVHj3HWSUn
 PFA4RB3uKXyTfISVtWdxJBbFOzMpchI6lEiRJHCS+KuY7UsACqV1T/y54ATOUauC
 ycLc9APgRaStdNPxfDl+xeFfoVb/f0mQsNwcmY1tv7z+3qE/trY9bMyrbgaebBFn
 /TAkmPvXfwtAREnx8xF/57poarWUkvupGTQkANNosdFokpExmrLj8T0sKv90hh5Y
 vkGf5zP4pYGN1Rs8qhOdHu+IjhVJvUl/L3LZYWcoMI6E61D8rGRc0Dkacx7gcja+
 sluFi5Yh2fQn55y6LTi3049cB1wMd6wly0214F11RKoBswguiGuaqJmL4sNDO/s4
 IcMCy5mg6C0jNZA5kHcdWmqsVzD2+XwP5J29n/LedlmgXoHYF+M=
 =N0qr
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210504' into staging

ppc patch queue 2021-05-04

Here's the first ppc pull request for qemu-6.1.  It has a wide variety
of stuff accumulated during the 6.0 freeze.  Highlights are:

 * Multi-phase reset cleanups for PAPR
 * Preliminary cleanups towards allowing !CONFIG_TCG for the ppc target
 * Cleanup of AIL logic and extension to POWER10
 * Further improvements to handling of hot unplug failures on PAPR
 * Allow much larger numbers of CPU on pseries
 * Support for the H_SCM_HEALTH hypercall
 * Add support for the Pegasos II board
 * Substantial cleanup to hflag handling
 * Assorted minor fixes and cleanups

# gpg: Signature made Tue 04 May 2021 06:52:39 BST
# gpg:                using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dg-gitlab/tags/ppc-for-6.1-20210504: (46 commits)
  hw/ppc/pnv_psi: Use device_cold_reset() instead of device_legacy_reset()
  hw/ppc/spapr_vio: Reset TCE table object with device_cold_reset()
  hw/intc/spapr_xive: Use device_cold_reset() instead of device_legacy_reset()
  target/ppc: removed VSCR from SPR registration
  target/ppc: Reduce the size of ppc_spr_t
  target/ppc: Clean up _spr_register et al
  target/ppc: Add POWER10 exception model
  target/ppc: rework AIL logic in interrupt delivery
  target/ppc: move opcode table logic to translate.c
  target/ppc: code motion from translate_init.c.inc to gdbstub.c
  spapr_drc.c: handle hotunplug errors in drc_unisolate_logical()
  spapr.h: increase FDT_MAX_SIZE
  spapr.c: do not use MachineClass::max_cpus to limit CPUs
  ppc: Rename current DAWR macros and variables
  target/ppc: POWER10 supports scv
  target/ppc: Fix POWER9 radix guest HV interrupt AIL behaviour
  docs/system: ppc: Add documentation for ppce500 machine
  roms/u-boot: Bump ppce500 u-boot to v2021.04 to fix broken pci support
  roms/Makefile: Update ppce500 u-boot build directory name
  ppc/spapr: Add support for implement support for H_SCM_HEALTH
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-05-05 20:29:14 +01:00
Daniel Henrique Barboza 5642e4513e spapr.c: do not use MachineClass::max_cpus to limit CPUs
Up to this patch, 'max_cpus' value is hardcoded to 1024 (commit
6244bb7e58). In theory this patch would simply bump it to 2048, since
it's the default NR_CPUS kernel setting for ppc64 servers nowadays, but
the whole mechanic of MachineClass:max_cpus is flawed for the pSeries
machine. The two supported accelerators, KVM and TCG, can live without
it.

TCG guests don't have a theoretical limit. The user must be free to
emulate as many CPUs as the hardware is capable of. And even if there
were a limit, max_cpus is not the proper way to report it since it's a
common value checked by SMP code in machine_smp_parse() for KVM as well.

For KVM guests, the proper way to limit KVM CPUs is by host
configuration via NR_CPUS, not a QEMU hardcoded value. There is no
technical reason for a pSeries QEMU guest to forcefully stay below
NR_CPUS.

This hardcoded value also disregard hosts that might have a lower
NR_CPUS limit, say 512. In this case, machine.c:machine_smp_parse() will
allow a 1024 value to pass, but then kvm_init() will complain about it
because it will exceed NR_CPUS:

Number of SMP cpus requested (1024) exceeds the maximum cpus supported
by KVM (512)

A better 'max_cpus' value would consider host settings, but
MachineClass::max_cpus is defined well before machine_init() and
kvm_init(). We can't check for KVM limits because it's too soon, so we
end up making a guess.

This patch makes MachineClass:max_cpus settings innocuous by setting it
to INT32_MAX. machine.c:machine_smp_parse() will not fail the
verification based on max_cpus, letting kvm_init() do the checking with
actual host settings. And TCG guests get to do whatever the hardware is
capable of emulating.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210408204049.221802-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-05-04 11:41:25 +10:00
Alexey Kardashevskiy 4b98e72d97 spapr: Rename RTAS_MAX_ADDR to FDT_MAX_ADDR
SLOF instantiates RTAS since
744a928cce ("spapr: Stop providing RTAS blob")
so the max address applies to the FDT only.

This renames the macro and fixes up the comment.

This should not cause any behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20210331025123.29310-1-aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-05-04 11:41:25 +10:00
Thomas Huth ee86213aa3 Do not include exec/address-spaces.h if it's not really necessary
Stop including exec/address-spaces.h in files that don't need it.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-5-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-05-02 17:24:51 +02:00
Thomas Huth ead62c75f6 Do not include hw/boards.h if it's not really necessary
Stop including hw/boards.h in files that don't need it.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-3-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-05-02 17:24:51 +02:00
Cornelia Huck da7e13c00b hw: add compat machines for 6.1
Add 6.1 machine types for arm/i440fx/q35/s390x/spapr.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Greg Kurz <groug@kaod.org>
Message-id: 20210331111900.118274-1-cohuck@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-04-30 11:16:51 +01:00
Daniel Henrique Barboza 2b18fc794f spapr.c: always pulse guest IRQ in spapr_core_unplug_request()
Commit 47c8c915b1 fixed a problem where multiple spapr_drc_detach()
requests were breaking QEMU. The solution was to just spapr_drc_detach()
once, and use spapr_drc_unplug_requested() to filter whether we already
detached it or not. The commit also tied the hotplug request to the
guest in the same condition.

Turns out that there is a reliable way for a CPU hotunplug to fail. If a
guest with one CPU hotplugs a CPU1, then offline CPU0s via 'echo 0 >
/sys/devices/system/cpu/cpu0/online', then attempts to hotunplug CPU1,
the kernel will refuse it because it's the last online CPU of the
system. Given that we're pulsing the IRQ only in the first try, in a
failed attempt, all other CPU1 hotunplug attempts will fail, regardless
of the online state of CPU1 in the kernel, because we're simply not
letting the guest know that we want to hotunplug the device.

Let's move spapr_hotplug_req_remove_by_index() back out of the "if
(!spapr_drc_unplug_requested(drc))" conditional, allowing for multiple
'device_del' requests to the same CPU core to reach the guest, in case
the CPU core didn't fully hotunplugged previously.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210401000437.131140-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-12 12:27:14 +10:00
Daniel Henrique Barboza d522cb52e6 spapr: rollback 'unplug timeout' for CPU hotunplugs
The pseries machines introduced the concept of 'unplug timeout' for CPU
hotunplugs. The idea was to circunvent a deficiency in the pSeries
specification (PAPR), that currently does not define a proper way for
the hotunplug to fail. If the guest refuses to release the CPU (see [1]
for an example) there is no way for QEMU to detect the failure.

Further discussions about how to send a QAPI event to inform about the
hotunplug timeout [2] exposed problems that weren't predicted back when
the idea was developed. Other QEMU machines don't have any type of
hotunplug timeout mechanism for any device, e.g. ACPI based machines
have a way to make hotunplug errors visible to the hypervisor. This
would make this timeout mechanism exclusive to pSeries, which is not
ideal.

The real problem is that a QAPI event that reports hotunplug timeouts
puts the management layer (namely Libvirt) in a weird spot. We're not
telling that the hotunplug failed, because we can't be 100% sure of
that, and yet we're resetting the unplug state back, preventing any
DEVICE_DEL events to reach out in case the guest decides to release the
device. Libvirt would need to inspect the guest itself to see if the
device was released or not, otherwise the internal domain states will be
inconsistent.  Moreover, Libvirt already has an 'unplug timeout'
concept, and a QEMU side timeout would need to be juggled together with
the existing Libvirt timeout.

All this considered, this solution ended up creating more trouble than
it solved. This patch reverts the 3 commits that introduced the timeout
mechanism for CPU hotplugs in pSeries machines.

This reverts commit 4515a5f786
"qemu_timer.c: add timer_deadline_ms() helper"

This reverts commit d1c2e3ce3d
"spapr_drc.c: add hotunplug timeout for CPUs"

This reverts commit 51254ffb32
"spapr_drc.c: introduce unplug_timeout_timer"

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1911414
[2] https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg04682.html

CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210401000437.131140-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-12 12:27:14 +10:00
Greg Kurz df2d7ca774 spapr: Assert DIMM unplug state in spapr_memory_unplug()
spapr_memory_unplug() is the last step of the hot unplug sequence.
It is indirectly called by:

 spapr_lmb_release()
  hotplug_handler_unplug()

and spapr_lmb_release() already buys us that DIMM unplug state is
present : it gets restored with spapr_recover_pending_dimm_state()
if missing.

g_assert() that spapr_pending_dimm_unplugs_find() cannot return NULL
in spapr_memory_unplug() to make this clear and silence Coverity.

Fixes: Coverity CID 1450767
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <161562021166.948373.15092876234470478331.stgit@bahia.lan>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-31 11:10:50 +11:00
Daniel Henrique Barboza eb7f80fd26 spapr.c: send QAPI event when memory hotunplug fails
Recent changes allowed the pSeries machine to rollback the hotunplug
process for the DIMM when the guest kernel signals, via a
reconfiguration of the DR connector, that it's not going to release the
LMBs.

Let's also warn QAPI listerners about it. One place to do it would be
right after the unplug state is cleaned up,
spapr_clear_pending_dimm_unplug_state(). This would mean that the
function is now doing more than cleaning up the pending dimm state
though.

This patch does the following changes in spapr.c:

- send a QAPI event to inform that we experienced a failure in the
  hotunplug of the DIMM;

- rename spapr_clear_pending_dimm_unplug_state() to
  spapr_memory_unplug_rollback(). This is a better fit for what the
  function is now doing, and it makes callers care more about what the
  function goal is and less about spapr.c internals such as clearing
  the pending dimm unplug state.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210302141019.153729-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-10 09:07:09 +11:00
Daniel Henrique Barboza 41c8ad3d92 spapr.c: remove duplicated assert in spapr_memory_unplug_request()
We are asserting the existence of the first DRC LMB after sending unplug
requests to all LMBs of the DIMM, where every DRC is being asserted
inside the loop. This means that the first DRC is being asserted twice.

Remove the duplicated assert.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210302141019.153729-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-10 09:07:09 +11:00
Daniel Henrique Barboza 7420033ec4 spapr.c: add 'unplug already in progress' message for PHB unplug
Both CPU hotunplug and PC_DIMM unplug reports an user warning,
mentioning that the hotunplug is in progress, if consecutive
'device_del' are issued in quick succession.

Do the same for PHBs in spapr_phb_unplug_request().

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210226163301.419727-4-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-10 09:07:09 +11:00
Daniel Henrique Barboza fe1831eff8 spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state
Handling errors in memory hotunplug in the pSeries machine is more
complex than any other device type, because there are all the
complications that other devices has, and more.

For instance, determining a timeout for a DIMM hotunplug must consider
if it's a Hash-MMU or a Radix-MMU guest, because Hash guests takes
longer to hotunplug DIMMs. The size of the DIMM is also a factor, given
that longer DIMMs naturally takes longer to be hotunplugged from the
kernel. And there's also the guest memory usage to be considered: if
there's a process that is consuming memory that would be lost by the
DIMM unplug, the kernel will postpone the unplug process until the
process finishes, and then initiate the regular hotunplug process. The
first two considerations are manageable, but the last one is a deal
breaker.

There is no sane way for the pSeries machine to determine the memory
load in the guest when attempting a DIMM hotunplug - and even if there
was a way, the guest can start using all the RAM in the middle of the
unplug process and invalidate our previous assumptions - and in result
we can't even begin to calculate a timeout for the operation. This means
that we can't implement a viable timeout mechanism for memory unplug in
pSeries.

Going back to why we would consider an unplug timeout, the reason is
that we can't know if the kernel is giving up the unplug. Turns out
that, sometimes, we can. Consider a failed memory hotunplug attempt
where the kernel will error out with the following message:

'pseries-hotplug-mem: Memory indexed-count-remove failed, adding any
removed LMBs'

This happens when there is a LMB that the kernel gave up in removing,
and the LMBs previously marked for removal are now being added back.
This happens in the pseries kernel in [1], dlpar_memory_remove_by_ic()
into dlpar_add_lmb(), and after that update_lmb_associativity_index().
In this function, the kernel is configuring the LMB DRC connector again.
Note that this is a valid usage in LOPAR, as stated in section
"ibm,configure-connector RTAS Call":

'A subsequent sequence of calls to ibm,configure-connector with the same
entry from the “ibm,drc-indexes” or “ibm,drc-info” property will restart
the configuration of devices which were not completely configured.'

We can use this kernel behavior in our favor. If a DRC connector
reconfiguration for a LMB that we marked as unplug pending happens, this
indicates that the kernel changed its mind about the unplug and is
reasserting that it will keep using all the LMBs of the DIMM. In this
case, it's safe to assume that the whole DIMM device unplug was
cancelled.

This patch hops into rtas_ibm_configure_connector() and, in the scenario
described above, clear the unplug state for the DIMM device. This will
not solve all the problems we still have with memory unplug, but it will
cover this case where the kernel reconfigures LMBs after a failed
unplug. We are a bit more resilient, without using an unreliable
timeout, and we didn't make the remaining error cases any worse.

[1] arch/powerpc/platforms/pseries/hotplug-memory.c

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210222194531.62717-6-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-10 09:07:09 +11:00
Daniel Henrique Barboza d1c2e3ce3d spapr_drc.c: add hotunplug timeout for CPUs
There is a reliable way to make a CPU hotunplug fail in the pseries
machine. Hotplug a CPU A, then offline all other CPUs inside the guest
but A. When trying to hotunplug A the guest kernel will refuse to do it,
because A is now the last online CPU of the guest. PAPR has no 'error
callback' in this situation to report back to the platform, so the guest
kernel will deny the unplug in silent and QEMU will never know what
happened. The unplug pending state of A will remain until the guest is
shutdown or rebooted.

Previous attempts of fixing it (see [1] and [2]) were aimed at trying to
mitigate the effects of the problem. In [1] we were trying to guess
which guest CPUs were online to forbid hotunplug of the last online CPU
in the QEMU layer, avoiding the scenario described above because QEMU is
now failing in behalf of the guest. This is not robust because the last
online CPU of the guest can change while we're in the middle of the
unplug process, and our initial assumptions are now invalid. In [2] we
were accepting that our unplug process is uncertain and the user should
be allowed to spam the IRQ hotunplug queue of the guest in case the CPU
hotunplug fails.

This patch presents another alternative, using the timeout
infrastructure introduced in the previous patch. CPU hotunplugs in the
pSeries machine will now timeout after 15 seconds. This is a long time
for a single CPU unplug to occur, regardless of guest load - although
the user is *strongly* encouraged to *not* hotunplug devices from a
guest under high load - and we can be sure that something went wrong if
it takes longer than that for the guest to release the CPU (the same
can't be said about memory hotunplug - more on that in the next patch).

Timing out the unplug operation will reset the unplug state of the CPU
and allow the user to try it again, regardless of the error situation
that prevented the hotunplug to occur. Of all the not so pretty
fixes/mitigations for CPU hotunplug errors in pSeries, timing out the
operation is an admission that we have no control in the process, and
must assume the worst case if the operation doesn't succeed in a
sensible time frame.

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg03353.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04400.html

Reported-by: Xujun Ma <xuma@redhat.com>
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210222194531.62717-5-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-10 09:07:09 +11:00
Daniel Henrique Barboza a03509cd2b spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()
spapr_drc_detach() is not the best name for what the function does. The
function does not detach the DRC, it makes an uncommited attempt to do
it.  It'll mark the DRC as pending unplug, via the 'unplug_request'
flag, and only if the DRC state is drck->empty_state it will detach the
DRC, via spapr_drc_release().

This is a contrast with its pair spapr_drc_attach(), where the function
is indeed creating the DRC QOM object. If you know what
spapr_drc_attach() does, you can be misled into thinking that
spapr_drc_detach() is removing the DRC from QEMU internal state, which
isn't true.

The current role of this function is better described as a request for
detach, since there's no guarantee that we're going to detach the DRC in
the end.  Rename the function to spapr_drc_unplug_request to reflect
what is is doing.

The initial idea was to change the name to spapr_drc_detach_request(),
and later on change the unplug_request flag to detach_request. However,
unplug_request is a migratable boolean for a long time now and renaming
it is not worth the trouble. spapr_drc_unplug_request() setting
drc->unplug_request is more natural than spapr_drc_detach_request
setting drc->unplug_request.

Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210222194531.62717-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-10 09:07:08 +11:00
Daniel Henrique Barboza 6640706972 spapr_numa.c: create spapr_numa_initial_nvgpu_numa_id() helper
We'll need to check the initial value given to spapr->gpu_numa_id when
building the rtas DT, so put it in a helper for easier access and to
avoid repetition.

Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210128174213.1349181-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-10 10:43:50 +11:00
Daniel Henrique Barboza 3b880445e6 spapr: move spapr_machine_using_legacy_numa() to spapr_numa.c
This function is used only in spapr_numa.c.

Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210128174213.1349181-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-10 10:43:50 +11:00
Greg Kurz 040bdafce1 spapr: Adjust firmware path of PCI devices
It is currently not possible to perform a strict boot from USB storage:

$ qemu-system-ppc64 -accel kvm -nodefaults -nographic -serial stdio \
	-boot strict=on \
	-device qemu-xhci \
	-device usb-storage,drive=disk,bootindex=0 \
	-blockdev driver=file,node-name=disk,filename=fedora-ppc64le.qcow2

SLOF **********************************************************************
QEMU Starting
 Build Date = Jul 17 2020 11:15:24
 FW Version = git-e18ddad8516ff2cf
 Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/vty@71000000
Populating /vdevice/nvram@71000001
Populating /pci@800000020000000
                     00 0000 (D) : 1b36 000d    serial bus [ usb-xhci ]
No NVRAM common partition, re-initializing...
Scanning USB
  XHCI: Initializing
    USB Storage
       SCSI: Looking for devices
          101000000000000 DISK     : "QEMU     QEMU HARDDISK    2.5+"
Using default console: /vdevice/vty@71000000

  Welcome to Open Firmware

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php

Trying to load:  from: /pci@800000020000000/usb@0/storage@1/disk@101000000000000 ...
E3405: No such device

E3407: Load failed

  Type 'boot' and press return to continue booting the system.
  Type 'reset-all' and press return to reboot the system.

Ready!
0 >

The device tree handed over by QEMU to SLOF indeed contains:

qemu,boot-list =
	"/pci@800000020000000/usb@0/storage@1/disk@101000000000000 HALT";

but the device node is named usb-xhci@0, not usb@0.

This happens because the firmware names of PCI devices returned
by get_boot_devices_list() come from pcibus_get_fw_dev_path(),
while the sPAPR PHB code uses a different naming scheme for
device nodes. This inconsistency has always been there but it was
hidden for a long time because SLOF used to rename USB device
nodes, until this commit, merged in QEMU 4.2.0 :

commit 85164ad4ed
Author: Alexey Kardashevskiy <aik@ozlabs.ru>
Date:   Wed Sep 11 16:24:32 2019 +1000

    pseries: Update SLOF firmware image

    This fixes USB host bus adapter name in the device tree to match QEMU's
    one.

    Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
    Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Fortunately, sPAPR implements the firmware path provider interface.
This provides a way to override the default firmware paths.

Just factor out the sPAPR PHB naming logic from spapr_dt_pci_device()
to a helper, and use it in the sPAPR firmware path provider hook.

Fixes: 85164ad4ed ("pseries: Update SLOF firmware image")
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210122170157.246374-1-groug@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-10 10:43:50 +11:00
Daniel Henrique Barboza a85bb34e1c spapr.c: add 'name' property for hotplugged CPUs nodes
In the CPU hotunplug bug [1] the guest kernel throws a scary
message in dmesg:

pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16

The reason isn't related to the bug though. This happens because the
kernel file arch/powerpc/platform/pseries/hotplug-cpu.c, function
dlpar_cpu_remove(), is not finding the device_node.name of the offending
CPU.

We're not populating the 'name' property for hotplugged CPUs. Since the
kernel relies on device_node.name for identifying CPU nodes, and the
CPUs that are coldplugged has the 'name' property filled by SLOF, this
is creating an unneeded inconsistency between hotplug and coldplug CPUs
in the kernel.

Let's fill the 'name' property for hotplugged CPUs as well. This will
make the guest dmesg throws a less intimidating message when we try to
unplug the last online CPU:

pseries-hotplug-cpu: Failed to offline CPU PowerPC,POWER9@1, rc: -16

[1] https://bugzilla.redhat.com/1911414

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210120232305.241521-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-10 10:43:49 +11:00
Daniel Henrique Barboza 7265bc3e54 spapr.c: use g_auto* with 'nodename' in CPU DT functions
Next patch will use the 'nodename' string in spapr_core_dt_populate()
after the point it's being freed today.

Instead of moving 'g_free(nodename)' around, let's do a QoL change in
both CPU DT functions where 'nodename' is being freed, and use
g_autofree to avoid the 'g_free()' call altogether.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210120232305.241521-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-02-10 10:43:49 +11:00
David Gibson 6c8ebe30ea spapr: Add PEF based confidential guest support
Some upcoming POWER machines have a system called PEF (Protected
Execution Facility) which uses a small ultravisor to allow guests to
run in a way that they can't be eavesdropped by the hypervisor.  The
effect is roughly similar to AMD SEV, although the mechanisms are
quite different.

Most of the work of this is done between the guest, KVM and the
ultravisor, with little need for involvement by qemu.  However qemu
does need to tell KVM to allow secure VMs.

Because the availability of secure mode is a guest visible difference
which depends on having the right hardware and firmware, we don't
enable this by default.  In order to run a secure guest you need to
create a "pef-guest" object and set the confidential-guest-support
property to point to it.

Note that this just *allows* secure guests, the architecture of PEF is
such that the guest still needs to talk to the ultravisor to enter
secure mode.  Qemu has no direct way of knowing if the guest is in
secure mode, and certainly can't know until well after machine
creation time.

To start a PEF-capable guest, use the command line options:
    -object pef-guest,id=pef0 -machine confidential-guest-support=pef0

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2021-02-08 16:57:38 +11:00
Greg Kurz 73598c75df spapr: Improve handling of memory unplug with old guests
Since commit 1e8b5b1aa1 ("spapr: Allow memory unplug to always succeed")
trying to unplug memory from a guest that doesn't support it (eg. rhel6)
no longer generates an error like it used to. Instead, it leaves the
memory around : only a subsequent reboot or manual use of drmgr within
the guest can complete the hot-unplug sequence. A flag was added to
SpaprMachineClass so that this new behavior only applies to the default
machine type.

We can do better. CAS processes all pending hot-unplug requests. This
means that we don't really care about what the guest supports if
the hot-unplug request happens before CAS.

All guests that we care for, even old ones, set enough bits in OV5
that lead to a non-empty bitmap in spapr->ov5_cas. Use that as a
heuristic to decide if CAS has already occured or not.

Always accept unplug requests that happen before CAS since CAS will
process them. Restore the previous behavior of rejecting them after
CAS when we know that the guest doesn't support memory hot-unplug.

This behavior is suitable for all machine types : this allows to
drop the pre_6_0_memory_unplug flag.

Fixes: 1e8b5b1aa1 ("spapr: Allow memory unplug to always succeed")
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <161012708715.801107.11418801796987916516.stgit@bahia.lan>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-01-19 10:20:29 +11:00
Greg Kurz 1105504100 spapr: Use spapr_drc_reset_all() at machine reset
Documentation of object_child_foreach_recursive() clearly stipulates
that "it is forbidden to add or remove children from @obj from the @fn
callback". But this is exactly what we do during machine reset. The call
to spapr_drc_reset() can finalize the hot-unplug sequence of a PHB or a
PCI bridge, both of which will then in turn destroy their PCI DRCs. This
could potentially invalidate the iterator used by do_object_child_foreach().
It is pure luck that this haven't caused any issues so far.

Use spapr_drc_reset_all() since it can cope with DRC removal.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201218103400.689660-5-groug@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-01-06 11:09:59 +11:00
Greg Kurz 1e8b5b1aa1 spapr: Allow memory unplug to always succeed
It is currently impossible to hot-unplug a memory device between
machine reset and CAS.

(qemu) device_del dimm1
Error: Memory hot unplug not supported for this guest

This limitation was introduced in order to provide an explicit
error path for older guests that didn't support hot-plug event
sources (and thus memory hot-unplug).

The linux kernel has been supporting these since 4.11. All recent
enough guests are thus capable of handling the removal of a memory
device at all time, including during early boot.

Lift the limitation for the latest machine type. This means that
trying to unplug memory from a guest that doesn't support it will
likely just do nothing and the memory will only get removed at
next reboot. Such older guests can still get the existing behavior
by using an older machine type.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160794035064.23292.17560963281911312439.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-01-06 11:09:59 +11:00
Greg Kurz 776e887f08 spapr: Fix DR properties of the root node
Section 13.5.2 of LoPAPR mandates various DR related indentifiers
for all hot-pluggable entities to be exposed in the "ibm,drc-indexes",
"ibm,drc-power-domains", "ibm,drc-names" and "ibm,drc-types" properties
of their parent node. These properties are created with spapr_dt_drc().

PHBs and LMBs are both children of the machine. Their DR identifiers
are thus supposed to be exposed in the afore mentioned properties of
the root node.

When PHB hot-plug support was added, an extra call to spapr_dt_drc()
was introduced: this overwrites the existing properties, previously
populated with the LMB identifiers, and they end up containing only
PHB identifiers. This went unseen so far because linux doesn't care,
but this is still not conformant with LoPAPR.

Fortunately spapr_dt_drc() is able to handle multiple DR entity types
at the same time. Use that to handle DR indentifiers for PHBs and LMBs
with a single call to spapr_dt_drc(). While here also account for PMEM
DR identifiers, which were forgotten when NVDIMM hot-plug support was
added. Also add an assert to prevent further misuse of spapr_dt_drc().

With -m 1G,maxmem=2G,slots=8 passed on the QEMU command line we get:

Without this patch:

/proc/device-tree/ibm,drc-indexes
		 0000001f 20000001 20000002 20000003
		 20000000 20000005 20000006 20000007
		 20000004 20000009 20000008 20000010
		 20000011 20000012 20000013 20000014
		 20000015 20000016 20000017 20000018
		 20000019 2000000a 2000000b 2000000c
		 2000000d 2000000e 2000000f 2000001a
		 2000001b 2000001c 2000001d 2000001e

These are the DRC indexes for the 31 possible PHBs.

With this patch:

/proc/device-tree/ibm,drc-indexes
		 0000002b 90000000 90000001 90000002
		 90000003 90000004 90000005 90000006
		 90000007 20000001 20000002 20000003
		 20000000 20000005 20000006 20000007
		 20000004 20000009 20000008 20000010
		 20000011 20000012 20000013 20000014
		 20000015 20000016 20000017 20000018
		 20000019 2000000a 2000000b 2000000c
		 2000000d 2000000e 2000000f 2000001a
		 2000001b 2000001c 2000001d 2000001e
		 80000004 80000005 80000006 80000007

And now we also have the 4 ((2G - 1G) / 256M) LMBs and the
8 (slots) PMEMs.

Fixes: 3998ccd092 ("spapr: populate PHB DRC entries for root DT node")
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160794479566.35245.17809158217760761558.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-01-06 11:09:59 +11:00
Greg Kurz 73231f7c5f spapr: DRC lookup cannot fail
All memory DRC objects are created during machine init. It is thus safe
to assume spapr_drc_by_id() cannot return NULL when hot-plug/unplugging
memory.

Make this clear with an assertion, like the code already does a few lines
above when looping over memory DRCs. This fixes Coverity reports 1437757
and 1437758.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <160805381160.228955.5388294067094240175.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-01-06 11:09:59 +11:00
Igor Mammedov 55810e90cc ppc/spapr: cleanup -machine pseries,nvdimm=X handling
Since NVDIMM support was introduced on pseries machine,
it ignored machine's nvdimm=on|off option and effectively
was always enabled on machines that support NVDIMM.
Later on commit
  (28f5a71621 ppc/spapr_nvdimm: do not enable support with 'nvdimm=off')
makes QEMU error out in case user explicitly set 'nvdimm=off'
on CLI by peeking at machine_opts.

However that's a workaround and leaves 'nvdimms_state->is_enabled'
in inconsistent state (false) when it should be set true
by default.

Instead of using on machine_opts, set default to true for pseries
machine in initfn time. If user sets manually 'nvdimm=off'
it will overwrite default value to false and QEMU will error
as expected without need to peek into machine_opts.

That way pseries will have, nvdimm enabled by default and
will honor user provided 'nvdimm=on|off'.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20201208164606.4109134-1-imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-15 12:51:53 -05:00
Daniel Henrique Barboza 07b10bc42c spapr.c: set a 'kvm-type' default value instead of relying on NULL
spapr_kvm_type() is considering 'vm_type=NULL' as a valid input, where
the function returns 0. This is relying on the current QEMU machine
options handling logic, where the absence of the 'kvm-type' option
will be reflected as 'vm_type=NULL' in this function.

This is not robust, and will break if QEMU options code decides to propagate
something else in the case mentioned above (e.g. an empty string instead
of NULL).

Let's avoid this entirely by setting a non-NULL default value in case of
no user input for 'kvm-type'. spapr_kvm_type() was changed to handle 3 fixed
values of kvm-type: "auto", "hv", and "pr", with "auto" being the default
if no kvm-type was set by the user. This allows us to always be predictable
regardless of any enhancements/changes made in QEMU options mechanics.

While we're at it, let's also document in 'kvm-type' description the
already existing default mode, now named 'auto'. The information provided
about it is based on how the pseries kernel handles the KVM_CREATE_VM
ioctl(), where the default value '0' makes the kernel choose an available
KVM module to use, giving precedence to kvm_hv. This logic is described in
the kernel source file arch/powerpc/kvm/powerpc.c, function kvm_arch_init_vm().

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20201210145517.1532269-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
2020-12-14 15:54:12 +11:00
Greg Kurz bc370a659a spapr: spapr_drc_attach() cannot fail
All users are passing &error_abort already. Document the fact
that spapr_drc_attach() should only be passed a free DRC, which
is supposedly the case if appropriate checking is done earlier.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20201201113728.885700-5-groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-12-14 15:54:12 +11:00