Include the interrupt presenter under the machine_data as we plan to
remove it from under PowerPCCPU
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
While looking at the s390x implementation, looks like spapr has a
similar BUG when building the topology.
The primary bus number corresponds always to the bus number of the
bus the bridge is attached to.
Right now, if we have two bridges attached to the same bus (e.g. root
bus) this is however not the case. The first bridge will have primary
bus 0, the second bridge primary bus 1, which is wrong. Fix the assignment.
While at it, drop setting the PCI_SUBORDINATE_BUS temporarily to 0xff.
Setting it temporarily to that value (as discussed e.g. in [1]), is
only relevant for a running system that probes the buses. The value is
effectively unused for us just doing a DFS.
[1] http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Machine types 3.0 and older only know about the legacy XICS backend.
Make it clear by erroring out if the user tries to set ic-mode on
such machines.
Signed-off-by: Greg Kurz <groug@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In hw/scsi/spapr_vio.c we declare that the controller supports multiple
buses by specifying "max_channel = 7" there. So in the code that fixes
up the device tree nodes, we must encode the channel number (a.k.a. bus
number in the "Logical unit addressing format" table of SAM5) into the
64-bit LUN, too.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1663160
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
commit efe2add7cb ("spapr/vio: deprecate the "irq" property") was
merged in QEMU version 3.0. The "irq" property" can be removed for
QEMU version 4.0.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When reading base register of RAM slot with no RAM we should not try
to calculate register value because that will result printing an error
due to invalid RAM size. Just return 0 without the error in this case.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Fix the encoding of larger memory modules in the SoC registers which
allows specifying more than 1GB memory for sam460ex. Well, only 2GB
due to SoC and firmware restrictions which was the only missing value
compared to what the real hardware supports. The SoC should support up
to 4GB but when setting that the firmware hangs during memory test.
This may be an overflow bug in the firmware which I did not try to
debug but this may affect real hardware as well.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The sdram_set_bcr() function in ppc440_uc.c takes a pointer into an
array then calculates its index from that. It's simpler and easier to
just pass the index which simplifies both the function and its callers.
Do similar cleanup in ppc4xx_devs.c to similar function.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There's already a struct with the same name in ppc4xx_devs.c. They are
not used outside their files so don't clash but they are also not
identical so rename the ppc440 specific one to distinguish them.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
To avoid overflow if larger values are added later use ram_addr_t for
the sdram_bank_sizes parameter to match ram_size to which it is compared.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Get rid of code from MIPS Malta board used to create SPD EEPROM data
(parts of which was not even needed for sam460ex) and use the generic
spd_data_generate() function to simplify this.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When compiling with Clang in -std=gnu99 mode, there is a warning/error:
CC ppc64-softmmu/hw/intc/xics_spapr.o
In file included from /home/thuth/devel/qemu/hw/intc/xics_spapr.c:34:
/home/thuth/devel/qemu/include/hw/ppc/xics.h:203:34: error: redefinition of typedef 'sPAPRMachineState' is a C11 feature
[-Werror,-Wtypedef-redefinition]
typedef struct sPAPRMachineState sPAPRMachineState;
^
/home/thuth/devel/qemu/include/hw/ppc/spapr_irq.h:25:34: note: previous definition is here
typedef struct sPAPRMachineState sPAPRMachineState;
^
We have to remove the duplicated typedef here and include "spapr.h" instead.
But "spapr.h" should not be included for the pnv machine files. So move
the spapr-related prototypes into a new file called "xics_spapr.h" instead.
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* esp bugfixes (Guenter)
* Windows build cleanup (Marc-André)
* checkpatch logic improvements (Paolo)
* coalesced range bugfix (Paolo)
* switch testsuite to TAP (Paolo)
* QTAILQ rewrite (Paolo)
* block/iscsi.c cancellation fixes (Stefan)
* improve selection of the default accelerator (Thomas)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJcOKyMAAoJEL/70l94x66DxKEH/1ho2Xl8ezxCecA6q3HqTgMT
NJ/ntdqQwVwekKOWzsywnM3/LkEDLH55MxbTeQ8M/Vb1seS8eROz24/gPTzvFrfR
n/d11rDV1EJfWe0H7nGLLFiRv0MSjxLpG9c3dlOKWhwOYHm25tr48PsdfVFP9Slz
BK3rwrMeDgArfptHAIsAXt2h1S0EzrG9pMwGDpErCDzziXxBhUESE0Iqfw8LsH1K
VjMn6rn7Ts1XKlxxwsm+BzHlTJghbj3tWPIfk+6uK2isP4iM3gFCoav3SG9XVXof
V9+vFyMxdtZKT/0HvajhUS4/1S/uGBNNchZRnCxXlpbueWc5ROtvarhM6Hb0eck=
=i8E5
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* HAX support for Linux hosts (Alejandro)
* esp bugfixes (Guenter)
* Windows build cleanup (Marc-André)
* checkpatch logic improvements (Paolo)
* coalesced range bugfix (Paolo)
* switch testsuite to TAP (Paolo)
* QTAILQ rewrite (Paolo)
* block/iscsi.c cancellation fixes (Stefan)
* improve selection of the default accelerator (Thomas)
# gpg: Signature made Fri 11 Jan 2019 14:47:40 GMT
# gpg: using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream: (34 commits)
avoid TABs in files that only contain a few
remove space-tab sequences
scripts: add script to convert multiline comments into 4-line format
hw/watchdog/wdt_i6300esb: remove a unnecessary comment
checkpatch: warn about qemu/queue.h head structs that are not typedef-ed
qemu/queue.h: simplify reverse access to QTAILQ
qemu/queue.h: reimplement QTAILQ without pointer-to-pointers
qemu/queue.h: remove Q_TAILQ_{HEAD,ENTRY}
qemu/queue.h: typedef QTAILQ heads
qemu/queue.h: leave head structs anonymous unless necessary
vfio: make vfio_address_spaces static
qemu/queue.h: do not access tqe_prev directly
test: replace gtester with a TAP driver
test: execute g_test_run when tests are skipped
qga: drop < Vista compatibility
build-sys: build with Vista API by default
build-sys: move windows defines in osdep.h header
build-sys: don't include windows.h, osdep.h does it
scsi: esp: Defer command completion until previous interrupts have been handled
esp-pci: Fix status register write erase control
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Most files that have TABs only contain a handful of them. Change
them to spaces so that we don't confuse people.
disas, standard-headers, linux-headers and libdecnumber are imported
from other projects and probably should be exempted from the check.
Outside those, after this patch the following files still contain both
8-space and TAB sequences at the beginning of the line. Many of them
have a majority of TABs, or were initially committed with all tabs.
bsd-user/i386/target_syscall.h
bsd-user/x86_64/target_syscall.h
crypto/aes.c
hw/audio/fmopl.c
hw/audio/fmopl.h
hw/block/tc58128.c
hw/display/cirrus_vga.c
hw/display/xenfb.c
hw/dma/etraxfs_dma.c
hw/intc/sh_intc.c
hw/misc/mst_fpga.c
hw/net/pcnet.c
hw/sh4/sh7750.c
hw/timer/m48t59.c
hw/timer/sh_timer.c
include/crypto/aes.h
include/disas/bfd.h
include/hw/sh4/sh.h
libdecnumber/decNumber.c
linux-headers/asm-generic/unistd.h
linux-headers/linux/kvm.h
linux-user/alpha/target_syscall.h
linux-user/arm/nwfpe/double_cpdo.c
linux-user/arm/nwfpe/fpa11_cpdt.c
linux-user/arm/nwfpe/fpa11_cprt.c
linux-user/arm/nwfpe/fpa11.h
linux-user/flat.h
linux-user/flatload.c
linux-user/i386/target_syscall.h
linux-user/ppc/target_syscall.h
linux-user/sparc/target_syscall.h
linux-user/syscall.c
linux-user/syscall_defs.h
linux-user/x86_64/target_syscall.h
slirp/cksum.c
slirp/if.c
slirp/ip.h
slirp/ip_icmp.c
slirp/ip_icmp.h
slirp/ip_input.c
slirp/ip_output.c
slirp/mbuf.c
slirp/misc.c
slirp/sbuf.c
slirp/socket.c
slirp/socket.h
slirp/tcp_input.c
slirp/tcpip.h
slirp/tcp_output.c
slirp/tcp_subr.c
slirp/tcp_timer.c
slirp/tftp.c
slirp/udp.c
slirp/udp.h
target/cris/cpu.h
target/cris/mmu.c
target/cris/op_helper.c
target/sh4/helper.c
target/sh4/op_helper.c
target/sh4/translate.c
tcg/sparc/tcg-target.inc.c
tests/tcg/cris/check_addo.c
tests/tcg/cris/check_moveq.c
tests/tcg/cris/check_swap.c
tests/tcg/multiarch/test-mmap.c
ui/vnc-enc-hextile-template.h
ui/vnc-enc-zywrle.h
util/envlist.c
util/readline.c
The following have only TABs:
bsd-user/i386/target_signal.h
bsd-user/sparc64/target_signal.h
bsd-user/sparc64/target_syscall.h
bsd-user/sparc/target_signal.h
bsd-user/sparc/target_syscall.h
bsd-user/x86_64/target_signal.h
crypto/desrfb.c
hw/audio/intel-hda-defs.h
hw/core/uboot_image.h
hw/sh4/sh7750_regnames.c
hw/sh4/sh7750_regs.h
include/hw/cris/etraxfs_dma.h
linux-user/alpha/termbits.h
linux-user/arm/nwfpe/fpopcode.h
linux-user/arm/nwfpe/fpsr.h
linux-user/arm/syscall_nr.h
linux-user/arm/target_signal.h
linux-user/cris/target_signal.h
linux-user/i386/target_signal.h
linux-user/linux_loop.h
linux-user/m68k/target_signal.h
linux-user/microblaze/target_signal.h
linux-user/mips64/target_signal.h
linux-user/mips/target_signal.h
linux-user/mips/target_syscall.h
linux-user/mips/termbits.h
linux-user/ppc/target_signal.h
linux-user/sh4/target_signal.h
linux-user/sh4/termbits.h
linux-user/sparc64/target_syscall.h
linux-user/sparc/target_signal.h
linux-user/x86_64/target_signal.h
linux-user/x86_64/termbits.h
pc-bios/optionrom/optionrom.h
slirp/mbuf.h
slirp/misc.h
slirp/sbuf.h
slirp/tcp.h
slirp/tcp_timer.h
slirp/tcp_var.h
target/i386/svm.h
target/sparc/asi.h
target/xtensa/core-dc232b/xtensa-modules.inc.c
target/xtensa/core-dc233c/xtensa-modules.inc.c
target/xtensa/core-de212/core-isa.h
target/xtensa/core-de212/xtensa-modules.inc.c
target/xtensa/core-fsf/xtensa-modules.inc.c
target/xtensa/core-sample_controller/core-isa.h
target/xtensa/core-sample_controller/xtensa-modules.inc.c
target/xtensa/core-test_kc705_be/core-isa.h
target/xtensa/core-test_kc705_be/xtensa-modules.inc.c
tests/tcg/cris/check_abs.c
tests/tcg/cris/check_addc.c
tests/tcg/cris/check_addcm.c
tests/tcg/cris/check_addoq.c
tests/tcg/cris/check_bound.c
tests/tcg/cris/check_ftag.c
tests/tcg/cris/check_int64.c
tests/tcg/cris/check_lz.c
tests/tcg/cris/check_openpf5.c
tests/tcg/cris/check_sigalrm.c
tests/tcg/cris/crisutils.h
tests/tcg/cris/sys.c
tests/tcg/i386/test-i386-ssse3.c
ui/vgafont.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20181213223737.11793-3-pbonzini@redhat.com>
Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Eric Blake <eblake@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Stefan Markovic <smarkovic@wavecomp.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Most list head structs need not be given a name. In most cases the
name is given just in case one is going to use QTAILQ_LAST, QTAILQ_PREV
or reverse iteration, but this does not apply to lists of other kinds,
and even for QTAILQ in practice this is only rarely needed. In addition,
we will soon reimplement those macros completely so that they do not
need a name for the head struct. So clean up everything, not giving a
name except in the rare case where it is necessary.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Instead of verbose arrays with 4 lines for each entry, make each
entry take only one line. This makes long arrays that couldn't
fit in the screen become short and readable.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190107193020.21744-4-ehabkost@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
stringify() is useful when we need to use macros in compat_props
(like when we set virtio-baloon-pci.class=PCI_CLASS_MEMORY_RAM at
pc_i440fx_1_0_machine_options()), but it is pointless when we are
already providing a number literal.
Replace stringify() with string literals when appropriate.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190107193020.21744-3-ehabkost@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The macro is only used in one place, where the purpose of the
value is obvious. Eliminate the macro so we don't need to rely
on stringify().
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190107193020.21744-2-ehabkost@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Depending on the interrupt mode of the machine, enable or disable the
XIVE MMIOs.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The 'dual' sPAPR IRQ backend supports both interrupt mode, XIVE
exploitation mode and the legacy compatibility mode (XICS). both modes
are not supported at the same time.
The machine starts with the legacy mode and a new interrupt mode can
then be negotiated by the CAS process. In this case, the new mode is
activated after a reset to take into account the required changes in
the machine. These impact the device tree layout, the interrupt
presenter object and the exposed MMIO regions in the case of XIVE.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The qemu_irq array is now allocated at the machine level using a sPAPR
IRQ set_irq handler depending on the chosen interrupt mode. The use of
this handler is slightly inefficient today but it will become necessary
when the 'dual' interrupt mode is introduced.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Future changes of the ICSState object will remove the qemu_irq array
from under the interrupt controller model. Prepare ground for the PSI
interrupt sources and introduce a new one directly under the PSI
device model.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The error value can be squashed by the section handling radix migration.
Simply bail out if an error occurs when the RTC offset is imported.
This fixes the Coverity issue CID 1398591.
Fixes: d39c90f5f3 ("spapr: Fix migration of Radix guests")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Now that the 'intc' pointer is only used by the XICS interrupt mode,
let's make things clear and use a XICS type and name.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
which will be used by the machine only when the XIVE interrupt mode is
in use.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, the interrupt presenter is linked to a CPU using the
cpu_intc_create() method of the sPAPR IRQ backend. The resulting
object is assigned to the PowerPCCPU 'intc' pointer whatever the
interrupt mode, XICS or XIVE.
To support the 'dual' interrupt mode, we will need to distinguish
between the two presenter objects and for that, we plan to introduce a
second interrupt presenter object pointer under the PowerPCCPU. The
modifications below move the assignment of the presenter object under
the cpu_intc_create() method to prepare ground for the future changes.
Both sPAPR and PowerNV machines are impacted.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The qirq routines of the XiveSource and the sPAPRXive model are only
used under the sPAPR IRQ backend. Simplify the overall call stack and
gather all the code under spapr_qirq_xive(). It will ease future
changes.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
PHB hotplug will bring more users for it. Let's define it along with
the PHB defines from which it is derived for simplicity.
While here fix a misleading comment about manual placement, which was
abandoned with 30b3bc5aa9.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This function is only used when creating the default PHB. Let's rename
it and move it to the core machine code for clarity.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Debug logs were left enabled in ppc4xx_devs.c whereas in other files
these are normally not enabled. Disable it here as well.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
SLOF receives a device tree and updates it with various properties
before switching to the guest kernel and QEMU is not aware of any changes
made by SLOF. Since there is no real RTAS (QEMU implements it), it makes
sense to pass the SLOF final device tree to QEMU to let it implement
RTAS related tasks better, such as PCI host bus adapter hotplug.
Specifially, now QEMU can find out the actual XICS phandle (for PHB
hotplug) and the RTAS linux,rtas-entry/base properties (for firmware
assisted NMI - FWNMI).
This stores the initial DT blob in the sPAPR machine and replaces it
in the KVMPPC_H_UPDATE_DT (new private hypercall) handler.
This adds an @update_dt_enabled machine property to allow backward
migration.
SLOF already has a hypercall since
https://github.com/aik/SLOF/commit/e6fc84652c9c0073f9183
This makes use of the new fdt_check_full() helper. In order to allow
the configure script to pick the correct DTC version, this adjusts
the DTC presense test.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
H_HOME_NODE_ASSOCIATIVITY H-Call returns the associativity domain
designation associated with the identifier input parameter
This fixes a crash when we try to hotplug a CPU in memory-less and
CPU-less numa node. In this case, the kernel tries to online the
node, but without the information provided by this h-call, the node id,
it cannot and the CPU is started while the node is not onlined.
It also removes the warning message from the kernel:
VPHN is not supported. Disabling polling..
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Support u-boot 'noload' images for Arm (as used by NetBSD/evbarm GENERIC kernel)
* hw/misc/tz-mpc: Fix value of BLK_MAX register
* target/arm: Emit barriers for A32/T32 load-acquire/store-release insns
* nRF51 SoC: add timer, GPIO, RNG peripherals
* hw/arm/allwinner-a10: Add the 'A' SRAM and the SRAM controller
* cpus.c: Fix race condition in cpu_stop_current()
* hw/arm: versal: Plug memory leaks
* Allow M profile boards to run even if -kernel not specified
* gdbstub: Add multiprocess extension support for use when the
board has multiple CPUs of different types (like the Xilinx Zynq boards)
* target/arm: Don't decode S bit in SVE brk[ab] merging insns
* target/arm: Convert ARM_TBFLAG_* to FIELDs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJcM36AAAoJEDwlJe0UNgzepuMP/A6umcXRrO+vOZgkW+cvJ8cD
JkDdb8H/u3S6zqNokABI3Ya/areX1P30sRV7e7mC5IsknVNZe0MqQX6TW5477HMP
Oz/m1AbyByWMLVILFiWfte5dtRRLfs3axzrmhu6HwJXe0NIUiYQofoJzCZEDMxDn
71cehgeNkUGA36HViPyqzHZYADFkCX3Tfmh1FEh2jD7taK9GNsff8p6cHTb05W7d
wWk68PS8VKTb5VrYH6SyiAHW8gBVrrUkYlkPKHzemK5fwlgDOSfxVLthf8mo08SH
QxEXI430tagdmrGNO/nKOTA2NQwMzvCk/OLf0Qwg9I9F9pYtiOJ7nXXbtqDC8eKy
DdHsL57W0F7sFkoVt+YNHSeylyLRluDh+D+Q7OHnlvwsEYmecqsWkW/A2CYC0uWs
8ajxPBNpGG1lIvo63YK5/4kOy0DE/6ISljYOSlYYg3iXeAZPkQZMTlUxoYmJQ+Zr
h1tLg1N9SuyQK5g5Uuluw2GwgzIv/Bt1LFo7pnvsA2X6PKiv6nno40T8q0Lw6ah4
lmAUWx0OUilTrvQwterHlr6hfWu2RLiRoxCg06a3C93YlRjsR3vZOBeQ5ByaE+ho
5ItKn58EerO+UaweVoc6MDhJFPC8b16Eee281BCec8Ks4GR1tIcpP/0z2lUwhBu6
hoPmkoPtFtu1dKBgF8Ma
=x1jv
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20190107' into staging
target-arm queue:
* Support u-boot 'noload' images for Arm (as used by NetBSD/evbarm GENERIC kernel)
* hw/misc/tz-mpc: Fix value of BLK_MAX register
* target/arm: Emit barriers for A32/T32 load-acquire/store-release insns
* nRF51 SoC: add timer, GPIO, RNG peripherals
* hw/arm/allwinner-a10: Add the 'A' SRAM and the SRAM controller
* cpus.c: Fix race condition in cpu_stop_current()
* hw/arm: versal: Plug memory leaks
* Allow M profile boards to run even if -kernel not specified
* gdbstub: Add multiprocess extension support for use when the
board has multiple CPUs of different types (like the Xilinx Zynq boards)
* target/arm: Don't decode S bit in SVE brk[ab] merging insns
* target/arm: Convert ARM_TBFLAG_* to FIELDs
# gpg: Signature made Mon 07 Jan 2019 16:29:52 GMT
# gpg: using RSA key 3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg: aka "Peter Maydell <pmaydell@gmail.com>"
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20190107: (37 commits)
Support u-boot noload images for arm as used by, NetBSD/evbarm GENERIC kernel.
hw/misc/tz-mpc: Fix value of BLK_MAX register
target/arm: Emit barriers for A32/T32 load-acquire/store-release insns
arm: Add Clock peripheral stub to NRF51 SOC
tests/microbit-test: Add Tests for nRF51 Timer
arm: Instantiate NRF51 Timers
hw/timer/nrf51_timer: Add nRF51 Timer peripheral
tests/microbit-test: Add Tests for nRF51 GPIO
arm: Instantiate NRF51 general purpose I/O
hw/gpio/nrf51_gpio: Add nRF51 GPIO peripheral
arm: Instantiate NRF51 random number generator
hw/misc/nrf51_rng: Add NRF51 random number generator peripheral
arm: Add header to host common definition for nRF51 SOC peripherals
qtest: Add set_irq_in command to set IRQ/GPIO level
hw/arm/allwinner-a10: Add the 'A' SRAM and the SRAM controller
cpus.c: Fix race condition in cpu_stop_current()
MAINTAINERS: Add ARM-related files for hw/[misc|input|timer]/
hw/arm: versal: Plug memory leaks
Revert "armv7m: Guard against no -kernel argument"
arm/xlnx-zynqmp: put APUs and RPUs in separate CPU clusters
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
noload kernels are loaded with the u-boot image header and as a result
the header size needs adding to the entry point. Fake up a hdr so the
kernel image is loaded at the right address and the entry point is
adjusted appropriately.
The default location for the uboot file is 32MiB above bottom of DRAM.
This matches the recommendation in Documentation/arm/Booting.
Clarify the load_uimage API to state the passing of a load address when an
image doesn't specify one, or when loading a ramdisk is expected.
Adjust callers of load_uimage, etc.
Signed-off-by: Nick Hudson <skrll@netbsd.org>
Message-id: 11488a08-1fe0-a278-2210-deb64731107f@gmx.co.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Use static arrays instead. I decided to rename the conflicting
pc_compat_2_1() function with pc_compat_2_1_fn().
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Use static arrays instead. I decided to rename the conflicting
pc_compat_2_2() function with pc_compat_2_2_fn().
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Use static arrays instead. I decided to rename the conflicting
pc_compat_2_3() function with pc_compat_2_3_fn().
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Similarly to accel properties, move compat properties out of globals
registration, and apply the machine compat properties during
device_post_init().
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
This pull request supersedes the one from 2018-12-13.
This is a revised first ppc pull request for qemu-4.0. Highlights
are:
* Most of the code for the POWER9 "XIVE" interrupt controller
(not complete yet, but we're getting there)
* A number of g_new vs. g_malloc cleanups
* Some IRQ wiring cleanups
* A fix for how we advertise NUMA nodes to the guest for pseries
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlwce1QACgkQbDjKyiDZ
s5JqlhAAhES3+UoNHa/eTf/o2OOgIgZ3A2FVP1kGQbb3Q415hxx1blpYkBDk6sUg
POEgwbj8QMvJ8npOICb2NnHNsAhKRfgGxx/lqVgxLPTDdwGBq7Jr1lfhyX4D99WD
C2oLtJWvQrA7yIsDzurMjJpFvw8SYSogppom4lqE5667pm6U0j7JggFJkwIo+VAj
jzl706vvB6/EL3PHZ8eCzsxT2oRpxxMStE3lJ1JPpKc60mFb5gkXMk3hura1L8Ez
t4NEN9I4ePivXlh6YYDp7Pv5l9JSzKV7Uu8xrYeMdz33e0jUWyoERrdhM51mI4s1
WGoQm6eL6p0jngAUPYtAdIGC6ZGaCMT5rkoDZ4K+us94kVvdqzWjQyRNp84GpQq0
Z/sxJaTSK2DZMnQL3LE19upk7XkB5uBgnjs5T5FcFia7bIDG3p8MY4VwIM4dRum9
WuirEUJRKg28eTTnuK9NQX2+MEnrRWc/FSNaBLjxrijD4C4jHogXTpssNprYnkV7
HgkQ2MaidcnNLftfOUeBr0aTx+rGqtUB56Xas1UK+WqykKVfRdZ6hnbRg30FJvQ/
X4SIc/QZcLcA78C/SvCuXa1uqWqlrZMhN2e5r+eiEXaFYyriWgMYe9w+mXKbrTsb
ZPkbax6xz1esqOZ15ytCTneQONyhMXy5iFDfb0khO4DO0uXq4dA=
=IN1s
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.0-20181221' into staging
ppc patch queue 2018-12-21
This pull request supersedes the one from 2018-12-13.
This is a revised first ppc pull request for qemu-4.0. Highlights
are:
* Most of the code for the POWER9 "XIVE" interrupt controller
(not complete yet, but we're getting there)
* A number of g_new vs. g_malloc cleanups
* Some IRQ wiring cleanups
* A fix for how we advertise NUMA nodes to the guest for pseries
# gpg: Signature made Fri 21 Dec 2018 05:34:12 GMT
# gpg: using RSA key 6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-4.0-20181221: (40 commits)
MAINTAINERS: PPC: add a XIVE section
spapr: change default CPU type to POWER9
spapr: introduce an 'ic-mode' machine option
spapr: add an extra OV5 field to the sPAPR IRQ backend
spapr: add a 'reset' method to the sPAPR IRQ backend
spapr: extend the sPAPR IRQ backend for XICS migration
spapr: allocate the interrupt thread context under the CPU core
spapr: add device tree support for the XIVE exploitation mode
spapr: add hcalls support for the XIVE exploitation interrupt mode
spapr: introduce a new machine IRQ backend for XIVE
spapr-iommu: Always advertise the maximum possible DMA window size
spapr/xive: use the VCPU id as a NVT identifier
spapr/xive: introduce a XIVE interrupt controller
ppc/xive: notify the CPU when the interrupt priority is more privileged
ppc/xive: introduce a simplified XIVE presenter
ppc/xive: introduce the XIVE interrupt thread context
ppc/xive: add support for the END Event State Buffers
Changes requirement for "vsubsbs" instruction
spapr: export and rename the xics_max_server_number() routine
spapr: introduce a spapr_irq_init() routine
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This option is used to select the interrupt controller mode (XICS or
XIVE) with which the machine will operate. XICS being the default
mode for now.
When running a machine with the XIVE interrupt mode backend, the guest
OS is required to have support for the XIVE exploitation mode. In the
case of legacy OS, the mode selected by CAS should be XICS and the OS
should fail to boot. However, QEMU could possibly detect it, terminate
the boot process and reset to stop in the SLOF firmware. This is not
yet handled.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The interrupt modes supported by the hypervisor are advertised to the
guest with new bits definitions of the option vector 5 of property
"ibm,arch-vec-5-platform-support. The byte 23 bits 0-1 of the OV5 are
defined as follow :
0b00 PAPR 2.7 and earlier (Legacy systems)
0b01 XIVE Exploitation mode only
0b10 Either available
If the client/guest selects the XIVE interrupt mode, it informs the
hypervisor by returning the value 0b01 in byte 23 bits 0-1. A 0b00
value indicates the use of the XICS interrupt mode (Legacy systems).
The sPAPR IRQ backend is extended with these definitions and the
values are directly used to populate the "ibm,arch-vec-5-platform-support"
property. The interrupt mode is advertised under TCG and under KVM.
Although a KVM XIVE device is not yet available, the machine can still
operate with kernel_irqchip=off. However, we apply a restriction on
the CPU which is required to be a POWER9 when a XIVE interrupt
controller is in use.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For the time being, the XIVE reset handler updates the OS CAM line of
the vCPU as it is done under a real hypervisor when a vCPU is
scheduled to run on a HW thread. This will let the XIVE presenter
engine find a match among the NVTs dispatched on the HW threads.
This handler will become even more useful when we introduce the
machine supporting both interrupt modes, XIVE and XICS. In this
machine, the interrupt mode is chosen by the CAS negotiation process
and activated after a reset.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Introduce a new sPAPR IRQ handler to handle resend after migration
when the machine is using a KVM XICS interrupt controller model.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Each interrupt mode has its own specific interrupt presenter object,
that we store under the CPU object, one for XICS and one for XIVE.
Extend the sPAPR IRQ backend with a new handler to support them both.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The XIVE interface for the guest is described in the device tree under
the "interrupt-controller" node. A couple of new properties are
specific to XIVE :
- "reg"
contains the base address and size of the thread interrupt
managnement areas (TIMA), for the User level and for the Guest OS
level. Only the Guest OS level is taken into account today.
- "ibm,xive-eq-sizes"
the size of the event queues. One cell per size supported, contains
log2 of size, in ascending order.
- "ibm,xive-lisn-ranges"
the IRQ interrupt number ranges assigned to the guest for the IPIs.
and also under the root node :
- "ibm,plat-res-int-priorities"
contains a list of priorities that the hypervisor has reserved for
its own use. OPAL uses the priority 7 queue to automatically
escalate interrupts for all other queues (DD2.X POWER9). So only
priorities [0..6] are allowed for the guest.
Extend the sPAPR IRQ backend with a new handler to populate the DT
with the appropriate "interrupt-controller" node.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The different XIVE virtualization structures (sources and event queues)
are configured with a set of Hypervisor calls :
- H_INT_GET_SOURCE_INFO
used to obtain the address of the MMIO page of the Event State
Buffer (ESB) entry associated with the source.
- H_INT_SET_SOURCE_CONFIG
assigns a source to a "target".
- H_INT_GET_SOURCE_CONFIG
determines which "target" and "priority" is assigned to a source
- H_INT_GET_QUEUE_INFO
returns the address of the notification management page associated
with the specified "target" and "priority".
- H_INT_SET_QUEUE_CONFIG
sets or resets the event queue for a given "target" and "priority".
It is also used to set the notification configuration associated
with the queue, only unconditional notification is supported for
the moment. Reset is performed with a queue size of 0 and queueing
is disabled in that case.
- H_INT_GET_QUEUE_CONFIG
returns the queue settings for a given "target" and "priority".
- H_INT_RESET
resets all of the guest's internal interrupt structures to their
initial state, losing all configuration set via the hcalls
H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
- H_INT_SYNC
issue a synchronisation on a source to make sure all notifications
have reached their queue.
Calls that still need to be addressed :
H_INT_SET_OS_REPORTING_LINE
H_INT_GET_OS_REPORTING_LINE
See the code for more documentation on each hcall.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Folded in fix for field accessors]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The XIVE IRQ backend uses the same layout as the new XICS backend but
covers the full range of the IRQ number space. The IRQ numbers for the
CPU IPIs are allocated at the bottom of this space, below 4K, to
preserve compatibility with XICS which does not use that range.
This should be enough given that the maximum number of CPUs is 1024
for the sPAPR machine under QEMU. For the record, the biggest POWER8
or POWER9 system has a maximum of 1536 HW threads (16 sockets, 192
cores, SMT8).
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When deciding about the huge DMA window, the typical Linux pseries guest
uses the maximum allowed RAM size as the upper limit. We did the same
on QEMU side to match that logic. Now we are going to support a GPU RAM
pass through which is not available at the guest boot time as it requires
the guest driver interaction. As the result, the guest requests a smaller
window than it should. Therefore the guest needs to be patched to
understand this new memory and so does QEMU.
Instead of reimplementing here whatever solution we choose for the guest,
this advertises the biggest possible window size limited by 32 bit
(as defined by LoPAPR). Since the window size has to be power-of-two
(the create rtas call receives a window shift, not a size),
this uses 0x8000.0000 as the maximum number of TCEs possible (rather than
32bit maximum of 0xffff.ffff).
This is safe as:
1. The guest visible emulated table is allocated in KVM (actual pages
are allocated in page fault handler) and QEMU (actual pages are allocated
when updated);
2. The hardware table (and corresponding userspace address table)
supports sparse allocation and also checks for locked_vm limit so
it is unable to cause the host any damage.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The XIVE sPAPR IRQ backend will use it to define the number of ENDs of
the IC controller.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Initialize the MSI bitmap from it as this will be necessary for the
sPAPR IRQ backend for XIVE.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We will need to use xics_max_server_number() to create the sPAPRXive
object modeling the interrupt controller of the machine which is
created before the CPUs.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
[dwg: Fix style nit]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The OpenPIC have 5 outputs per connected CPU. The machine init code hence
needs a bi-dimensional array (smp_cpu lines, 5 columns) to wire up the irqs
between the PIC and the CPUs.
The current code first allocates an array of smp_cpus pointers to qemu_irq
type, then it allocates another array of smp_cpus * 5 qemu_irq and fills the
first array with pointers to each line of the second array. This is rather
convoluted.
Simplify the logic by introducing a structured type that describes all the
OpenPIC outputs for a single CPU, ie, fixed size of 5 qemu_irq, and only
allocate a smp_cpu sized array of those.
This also allows to use g_new(T, n) instead of g_malloc(sizeof(T) * n)
as recommended in HACKING.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The OpenPIC have 5 outputs per connected CPU. The machine init code hence
needs a bi-dimensional array (smp_cpu lines, 5 columns) to wire up the irqs
between the PIC and the CPUs.
The current code first allocates an array of smp_cpus pointers to qemu_irq
type, then it allocates another array of smp_cpus * 5 qemu_irq and fills the
first array with pointers to each line of the second array. This is rather
convoluted.
Simplify the logic by introducing a structured type that describes all the
OpenPIC outputs for a single CPU, ie, fixed size of 5 qemu_irq, and only
allocate a smp_cpu sized array of those.
This also allows to use g_new(T, n) instead of g_malloc(sizeof(T) * n)
as recommended in HACKING.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Because it is a recommended coding practice (see HACKING).
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Because it is a recommended coding practice (see HACKING).
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Because it is a recommended coding practice (see HACKING).
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Because it is a recommended coding practice (see HACKING).
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Because it is a recommended coding practice (see HACKING).
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Because it is a recommended coding practice (see HACKING).
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Laurent Vivier reported off by one with maximum number of NUMA nodes
provided by qemu-kvm being less by one than required according to
description of "ibm,max-associativity-domains" property in LoPAPR.
It appears that I incorrectly treated LoPAPR description of this
property assuming it provides last valid domain (NUMA node here)
instead of maximum number of domains.
### Before hot-add
(qemu) info numa
3 nodes
node 0 cpus: 0
node 0 size: 0 MB
node 0 plugged: 0 MB
node 1 cpus:
node 1 size: 1024 MB
node 1 plugged: 0 MB
node 2 cpus:
node 2 size: 0 MB
node 2 plugged: 0 MB
$ numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus:
node 1 size: 999 MB
node 1 free: 658 MB
node distances:
node 0 1
0: 10 40
1: 40 10
### Hot-add
(qemu) object_add memory-backend-ram,id=mem0,size=1G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem0,node=2
(qemu) [ 87.704898] pseries-hotplug-mem: Attempting to hot-add 4 ...
<there is no "Initmem setup node 2 [mem 0xHEX-0xHEX]">
[ 87.705128] lpar: Attempting to resize HPT to shift 21
... <HPT resize messages>
### After hot-add
(qemu) info numa
3 nodes
node 0 cpus: 0
node 0 size: 0 MB
node 0 plugged: 0 MB
node 1 cpus:
node 1 size: 1024 MB
node 1 plugged: 0 MB
node 2 cpus:
node 2 size: 1024 MB
node 2 plugged: 1024 MB
$ numactl -H
available: 2 nodes (0-1)
^^^^^^^^^^^^^^^^^^^^^^^^
Still only two nodes (and memory hot-added to node 0 below)
node 0 cpus: 0
node 0 size: 1024 MB
node 0 free: 1021 MB
node 1 cpus:
node 1 size: 999 MB
node 1 free: 658 MB
node distances:
node 0 1
0: 10 40
1: 40 10
After fix applied numactl(8) reports 3 nodes available and memory
plugged into node 2 as expected.
From David Gibson:
------------------
Qemu makes a distinction between "non NUMA" (nb_numa_nodes == 0) and
"NUMA with one node" (nb_numa_nodes == 1). But from a PAPR guests's
point of view these are equivalent. I don't want to present two
different cases to the guest when we don't need to, so even though the
guest can handle it, I'd prefer we put a '1' here for both the
nb_numa_nodes == 0 and nb_numa_nodes == 1 case.
This consolidates everything discussed previously on mailing list.
Fixes: da9f80fbad ("spapr: Add ibm,max-associativity-domains property")
Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Serhii Popovych <spopovyc@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Introduce and use the "unplug" callback.
This is a preparation for multi-stage hotplug handlers, whereby the bus
hotplug handler is overwritten by the machine hotplug handler. This handler
will then pass control to the bus hotplug handler. So to get this running
cleanly, we also have to make sure to go via the hotplug handler chain when
actually unplugging a device after an unplug request. Lookup the hotplug
handler and call "unplug".
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The load_image() function is deprecated, as it does not let the
caller specify how large the buffer to read the file into is.
Instead use load_image_size().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-id: 20181130151712.2312-3-peter.maydell@linaro.org
The load_image() function is deprecated, as it does not let the
caller specify how large the buffer to read the file into is.
Use the glib g_file_get_contents() function instead, which does
the whole "allocate memory for the file and read it in" operation.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-id: 20181130151712.2312-2-peter.maydell@linaro.org
Now that all instance_options functions for spapr are empty,
delete them.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20181205205827.19387-5-ehabkost@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Instead of setting suppress_vmdesc at instance_init time, set
default_machine_opts on spapr_machine_2_2_class_options() to
implement equivalent behavior.
This will let us eliminate the need for separate instance_init
functions for each spapr machine-type.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20181205205827.19387-4-ehabkost@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Instead of setting use_hotplug_event_source at instance_init
time, set default_machine_opts on spapr_machine_2_7_class_options()
to implement equivalent behavior.
This will let us eliminate the need for separate instance_init
functions for each spapr machine-type.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20181205205827.19387-3-ehabkost@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Including all machine types that might have a pcie-root-port.
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <154394083644.28192.8501647946108201466.stgit@gimli.home>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
[ehabkost: fixed accidental recursion at spapr_machine_3_1_class_options()]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Add the spapr cap SPAPR_CAP_NESTED_KVM_HV to be used to control the
availability of nested kvm-hv to the level 1 (L1) guest.
Assuming a hypervisor with support enabled an L1 guest can be allowed to
use the kvm-hv module (and thus run it's own kvm-hv guests) by setting:
-machine pseries,cap-nested-hv=true
or disabled with:
-machine pseries,cap-nested-hv=false
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Coverity points out in CID 1390588 that the test for sh == 0
in sdram_size() can never fire, because we calculate sh with
sh = 1024 - ((bcr >> 6) & 0x3ff);
which must result in a value between 1 and 1024 inclusive.
Without the relevant manual for the SoC, we're not completely
sure of the correct behaviour here, but we can remove the
dead code without changing how QEMU currently behaves.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In ppc_core99_init(), we allocate an openpic_irqs array, which
we then use to collect up the various qemu_irqs which we're
going to connect to the interrupt controller. Once we've
called sysbus_connect_irq() to connect them all up, the
array is no longer required, but we forgot to free it.
Since board init is only run once at startup, the memory
leak is not a significant one.
Spotted by Coverity: CID 1192916.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When allocating an array, it is a recommended coding practice to call
g_new(FooType, n) instead of g_malloc(n * sizeof(FooType)) because
it takes care to avoid overflow when calculating the size of the
allocated block and it returns FooType *, which allows the compiler
to perform type checking.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The spapr-rng device is suboptimal when compared to virtio-rng, so
users might want to disable it in their builds. Thus let's introduce
a proper CONFIG switch to allow us to compile QEMU without this device.
The function spapr_rng_populate_dt is required for linking, so move it
to a different location.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We will factor out get_memory_region() from pc-dimm to memory device code
soon. Once that is done, get_region_size() can be implemented
generically and essentially be replaced by
memory_device_get_region_size (and work only on get_memory_region()).
We have some users of get_memory_region() (spapr and pc-dimm code) that are
only interested in the size. So let's rework them to use
memory_device_get_region_size() first, then we can factor out
get_memory_region() and eventually remove get_region_size() without
touching the same code multiple times.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20181005092024.14344-10-david@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
We're plugging/unplugging a PCDIMMDevice, so directly pass this type
instead of a more generic DeviceState.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20181005092024.14344-5-david@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
From include/qapi/error.h:
* Pass an existing error to the caller with the message modified:
* error_propagate(errp, err);
* error_prepend(errp, "Could not frobnicate '%s': ", name);
Fei Li pointed out that doing error_propagate() first doesn't work
well when @errp is &error_fatal or &error_abort: the error_prepend()
is never reached.
Since I doubt fixing the documentation will stop people from getting
it wrong, introduce error_propagate_prepend(), in the hope that it
lures people away from using its constituents in the wrong order.
Update the instructions in error.h accordingly.
Convert existing error_prepend() next to error_propagate to
error_propagate_prepend(). If any of these get reached with
&error_fatal or &error_abort, the error messages improve. I didn't
check whether that's the case anywhere.
Cc: Fei Li <fli@suse.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181017082702.5581-2-armbru@redhat.com>
Here are the accumulated ppc target patches for the last several
weeks. Highlights are:
* A number of 40p / PReP cleanups
* Preliminary irq rework on the pseries machine towards the new
XIVE interrupt controller
There are a few patches which make small changes to generic device and
arm code as prerequisites to the 40p interrupt routing cleanup. They
have acks from the relevant maintainers.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlup3PYACgkQbDjKyiDZ
s5IcYQ//fp79LhIXUKfJuGasVg1K8X795s3nD8vZ76z7FV2kNyHvOCcTsLn0Ccrp
WJLdXdZ0ErY87vJPfHckii9pXOX8J38nV5EFCElSLslx6gCndQZdQX2WY3luwIzq
afiKMERwTkCcqFXXPgweijhhuAU+roay8xdO/ZBO52ogzGaZalTFjG4l9a0DZMSm
ZceDrLrKw6GOaxntLptcn2+Ncuwpm0WSpLyL+bGNAzSAbqdn1dhHQ9UBrcSMteWj
df8J7CX63CFL2MwbQE3RyXeKaomdHabG+QgEVMlS4dpXVUx++ciMtrwZTX1mMDlI
DA9+5u6TcRMz34hN8lWk2O05scOVp8965BcfdeRBYAOTDS4ztiZJ9spKkIV0lHfe
rkgo7F1OsqoQhs9QrLYp0zZYn1OIhHWrbhk/DQptCJMRHk8mct4v2FcyGecU0e1Z
7SlJErxHXmar83PCCJXhtYHthDxN+dTHUW0bbrF4IjysfK+poX5hvvFEjyHGPIJL
duytwgEnnrBOFM7f7mdfH1LKeKzm1ji8nu7g2IsPAXC0xuFaq+d0fZWUWjymSPku
k5k5UUPs8KLtP9XY2qhO0vxBWl5d+CTam19FWVqHjRAp5WqjmoLxWnkofupcT0Yv
LcoHH2Ad9K8e0F4nA4UCYdJwfGH3qO+eBzmBR4+HZOuT1gVvRuw=
=A62f
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-3.1-20180925' into staging
ppc patch queue 2018-09-25
Here are the accumulated ppc target patches for the last several
weeks. Highlights are:
* A number of 40p / PReP cleanups
* Preliminary irq rework on the pseries machine towards the new
XIVE interrupt controller
There are a few patches which make small changes to generic device and
arm code as prerequisites to the 40p interrupt routing cleanup. They
have acks from the relevant maintainers.
# gpg: Signature made Tue 25 Sep 2018 08:00:06 BST
# gpg: using RSA key 6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-3.1-20180925:
40p: add fixed IRQ routing for LSI SCSI device
lsi53c895a: add optional external IRQ via qdev
scsi: remove unused lsi53c895a_create() and lsi53c810_create() functions
scsi: move lsi53c8xx_create() callers to lsi53c8xx_handle_legacy_cmdline()
scsi: add lsi53c8xx_handle_legacy_cmdline() function
sm501: Adjust endianness of pixel value in rectangle fill
spapr_pci: add an extra 'nr_msis' argument to spapr_populate_pci_dt
spapr: increase the size of the IRQ number space
spapr: introduce a spapr_irq class 'nr_msis' attribute
40p: use OR gate to wire up raven PCI interrupts
raven: some minor IRQ-related tidy-ups
hw/ppc: on 40p machine, change default firmware to OpenBIOS
target/ppc/cpu-models: Re-group the 970 CPUs together again
Record history of ppcemb target in common.json
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Whilst the PReP specification describes how all PCI IRQs are routed via IRQ
15 on the interrupt controller, the real 40p machine has a routing quirk in
that the LSI SCSI device is routed directly to IRQ 13.
Enable the external IRQ for the LSI SCSI device by wiring up the IRQ with
qdev to the relevant interrupt controller gpio.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
Tested-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As part of commits a64aa5785d "hw: Deprecate -drive if=scsi with non-onboard
HBAs" and b891538e81 "hw/ppc/prep: Fix implicit creation of "-drive if=scsi"
devices" the lsi53c895a_create() and lsi53c810_create() functions were added
to wrap pci_create_simple() and scsi_bus_legacy_handle_cmdline().
Unfortunately this prevents us from changing qdev properties on the device
and/or changing the PCI configuration. By switching over to using the new
lsi53c8xx_handle_legacy_cmdline() function then the caller can now configure
and realize the LSI SCSI device exactly as required.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Peter Maydell <peter.maydell@linaro.org> [arm parts]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
So that we don't have to call qdev_get_machine() to get the machine
class and the sPAPRIrq backend holding the number of MSIs.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The new layout using static IRQ number does not leave much space to
the dynamic MSI range, only 0x100 IRQ numbers. Increase the total
number of IRQS for newer machines and introduce a legacy XICS backend
for pre-3.1 machines to maintain compatibility.
For the old backend, provide a 'nr_msis' value covering the full IRQ
number space as it does not use the bitmap allocator to allocate MSI
interrupt numbers.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The number of MSI interrupts a sPAPR machine can allocate is in direct
relation with the number of interrupts of the sPAPRIrq backend. Define
statically this value at the sPAPRIrq class level and use it for the
"ibm,pe-total-#msi" property of the sPAPR PHB.
According to the PAPR specs, "ibm,pe-total-#msi" defines the maximum
number of MSIs that are available to the PE. We choose to advertise
the maximum number of MSIs that are available to the machine for
simplicity of the model and to avoid segmenting the MSI interrupt pool
which can be easily shared. If the pool limit is reached, it can be
extended dynamically.
Finally, remove XICS_IRQS_SPAPR which is now unused.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
According to the PReP specification section 6.1.6 "System Interrupt
Assignments", all PCI interrupts are routed via IRQ 15.
Instead of mapping each PCI IRQ separately, we introduce an OR gate within the
raven PCI host bridge and then wire the single output of the OR gate to the
interrupt controller.
Note that whilst the (now deprecated) PReP machine still exists we still need
to preserve the old IRQ routing. This is done by adding a new "is-legacy-prep"
property to the raven PCI host bridge which is set to true for the PReP
machine.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
Tested-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
OpenBIOS gained 40p support in 5b20e4cace
Use it, instead of relying on an unmaintained and very limited firmware.
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Here's another pull request for qemu-3.1. No real theme here, just an
assortment of various fixes. Probably the most notable thing is the
removal of the ppcemb target which has been deprecated for some time
now.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAluSKPoACgkQbDjKyiDZ
s5JSpRAAhWvxLM6OoTdhAaPKhlKrIzWexWNI8efJNWfXvHnbHBxs8tk+hnJOZVsU
m00hfFMKMA0/4JMURrbYsCiyaq+r+Ws8oEbLDVKQdng6LNeUrLq7uC0rv41bW3CC
1BTqTX16lvhPsg1Sz8mh6IGwCIgRiV8zgvQ4iCc3GCJidI2A+3uLvW5hAndvDdjb
3lq6drg23LXZ6z/ou7hPynKmV6tFTlxSnB957LCnPGFACZeJKbuoRHPP30IrWwY+
nOQ1GTvenouGvEKI5gsC13qFWYcoNPPfc7NZFtx1fvxiMpkOj7R5hg9oStT2Ya6u
MVRwcp/XA2MF+2NnJ8TZOkAV7+1JidhRirsKFjcn1JqftWSxJOKA0weWuNQgdQNY
lJzyZZejEJCHn0NgOq9ZRjOP4U6iIcSlTurfXoronhw1q7yEBkYkS+JpLToLLsid
9qwxlBAfUfQ8E1wR8RnM6ATygVp2Z2ToL+70Rc7xzq6/R8kYFSzuhyaI1GUUtPGW
ZPwp3GRYWJE/xOK3z1YAndrN8FlNxqz3Cov3vtH118aBatWAT+PRVlouOB1/aF3T
KfV8Kme5KQrMGuj/RDLGLOeQi0e8wqBtVIhsESpHdocC6uo28H5gNXxptyLJPA04
dJwWvaQf/J7eIuChhuFygiTzMnQyJA1f77jlExpKfxKKQwUpHf4=
=WnE4
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-3.1-20180907' into staging
ppc patch queue 2018-09-07
Here's another pull request for qemu-3.1. No real theme here, just an
assortment of various fixes. Probably the most notable thing is the
removal of the ppcemb target which has been deprecated for some time
now.
# gpg: Signature made Fri 07 Sep 2018 08:30:02 BST
# gpg: using RSA key 6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-3.1-20180907:
target-ppc: Extend HWCAP2 bits for ISA 3.0
target/ppc/kvm: set vcpu as online/offline
Fix a deadlock case in the CPU hotplug flow
spapr: Correct reference count on spapr-cpu-core
mac_newworld: implement custom FWPathProvider
uninorth: add ofw-addr property to allow correct fw path generation
mac_oldworld: implement custom FWPathProvider
grackle: set device fw_name and address for correct fw path generation
macio: add addr property to macio IDE object
macio: add macio bus to help with fw path generation
macio: move MACIOIDEState type declarations to macio.h
spapr_pci: fix potential NULL pointer dereference
spapr: fix leak of rev array
ppc: Remove deprecated ppcemb target
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
error_report and friends already add a "qemu-system-xxx" prefix
to the string, so a "qemu:" prefix is redundant in the string.
Just drop it.
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Mao Zhongyi <maozhongyi@cmss.chinamobile.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1537495530-580-1-git-send-email-maozhongyi@cmss.chinamobile.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Set the newly added register(KVM_REG_PPC_ONLINE) to indicate if the vcpu is
online(1) or offline(0)
KVM will use this information to set the RWMR register, which controls the PURR
and SPURR accumulation.
CC: paulus@samba.org
Signed-off-by: Nikunj A Dadhania <nikunj@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We need to set cs->halted to 1 before calling ppc_set_compat. The reason
is that ppc_set_compat kicks up the new thread created to manage the
hotplugged KVM virtual CPU and the code drives directly to KVM_RUN
ioctl. When cs->halted is 1, the code:
int kvm_cpu_exec(CPUState *cpu)
...
if (kvm_arch_process_async_events(cpu)) {
atomic_set(&cpu->exit_request, 0);
return EXCP_HLT;
}
...
returns before it reaches KVM_RUN, giving time to the main thread to
finish its job. Otherwise we can fall in a deadlock because the KVM
thread will issue the KVM_RUN ioctl while the main thread is setting up
KVM registers. Depending on how these jobs are scheduled we'll end up
freezing QEMU.
The following output shows kvm_vcpu_ioctl sleeping because it cannot get
the mutex and never will.
PS: kvm_vcpu_ioctl was triggered kvm_set_one_reg - compat_pvr.
STATE: TASK_UNINTERRUPTIBLE|TASK_WAKEKILL
PID: 61564 TASK: c000003e981e0780 CPU: 48 COMMAND: "qemu-system-ppc"
#0 [c000003e982679a0] __schedule at c000000000b10a44
#1 [c000003e98267a60] schedule at c000000000b113a8
#2 [c000003e98267a90] schedule_preempt_disabled at c000000000b11910
#3 [c000003e98267ab0] __mutex_lock at c000000000b132ec
#4 [c000003e98267bc0] kvm_vcpu_ioctl at c00800000ea03140 [kvm]
#5 [c000003e98267d20] do_vfs_ioctl at c000000000407d30
#6 [c000003e98267dc0] ksys_ioctl at c000000000408674
#7 [c000003e98267e10] sys_ioctl at c0000000004086f8
#8 [c000003e98267e30] system_call at c00000000000b488
crash> struct -x kvm.vcpus 0xc000003da0000000
vcpus = {0xc000003db4880000, 0xc000003d52b80000, 0xc0000039e9c80000, 0xc000003d0e200000, 0xc000003d58280000, 0x0, 0x0, ...}
crash> struct -x kvm_vcpu.mutex.owner 0xc000003d58280000
mutex.owner = {
counter = 0xc000003a23a5c881 <- flag 1: waiters
},
crash> bt 0xc000003a23a5c880
PID: 61579 TASK: c000003a23a5c880 CPU: 9 COMMAND: "CPU 4/KVM"
(active)
crash> struct -x kvm_vcpu.mutex.wait_list 0xc000003d58280000
mutex.wait_list = {
next = 0xc000003e98267b10,
prev = 0xc000003e98267b10
},
crash> struct -x mutex_waiter.task 0xc000003e98267b10
task = 0xc000003e981e0780
The following command-line was used to reproduce the problem (note: gdb
and trace can change the results).
$ qemu-ppc/build/ppc64-softmmu/qemu-system-ppc64 -cpu host \
-enable-kvm -m 4096 \
-smp 4,maxcpus=8,sockets=1,cores=2,threads=4 \
-display none -nographic \
-drive file=disk1.qcow2,format=qcow2
...
(qemu) device_add host-spapr-cpu-core,core-id=4
[no interaction is possible after it, only SIGKILL to take the terminal
back]
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_init_cpus() currently creates spapr-cpu-core objects via
object_new() and setting their realized property to true. This leaves
their reference count at two, because object_new() adds an initial
reference and the realization attaches them to a default parent object
which also increments the reference count.
This causes a problem if one of these cores is hot unplugged: no
delete event is generated for it because it's reference count doesn't
reach zero when it is detached from it's parent.
Correct this by adding a call to object_unref() in spapr_init_cpus().
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This enables the correct generation of bootdevice fw paths for in-built IDE
and virtio-pci-blk devices suitable for OpenBIOS.
Note we also set the MachineClass ignore_boot_device_suffixes property to true
since an additional disk node should not be added except for virtio devices.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This enables the correct generation of bootdevice fw paths for in-built IDE
and virtio-pci-blk devices suitable for OpenBIOS.
Note we also set the MachineClass ignore_boot_device_suffixes property to true
since an additional disk node should not be added except for virtio devices.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The generated qapi_event_send_FOO() take an Error ** argument. They
can't actually fail, because all they do with the argument is passing it
to functions that can't fail: the QObject output visitor, and the
@qmp_emit callback, which is either monitor_qapi_event_queue() or
event_test_emit().
Drop the argument, and pass &error_abort to the QObject output visitor
and @qmp_emit instead.
Suggested-by: Eric Blake <eblake@redhat.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20180815133747.25032-4-peterx@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Commit message rewritten, update to qapi-code-gen.txt corrected]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Commit 2c88b098e7 added a call to SPAPR_MACHINE_GET_CLASS(spapr) in
spapr_phb_realize() before we check spapr isn't NULL. This causes QEMU
to crash when starting a non-pseries machine with a sPAPR PHB.
This could be fixed by setting the smc variable after the null check,
but it seems more explicit to use a ternary operator to skip the call
to SPAPR_MACHINE_GET_CLASS() if spapr is NULL, since spapr_phb_realize()
will return immediately in this case.
This was reported by Coverity (CID 1395170 and 1395183).
Fixes: 2c88b098e7
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Introduced in 04d595b300 ("spapr: do not use CPU_FOREACH_REVERSE",
2018-08-23)
Fixes: CID1395181
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There is no known available OS for ppc around anymore that uses page
sizes below 4k, so it does not make much sense that we keep wasting
our time on building and testing the ppcemb-softmmu target. It has
been deprecated since two releases, and nobody complained, so let's
remove this now.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* qumu-guest-agent freeze-hook tweak (Christian)
* pm_smbus improvements (Corey)
* Move validation to pre_plug for pc-dimm (David)
* Fix memory leaks (Eduardo, Marc-André)
* synchronization profiler (Emilio)
* Convert the CPU list to RCU (Emilio)
* LSI support for PPR Extended Message (George)
* vhost-scsi support for protection information (Greg)
* Mark mptsas as a storage device in the help (Guenter)
* checkpatch tweak cherry-picked from Linux (me)
* Typos, cleanups and dead-code removal (Julia, Marc-André)
* qemu-pr-helper support for old libmultipath (Murilo)
* Annotate fallthroughs (me)
* MemoryRegionOps cleanup (me, Peter)
* Make s390 qtests independent from libqos, which doesn't actually support it (me)
* Make cpu_get_ticks independent from BQL (me)
* Introspection fixes (Thomas)
* Support QEMU_MODULE_DIR environment variable (ryang)
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAlt+5OYUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPtxwf8CQM/F+0L+EKeYfYcVgVZsDhhOkLj
Pm61q0bZsWKLby5jCqIDYw7Z/vodJnSS1DO0slIRoXxvQ9DwlkbBnBy/aG/E9U0q
WF1vbCezibDIt7sGcsu9F5zXU9eqe+E6dZfxFrv8FQSOFVxn34TfeJagWLCtzg0d
LnVTF/e4zJD8IQiM7w6lJQxua3fz13ssPEg2KnMkguDhACMwvZ/K/cA2AJkHRMhY
sroPMwLHlrF1NOoeCIrWxYUmSGCRCAy1DmiPGiiSs0yBq/dL0UkAa5Eu6HMQ7rgI
zUff3JDmzEjixUSIEbpVRN+yPCN0/ACSOpJUrKLDxXbc4nZ+PBQ04YpyPQ==
=UZiV
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* x86 TCG fixes for 64-bit call gates (Andrew)
* qumu-guest-agent freeze-hook tweak (Christian)
* pm_smbus improvements (Corey)
* Move validation to pre_plug for pc-dimm (David)
* Fix memory leaks (Eduardo, Marc-André)
* synchronization profiler (Emilio)
* Convert the CPU list to RCU (Emilio)
* LSI support for PPR Extended Message (George)
* vhost-scsi support for protection information (Greg)
* Mark mptsas as a storage device in the help (Guenter)
* checkpatch tweak cherry-picked from Linux (me)
* Typos, cleanups and dead-code removal (Julia, Marc-André)
* qemu-pr-helper support for old libmultipath (Murilo)
* Annotate fallthroughs (me)
* MemoryRegionOps cleanup (me, Peter)
* Make s390 qtests independent from libqos, which doesn't actually support it (me)
* Make cpu_get_ticks independent from BQL (me)
* Introspection fixes (Thomas)
* Support QEMU_MODULE_DIR environment variable (ryang)
# gpg: Signature made Thu 23 Aug 2018 17:46:30 BST
# gpg: using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream: (69 commits)
KVM: cleanup unnecessary #ifdef KVM_CAP_...
target/i386: update MPX flags when CPL changes
i2c: pm_smbus: Add the ability to force block transfer enable
i2c: pm_smbus: Don't delay host status register busy bit when interrupts are enabled
i2c: pm_smbus: Add interrupt handling
i2c: pm_smbus: Add block transfer capability
i2c: pm_smbus: Make the I2C block read command read-only
i2c: pm_smbus: Fix the semantics of block I2C transfers
i2c: pm_smbus: Clean up some style issues
pc-dimm: assign and verify the "addr" property during pre_plug
pc: drop memory region alignment check for 0
util/oslib-win32: indicate alignment for qemu_anon_ram_alloc()
pc-dimm: assign and verify the "slot" property during pre_plug
ipmi: Use proper struct reference for BT vmstate
vhost-scsi: expose 't10_pi' property for VIRTIO_SCSI_F_T10_PI
vhost-scsi: unify vhost-scsi get_features implementations
vhost-user-scsi: move host_features into VHostSCSICommon
cpus: allow cpu_get_ticks out of BQL
cpus: protect TimerState writes with a spinlock
seqlock: add QemuLockable support
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We can assign and verify the address before realizing and trying to plug.
reading/writing the address property should never fail for DIMMs, so let's
reduce error handling a bit by using &error_abort. Getting access to the
memory region now might however fail. So forward errors from
get_memory_region() properly.
As all memory devices should use the alignment of the underlying memory
region for guest physical address asignment, do detection of the
alignment in pc_dimm_pre_plug(), but allow pc.c to overwrite the
alignment for compatibility handling.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180801133444.11269-5-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We can assign and verify the slot before realizing and trying to plug.
reading/writing the slot property should never fail, so let's reduce
error handling a bit by using &error_abort.
To do this during pre_plug, add and use (x86, ppc) pc_dimm_pre_plug().
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180801133444.11269-2-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This paves the way for implementing the CPU list with an RCU list,
which cannot be traversed in reverse order.
Note that this is the only caller of CPU_FOREACH_REVERSE.
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20180819091335.22863-11-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is currently a funny problem with the "mc146818rtc" device:
1) Start QEMU like this:
qemu-system-ppc64 -M pseries -S
2) At the HMP monitor, enter "info qom-tree". Note that there is an
entry for "/rtc (spapr-rtc)".
3) Introspect the mc146818rtc device like this:
device_add mc146818rtc,help
4) Run "info qom-tree" again. The "/rtc" entry is gone now!
The rtc_finalize() function of the mc146818rtc device has two bugs: First,
it tries to remove a "rtc" property, while the rtc_realizefn() added a
"rtc-time" property instead. And second, it should have been done in an
unrealize function, not in a finalize function, to avoid that this causes
problems during introspection.
But since adding aliases to the global machine state should not be done
from a device's realize function anyway, let's rather fix this issue
by moving the creation of the alias to the code that creates the device
(and thus is run from the machine init functions instead), i.e. the
mc146818_rtc_init() function for most machines. The prep machines are
special, since the mc146818rtc device is created here in the realize
function of the i82378 device. Since we certainly don't want to add the
alias there, we add it to some code that is called from the ibm_40p_init()
machine init function instead.
Since the alias is now only created during the machine init, we can remove
the object_property_del() completely.
Fixes: 654a36d857
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1534419358-10932-5-git-send-email-thuth@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It should save us some CPU cycles as these routines perform a lot of
checks.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Instead initialise the device via qdev to allow us to set device properties
directly as required.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Instead initialise the device via qdev to allow us to set device properties
directly as required.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Instead initialise the device via qdev to allow us to set device properties
directly as required.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
- prep machine is a fictional machine, so has no specifications. Which
devices can be changed/added/removed without impact? Are interrupts
correctly mapped?
- prep firmware (OHW) has support only for IDE drives (no SCSI).
Booting from IDE has been broken approximatively 3 years ago, and nobody complained.
- OHW is limited on IDE boot to a specific set of OS loaders.
These operating systems are of the 2004 time frame.
- OHW can use -kernel. Linux kernel freezes a long time after PS/2 mouse
detection, and then screen becomes garbage. This was already broken in
QEMU v2.7, 2 years ago, and nobody complained.
On the other side:
- 40p is a real machine, so emulation can be checked against
hardware specifications
- OpenBIOS has support for SCSI block devices, including 40p LSI adapter
- OpenBIOS can start mostly all Linux kernels (including recent ones)
and recent operating system (like NetBSD 7.1.2)
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
[dwg: Drop prep from boot-serial test to avoid deprecation warnings]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This proposal moves all the related IRQ routines of the sPAPR machine
behind a sPAPR IRQ backend interface 'spapr_irq' to prepare for future
changes. First of which will be to increase the size of the IRQ number
space, then, will follow a new backend for the POWER9 XIVE IRQ controller.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Convert the devices in ppc405_uc away from using the old_mmio
MemoryRegion accessors:
* opba's 32-bit and 16-bit accessors were just calling the
8-bit accessors and assembling a big-endian order number,
which we can do by setting the .impl.max_access_size to 1
and the endianness to DEVICE_BIG_ENDIAN, and letting the
core memory code do the assembly
* ppc405_gpio's accessors were all just stubs
* ppc4xx_gpt's 8-bit and 16-bit accessors were treating the
access as invalid, which we can do by setting the
.valid.min_access_size and .valid.max_access_size fields
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Switch the ref405ep_fpga device away from using the old_mmio
MemoryRegion accessors.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The prep machine has some code which is stubs of accessors
for XCSR registers. This has been disabled via #if 0
since commit b6b8bd1819 in 2004, and doesn't have any
actual interesting content. It also uses the deprecated
old_mmio accessor functions. Remove it entirely.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This proposal introduces a new IRQ number space layout using static
numbers for all devices, depending on a device index, and a bitmap
allocator for the MSI IRQ numbers which are negotiated by the guest at
runtime.
As the VIO device model does not have a device index but a "reg"
property, we introduce a formula to compute an IRQ number from a "reg"
value. It should minimize most of the collisions.
The previous layout is kept in pre-3.1 machines raising the
'legacy_irq_allocation' machine class flag.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
VMStateDescription vmstate_spapr_cpu_state was added by commit
b94020268e (spapr_cpu_core: migrate per-CPU data) to migrate per-CPU
data with the required vmstate registration and unregistration calls.
However the unregistration is being done only from vcpu creation error path
and not from CPU delete path.
This causes migration to fail with the following error if migration is
attempted after a CPU unplug like this:
Unknown savevm section or instance 'spapr_cpu' 16
Additionally this leaves the source VM unresponsive after migration failure.
Fix this by ensuring the vmstate_unregister happens during CPU removal.
Fixing this becomes easier when vmstate (un)registration calls are moved to
vcpu (un)realize functions which is what this patch does.
Fixes: https://bugs.launchpad.net/qemu/+bug/1785972
Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For the older machines (such as Mac and SPARC) the DT nodes representing
bootdevices for disk nodes are irregular for mainly historical reasons.
Since the majority of bootdevice nodes for these machines either do not have a
separate disk node or require different (custom) names then it is much easier
for processing to just disable all suffixes for a particular machine.
Introduce a new ignore_boot_device_suffixes MachineClass property to control
bootdevice suffix generation, defaulting to false in order to preserve
compatibility.
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20180810124027.10698-1-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The four interrupts of the PCI bus are connected to the same UIC pin
on the real Sam460ex. Evidence for this can be found in the UBoot
source for the Sam460ex in the Sam460ex.c file where
PCI_INTERRUPT_LINE is written. Change the ppc440_pcix model to behave
more like this.
This fixes the problem that can be observed when adding further PCI
cards that got their interrupt rotated to other interrupts than PCI
INT A. In particular, the bug was observed with an additional OHCI PCI
card or an ES1370 sound device.
Signed-off-by: Sebastian Bauer <mail@sebastianbauer.info>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Tested-by: Sebastian Bauer <mail@sebastianbauer.info>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 51b0d834c changed error handling to report file name in error
message but forgot to move freeing it after usage. Noticed by Coverity.
Fixes: CID 1394217
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This function was introduced between v2.11 and v2.12 to replace obsolete
ways of specifying the NUMA nodes for DIMMs. It's used to find the correct
node for an LMB, by locating which DIMM object it lies within.
Unfortunately, one of the checks is inverted, so we check whether the
address is less than two different things, rather than actually checking
a range. This introduced a regression, meaning that after a reboot qemu
will advertise incorrect node information for memory to the guest.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
sam460ex_load_device_tree() handles nearly all possible errors by simply
exiting (within helper functions and macros). It handles two early error
cases by returning an error.
There's no particular point to this, so make it handle those directly as
well, removing the need for the caller to handle a failure. As a bonus it
gives us more specific error messages.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The qemu_fdt_*() helper functions already exit with a message instead of
returning errors, so we don't need to check for errors in the caller.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In a couple of places sam460ex_load_device_tree() calls "raw" libfdt
functions which can fail, but doesn't check for error codes. At best,
if these fail the guest will be silently started in a non-standard state,
or it could fail entirely.
Fix this by using the _FDT() helper macro which aborts on a libfdt failure.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 29f9cef "ppc: Include vga cirrus card into the compiling process"
changed the default display adapter for all PPC machines to cirrus. Unfortunately
it missed setting the default display type to stdvga for both PReP machines
causing the display to fail to initialise under OpenHackWare.
Update the MachineClass for both prep and 40p machines so that the default
std(vga) display adapter is the default if no options are specified
which fixes the display for the PReP machines.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
commit efe2add7cb ("spapr/vio: deprecate the "irq" property")
introduced get/set accessors for the "irq" property to warn of its
usage, but the warning in the get pollutes the monitor 'info qtree'.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 29f9cef39e "ppc: Include vga cirrus card into the compiling process"
changed the default display adapter for all PPC machines to cirrus. Unfortunately
it missed setting the default display type to stdvga for both Mac machines
causing the display to fail to initialise under OpenBIOS.
Update the MachineClass for both Old World and New World Macs so that the
default std(vga) display adapter is the default if no options are specified
which fixes the display for the Mac machines.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Here's a last minue pull request before today's soft freeze. Ideally
I would have sent this earlier, but I was waiting for a couple of
extra fixes I knew were close. And the freeze crept up on me, like
always.
Most of the changes here are bugfixes in any case. There are some
cleanups as well, which have been in my staging tree for a little
while. There are a couple of truly new features (some extensions to
the sam460ex platform), but these are low risk, since they only affect
a new and not really stabilized machine type anyway.
Higlights are:
* Mac platform improvements from Mark Cave-Ayland
* Sam460ex improvements from BALATON Zoltan et al.
* XICS interrupt handler cleanups from Cédric Le Goater
* TCG improvements for atomic loads and stores from Richard
Henderson
* Assorted other bugfixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAls7D8oACgkQbDjKyiDZ
s5Lxmg//YzPfC/nKqTTKkyJPzh/NnSC+kRTMAT3mbxdRIc7yfgMqJtWGGbS1iKgK
EeJ9hl5Qm0HfscfDuzf0xasU62ZEv3kNdLnWJEIgkqiXrxoO5KCnC0y4D8NN1W03
mvINNCa8+QDg2OsirGmNUTkriiG3wLIrHTpLZ4+JuC2Bd9H3nTHZgJ0MXON/1VWY
oRgr6kMZ5+IAzPhvYLFR6l3nPI883fgJOFyRo7YqYrkVBKFrFkfK0Xjw6vpsNxcx
2dE/YCHhNIriLuBG5noewL7GuqZRtLnl6rjjee5VAKIe1EmFeR+jsXwNjzGOVOJg
dhjOtsJsQQ3WdEw5uImJzE64kV228WCgmkeXzZd1010JBLr7sUkrd2EuoZ23vvat
uvZAHVSBrJg5WvzMo1VMEoPU3VeeZQ5HL+MI80iKiU6oUgRK11gVJcebtA0sEKt+
zhJC4JiUlHtZLTGIpMBmU8DJZ3Tyk1cBEm+Ky+SaPE+dsz16UHI0fazFQXJnXphE
MLHEGAyQgzWYp7kIcAjUFev0Geq/Uovy4JKIGI6ISop1wRPEQDxkthfkfRyQxQkE
zuse4EBcEH/Undw9KrmEQa0hCe+8BRkxklVbPesFPPdqH3PKNxtHYuWpSShQF0PW
XMjw43O2Rbsl8kBUHCpy4pYSugD1hpfgaw/mVUOU1u/M1O6toTw=
=AHrx
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-3.0-20180703' into staging
ppc patch queue 2018-07-03
Here's a last minue pull request before today's soft freeze. Ideally
I would have sent this earlier, but I was waiting for a couple of
extra fixes I knew were close. And the freeze crept up on me, like
always.
Most of the changes here are bugfixes in any case. There are some
cleanups as well, which have been in my staging tree for a little
while. There are a couple of truly new features (some extensions to
the sam460ex platform), but these are low risk, since they only affect
a new and not really stabilized machine type anyway.
Higlights are:
* Mac platform improvements from Mark Cave-Ayland
* Sam460ex improvements from BALATON Zoltan et al.
* XICS interrupt handler cleanups from Cédric Le Goater
* TCG improvements for atomic loads and stores from Richard
Henderson
* Assorted other bugfixes
# gpg: Signature made Tue 03 Jul 2018 06:55:22 BST
# gpg: using RSA key 6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-3.0-20180703: (35 commits)
ppc: Include vga cirrus card into the compiling process
target/ppc: Relax reserved bitmask of indexed store instructions
target/ppc: set is_jmp on ppc_tr_breakpoint_check
spapr: compute default value of "hpt-max-page-size" later
target/ppc/kvm: don't pass cpu to kvm_get_smmu_info()
target/ppc/kvm: get rid of kvm_get_fallback_smmu_info()
ppc440_uc: Basic emulation of PPC440 DMA controller
sam460ex: Add RTC device
hw/timer: Add basic M41T80 emulation
ppc4xx_i2c: Rewrite to model hardware more closely
hw/ppc: Give sam46ex its own config option
fpu_helper.c: fix setting FPSCR[FI] bit
target/ppc: Implement the rest of gen_st_atomic
target/ppc: Implement the rest of gen_ld_atomic
target/ppc: Use atomic min/max helpers
target/ppc: Use MO_ALIGN for EXIWX and ECOWX
target/ppc: Split out gen_st_atomic
target/ppc: Split out gen_ld_atomic
target/ppc: Split out gen_load_locked
target/ppc: Tidy gen_conditional_store
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# hw/ppc/spapr.c
Drivers for this card exists on PPC-based AmigaOS guests so it is useful to
allow users to emulate the graphics card for PPC machines.
As cirrus vga is currently preferred over std(vga) in absence of any user
choice, this change also sets the default display of spapr machines to
std as otherwise qemu refuses to start these machines. Not specifying an
explicit graphics mode is for instance done by 'make check'.
Signed-off-by: Sebastian Bauer <mail@sebastianbauer.info>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It is currently not possible to run a pseries-2.12 or older machine
with HV KVM. QEMU prints the following and exits right away.
qemu-system-ppc64: KVM doesn't support for base page shift 34
The "hpt-max-page-size" capability was recently added to spapr to hide
host configuration details from HPT mode guests. Its default value for
newer machine types is 64k.
For backwards compatibility, pseries-2.12 and older machine types need
a different value. This is handled as usual in a class init function.
The default value is 16G, ie, all page sizes supported by POWER7 and
newer CPUs, but HV KVM requires guest pages to be hpa contiguous as
well as gpa contiguous. The default value is the page size used to
back the guest RAM in this case.
Unfortunately kvmppc_hpt_needs_host_contiguous_pages()->kvm_enabled() is
called way before KVM init and returns false, even if the user requested
KVM. We thus end up selecting 16G, which isn't supported by HV KVM. The
default value must be set during machine init, because we can safely
assume that KVM is initialized at this point.
We fix this by moving the logic to default_caps_with_cpu(). Since the
user cannot pass cap-hpt-max-page-size=0, we set the default to 0 in
the pseries-2.12 class init function and use that as a flag to do the
real work.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
PPC440 SoCs such as the AMCC 460EX have a DMA controller which is used
by AmigaOS on the sam460ex. Implement the parts used by AmigaOS so it
can get further booting on the sam460ex machine.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The Sam460ex has an M41T80 serial RTC chip on I2C bus 0 at address 0x68.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
At present the Sam460ex board is activated by the general CONFIG_PPC4XX
option. However that includes the board for both ppc-softmmu and
(deprecated) ppcemb-softmmu builds. As Sam460ex is developed, that would
require adding more things into ppcemb-softmmu, which we don't want to do.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
commit d35aefa9ae ("ppc/pnv: introduce a new intc_create() operation
to the chip model") changed the object link in the pnv_core_realize()
routine but a return was forgotten in case of error, which can lead to
more problems afterwards (segv)
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
With the previous changes, we can now let the ICS_KVM class inherit
directly from ICS_BASE class and not from the intermediate ICS_SIMPLE.
It makes the class hierarchy much cleaner.
What is left in the top classes is the low level interface to access
the KVM XICS device in ICS_KVM and the XICS emulating handlers in
ICS_SIMPLE.
This should not break migration compatibility.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
sam460ex (or at least this emulation) does not support the "ibm,cpm" power
management. As a result, Linux crashes when trying to access it. Remove
its device tree node. Also, if/when we boot the Linux kernel directly,
serial port clock frequencies in the device tree file will be unset, and
serial port initialization will fail. Add valid frequency values to
the serial ports to be able to use it. Also set valid values for the other
clock nodes otherwise set by u-boot.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 84051eb400 "adb: add property to disable direct reg 3 writes" added a
workaround for MacOS 9 incorrectly setting the mouse address during boot of
PMU machines.
Further testing has shown that since fb6649f172 "adb: fix read reg 3 byte
ordering" this can still sometimes happen with the CUDA mac99 machine,
so let's enable this workaround for all New World machines using ADB for now.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It eases code review, unit is explicit.
Patch generated using:
$ git grep -E '(1024|2048|4096|8192|(<<|>>).?(10|20|30))' hw/ include/hw/
and modified manually.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20180625124238.25339-33-f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These files don't use anything exposed by "qemu/cutils.h",
simplify preprocessing including directly "qemu/units.h".
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts)
Message-Id: <20180625124238.25339-7-f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Let's try to reduce error handling a bit. In the plug/unplug case, the
device was realized and therefore we can assume that getting access to
the memory region will not fail.
For get_vmstate_memory_region() this is already handled that way.
Document both cases.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180619134141.29478-13-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Let's rename it to make it look more consistent.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180619134141.29478-4-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently during KVM initialization on POWER, kvm_fixup_page_sizes()
rewrites a bunch of information in the cpu state to reflect the
capabilities of the host MMU and KVM. This overwrites the information
that's already there reflecting how the TCG implementation of the MMU will
operate.
This means that we can get guest-visibly different behaviour between KVM
and TCG (and between different KVM implementations). That's bad. It also
prevents migration between KVM and TCG.
The pseries machine type now has filtering of the pagesizes it allows the
guest to use which means it can present a consistent model of the MMU
across all accelerators.
So, we can now replace kvm_fixup_page_sizes() with kvm_check_mmu() which
merely verifies that the expected cpu model can be faithfully handled by
KVM, rather than updating the cpu model to match KVM.
We call kvm_check_mmu() from the spapr cpu reset code. This is a hack:
conceptually it makes more sense where fixup_page_sizes() was - in the KVM
cpu init path. However, doing that would require moving the platform's
pagesize filtering much earlier, which would require a lot of work making
further adjustments. There wouldn't be a lot of concrete point to doing
that, since the only KVM implementation which has the awkward MMU
restrictions is KVM HV, which can only work with an spapr guest anyway.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
KVM HV has some limitations (deriving from the hardware) that mean not all
host-cpu supported pagesizes may be usable in the guest. At present this
means that KVM guests and TCG guests may see different available page sizes
even if they notionally have the same vcpu model. This is confusing and
also prevents migration between TCG and KVM.
This patch makes the environment consistent by always allowing the same set
of pagesizes. Since we can't remove the KVM limitations, we do this by
always applying the same limitations it has, even to TCG guests.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
The way we used to handle KVM allowable guest pagesizes for PAPR guests
required some convoluted checking of memory attached to the guest.
The allowable pagesizes advertised to the guest cpus depended on the memory
which was attached at boot, but then we needed to ensure that any memory
later hotplugged didn't change which pagesizes were allowed.
Now that we have an explicit machine option to control the allowable
maximum pagesize we can simplify this. We just check all memory backends
against that declared pagesize. We check base and cold-plugged memory at
reset time, and hotplugged memory at pre_plug() time.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
The way the POWER Hash Page Table (HPT) MMU is virtualized by KVM HV means
that every page that the guest puts in the pagetables must be truly
physically contiguous, not just GPA-contiguous. In effect this means that
an HPT guest can't use any pagesizes greater than the host page size used
to back its memory.
At present we handle this by changing what we advertise to the guest based
on the backing pagesizes. This is pretty bad, because it means the guest
sees a different environment depending on what should be host configuration
details.
As a start on fixing this, we add a new capability parameter to the
pseries machine type which gives the maximum allowed pagesizes for an
HPT guest. For now we just create and validate the parameter without
making it do anything.
For backwards compatibility, on older machine types we set it to the max
available page size for the host. For the 3.0 machine type, we fix it to
16, the intention being to only allow HPT pagesizes up to 64kiB by default
in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
spapr_irq_alloc_block and spapr_irq_alloc() are now deprecated.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, when a device requests for IRQ number in a sPAPR machine, the
spapr_irq_alloc() routine first scans the ICSState status array to
find an empty slot and then performs the assignement of the selected
numbers. Split this sequence in two distinct routines : spapr_irq_find()
for lookups and spapr_irq_claim() for claiming the IRQ numbers.
This will ease the introduction of a static layout of IRQ numbers.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr capabilities have an apply hook to actually activate (or deactivate)
the feature in the system at reset time. However, a number of capabilities
affect the setup of cpus, and need to be applied to each of them -
including hotplugged cpus for extra complication. To make this simpler,
add an optional cpu_apply hook that is called from spapr_cpu_reset().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Previously, the effective values of the various spapr capability flags
were only determined at machine reset time. That was a lazy way of making
sure it was after cpu initialization so it could use the cpu object to
inform the defaults.
But we've now improved the compat checking code so that we don't need to
instantiate the cpus to use it. That lets us move the resolution of the
capability defaults much earlier.
This is going to be necessary for some future capabilities.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
ppc_check_compat() is used in a number of places to check if a cpu object
supports a certain compatiblity mode, subject to various constraints.
It takes a PowerPCCPU *, however it really only depends on the cpu's class.
We have upcoming cases where it would be useful to make compatibility
checks before we fully instantiate the cpu objects.
ppc_type_check_compat() will now make an equivalent check, but based on a
CPU's QOM typename instead of an instantiated CPU object.
We make use of the new interface in several places in spapr, where we're
essentially making a global check, rather than one specific to a particular
cpu. This avoids some ugly uses of first_cpu to grab a "representative"
instance.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
The device tree node of the ISA bus was being partially done in
different places. Move all the nodes creation under the same routine.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It introduces a base PnvChip class from which the specific processor
chip classes, Pnv8Chip and Pnv9Chip, inherit. Each of them needs to
define an init and a realize routine which will create the controllers
of the target processor. For the moment, the base PnvChip class
handles the XSCOM bus and the cores.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU implements the "Shared Processor LPAR" (SPLPAR) option, which allows
the hypervisor to time-slice a physical processor into multiple virtual
processor. The intent is to allow more guests to run, and to optimize
processor utilization.
The guest OS can cede idle VCPUs, so that their processing capacity may
be used by other VCPUs, with the H_CEDE hcall. The guest OS can also
optimize spinlocks, by confering the time-slice of a spinning VCPU to the
spinlock holder if it's currently notrunning, with the H_CONFER hcall.
Both hcalls depend on a "Virtual Processor Area" (VPA) to be registered
by the guest OS, generally during early boot. Other per-VCPU areas can
be registered: the "SLB Shadow Buffer" which allows a more efficient
dispatching of VCPUs, and the "Dispatch Trace Log Buffer" (DTL) which
is used to compute time stolen by the hypervisor. Both DTL and SLB Shadow
areas depend on the VPA to be registered.
The VPA/SLB Shadow/DTL are state that QEMU should migrate, but this doesn't
happen, for no apparent reason other than it was just never coded. This
causes the features listed above to stop working after migration, and it
breaks the logic of the H_REGISTER_VPA hcall in the destination.
The VPA is set at the guest request, ie, we don't have to migrate
it before the guest has actually set it. This patch hence adds an
"spapr_cpu/vpa" subsection to the recently introduced per-CPU machine
data migration stream.
Since DTL and SLB Shadow are optional and both depend on VPA, they get
their own subsections "spapr_cpu/vpa/slb_shadow" and "spapr_cpu/vpa/dtl"
hanging from the "spapr_cpu/vpa" subsection.
Note that this won't break migration to older QEMUs. Is is already handled
by only registering the vmstate handler for per-CPU data with newer machine
types.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A per-CPU machine data pointer was recently added to PowerPCCPU. The
motivation is to to hide platform specific details from the core CPU
code. This per-CPU data can hold state which is relevant to the guest
though, eg, Virtual Processor Areas, and we should migrate this state.
This patch adds the plumbing so that we can migrate the per-CPU data
for PAPR guests. We only do this for newer machine types for the sake
of backward compatibility. No state is migrated for the moment: the
vmstate_spapr_cpu_state structure will be populated by subsequent
patches.
Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Fix some trivial spelling and spacing errors]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This moves the details of the ISA bus creation under the LPC model but
more important, the new PnvChip operation will let us choose the chip
class to use when we introduce the different chip classes for Power9
and Power8. It hides away the processor chip controllers from the
machine.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On Power9, the thread interrupt presenter has a different type and is
linked to the chip owning the cores.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 3d85885a1b tried to fix error handling, but it actually
went into the wrong direction by dropping the local Error *.
In the default KVM case, the rationale is to try the in-kernel XICS first,
and if not possible, to fallback to userland XICS. Passing errp everywhere
makes this fallback impossible if errp is &error_fatal (which happens to
be the case). And anyway, if the caller would pass a regular &local_err,
things would be worse: we could possibly pass an already set *errp to
error_setg() and crash, or return an error even in case of success.
So we definitely need a local Error * and only propagate it when we're
done with the fallback logic. This is what this patch does.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
CPUPPCState currently contains a number of fields containing the state of
the VPA. The VPA is a PAPR specific concept covering several guest/host
shared memory areas used to communicate some information with the
hypervisor.
As a PAPR concept this is really machine specific information, although it
is per-cpu, so it doesn't really belong in the core CPU state structure.
There's also other information that's per-cpu, but platform/machine
specific. So create a (void *)machine_data in PowerPCCPU which can be
used by the machine to locate per-cpu data. Intialization, lifetime and
cleanup of machine_data is entirely up to the machine type.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
This extracts from the PvChip realize routine the part creating the
cores. On Power9, we will need to create the cores after the Xive
interrupt controller is created.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This moves some code out from spapr_cpu_core_realize() for clarity. No
functional change.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The spapr_realize_vcpu() function doesn't rollback in case of error.
This isn't a problem with coldplugged CPUs because the machine won't
start and QEMU will exit. Hotplug is a different story though: the
CPU thread is started under object_property_set_bool() and it assumes
it can access the CPU object.
If icp_create() fails, we return an error without unregistering the
reset handler for this CPU, and we let the underlying QEMU thread for
this CPU alive. Since spapr_cpu_core_realize() doesn't care to unrealize
already realized CPUs either, but happily frees all of them anyway, the
CPU thread crashes instantly:
(qemu) device_add host-spapr-cpu-core,core-id=1,id=gku
GKU: failing icp_create (cpu 0x11497fd0)
^^^^^^^^^^
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffee3feaa0 (LWP 24725)]
0x00000000104c8374 in object_dynamic_cast_assert (obj=0x11497fd0,
^^^^^^^^^^^^^^
pointer to the CPU object
623 trace_object_dynamic_cast_assert(obj ? obj->class->type->name
(gdb) p obj->class->type
$1 = (Type) 0x0
(gdb) p * obj
$2 = {class = 0x10ea9c10, free = 0x11244620,
^^^^^^^^^^
should be g_free
(gdb) p g_free
$3 = {<text variable, no debug info>} 0x7ffff282bef0 <g_free>
obj is a dangling pointer to the CPU that was just destroyed in
spapr_cpu_core_realize().
This patch adds proper rollback to both spapr_realize_vcpu() and
spapr_cpu_core_realize().
Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Fixed a conflict due to a change in my tree]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 94ad93bd97 (QEMU 2.12) switched to instantiate CPUs separately
but it missed to adapt the error path accordingly. If something fails in
the CPU creation loop, then the CPU object that was just created is leaked.
The error paths in this function are a bit obfuscated, and adding
yet another label to free this CPU object makes it worse. We should
move the block of the loop to a separate function, with a proper
rollback path, but this is a bigger cleanup.
For now, let's just fix the bug by adding the missing calls to
object_unref(). This will allow easier backport to older QEMU
versions.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently we don't have any unrealize path for pnv cpu cores. We get away
with this because we don't yet support cpu hotplug for pnv.
However, we're going to want it eventually, and in the meantime, it makes
it non-obvious why there are a bunch of allocations on the realize() path
that don't have matching frees.
So, implement the missing unrealize path.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
pnv_cpu_init() is only called from the the pnv cpu core realize path, and
really only can be called from there. So fold it into its caller, which
we also rename for brevity.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Currently, we allocate space for all the cpu objects within a single core
in one big block. This was copied from an older version of the spapr code
and requires some ugly pointer manipulation to extract the individual
objects.
This design was due to a misunderstanding of qemu lifetime conventions and
has already been changed in spapr (in 94ad93bd "spapr_cpu_core: instantiate
CPUs separately".
Make an equivalent change in pnv_core to get rid of the nasty pointer
arithmetic.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
In pnv_core_realize() we call two functions with an Error * parameter in
succession, which will go badly if they both cause errors. In fact, a
failure in either of them indicates a qemu internal error, so we can just
use &error_abort in both cases.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
spapr_cpu_init() and spapr_cpu_destroy() are only called from the spapr
cpu core realize/unrealize paths, and really can only be called from there.
Those are all short functions, so fold the pairs together for simplicity.
While we're there rename some functions and change some parameter types
for brevity and clarity.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
The PMU device supercedes the CUDA device found on older New World Macs and
is supported by a larger number of guest OSs from OS 9 to OS X 10.5.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
PMU-enabled New World Macs expose their GPIOs via a separate memory region
within the macio device.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This option allows the VIA configuration to be controlled between 3
different possible setups: cuda, pmu-adb and pmu with USB rather than ADB
keyboard/mouse.
For the moment we don't do anything with the configuration except to pass
it to the macio device (the via-cuda parent) and also to the firmware via
the fw_cfg interface so that it can present the correct device tree.
The default is cuda which is the current default and so will have no
change in behaviour.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is in preparation for adding configuration controlled via machine
options.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If the negotiated compat mode can't be set, but raw mode is supported,
we decide to ignore the error. An so, we should free it to prevent a
memory leak.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In default_caps_with_cpu() we set spapr_cap_cfpc to broken for POWER8
processors and before.
Since we no longer require private l1d cache on POWER8 for this cap to
be set to workaround change this to default to broken for POWER7
processors and before.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add an IOMMU index argument to the translate method of
IOMMUs. Since all of our current IOMMU implementations
support only a single IOMMU index, this has no effect
on the behaviour.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180604152941.20374-4-peter.maydell@linaro.org
Add support for multiple IOMMU indexes to the IOMMU notifier APIs.
When initializing a notifier with iommu_notifier_init(), the caller
must pass the IOMMU index that it is interested in. When a change
happens, the IOMMU implementation must pass
memory_region_notify_iommu() the IOMMU index that has changed and
that notifiers must be called for.
IOMMUs which support only a single index don't need to change.
Callers which only really support working with IOMMUs with a single
index can use the result of passing MEMTXATTRS_UNSPECIFIED to
memory_region_iommu_attrs_to_index().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180604152941.20374-3-peter.maydell@linaro.org
We banned use of certain g_assert_FOO() functions outside tests, and
made checkpatch.pl flag them (commit 6e9389563e). We neglected to
purge existing uses. Do that now.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180608170231.27912-1-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: John Snow <jsnow@redhat.com>
By default, the IOMMU model built into the spapr virtual PCI host bridge
supports 4kiB and 64kiB IOMMU page sizes. However this can be overridden
which may be desirable to allow larger IOMMU page sizes when running a
guest with hugepage backing and passthrough devices. For that reason a
warning was printed when the device wasn't configured to allow the pagesize
with which guest RAM is backed.
Experience has proven, however, that this message is more confusing than
useful. Worse it sometimes makes little sense when the host-available page
sizes don't match those available on the guest, which can happen with
a POWER8 guest running on a POWER9 KVM host.
Long term we do want better handling to allow large IOMMU page sizes to be
used, but for now this parameter and warning don't really accomplish it.
So, remove the message, pending a better solution.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A specific MemoryRegion is required for the LPC HC Firmware address
space.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Factor out cpu core unplug into separate function from
spapr_core_release(). Then use generic hotplug_handler_unplug() to trigger
cpu core unplug, which would call spapr_machine_device_unplug() ->
spapr_core_unplug() in the end.
This way unplug operation is not buried in spapr internals and located
in the same place like in other targets, following similar
logic/call chain across targets.
Acked-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Factor out memory unplug into separate function from spapr_lmb_release().
Then use generic hotplug_handler_unplug() to trigger memory unplug,
which will call spapr_machine_device_unplug() -> spapr_memory_unplug()
in the end.
This way unplug operation is not buried in lmb internals and located in
the same place like in other targets, following similar logic/call chain
across targets.
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We'll be handling unplug of e.g. CPUs and PCDIMMs via the general
hotplug handler soon, so let's add that handler function.
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Let's finish cleaning up the hotplug handler. This check can be
performed in the pre_plug code as the very first thing.
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Let's clean the hotplug handler up by moving lookup of the node into
the function where it is actually being used.
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The node property can always be queried and the value has already been
verified in pc_dimm_realize().
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commits b6712ea391 removed the macio_init() function but missed the header
prototype in mac.h. Remove it since it is no longer needed.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commits 7b19318bee and 8ce3f743c7 removed the pci_pmac_init() and
pci_pmac_u3_init() functions but missed the header prototypes in mac.h. Remove
them since they are no longer needed.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
VIO devices have an "irq" property that can be used by the sPAPR IRQ
allocator as an IRQ number hint. But it is not set in QEMU nor in
libvirt. It brings unnecessary complexity to the underlying layers
managing the IRQ number space and it is in full opposition with the
new static IRQ allocator we want to introduce in sPAPR.
Let's deprecate it to simplify the spapr_irq_alloc routine in the
future.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
[dwg: Check qtest_enabled() to suppress bogus warnings from make check]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 72d3d8f052 "hw/isa/superio: Add a keyboard/mouse controller (8042)"
added an 8042 keyboard device to the PC87312 superio device to replace that
being used by the prep machine.
Unfortunately this commit didn't do the same for the 40p machine which broke
the keyboard by registering two 8042 keyboard devices at the same address.
Resolve this by similarly removing the 8042 keyboard from the 40p machine as
done for the prep machine in commit 72d3d8f052.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The Linux sandalfoot zImage has an initialisation process which resets the
VGA controller by setting all the BAR addresses to zero to access the VGA
ioports at their legacy addresses.
Unfortunately setting the framebuffer BAR to address 0 makes the framebuffer
memory overlap the internal VGA memory causing accesses to fail, and so
prevents the kernel from switching successfully to text mode.
Since OpenHackWare configures the framebuffer BAR address outside of the legacy
VGA internal memory space, remove pci_allow_0_address from the 40p machine class
which causes the BAR reprogramming to zero to fail and so the VGA internal
memory can be accessed correctly again.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Use error_report() + abort() instead of error_setg(&error_abort),
as suggested by the "qapi/error.h" documentation:
Please don't error_setg(&error_fatal, ...), use error_report() and
exit(), because that's more obvious.
Likewise, don't error_setg(&error_abort, ...), use assert().
Use abort() instead of the suggested assert() because the error message
already got displayed.
Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-id: 20180606152128.449-5-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
vDPA support, fix to vhost blk RO bit handling, some include path
cleanups, NFIT ACPI table.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJbEXNvAAoJECgfDbjSjVRpc8gH/R8xrcFrV+k9wwbgYcOcGb6Y
LWjseE31pqJcxRV80vLOdzYEuLStZQKQQY7xBDMlA5vdyvZxIA6FLO2IsiJSbFAk
EK8pclwhpwQAahr8BfzenabohBv2UO7zu5+dqSvuJCiMWF3jGtPAIMxInfjXaOZY
odc1zY2D2EgsC7wZZ1hfraRbISBOiRaez9BoGDKPOyBY9G1ASEgxJgleFgoBLfsK
a1XU+fDM6hAVdxftfkTm0nibyf7PWPDyzqghLqjR9WXLvZP3Cqud4p8N29mY51pR
KSTjA4FYk6Z9EVMltyBHfdJs6RQzglKjxcNGdlrvacDfyFi79fGdiosVllrjfJM=
=3+V0
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
acpi, vhost, misc: fixes, features
vDPA support, fix to vhost blk RO bit handling, some include path
cleanups, NFIT ACPI table.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Fri 01 Jun 2018 17:25:19 BST
# gpg: using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream: (31 commits)
vhost-blk: turn on pre-defined RO feature bit
ACPI testing: test NFIT platform capabilities
nvdimm, acpi: support NFIT platform capabilities
tests/.gitignore: add entry for generated file
arch_init: sort architectures
ui: use local path for local headers
qga: use local path for local headers
colo: use local path for local headers
migration: use local path for local headers
usb: use local path for local headers
sd: fix up include
vhost-scsi: drop an unused include
ppc: use local path for local headers
rocker: drop an unused include
e1000e: use local path for local headers
ioapic: fix up includes
ide: use local path for local headers
display: use local path for local headers
trace: use local path for local headers
migration: drop an unused include
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When pulling in headers that are in the same directory as the C file (as
opposed to one in include/), we should use its relative path, without a
directory.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Remove those unneeded includes to speed up the compilation
process a little bit. (Continue 7eceff5b5a cleanup)
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180528232719.4721-13-f4bug@amsat.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rename the 2.13 machines to match the number we're going to
use for the next release.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-id: 20180522104000.9044-5-peter.maydell@linaro.org
platform-bus were using machine_done notifier to get and map
(assign irq/mmio resources) dynamically added sysbus devices
after all '-device' options had been processed.
That however creates non obvious dependencies on ordering of
machine_done notifiers and requires carefull line juggling
to keep it working. For example see comment above
create_platform_bus() and 'straitforward' arm_load_kernel()
had to converted to machine_done notifier and that lead to
yet another machine_done notifier to keep it working
arm_register_platform_bus_fdt_creator().
Instead of hiding resource assignment in platform-bus-device
to magically initialize sysbus devices, use device plug
callback and assign resources explicitly at board level
at the moment each -device option is being processed.
That adds a bunch of machine declaration boiler plate to
e500plat board, similar to ARM/x86 but gets rid of hidden
machine_done notifier and would allow to remove the dependent
notifiers in ARM code simplifying it and making code flow
easier to follow.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-id: 1525691524-32265-3-git-send-email-imammedo@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* pc-dimm: factor out MemoryDevice
(virtio-pmem and virtio-mem will make use of the new abstraction later)
* scripts/device-crash-test: Removed fixed CAN entries
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJa8IZ2AAoJECgHk2+YTcWmmD0P/2Lddw+ilGhGS/CWarq4uLSF
ILtEMwNgbJeJAEza6IQx/IIuUER3H5UcxgZhO49nELpurobhl5yW9JKP1qjH9z9i
7hVPORGioiyGkjgjbm8jWtljePAloTIwEiIcrqYkVHpWDCUJaZ7SES2VQL7ltY/W
AU3uSFQQMDfVqr/MXDxZq084wFK3Jm2aIE+p8a0MF7B+29RSHdFU9iKysCC1Wu/1
AllXCkQ4yWHCGoSRBfzFz9EWBb4VlzM+VNj9nhHu75zdF3hm7J05yIiGuZLiOjmB
MDOkvKhSeXNj+21mXVLmSxkfI65z6jrq3aI7iTp4+orrd2SCXoHsOZoj4Q2cRSnw
kJlY62+p85H9NYIKTgMCM/oURpL2ZnqPKmCto1NRFywSBGLXll2weyKpX9ByvXe2
gL8hqra/K8eUPW4zSsPYbbN1b16EnK4MY2nkYvG0Y/aAXGZF6V9zQwKNT4/F5GyY
SRMC4c2OtQOgZNDSuPdgZ5Lu5PXfetvvcqWCj0tXNdaScOp6Omsc/i/YCUtu6r/3
IbBIclJ+K5aD+U4QP4DKZ+DJbEkIGMU4pSHgR2i8bK7MmoJpJcAIB1mL5nA/TknP
/RVgtnP7gVbfGIVVwjUw9bMurvOti4PBp0/DxC/VqUqGs9e8avE1yb9grVJdj/jA
oEGJ6EIsmO1URbk1+f93
=Hhge
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging
Machine queue, 2018-05-07
* pc-dimm: factor out MemoryDevice
(virtio-pmem and virtio-mem will make use of the new abstraction later)
* scripts/device-crash-test: Removed fixed CAN entries
# gpg: Signature made Mon 07 May 2018 18:01:42 BST
# gpg: using RSA key 2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* remotes/ehabkost/tags/machine-next-pull-request:
scripts/device-crash-test: Removed fixed CAN entries
vl: allow 'maxmem' without 'slot'
spapr: rename "hotplug memory" terminology to "device memory"
pc: rename "hotplug memory" terminology to "device memory"
machine: rename MemoryHotplugState to DeviceMemoryState
pc-dimm: move actual plug/unplug of a memory region to MemoryDevice
pc-dimm: factor out capacity and slot checks into MemoryDevice
pc-dimm: factor out address search into MemoryDevice code
pc-dimm: pass in the machine and to the MemoryHotplugState
pc-dimm: no need to pass the memory region
machine: make MemoryHotplugState accessible via the machine
pc-dimm: factor out MemoryDevice interface
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
qemu-system-ppc fails to build with GCC 8.0.1:
/home/hsp/src/qemu-master/hw/ppc/e500.c: In function ‘ppce500_load_device_tree’:
/home/hsp/src/qemu-master/hw/ppc/e500.c:442:37: error: ‘/pic@’
directive output may be truncated writing 5 bytes into a region of
size between 1 and 128 [-Werror=format-truncation=]
snprintf(mpic, sizeof(mpic), "%s/pic@%llx", soc, MPC8544_MPIC_REGS_OFFSET);
^~~~~
In file included from /usr/include/stdio.h:862,
from /home/hsp/src/qemu-master/include/qemu/osdep.h:68,
from /home/hsp/src/qemu-master/hw/ppc/e500.c:17:
/usr/include/bits/stdio2.h:64:10: note: ‘__builtin___snprintf_chk’
output between 11 and 138 bytes into a destination of size 128
return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/hsp/src/qemu-master/hw/ppc/e500.c:470:39: error:
‘/global-utilities@’ directive output may be truncated writing 18
bytes into a region of size between 1 and 128
[-Werror=format-truncation=]
snprintf(gutil, sizeof(gutil), "%s/global-utilities@%llx", soc,
^~~~~~~~~~~~~~~~~~
In file included from /usr/include/stdio.h:862,
from /home/hsp/src/qemu-master/include/qemu/osdep.h:68,
from /home/hsp/src/qemu-master/hw/ppc/e500.c:17:
/usr/include/bits/stdio2.h:64:10: note: ‘__builtin___snprintf_chk’
output between 24 and 151 bytes into a destination of size 128
return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/hsp/src/qemu-master/hw/ppc/e500.c:477:36: error: ‘/msi@’
directive output may be truncated writing 5 bytes into a region of
size between 0 and 127 [-Werror=format-truncation=]
snprintf(msi, sizeof(msi), "/%s/msi@%llx", soc, MPC8544_MSI_REGS_OFFSET);
^~~~~
In file included from /usr/include/stdio.h:862,
from /home/hsp/src/qemu-master/include/qemu/osdep.h:68,
from /home/hsp/src/qemu-master/hw/ppc/e500.c:17:
/usr/include/bits/stdio2.h:64:10: note: ‘__builtin___snprintf_chk’
output between 12 and 139 bytes into a destination of size 128
return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix this by converting e500 to use g_strdup_printf()+g_free() instead
of snprintf(). This is done globally, even for call sites that don't
break build, since this is the preferred practice in QEMU.
Reported-by: Howard Spoelstra <hsp.cat7@gmail.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 152568372989.443627.900708381919207053.stgit@bahia.lan
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Let's make it clear at relevant places that we are dealing with device
memory. That it can be used for memory hotplug is just a special case.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-11-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
[ehabkost: rebased series, solved conflicts at spapr.c]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Rename it to better match the new terminology.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-9-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
We use the machine internally either way, so let's just pass it in then.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-5-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
We can just query it ourselves. When unplugging, we should always be
able to the region (as it was previously plugged). E.g. PPC already
assumed that and used &error_abort.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-4-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Let's allow to query the MemoryHotplugState directly from the machine.
If the pointer is NULL, the machine does not support memory devices. If
the pointer is !NULL, the machine supports memory devices and the
data structure contains information about the applicable physical
guest address space region.
This allows us to generically detect if a certain machine has support
for memory devices, and to generically manage it (find free address
range, plug/unplug a memory region).
We will rename "MemoryHotplugState" to something more meaningful
("DeviceMemory") after we completed factoring out the pc-dimm code into
MemoryDevice code.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-3-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
[ehabkost: rebased series, solved conflicts at spapr.c]
[ehabkost: squashed fix to use g_malloc0()]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
On the qmp level, we already have the concept of memory devices:
"query-memory-devices"
Right now, we only support NVDIMM and PCDIMM.
We want to map other devices later into the address space of the guest.
Such device could e.g. be virtio devices. These devices will have a
guest memory range assigned but won't be exposed via e.g. ACPI. We want
to make them look like memory device, but not glued to pc-dimm.
Especially, it will not always be possible to have TYPE_PC_DIMM as a parent
class (e.g. virtio devices). Let's use an interface instead. As a first
part, convert handling of
- qmp_pc_dimm_device_list
- get_plugged_memory_size
to our new model. plug/unplug stuff etc. will follow later.
A memory device will have to provide the following functions:
- get_addr(): Necessary, as the property "addr" can e.g. not be used for
virtio devices (already defined).
- get_plugged_size(): The amount this device offers to the guest as of
now.
- get_region_size(): Because this can later on be bigger than the
plugged size.
- fill_device_info(): Fill MemoryDeviceInfo, e.g. for qmp.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180423165126.15441-2-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJa7BLUAAoJEDhwtADrkYZTumIQAJC6wXmN+wBYc2MoR2Y8SQgY
+gTM9J6R6H50ijb7RkkERLTgys7IxCDD/jy2p0yX/Re3ReXbYwzYQXmSFpF1KWGe
SXB84uDtwSILbvR5iS0TBdQSyO+u5DRboukuLfTEZHjYQUP+guT1we3YwqWGzIKp
o5kV/7Nq0vPWO5Sbs4FWB0t9hWzWV3Kef9b4gRPn05sWPaq2/sU6A3xai+ty6qS7
PCm7VwT4z5SACdR4LRiL45h3HdThgr/alJJ6lUr2kaNCBiDBvM4h6d7W+lI/Vi3Y
rG+wqyPQFyWLXf0uuI3AmSScVUzfYv9C4TcBTJkFnebrFcybPsGwEJLGtaIgFnBU
1Mcz/TCl1bB4fDvhwV2qexxlXryOWXKn+ygdu9sBSY/QSA+NEqbJQo6cCDqMQ9Qy
6zqrGxUrM/peVLvhfle4cIbyPslGRGn2s95oQzCJi8TlZxBj8lgW1x1kr7OhSlf4
rNteSYAHDNSiNVL1PcW3vOS7ndTA6O0vHAtGa+0vbQzAf+RUfFG0sfggG6350O8e
97Hp4LKT3VpGEuwyQEw6wk3zODNfAgtkkwjQHTnQYHriKB/fcVfY3g7gpYp4zMLF
GJ3h5KZj71JNoFoxVJniAgkWY8+IP11ggXMyYWSMxMZ3M81EqQ/rbvOvGxn1wjd8
kHbpUEMmGBHF1VmKs7e1
=Kukn
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2018-05-04' into staging
QAPI patches for 2018-05-04
# gpg: Signature made Fri 04 May 2018 08:59:16 BST
# gpg: using RSA key 3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-qapi-2018-05-04:
qapi: deprecate CpuInfoFast.arch
qapi: discriminate CpuInfoFast on SysEmuTarget, not CpuInfoArch
qapi: change the type of TargetInfo.arch from string to enum SysEmuTarget
qapi: add SysEmuTarget to "common.json"
qapi: fill in CpuInfoFast.arch in query-cpus-fast
qobject: Modify qobject_ref() to return obj
qobject: Replace qobject_incref/QINCREF qobject_decref/QDECREF
qobject: use a QObjectBase_ struct
qobject: Ensure base is at offset 0
qobject: Use qobject_to() instead of type cast
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Now that we can safely call QOBJECT() on QObject * as well as its
subtypes, we can have macros qobject_ref() / qobject_unref() that work
everywhere instead of having to use QINCREF() / QDECREF() for QObject
and qobject_incref() / qobject_decref() for its subtypes.
The replacement is mechanical, except I broke a long line, and added a
cast in monitor_qmp_cleanup_req_queue_locked(). Unlike
qobject_decref(), qobject_unref() doesn't accept void *.
Note that the new macros evaluate their argument exactly once, thus no
need to shout them.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180419150145.24795-4-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Rebased, semantic conflict resolved, commit message improved]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
On a POWER9 host, if a guest runs in pre POWER9 compat mode, it necessarily
uses the hash MMU mode. In this case, we shouldn't advertise radix GTSE in
the ibm,arch-vec-5-platform-support DT property as the current code does.
The first reason is that it doesn't make sense, and the second one is that
causes the CAS-negotiated options subsection to be migrated. This breaks
backward migration to QEMU 2.7 and older versions on POWER8 hosts:
qemu-system-ppc64: error while loading state for instance 0x0 of device
'spapr'
qemu-system-ppc64: load of migration failed: No such file or directory
This patch hence initialize CPUs a bit earlier so that we can check the
requested compat mode, and don't set OV5_MMU_RADIX_GTSE for power8 and
older.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
a324d6f166 "spapr: Support ibm,dynamic-memory-v2 property" added
a new feature in the set of CAS-negotiatable options. This causes
the CAS-negotiated options subsection to be migrated, even for old
machine types that don't know about it, and breaks backward migration
to QEMU 2.7 and older versions:
qemu-system-ppc64: error while loading state for instance 0x0 of device
'spapr'
qemu-system-ppc64: load of migration failed: No such file or directory
Since this feature only affects boot time behaviour, it should be
filtered out when we decide to migrate CAS-negotiated options, like
we already do with OV5_FORM1_AFFINITY and OV5_DRCONF_MEMORY.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since the macio device has a link to the PIC device, we can now wire up the
IRQs directly via qdev GPIOs rather than having to use an intermediate array.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Introduce constants for the pre-defined New World IRQs to help keep things
readable.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 4e46dcdbd3 "PPC: Newworld: Add uninorth token register" added a TODO
which was to convert the uninorth registers hack to a proper device. Move
these registers to a new uninorth device, removing the old hacks from
mac_newworld.c.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
To prevent spurious wakeups on cpus that are supposed to be disabled, we
need to clear the LPCR bits which control certain wakeup events.
spapr_cpu_reset() has separate cases here for boot and non-boot (initially
inactive) cpus. rtas_start_cpu() then turns the LPCR bits on when the
non-boot cpus are activated.
But explicit checks against first_cpu are not how we usually do things:
instead spapr_cpu_reset() generally sets things up for non-boot (inactive)
cpus, then spapr_machine_reset() and/or rtas_start_cpu() override as
necessary.
So, do that instead. Because the LPCR activation is identical for boot
cpus and non-boot cpus just activated with rtas_start_cpu() we can put the
code common in spapr_cpu_set_entry_state().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
cpu_ppc_set_papr() does several things:
1) it sets up the virtual hypervisor interface
2) it prevents the cpu from ever entering hypervisor mode
3) it tells KVM that we're emulating a cpu in PAPR mode
and 4) it configures the LPCR and AMOR (hypervisor privileged registers)
so that TCG will behave correctly for PAPR guests, without
attempting to emulate the cpu in hypervisor mode
(1) & (2) make sense for any virtual hypervisor (if another one ever
exists).
(3) belongs more properly in the machine type specific to a PAPR guest, so
move it to spapr_cpu_init(). While we're at it, remove an ugly test on
kvm_enabled() by making kvmppc_set_papr() a safe no-op on non-KVM.
(4) also belongs more properly in the machine type specific code. (4) is
done by mangling the default values of the SPRs, so that they will be set
correctly at reset time. Manipulating usually-static parameters of the cpu
model like this is kind of ugly, especially since the values used really
have more to do with the platform than the cpu.
The spapr code already has places for PAPR specific initializations of
register state in spapr_cpu_reset(), so move this handling there.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
In cpu_ppc_set_papr() the UPRT and GTSE bits of the LPCR default value are
initialized based on on ppc64_radix_guest(). Which seems reasonable,
except that ppc64_radix_guest() is based on spapr->patb_entry which is
only set up in spapr_machine_reset, called _after_ cpu_ppc_set_papr() for
boot cpus. Well, and the fact that modifying the SPR default value for an
instance rather than a class is kind of yucky.
The initialization here is really only necessary or valid for
hotplugged cpus; the base cpu initialization already sets a value
that's good enough for the boot cpus until the guest uses an hcall to
configure it's preferred MMU mode.
So, move this initialization to the rtas_start_cpu() path, at which point
ppc64_radix_guest() will have a sensible value, to make sure secondary cpus
come up in an MMU mode matching the existing cpus.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
There are several places in spapr_hcall.c where we need to update the LPCR
value on all CPUs. We do this with the set_spr() helper. That's not
really correct because this directly sets the SPR value, without going
through the ppc_store_lpcr() helper which may need to update state based
on the LPCR change.
In fact, set_spr() is only ever used for the LPCR, so replace it with an
explicit LPCR updated which uses the right low-level helper. While we're
there, move the CPU_FOREACH() which was in every one of the callers into
the new helper: set_all_lpcrs().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Under PAPR, only the boot CPU is active when the system starts. Other cpus
must be explicitly activated using an RTAS call. The entry state for the
boot and secondary cpus isn't identical, but it has some things in common.
We're going to add a bit more common setup later, too, so to simplify
make a helper which sets up the common entry state for both boot and
secondary cpu threads.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
rtas_start_cpu() calls spapr_cpu_update_tb_offset() and
spapr_cpu_set_endianness() to initialize certain things in the new cpu's
state. This is the only caller of those helpers, and they're each only
a few lines long, so we might as well just fold them into the caller.
In addition, those helpers initialize state on the new cpu to match that of
the first cpu. That will generally work, but might be at least logically
incorrect if the first cpu has been set offline by the guest. So, instead
base the state on that of the cpu invoking the RTAS call, which is
obviously active already.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
This makes several minor cleanups to these functions:
* Follow usual convention of an early exit on error, rather than having
most of the body in an if
* Clearer naming of cpu and cpu_. Now callcpu is the cpu from which the
RTAS call is invoked, newcpu is the cpu which we're starting
* Use cpu_synchronize_state() instead of kvm_cpu_synchronize_state()
directly
* Remove pointless comment describing what cpu_synchronize_state() does
* Use ppc_store_lpcr() instead of directly writing the register field
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Current POWER cpus allow for a VRMA, a special mapping which describes a
guest's view of memory when in real mode (MMU off, from the guest's point
of view). Older cpus didn't have that which meant that to support a guest
a special host-contiguous region of memory was needed to give the guest its
Real Mode Area (RMA).
KVM used to provide special calls to allocate a contiguous RMA for those
cases. This was useful in the early days of KVM on Power to allow it to be
tested on PowerPC 970 chips as used in Macintosh G5 machines. Now, those
machines are so old as to be almost irrelevant.
The normal qemu deprecation process would require this to be marked
deprecated then removed in 2 releases. However, this can only be used
with corresponding support in the host kernel - which was dropped
years ago (in c17b98cf "KVM: PPC: Book3S HV: Remove code for PPC970
processors" of 2014-12-03 to be precise). Therefore it should be ok
to drop this immediately.
Just to be clear this only affects *KVM HV* guests with PowerPC 970,
and those already require an ancient host kernel. TCG and KVM PR
guests with PowerPC 970 should still work.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Thomas Huth <thuth@redhat.com>
Although the order doesn't really matter at the moment, it's possible
other initializastions could depend on the compatiblity mode, so make sure
we set it first in spapr_cpu_reset().
While we're at it drop the test against first_cpu. Setting the compat mode
to the value it already has is redundant, but harmless, so we might as well
make a small simplification to the code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
The new property ibm,dynamic-memory-v2 allows memory to be represented
in a more compact manner in device tree.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Convert PPCE500Params to PCCE500MachineClass which it essentially is,
and introduce PCCE500MachineState to keep track of E500 specific
state instead of adding global variables or extra parameters to
functions when we need to keep data beyond machine init
(i.e. make it look like typical fully defined machine).
It's pretty shallow conversion instead of currently used trivial
DEFINE_MACHINE() macro. It adds extra 60LOC of boilerplate code
of full machine definition.
The patch on top[1] will use PCCE500MachineState to keep track of
platform_bus device and add E500Plate specific machine class
to use HOTPLUG_HANDLER for explicitly initializing dynamic
sysbus devices at the time they are added instead of delaying
it to machine done time by platform_bus_init_notify() which is
being removed.
1) <1523551221-11612-3-git-send-email-imammedo@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Now recent kernels (i.e. since linux-stable commit a346137e9142
("powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes")
support this property to mark initially memory-less NUMA nodes as "possible"
to allow further memory hot-add to them.
Advertise this property for pSeries machines to let guest kernels detect
maximum supported node configuration and benefit from kernel side change
when hot-add memory to specific, possibly empty before, NUMA node.
Signed-off-by: Serhii Popovych <spopovyc@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The env->slb_nr field gives the size of the SLB (Segment Lookaside Buffer).
This is another static-after-initialization parameter of the specific
version of the 64-bit hash MMU in the CPU. So, this patch folds the field
into PPCHash64Options with the other hash MMU options.
This is a bit more complicated that the things previously put in there,
because slb_nr was foolishly included in the migration stream. So we need
some of the usual dance to handle backwards compatible migration.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
The ci_large_pages boolean in CPUPPCState is only relevant to 64-bit hash
MMU machines, indicating whether it's possible to map large (> 4kiB) pages
as cache-inhibitied (i.e. for IO, rather than memory). Fold it as another
flag into the PPCHash64Options structure.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Currently env->mmu_model is a bit of an unholy mess of an enum of distinct
MMU types, with various flag bits as well. This makes which bits of the
field should be compared pretty confusing.
Make a start on cleaning that up by moving two of the flags bits -
POWERPC_MMU_1TSEG and POWERPC_MMU_AMR - which are specific to the 64-bit
hash MMU into a new flags field in PPCHash64Options structure.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
env->sps contains page size encoding information as an embedded structure.
Since this information is specific to 64-bit hash MMUs, split it out into
a separately allocated structure, to reduce the basic env size for other
cpus. Along the way we make a few other cleanups:
* Rename to PPCHash64Options which is more in line with qemu name
conventions, and reflects that we're going to merge some more hash64
mmu specific details in there in future. Also rename its
substructures to match qemu conventions.
* Move structure definitions to the mmu-hash64.[ch] files.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
As a rule we prefer to pass PowerPCCPU instead of CPUPPCState, and this
change will make some things simpler later on.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Since commit 7da79a167a, the machine class init function registers
dynamic sysbus device types it supports. Passing an unsupported device
type on the command line causes QEMU to exit with an error message
just after machine init.
It is hence not needed to do the same sanity check at machine reset.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This reverts commit b556854bd8.
Leave change @node type from uint32_t to to int from reverted commit
because node < 0 is always false.
Note that implementing capability or some trick to detect if guest
kernel does not support hot-add to memory: this returns previous
behavour where memory added to first non-empty node.
Signed-off-by: Serhii Popovych <spopovyc@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Both spapr_irq_alloc() and spapr_irq_alloc_block() have an errp
parameter, but they don't use it if XICS hasn't been initialized
yet.
This is doubly wrong:
- all callers do pass a non-null Error **, ie, they expect an error
to be propagated in case of failure
- XICS obviously needs to be initialized before anything starts allocating
IRQs
So this patch turns the check into an assert.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The existing UNINState actually represents the PCI/AGP host bridge stage so
rename it accordingly.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Do this for both the uninorth main and uninorth u3 AGP buses, using the main
PCI bus for each machine (this ensures the IO addresses still match those
used by OpenBIOS).
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Now that the OpenPIC is wired up via the board, we can now remove our temporary
PIC qdev pointer property and replace it with an object link instead.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Instead wire up the PCI/AGP host bridges in mac_newworld.c. Now this is complete
it is possible to move the initialisation of the PCI hole alias into
pci_u3_agp_init().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Instead wire up the PCI/AGP host bridges in mac_newworld.c. Now this is complete
it is possible to move the initialisation of the PCI hole alias into
pci_unin_main_init().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since the IO address space is fixed to use the standard system IO address
space then we can also use the opportunity to remove the address_space_io
parameter from pci_pmac_init() and pci_pmac_u3_init().
Note we also move the default mac99 PCI bus to the end of the initialisation
list so that it becomes the default destination for any devices specified
via -device without an explicit PCI bus provided.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since the macio device has a link to the PIC device, we can now wire up the
IRQs directly via qdev GPIOs rather than having to use an intermediate array.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Introduce constants for the pre-defined Old World IRQs to help keep things
readable.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This simplifies the Old World machine to simply mapping the ISA memory region
into the main address space.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Instead wire up the grackle device inside the Mac Old World machine.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is the first step towards removing the old-style pci_grackle_init()
function. Following on from the previous commit we can now pass the heathrow
device as an object link and wire up the heathrow IRQs via qdev GPIOs.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Instead wire up heathrow to the CPU and grackle PCI host using qdev GPIOs.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is in preparation for moving the device wiring into the New World machine.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
After QOMification this is clearly no longer needed (and possibly hasn't been
for some time).
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 593c181160: "PPC: Newworld: Add second uninorth control register set"
added a second set of uninorth registers at 0xf3000000.
Testing MacOS 9.2 to MacOS X 10.4 reveals no accesses to this address and I
can't find any reference to it in Apple's Core99.cpp source so I'm assuming
that this was the result of another bug that has now been fixed.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Create a new function serial_max_hds() which returns the number of
serial ports defined by the user. This is needed only by spapr.
This allows us to remove the MAX_SERIAL_PORTS define.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180420145249.32435-14-peter.maydell@linaro.org
The ISA serial port handling in serial-isa.c imposes a limit
of 4 serial ports. This is because we only know of 4 IO port
and IRQ settings for them, and is unrelated to the generic
MAX_SERIAL_PORTS limit, though they happen to both be set at
4 currently.
Use a new MAX_ISA_SERIAL_PORTS wherever that is the correct
limit to be checking against.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180420145249.32435-11-peter.maydell@linaro.org
Change all the uses of serial_hds[] to go via the new
serial_hd() function. Code change produced with:
find hw -name '*.[ch]' | xargs sed -i -e 's/serial_hds\[\([^]]*\)\]/serial_hd(\1)/g'
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20180420145249.32435-8-peter.maydell@linaro.org
We only emulate timer running at CPU frequency which is what most
guests expect so set the frequency to match real hardware. This also
allows setting clock multipliers which caused slowdown previously due
to wrong timer frequency.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
At the moment the device tree produced by the H_CAS handler has no
reserved map initialized at all which is not correct as at least one
empty record is required to be present as a marker of the end.
This does not cause problems now as the only consumer is SLOF which
does not look at the reserved map area.
However when DTC's "Improve libfdt's memory safety" changeset hits
the QEMU upstream, there will be errors reported and crashes observed.
This fixes the problem by adding an empty entry to the reserved map,
just like create_device_tree() does already.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Make qmp_pc_dimm_device_list() return sorted by start address
list of devices so that it could be reused in places that
would need sorted list*. Reuse existing pc_dimm_built_list()
to get sorted list.
While at it hide recursive callbacks from callers, so that:
qmp_pc_dimm_device_list(qdev_get_machine(), &list);
could be replaced with simpler:
list = qmp_pc_dimm_device_list();
* follow up patch will use it in build_srat()
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au> for ppc part
Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Using log unimp is more appropriate for these messages and this also
silences them by default so they won't clobber make check output when
tests are added for this board.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
With the new "--nic" command line parameter option, the "old" way of
specifying a NIC model via the nd_table[] is becoming more prominent
again. But for the pseries "spapr-vlan" device, there is a confusing
discrepancy between the model name that is used for "--device" (i.e.
"spapr-vlan") and the model name that has to be used for "--net nic"
or the new "--nic" parameter (i.e. "ibmveth"). Since "spapr-vlan" is
the "real" name of the device, let's allow "spapr-vlan" to be used
as model name for the nd_table[] entries, too.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch moves the gap between u-boot and kernel at the correct location.
Signed-off-by: David Engraf <david.engraf@sysgo.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The global hack for creating SCSI devices has recently been removed,
but this apparently broke SCSI devices on some boards that were not
ready for this change yet. For the 40p machine you now get:
$ ppc64-softmmu/qemu-system-ppc64 -M 40p -cdrom x.iso
qemu-system-ppc64: -cdrom x.iso: machine type does not support if=scsi,bus=0,unit=2
Fix it by providing a lsi53c810_create() function that takes care
of calling scsi_bus_legacy_handle_cmdline() after creating the
corresponding SCSI controller.
Fixes: 1454509726
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* SCSI fix to pass maximum transfer size (Daniel Barboza)
* chardev fixes and improved iothread support (Daniel Berrangé, Peter)
* checkpatch tweak (Eric)
* make help tweak (Marc-André)
* make more PCI NICs available with -net or -nic (myself)
* change default q35 NIC to e1000e (myself)
* SCSI support for NDOB bit (myself)
* membarrier system call support (myself)
* SuperIO refactoring (Philippe)
* miscellaneous cleanups and fixes (Thomas)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJapqaMAAoJEL/70l94x66DQoUH/Rvg+a8giz/SrEA4P8D3Cb2z
4GNbNUUoy4oU0ltD5IAMskMwpOsvl1batE0D+pKIlfO9NV4+Cj2kpgo0p9TxoYqM
VCby3wRtx27zb5nVytC6M++iIKXmeEMqXmFw61I6umddNPSl4IR3hiHEE0DM+7dV
UPIOvJeEiazyQaw3Iw+ZctNn8dDBKc/+6oxP9xRcYTaZ6hB4G9RZkqGNNSLcJkk7
R0UotdjzIZhyWMOkjIwlpTF4sWv8gsYUV4bPYKMYho5B0Obda2dBM3I1kpA8yDa/
xZ5lheOaAVBZvM5aMIcaQPa65MO9hLyXFmhMOgyfpJhLBBz6Qpa4OLLI6DeTN+0=
=UAgA
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Record-replay lockstep execution, log dumper and fixes (Alex, Pavel)
* SCSI fix to pass maximum transfer size (Daniel Barboza)
* chardev fixes and improved iothread support (Daniel Berrangé, Peter)
* checkpatch tweak (Eric)
* make help tweak (Marc-André)
* make more PCI NICs available with -net or -nic (myself)
* change default q35 NIC to e1000e (myself)
* SCSI support for NDOB bit (myself)
* membarrier system call support (myself)
* SuperIO refactoring (Philippe)
* miscellaneous cleanups and fixes (Thomas)
# gpg: Signature made Mon 12 Mar 2018 16:10:52 GMT
# gpg: using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream: (69 commits)
tcg: fix cpu_io_recompile
replay: update documentation
replay: save vmstate of the asynchronous events
replay: don't process async events when warping the clock
scripts/replay-dump.py: replay log dumper
replay: avoid recursive call of checkpoints
replay: check return values of fwrite
replay: push replay_mutex_lock up the call tree
replay: don't destroy mutex at exit
replay: make locking visible outside replay code
replay/replay-internal.c: track holding of replay_lock
replay/replay.c: bump REPLAY_VERSION again
replay: save prior value of the host clock
replay: added replay log format description
replay: fix save/load vm for non-empty queue
replay: fixed replay_enable_events
replay: fix processing async events
cpu-exec: fix exception_index handling
hw/i386/pc: Factor out the superio code
hw/alpha/dp264: Use the TYPE_SMC37C669_SUPERIO
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# default-configs/i386-softmmu.mak
# default-configs/x86_64-softmmu.mak
This adds a possibility for the platform to tell VFIO not to emulate MSIX
so MMIO memory regions do not get split into chunks in flatview and
the entire page can be registered as a KVM memory slot and make direct
MMIO access possible for the guest.
This enables the entire MSIX BAR mapping to the guest for the pseries
platform in order to achieve the maximum MMIO preformance for certain
devices.
Tested on:
LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Since the PC87312 inherits this abstract model, we remove the I8042
instance in the PREP machine.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20180308223946.26784-14-f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au> (hw/ppc)
Message-Id: <20180308223946.26784-6-f4bug@amsat.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au> (hw/ppc)
Message-Id: <20180308223946.26784-4-f4bug@amsat.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After reviewing a patch from Philippe that removes block-backend.h
from hw/lm32/milkymist.c, I noticed that this header is included
unnecessarily in a lot of other files, too. Remove those unneeded
includes to speed up the compilation process a little bit.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1518684912-31637-1-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove the hard-coded list of PCI NIC names; instead, fill an array
using all PCI devices listed under DEVICE_CATEGORY_NETWORK. Keep
the old shortcut "virtio" for virtio-net-pci.
Suggested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch fixes an incorrect behavior when the -kernel argument has been
specified without -bios. In this case the kernel was loaded twice. At address
32M as a raw image and afterwards by load_elf/load_uimage at the
corresponding load address. In this case the region for the device tree and
the raw kernel image may overlap.
The patch fixes the behavior by loading the kernel image once with
load_elf/load_uimage and skips loading the raw image.
When here do not use bios_name/size for the kernel and use a more generic
name called payload_name/size.
New in v3: dtb must be stored between kernel and initrd because Linux can
handle the dtb only within the first 64MB. Add a comment to
clarify the behavior.
Signed-off-by: David Engraf <david.engraf@sysgo.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Linux kernel commit 2a9d832cc9aae21ea827520fef635b6c49a06c6d
(of: Add bindings for chosen node, stdout-path) deprecated chosen property
"linux,stdout-path" and "stdout".
Introduce the new property "stdout-path" and continue supporting the older
property to remain compatible with existing/older firmware. This older property
can be deprecated after 5 years.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The sxxm (speculative execution exploit mitigation) machine type is a
variant of the 2.12 machine type with workarounds for speculative
execution vulnerabilities enabled by default.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Convert cap-ibs (indirect branch speculation) to a custom spapr-cap
type.
All tristate caps have now been converted to custom spapr-caps, so
remove the remaining support for them.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
[dwg: Don't explicitly list "?"/help option, trust convention]
[dwg: Fold tristate removal into here, to not break bisect]
[dwg: Fix minor style problems]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There are currently 2 implemented types of spapr-caps, boolean and
tristate. However there may be a need for caps which don't fit either of
these options. Add a custom capability type for which a list of custom
valid strings can be specified and implement the get/set functions for
these. Also add a field for help text to describe the available options.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
[dwg: Change "help" option to "?" matching qemu conventions]
[dwg: Add ATTRIBUTE_UNUSED to avoid breaking bisect]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Move the remaining comment into macio.c for reference, then remove the
macio_init() function and instantiate the macio devices for both Old World
and New World machines via qdev_init_nofail() directly.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Also switch macio_newworld_realize() over to use it rather than using the pic_mem
memory region directly.
Now that both Old World and New World macio devices no longer make use of the
pic_mem memory region directly, we can remove it.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is needed before the next patch because the target-dependent kvm stub
uses the existing kvm_openpic_connect_vcpu() declaration, making it impossible
to move the device-specific declarations into the same file without breaking
ppc-linux-user compilation.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Also switch macio_oldworld_realize() over to use it rather than using the pic_mem
memory region directly.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This enables the device to be made available during the setup of the Old World
machine. In order to pass back the previous set of IRQs we temporarily introduce
a new pic_irqs parameter until it can be removed.
An additional benefit of this change is that it is also possible to remove the
pic_mem pointer used for macio by accessing the memory region via sysbus.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Now that the ESCC device is instantiated directly via qdev, move it to within
the macio device and wire up the IRQs and memory regions using the sysbus API.
This enables to remove the now-obsolete escc_mem parameter to the macio_init()
function.
(Note this patch also contains small touch-ups to the formatting in
macio_escc_legacy_setup() and ppc_heathrow_init() in order to keep checkpatch
happy)
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
VSMT must be set in order to compute VCPU ids. This means that the
following functions must not be called before spapr_set_vsmt_mode()
was called:
- spapr_vcpu_id()
- spapr_is_thread0_in_vcore()
- xics_max_server_number()
We had a recent regression where the latter would be called before VSMT
was set, and broke migration of some old machine types. This patch
adds assert() in the above functions to avoid problems in the future.
Also, since VSMT is really a CPU related thing, spapr_set_vsmt_mode() is
now called from spapr_init_cpus(), just before the first VSMT user.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Some older machine types create more ICPs than needed. We hence
need to register up to xics_max_server_number() dummy ICPs to
accomodate the migration of these machine types.
Recent VSMT rework changed xics_max_server_number() to return
DIV_ROUND_UP(max_cpus * spapr->vsmt, smp_threads)
instead of
DIV_ROUND_UP(max_cpus * kvmppc_smt_threads(), smp_threads);
The change is okay but it requires spapr->vsmt to be set, which
isn't the case with the current code. This causes the formula to
return zero and we don't create dummy ICPs. This breaks migration
of older guests as reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=1549087
The dummy ICP workaround doesn't really have a dependency on XICS
itself. But it does depend on proper VCPU id numbering and it must
be applied before creating vCPUs (ie, creating real ICPs). So this
patch moves the workaround to spapr_init_cpus(), which already
assumes VSMT to be set.
Fixes: 72194664c8 ("spapr: use spapr->vsmt to compute VCPU ids")
Reported-by: Lukas Doktor <ldoktor@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add emulation of aCube Sam460ex board based on AMCC 460EX embedded SoC.
This is not a complete implementation yet with a lot of components
still missing but enough for the U-Boot firmware to start and to boot
a Linux kernel or AROS.
Signed-off-by: François Revol <revol@free.fr>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is the PCIX controller found in newer 440 core SoCs e.g. the
AMMC 460EX. The device tree refers to this as plb-pcix compared to
the plb-pci controller in older 440 SoCs.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
[dwg: Remove hwaddr from trace-events, that doesn't work with some
trace backends]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 5d0fb1508e "spapr: consolidate the VCPU id numbering logic
in a single place" introduced a helper to detect thread0 of a virtual
core based on its VCPU id. This is used to create CPU core nodes in
the DT, but it is broken in TCG.
$ qemu-system-ppc64 -nographic -accel tcg -machine dumpdtb=dtb.bin \
-smp cores=16,maxcpus=16,threads=1
$ dtc -f -O dts dtb.bin | grep POWER8
PowerPC,POWER8@0 {
PowerPC,POWER8@8 {
instead of the expected 16 cores that we get with KVM:
$ dtc -f -O dts dtb.bin | grep POWER8
PowerPC,POWER8@0 {
PowerPC,POWER8@8 {
PowerPC,POWER8@10 {
PowerPC,POWER8@18 {
PowerPC,POWER8@20 {
PowerPC,POWER8@28 {
PowerPC,POWER8@30 {
PowerPC,POWER8@38 {
PowerPC,POWER8@40 {
PowerPC,POWER8@48 {
PowerPC,POWER8@50 {
PowerPC,POWER8@58 {
PowerPC,POWER8@60 {
PowerPC,POWER8@68 {
PowerPC,POWER8@70 {
PowerPC,POWER8@78 {
This happens because spapr_get_vcpu_id() maps VCPU ids to
cs->cpu_index in TCG mode. This confuses the code in
spapr_is_thread0_in_vcore(), since it assumes thread0 VCPU
ids to have a spapr->vsmt spacing.
spapr_get_vcpu_id(cpu) % spapr->vsmt == 0
Actually, there's no real reason to expose cs->cpu_index instead
of the VCPU id, since we also generate it with TCG. Also we already
set it explicitly in spapr_set_vcpu_id(), so there's no real reason
either to call kvm_arch_vcpu_id() with KVM.
This patch unifies spapr_get_vcpu_id() to always return the computed
VCPU id both in TCG and KVM. This is one step forward towards KVM<->TCG
migration.
Fixes: 5d0fb1508e
Reported-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The previous commit improved compile time by including less of the
generated QAPI headers. This is impossible for stuff defined directly
in qapi-schema.json, because that ends up in headers that that pull in
everything.
Move everything but include directives from qapi-schema.json to new
sub-module qapi/misc.json, then include just the "misc" shard where
possible.
It's possible everywhere, except:
* monitor.c needs qmp-command.h to get qmp_init_marshal()
* monitor.c, ui/vnc.c and the generated qapi-event-FOO.c need
qapi-event.h to get enum QAPIEvent
Perhaps we'll get rid of those some other day.
Adding a type to qapi/migration.json now recompiles some 120 instead
of 2300 out of 5100 objects.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-25-armbru@redhat.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
In my "build everything" tree, a change to the types in
qapi-schema.json triggers a recompile of about 4800 out of 5100
objects.
The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h,
qapi-types.h. Each of these headers still includes all its shards.
Reduce compile time by including just the shards we actually need.
To illustrate the benefits: adding a type to qapi/migration.json now
recompiles some 2300 instead of 4800 objects. The next commit will
improve it further.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-24-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
These devices are found in newer SoCs based on 440 core e.g. the 460EX
(http://www.embeddeddeveloper.com/assets/processors/amcc/datasheets/
PP460EX_DS2063.pdf)
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The spapr-cap cap-ibs can only have values broken or fixed as there is
no explicit workaround required. Currently setting the value workaround
for this cap will hit an assert if the guest makes the hcall
h_get_cpu_characteristics.
Report an error when attempting to apply the setting with a more helpful
error message.
Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Several places in the code need to calculate a VCPU id:
(cpu_index / smp_threads) * spapr->vsmt + cpu_index % smp_threads
(core_id / smp_threads) * spapr->vsmt (1 user)
index * spapr->vsmt (2 users)
or guess that the VCPU id of a given VCPU is the first thread of a virtual
core:
index % spapr->vsmt != 0
Even if the numbering logic isn't that complex, it is rather fragile to
have these assumptions open-coded in several places. FWIW this was
proved with recent issues related to VSMT.
This patch moves the VCPU id formula to a single function to be called
everywhere the code needs to compute one. It also adds an helper to
guess if a VCPU is the first thread of a VCORE.
Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Rename spapr_is_vcore() to spapr_is_thread0_in_vcore() for clarity]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The spapr_vcpu_id() function is an accessor actually. Let's rename it
for symmetry with the recently added spapr_set_vcpu_id() helper.
The motivation behind this is that a later patch will consolidate
the VCPU id formula in a function and spapr_vcpu_id looks like an
appropriate name.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The VCPU ids are currently computed and assigned to each individual
CPU threads in spapr_cpu_core_realize(). But the numbering logic
of VCPU ids is actually a machine-level concept, and many places
in hw/ppc/spapr.c also have to compute VCPU ids out of CPU indexes.
The current formula used in spapr_cpu_core_realize() is:
vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i
where:
cc->core_id is a multiple of smp_threads
cpu_index = cc->core_id + i
0 <= i < smp_threads
So we have:
cpu_index % smp_threads == i
cc->core_id / smp_threads == cpu_index / smp_threads
hence:
vcpu_id =
(cpu_index / smp_threads) * spapr->vsmt + cpu_index % smp_threads;
This formula was used before VSMT at the time VCPU ids where computed
at the target emulation level. It has the advantage of being useable
to derive a VPCU id out of a CPU index only. It is fitted for all the
places where the machine code has to compute a VCPU id.
This patch introduces an accessor to set the VCPU id in a PowerPCCPU object
using the above formula. It is a first step to consolidate all the VCPU id
logic in a single place.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since the introduction of VSMT in 2.11, the spacing of VCPU ids
between cores is controllable through a machine property instead
of being only dictated by the SMT mode of the host:
cpu->vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i
Until recently, the machine code would try to change the SMT mode
of the host to be equal to VSMT or exit. This allowed the rest of
the code to assume that kvmppc_smt_threads() == spapr->vsmt is
always true.
Recent commit "8904e5a75005 spapr: Adjust default VSMT value for
better migration compatibility" relaxed the rule. If the VSMT
mode cannot be set in KVM for some reasons, but the requested
CPU topology is compatible with the current SMT mode, then we
let the guest run with kvmppc_smt_threads() != spapr->vsmt.
This breaks quite a few places in the code, in particular when
calculating DRC indexes.
This is what happens on a POWER host with subcores-per-core=2 (ie,
supports up to SMT4) when passing the following topology:
-smp threads=4,maxcpus=16 \
-device host-spapr-cpu-core,core-id=4,id=core1 \
-device host-spapr-cpu-core,core-id=8,id=core2
qemu-system-ppc64: warning: Failed to set KVM's VSMT mode to 8 (errno -22)
This is expected since KVM is limited to SMT4, but the guest is started
anyway because this topology can run on SMT4 even with a VSMT8 spacing.
But when we look at the DT, things get nastier:
cpus {
...
ibm,drc-indexes = <0x4 0x10000000 0x10000004 0x10000008 0x1000000c>;
This means that we have the following association:
CPU core device | DRC | VCPU id
-----------------+------------+---------
boot core | 0x10000000 | 0
core1 | 0x10000004 | 4
core2 | 0x10000008 | 8
core3 | 0x1000000c | 12
But since the spacing of VCPU ids is 8, the DRC for core1 points to a
VCPU that doesn't exist, the DRC for core2 points to the first VCPU of
core1 and and so on...
...
PowerPC,POWER8@0 {
...
ibm,my-drc-index = <0x10000000>;
...
};
PowerPC,POWER8@8 {
...
ibm,my-drc-index = <0x10000008>;
...
};
PowerPC,POWER8@10 {
...
No ibm,my-drc-index property for this core since 0x10000010 doesn't
exist in ibm,drc-indexes above.
...
};
};
...
interrupt-controller {
...
ibm,interrupt-server-ranges = <0x0 0x10>;
With a spacing of 8, the highest VCPU id for the given topology should be:
16 * 8 / 4 = 32 and not 16
...
linux,phandle = <0x7e7323b8>;
interrupt-controller;
};
And CPU hot-plug/unplug is broken:
(qemu) device_del core1
pseries-hotplug-cpu: Cannot find CPU (drc index 10000004) to remove
(qemu) device_del core2
cpu 4 (hwid 8) Ready to die...
cpu 5 (hwid 9) Ready to die...
cpu 6 (hwid 10) Ready to die...
cpu 7 (hwid 11) Ready to die...
These are the VCPU ids of core1 actually
(qemu) device_add host-spapr-cpu-core,core-id=12,id=core3
(qemu) device_del core3
pseries-hotplug-cpu: Cannot find CPU (drc index 1000000c) to remove
This patches all the code in hw/ppc/spapr.c to assume the VSMT
spacing when manipulating VCPU ids.
Fixes: 8904e5a750
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Change the macro that generates the vmstate migration field and the needed
function for the spapr-caps to take the full spapr-cap name. This has
the benefit of meaning this instance will be picked up when greping
for the spapr-caps and making it more obvious what this macro is doing.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Move necessary stuff in escc.h and update type names.
Remove slavio_serial_ms_kbd_init().
Fix code style problems reported by checkpatch.pl
Update mac_newworld, mac_oldworld and sun4m to use directly the
QDEV interface.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Newer kernels have a htab resize capability when adding or remove
memory. At these situations, the guest kernel might reallocate its
htab to a more suitable size based on the resulting memory.
However, we're not setting the new value back into the machine state
when a KVM guest resizes its htab. At first this doesn't seem harmful,
but when migrating or saving the guest state (via virsh managedsave,
for instance) this mismatch between the htab size of QEMU and the
kernel makes the guest hangs when trying to load its state.
Inside h_resize_hpt_commit, the hypercall that commits the hash page
resize changes, let's set spapr->htab_shift to the new value if we're
sure that kvmppc_resize_hpt_commit were successful.
While we're here, add a "not RADIX" sanity check as it is already done
in the related hypercall h_resize_hpt_prepare.
Fixes: https://github.com/open-power-host-os/qemu/issues/28
Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add the relevant hooks as required for the MacOS timer calibration and delayed
SR interrupt.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This allows us to more easily differentiate between the timebase frequency used
to calibrate the MacOS timers and the actual frequency of the hardware clock as
indicated by CUDA_TIMER_FREQ.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
[dwg: Revert some extraneous changes which break compile]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We ignore silently the value of smp_threads when we set
the default VSMT value, and if smp_threads is greater than VSMT
kernel is going into trouble later.
Fixes: 8904e5a750
("spapr: Adjust default VSMT value for better migration compatibility")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit bcb5ce08cf ("spapr: Rename machine init functions for clarity")
renamed ppc_spapr_reset to spapr_machine_reset and ppc_spapr_init
to spapr_machine_init. Let's also rename the references in
comments.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Detected by Coverity (CID 1385702). This fixes the recently added hypercall
to let guests properly apply Spectre and Meltdown workarounds.
Fixes: c59704b254 "target/ppc/spapr: Add H-Call H_GET_CPU_CHARACTERISTICS"
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
qemu-common.h includes qemu/option.h, but most places that include the
former don't actually need the latter. Drop the include, and add it
to the places that actually need it.
While there, drop superfluous includes of both headers, and
separate #include from file comment with a blank line.
This cleanup makes the number of objects depending on qemu/option.h
drop from 4545 (out of 4743) to 284 in my "build everything" tree.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-20-armbru@redhat.com>
[Semantic conflict with commit bdd6a90a9e in block/nvme.c resolved]
The macro expansions of qdict_put_TYPE() and qlist_append_TYPE() need
qbool.h, qnull.h, qnum.h and qstring.h to compile. We include qnull.h
and qnum.h in the headers, but not qbool.h and qstring.h. Works,
because we include those wherever the macros get used.
Open-coding these helpers is of dubious value. Turn them into
functions and drop the includes from the headers.
This cleanup makes the number of objects depending on qapi/qmp/qnum.h
from 4551 (out of 4743) to 46 in my "build everything" tree. For
qapi/qmp/qnull.h, the number drops from 4552 to 21.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-10-armbru@redhat.com>
This cleanup makes the number of objects depending on qapi/error.h
drop from 1910 (out of 4743) to 1612 in my "build everything" tree.
While there, separate #include from file comment with a blank line,
and drop a useless comment on why qemu/osdep.h is included first.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-5-armbru@redhat.com>
[Semantic conflict with commit 34e304e975 resolved, OSX breakage fixed]
In order to enable TCE operations support in KVM, we have to inform
the KVM about VFIO groups being attached to specific LIOBNs;
the necessary bits are implemented already by IOMMU MR and VFIO.
This defines get_attr() for the SPAPR TCE IOMMU MR which makes VFIO
call the KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl and establish
LIOBN-to-IOMMU link.
This changes spapr_tce_set_need_vfio() to avoid TCE table reallocation
if the kernel supports the TCE acceleration.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
[aw - remove unnecessary sys/ioctl.h include]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Replace a large number of the fprintf(stderr, "*\n" calls with
error_report(). The functions were renamed with these commands and then
compiler issues where manually fixed.
find ./* -type f -exec sed -i \
'N;N;N;N;N;N;N;N;N;N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N;N;N;N;N;N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N;N;N;N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N;N;N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N;N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N;N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
find ./* -type f -exec sed -i \
'N; {s|fprintf(stderr, "\(.*\)\\n"\(.*\));|error_report("\1"\2);|Ig}' \
{} +
Some lines were then manually tweaked to pass checkpatch and some curly
braces were added to match QEMU style.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Cc: qemu-ppc@nongnu.org
Conversions that aren't followed by exit() dropped, because they might
be inappropriate.
Also trim trailing punctuation from error messages.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180203084315.20497-10-armbru@redhat.com>
The new H-Call H_GET_CPU_CHARACTERISTICS is used by the guest to query
behaviours and available characteristics of the cpu.
Implement the handler for this new H-Call which formulates its response
based on the setting of the spapr_caps cap-cfpc, cap-sbbc and cap-ibs.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add new tristate cap cap-ibs to represent the indirect branch
serialisation capability.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add new tristate cap cap-sbbc to represent the speculation barrier
bounds checking capability.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add new tristate cap cap-cfpc to represent the cache flush on privilege
change capability.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_caps are used to represent the level of support for various
capabilities related to the spapr machine type. Currently there is
only support for boolean capabilities.
Add support for tristate capabilities by implementing their get/set
functions. These capabilities can have the values 0, 1 or 2
corresponding to broken, workaround and fixed.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In various place we don't correctly check if the device supports MSI or
MSI-X. This can cause devices to be advertised with MSI support, even
if they only support MSI-X (like virtio-pci-* devices for example):
ethernet@0 {
ibm,req#msi = <0x1>; <--- wrong!
.
ibm,loc-code = "qemu_virtio-net-pci:0000:00:00.0";
.
ibm,req#msi-x = <0x3>;
};
Worse, this can also cause the "ibm,change-msi" RTAS call to corrupt the
PCI status and cause migration to fail:
qemu-system-ppc64: get_pci_config_device: Bad config data: i=0x6
read: 0 device: 10 cmask: 10 wmask: 0 w1cmask:0
^^
PCI_STATUS_CAP_LIST bit which is assumed to be constant
This patch changes spapr_populate_pci_child_dt() to properly check for
MSI support using msi_present(): this ensures that PCIDevice::msi_cap
was set by msi_init() and that msi_nr_vectors_allocated() will look at
the right place in the config space.
Checking PCIDevice::msix_entries_nr is enough for MSI-X but let's add
a call to msix_present() there as well for consistency.
It also changes rtas_ibm_change_msi() to select the appropriate MSI
type in Function 1 instead of always selecting plain MSI. This new
behaviour is compliant with LoPAPR 1.1, as described in "Table 71.
ibm,change-msi Argument Call Buffer":
Function 1: If Number Outputs is equal to 3, request to set to a new
number of MSIs (including set to 0).
If the “ibm,change-msix-capable” property exists and Number
Outputs is equal to 4, request is to set to a new number of
MSI or MSI-X (platform choice) interrupts (including set to
0).
Since MSI is the the platform default (LoPAPR 6.2.3 MSI Option), let's
check for MSI support first.
And finally, it checks the input parameters are valid, as described in
LoPAPR 1.1 "R1–7.3.10.5.1–3":
For the MSI option: The platform must return a Status of -3 (Parameter
error) from ibm,change-msi, with no change in interrupt assignments if
the PCI configuration address does not support MSI and Function 3 was
requested (that is, the “ibm,req#msi” property must exist for the PCI
configuration address in order to use Function 3), or does not support
MSI-X and Function 4 is requested (that is, the “ibm,req#msi-x” property
must exist for the PCI configuration address in order to use Function 4),
or if neither MSIs nor MSI-Xs are supported and Function 1 is requested.
This ensures that the ret_intr_type variable contains a valid MSI type
for this device, and that spapr_msi_setmsg() won't corrupt the PCI status.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
qemu-system-ppcemb has been once split of qemu-system-ppc to support
CPU page sizes < 4096 for some of the embedded 4xx PowerPC CPUs.
However, there was hardly any OS available in the wild that really
used such small page sizes (Linux uses 4096 on PPC), so there is
no known recent use case for this separate build anymore. It's
rather cumbersome to maintain a separate set of config switches for
this, and it's wasting compile and test time of all the developers
who have to build all QEMU targets to verify that their changes did
not break anything.
Except for the small CPU page sizes, qemu-system-ppc can be used as
a full replacement for qemu-system-ppcemb since it contains all the
embedded 4xx PPC boards and CPUs, too. Thus let's start the deprecation
process for qemu-system-ppcemb to see whether somebody still needs
the small page sizes or whether we could finally remove this unloved
separate build.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The vmstate description and the contained needed function for migration
of spapr_caps is the same for each cap, with the name of the cap
substituted. As such introduce a macro to allow for easier generation of
these.
Convert the three existing spapr_caps (htm, vsx, and dfp) to use this
macro.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 51f84465dd changed the compatility mode setting logic:
- machine reset only sets compatibility mode for the boot CPU
- compatibility mode is set for other CPUs when they are put online
by the guest with the "start-cpu" RTAS call
This causes a regression for machines started with max-compat-cpu:
the device tree nodes related to secondary CPU cores contain wrong
"cpu-version" and "ibm,pa-features" values, as shown below.
Guest started on a POWER8 host with:
-smp cores=2 -machine pseries,max-cpu-compat=compat7
ibm,pa-features = [18 00 f6 3f c7 c0 80 f0 80 00
00 00 00 00 00 00 00 00 80 00 80 00 80 00 00 00];
cpu-version = <0x4d0200>;
^^^
second CPU core
ibm,pa-features = <0x600f63f 0xc70080c0>;
cpu-version = <0xf000003>;
^^^
boot CPU core
The second core is advertised in raw POWER8 mode. This happens because
CAS assumes all CPUs to have the same compatibility mode. Since the
boot CPU already has the requested compatibility mode, the CAS code
does not set it for the secondary one, and exposes the bogus device
tree properties in in the CAS response to the guest.
A similar situation is observed when hot-plugging a CPU core. The
related device tree properties are generated and exposed to guest
with the "ibm,configure-connector" RTAS before "start-cpu" is called.
The CPU core is advertised to the guest in raw mode as well.
It both cases, it boils down to the fact that "start-cpu" happens too
late. This can be fixed globally by propagating the compatibility mode
of the boot CPU to the other CPUs during reset. For this to work, the
compatibility mode of the boot CPU must be set before the machine code
actually resets all CPUs.
It is not needed to set the compatibility mode in "start-cpu" anymore,
so the code is dropped.
Fixes: 51f84465dd
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A variable is already defined at the begining of the function to
hold a pointer to the CPU core object:
sPAPRCPUCore *core = SPAPR_CPU_CORE(OBJECT(dev));
No need to define it again in the pre-2.10 compatibility code snipplet.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We've got the config switch CONFIG_PPC4XX, so we should use it
in the Makefile accordingly and only include the PPC4xx boards
if this switch has been enabled. (Note: Unfortunately, the files
ppc4xx_devs.c and ppc405_uc.c still have to be included in the
build anyway to fulfil some complicated linker dependencies ...
so these are subject to a more thourough clean-up later)
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Remove dependency of possible_cpus on 1st CPU instance,
which decouples configuration data from CPU instances that
are created using that data.
Also later it would be used for enabling early cpu to numa node
configuration at runtime qmp_query_hotpluggable_cpus() should
provide a list of available cpu slots at early stage,
before machine_init() is called and the 1st cpu is created,
so that mgmt might be able to call it and use output to set
numa mapping.
Use MachineClass::possible_cpu_arch_ids() callback to set
cpu type info, along with the rest of possible cpu properties,
to let machine define which cpu type* will be used.
* for SPAPR it will be a spapr core type and for ARM/s390x/x86
a respective descendant of CPUClass.
Move parse_numa_opts() in vl.c after cpu_model is parsed into
cpu_type so that possible_cpu_arch_ids() would know which
cpu_type to use during layout initialization.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1515597770-268979-1-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
TYPE_SPAPR_PCI_HOST_BRIDGE is the only dynamic sysbus device not
rejected by ppc_spapr_reset(), so it can be the only entry on the
allowed list.
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Alexander Graf <agraf@suse.de>
Cc: qemu-ppc@nongnu.org
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171125151610.20547-5-ehabkost@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
platform_bus_create_devtree() already rejects all dynamic sysbus
devices except TYPE_ETSEC_COMMON, so register it as the only
allowed dynamic sysbus device for the ppce500 machine-type.
Cc: Alexander Graf <agraf@suse.de>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: qemu-ppc@nongnu.org
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171125151610.20547-4-ehabkost@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The existing has_dynamic_sysbus flag makes the machine accept
every user-creatable sysbus device type on the command-line.
Replace it with a list of allowed device types, so machines can
easily accept some sysbus devices while rejecting others.
To keep exactly the same behavior as before, the existing
has_dynamic_sysbus=true assignments are replaced with a
TYPE_SYS_BUS_DEVICE entry on the allowed list. Other patches
will replace the TYPE_SYS_BUS_DEVICE entries with more specific
lists of devices.
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: qemu-arm@nongnu.org
Cc: qemu-ppc@nongnu.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171125151610.20547-2-ehabkost@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
When skiboot starts, it first clears the CPU structs for all possible
CPUs on a system :
for (i = 0; i <= cpu_max_pir; i++)
memset(&cpu_stacks[i].cpu, 0, sizeof(struct cpu_thread));
On POWER9, cpu_max_pir is quite big, 0x7fff, and the skiboot cpu_stacks
array overlaps with the memory region in which QEMU maps the initramfs
file. Move it upwards in memory to keep it safe.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The XSCOM base address of the core chiplet was wrongly calculated. Use
the OPAL macros to fix that and do a couple of renames.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
These are useful when instantiating device models which are shared
between the POWER8 and the POWER9 processor families.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When addressed by XSCOM, the first core has the 0x20 chiplet ID but
the CPU PIR can start at 0x0.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
commit 1ed9c8af50 ("target/ppc: Add POWER9 DD2.0 model information")
deprecated the POWER9 model v1.0.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
fa98fbfc "PC: KVM: Support machine option to set VSMT mode" introduced the
"vsmt" parameter for the pseries machine type, which controls the spacing
of the vcpu ids of thread 0 for each virtual core. This was done to bring
some consistency and stability to how that was done, while still allowing
backwards compatibility for migration and otherwise.
The default value we used for vsmt was set to the max of the host's
advertised default number of threads and the number of vthreads per vcore
in the guest. This was done to continue running without extra parameters
on older KVM versions which don't allow the VSMT value to be changed.
Unfortunately, even that smaller than before leakage of host configuration
into guest visible configuration still breaks things. Specifically a guest
with 4 (or less) vthread/vcore will get a different vsmt value when
running on a POWER8 (vsmt==8) and POWER9 (vsmt==4) host. That means the
vcpu ids don't line up so you can't migrate between them, though you should
be able to.
Long term we really want to make vsmt == smp_threads for sufficiently
new machine types. However, that means that qemu will then require a
sufficiently recent KVM (one which supports changing VSMT) - that's still
not widely enough deployed to be really comfortable to do.
In the meantime we need some default that will work as often as
possible. This patch changes that default to 8 in all circumstances.
This does change guest visible behaviour (including for existing
machine versions) for many cases - just not the most common/important
case.
Following is case by case justification for why this is still the least
worst option. Note that any of the old behaviours can still be duplicated
after this patch, it's just that it requires manual intervention by
setting the vsmt property on the command line.
KVM HV on POWER8 host:
This is the overwhelmingly common case in production setups, and is
unchanged by design. POWER8 hosts will advertise a default VSMT mode
of 8, and > 8 vthreads/vcore isn't permitted
KVM HV on POWER7 host:
Will break, but POWER7s allowing KVM were never released to the public.
KVM HV on POWER9 host:
Not yet released to the public, breaking this now will reduce other
breakage later.
KVM HV on PowerPC 970:
Will theoretically break it, but it was barely supported to begin with
and already required various user visible hacks to work. Also so old
that I just don't care.
TCG:
This is the nastiest one; it means migration of TCG guests (without
manual vsmt setting) will break. Since TCG is rarely used in production
I think this is worth it for the other benefits. It does also remove
one more barrier to TCG<->KVM migration which could be interesting for
debugging applications.
KVM PR:
As with TCG, this will break migration of existing configurations,
without adding extra manual vsmt options. As with TCG, it is rare in
production so I think the benefits outweigh breakages.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
At present if we require a vsmt mode that's not equal to the kernel's
default, and the kernel doesn't let us change it (e.g. because it's an old
kernel without support) then we always fail.
But in fact we can cope with the kernel having a different vsmt as long as
a) it's >= the actual number of vthreads/vcore (so that guest threads
that are supposed to be on the same core act like it)
b) it's a submultiple of the requested vsmt mode (so that guest threads
spaced by the vsmt value will act like they're on different cores)
Allowing this case gives us a bit more freedom to adjust the vsmt behaviour
without breaking existing cases.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
We recently had some discussions that were sidetracked for a while, because
nearly everyone misapprehended the purpose of the 'max_threads' field in
the compatiblity modes table. It's all about guest expectations, not host
expectations or support (that's handled elsewhere).
In an attempt to avoid a repeat of that confusion, rename the field to
'max_vthreads' and add an explanatory comment.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
The options field here is intended to list the available values for the
capability. It's not used yet, because the existing capabilities are
boolean.
We're going to add capabilities that aren't, but in that case the info on
the possible values can be folded into the .description field.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently spapr_caps are tied to boolean values (on or off). This patch
reworks the caps so that they can have any uint8 value. This allows more
capabilities with various values to be represented in the same way
internally. Capabilities are numbered in ascending order. The internal
representation of capability values is an array of uint8s in the
sPAPRMachineState, indexed by capability number.
Capabilities can have their own name, description, options, getter and
setter functions, type and allow functions. They also each have their own
section in the migration stream. Capabilities are only migrated if they
were explictly set on the command line, with the assumption that
otherwise the default will match.
On migration we ensure that the capability value on the destination
is greater than or equal to the capability value from the source. So
long at this remains the case then the migration is considered
compatible and allowed to continue.
This patch implements generic getter and setter functions for boolean
capabilities. It also converts the existings cap-htm, cap-vsx and
cap-dfp capabilities to this new format.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Decimal Floating Point has been available on POWER7 and later (server)
cpus. However, it can be disabled on the hypervisor, meaning that it's
not available to guests.
We currently handle this by conditionally advertising DFP support in the
device tree depending on whether the guest CPU model supports it - which
can also depend on what's allowed in the host for -cpu host. That can lead
to confusion on migration, since host properties are silently affecting
guest visible properties.
This patch handles it by treating it as an optional capability for the
pseries machine type.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
We currently have some conditionals in the spapr device tree code to decide
whether or not to advertise the availability of the VMX (aka Altivec) and
VSX vector extensions to the guest, based on whether the guest cpu has
those features.
This can lead to confusion and subtle failures on migration, since it makes
a guest visible change based only on host capabilities. We now have a
better mechanism for this, in spapr capabilities flags, which explicitly
depend on user options rather than host capabilities.
Rework the advertisement of VSX and VMX based on a new VSX capability. We
no longer bother with a conditional for VMX support, because every CPU
that's ever been supported by the pseries machine type supports VMX.
NOTE: Some userspace distributions (e.g. RHEL7.4) already rely on
availability of VSX in libc, so using cap-vsx=off may lead to a fatal
SIGILL in init.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Now that the "pseries" machine type implements optional capabilities (well,
one so far) there's the possibility of having different capabilities
available at either end of a migration. Although arguably a user error,
it would be nice to catch this situation and fail as gracefully as we can.
This adds code to migrate the capabilities flags. These aren't pulled
directly into the destination's configuration since what the user has
specified on the destination command line should take precedence. However,
they are checked against the destination capabilities.
If the source was using a capability which is absent on the destination,
we fail the migration, since that could easily cause a guest crash or other
bad behaviour. If the source lacked a capability which is present on the
destination we warn, but allow the migration to proceed.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
This adds an spapr capability bit for Hardware Transactional Memory. It is
enabled by default for pseries-2.11 and earlier machine types. with POWER8
or later CPUs (as it must be, since earlier qemu versions would implicitly
allow it). However it is disabled by default for the latest pseries-2.12
machine type.
This means that with the latest machine type, HTM will not be available,
regardless of CPU, unless it is explicitly enabled on the command line.
That change is made on the basis that:
* This way running with -M pseries,accel=tcg will start with whatever cpu
and will provide the same guest visible model as with accel=kvm.
- More specifically, this means existing make check tests don't have
to be modified to use cap-htm=off in order to run with TCG
* We hope to add a new "HTM without suspend" feature in the not too
distant future which could work on both POWER8 and POWER9 cpus, and
could be enabled by default.
* Best guesses suggest that future POWER cpus may well only support the
HTM-without-suspend model, not the (frankly, horribly overcomplicated)
POWER8 style HTM with suspend.
* Anecdotal evidence suggests problems with HTM being enabled when it
wasn't wanted are more common than being missing when it was.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Because PAPR is a paravirtual environment access to certain CPU (or other)
facilities can be blocked by the hypervisor. PAPR provides ways to
advertise in the device tree whether or not those features are available to
the guest.
In some places we automatically determine whether to make a feature
available based on whether our host can support it, in most cases this is
based on limitations in the available KVM implementation.
Although we correctly advertise this to the guest, it means that host
factors might make changes to the guest visible environment which is bad:
as well as generaly reducing reproducibility, it means that a migration
between different host environments can easily go bad.
We've mostly gotten away with it because the environments considered mature
enough to be well supported (basically, KVM on POWER8) have had consistent
feature availability. But, it's still not right and some limitations on
POWER9 is going to make it more of an issue in future.
This introduces an infrastructure for defining "sPAPR capabilities". These
are set by default based on the machine version, masked by the capabilities
of the chosen cpu, but can be overriden with machine properties.
The intention is at reset time we verify that the requested capabilities
can be supported on the host (considering TCG, KVM and/or host cpu
limitations). If not we simply fail, rather than silently modifying the
advertised featureset to the guest.
This does mean that certain configurations that "worked" may now fail, but
such configurations were already more subtly broken.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Currently the pseries machine sets the compatibility mode for the
guest's cpus in two places: 1) at machine reset and 2) after CAS
negotiation.
This means that if we set or negotiate a compatiblity mode, then
hotplug a cpu, the hotplugged cpu doesn't get the right mode set and
will incorrectly have the full native features.
To correct this, we set the compatibility mode on a cpu when it is
brought online with the 'start-cpu' RTAS call. Given that we no
longer need to set the compatibility mode on all CPUs at machine
reset, so we change that to only set the mode for the boot cpu.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
It's a deprecated dummy device since QEMU v2.6.0. That should have
been enough time to allow the users to update their scripts in case
they still use it, so let's remove this legacy code now.
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Also introduce utilities to manipulate bitmasks (originaly from OPAL)
which be will be used in the model of the XIVE interrupt controller.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The 'pnv' prefix is now used for all and the routines populating the
device tree start with 'pnv_dt'. The handler of the PnvXScomInterface
is also renamed to 'dt_xscom' which should reflect that it is
populating the device tree under the 'xscom@' node of the chip.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
These two are definitely warnings. Let's use the appropriate API.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
applied using ./scripts/clean-includes
not needed since 7ebaf79556
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
applied using ./scripts/clean-includes
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
if KVM is enabled and KVM capabilities MMU radix is available,
the partition table entry (patb_entry) for the radix mode is
initialized by default in ppc_spapr_reset().
It's a problem if we want to migrate the guest to a POWER8 host
while the kernel is not started to set the value to the one
expected for a POWER8 CPU.
The "-machine max-cpu-compat=power8" should allow to migrate
a POWER9 KVM host to a POWER8 KVM host, but because patb_entry
is set, the destination QEMU tries to enable radix mode on the
POWER8 host. This fails and cancels the migration:
Process table config unsupported by the host
error while loading state for instance 0x0 of device 'spapr'
load of migration failed: Invalid argument
This patch doesn't set the PATB entry if the user provides
a CPU compatibility mode that doesn't support radix mode.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We conditionally adjust part of the guest device tree based on the
global msi_nonbroken flag. However, the main machine type code
initializes msi_nonbroken to true and there's nothing that would set
it to false again.
So replace the test with an assert().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Machine objects have two init functions - the generic QOM level
instance_init which should only do static object initialization, and
the Machine specific MachineClass::init which does the actual
construction of the machine.
In spapr the functions implementing these two have names -
ppc_machine_initfn() and ppc_spapr_init() - which don't correspond closely
to either of those. To prevent people (read, me) from confusing which is
which, rename them spapr_instance_init() and spapr_machine_init() to
make it clearer which is which.
While we're there rename ppc_spapr_reset() to spapr_machine_reset() to
match.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
According to LoPAPR 1.1 B.6.12, the "/event-sources" node has an "interrupt-
ranges" property, the format of which is described in B.6.9.1.2 as follows:
“interrupt-ranges”
Standard property name that defines the interrupt number(s) and range(s)
handled by this unit.
prop-encoded-array: List of (int-number, range) specifications.
Int-number is encoded as with encode-int.
Range is encoded as with encode-int.
The first entry in this list shall contain the int-number associated with
the first “reg” property entry. The int-num-ber is the value representing
the interrupt source as would appear in the PowerPC External Interrupt
Architecture XISR. The range shall be the number of sequential interrupt
numbers which this unit can generate.
There's no such thing as a cell count at the end of the array, like the
one introduced by commit ffbb1705a3 in QEMU 2.8. It doesn't seem it had
any impact on existing guests and I couldn't find any related workaround
in linux. So, let's just drop the bogus lines.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
LoPAPR 1.1 B.6.9.1.2 describes the "#interrupt-cells" property of the
PowerPC External Interrupt Source Controller node as follows:
“#interrupt-cells”
Standard property name to define the number of cells in an interrupt-
specifier within an interrupt domain.
prop-encoded-array: An integer, encoded as with encode-int, that denotes
the number of cells required to represent an interrupt specifier in its
child nodes.
The value of this property for the PowerPC External Interrupt option shall
be 2. Thus all interrupt specifiers (as used in the standard “interrupts”
property) shall consist of two cells, each containing an integer encoded
as with encode-int. The first integer represents the interrupt number the
second integer is the trigger code: 0 for edge triggered, 1 for level
triggered.
This patch fixes the interrupt specifiers in the "interrupt-map" property
of the PHB node, that were setting the second cell to 8 (confusion with
IRQ_TYPE_LEVEL_LOW ?) instead of 1.
VIO devices and RTAS event sources use the same format for interrupt
specifiers: while here, we introduce a common helper to handle the
encoding details.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
--
v3: - reference public LoPAPR instead of internal PAPR+ in changelog
- change helper name to spapr_dt_xics_irq()
v2: - drop the erroneous changes to the "interrupts" prop in PCI device nodes
- introduce a common helper to encode interrupt specifiers
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
SPAPR is the last user of numa_get_node() and a bunch of
supporting code to maintain numa_info[x].addr list.
Get LMB node id from pc-dimm list, which allows to
remove ~80LOC maintaining dynamic address range
lookup list.
It also removes pc-dimm dependency on numa_[un]set_mem_node_id()
and makes pc-dimms a sole source of information about which
node it belongs to and removes duplicate data from global
numa_info.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
xics_get_qirq() is only used by the sPAPR machine. Let's move it there
and change its name to reflect its scope. It will be useful for XIVE
support which will use its own set of qirqs.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It will make synchronisation easier with the XIVE interrupt mode when
available. The 'irq' parameter refers to the global IRQ number space.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Also change the prototype to use a sPAPRMachineState and prefix them
with spapr_irq_. It will let us synchronise the IRQ allocation with
the XIVE interrupt mode when available.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The 'intc' pointer of the CPU references the interrupt presenter in
the XICS interrupt mode. When the XIVE interrupt mode is available and
activated, the machine will need to reassign this pointer to reflect
the change.
Moving this assignment under the realize routine of the CPU will ease
the process when the interrupt mode is toggled.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The sPAPR and the PowerNV core objects create the interrupt presenter
object of the CPUs in a very similar way. Let's provide a common
routine in which we use the presenter 'type' as a child identifier.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When a CPU is stopped with the 'stop-self' RTAS call, its state
'halted' is switched to 1 and, in this case, the MSR is not taken into
account anymore in the cpu_has_work() routine. Only the pending
hardware interrupts are checked with their LPCR:PECE* enablement bit.
The CPU is now also protected from the decrementer interrupt by the
LPCR:PECE* bits which are disabled in the 'stop-self' RTAS
call. Reseting the MSR is pointless.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Just like for hot unplug CPUs, when a guest is rebooted, the secondary
CPUs can be awaken by the decrementer and start entering SLOF at the
same time the boot CPU is.
To be safe, let's disable on the secondaries all the exceptions which
can cause an exit while the CPU is in power-saving mode.
Based on previous work from Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When a CPU is stopped with the 'stop-self' RTAS call, its state
'halted' is switched to 1 and, in this case, the MSR is not taken into
account anymore in the cpu_has_work() routine. Only the pending
hardware interrupts are checked with their LPCR:PECE* enablement bit.
If the DECR timer fires after 'stop-self' is called and before the CPU
'stop' state is reached, the nearly-dead CPU will have some work to do
and the guest will crash. This case happens very frequently with the
not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
occasionally fired but after 'stop' state, so no work is to be done
and the guest survives.
I suspect there is a race between the QEMU mainloop triggering the
timers and the TCG CPU thread but I could not quite identify the root
cause. To be safe, let's disable in the LPCR all the exceptions which
can cause an exit while the CPU is in power-saving mode and reenable
them when the CPU is started.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The current code assumes that only the CPU core object holds a
reference on each individual CPU object, and happily frees their
allocated memory when the core is unrealized. This is dangerous
as some other code can legitimely keep a pointer to a CPU if it
calls object_ref(), but it would end up with a dangling pointer.
Let's allocate all CPUs with object_new() and let QOM free them
when their reference count reaches zero. This greatly simplify the
code as we don't have to fiddle with the instance size anymore.
Signed-off-by: Greg Kurz <groug@kaod.org>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
While we're at it fix a couple of small errors in the 2.11 and 2.10 models
(they didn't have any real effect, but don't quite match the template).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The bus pointer in PCIDevice is basically redundant with QOM information.
It's always initialized to the qdev_get_parent_bus(), the only difference
is the type.
Therefore this patch eliminates the field, instead creating a pci_get_bus()
helper to do the type mangling to derive it conveniently from the QOM
Device object underneath.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
pci_bus_init(), pci_bus_new_inplace(), pci_bus_new() and pci_register_bus()
are misleadingly named. They're not used for initializing *any* PCI bus,
but only for a root PCI bus.
Non-root buses - i.e. ones under a logical PCI to PCI bridge - are instead
created with a direct qbus_create_inplace() (see pci_bridge_initfn()).
This patch renames the functions to make it clear they're only used for
a root bus.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
At guest reset time, we allocate a hash page table (HPT) for the guest
based on the guest's RAM size. If dynamic HPT resizing is not available we
use the maximum RAM size, if it is we use the current RAM size.
But the "current RAM size" calculation is incorrect - we just use the
"base" ram_size from the machine structure. This doesn't include any
pluggable DIMMs that are already plugged at reset time.
This means that if you try to start a 'pseries' machine with a DIMM
specified on the command line that's much larger than the "base" RAM size,
then the guest will get a woefully inadequate HPT. This can lead to a
guest freeze during boot as it runs out of HPT space during initial MMU
setup.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Migration of pseries is broken with TCG because
QEMU tries to restore KVM MMU state unconditionally.
The result is a SIGSEGV in kvm_vm_ioctl():
#0 kvm_vm_ioctl (s=0x0, type=-2146390353)
at qemu/accel/kvm/kvm-all.c:2032
#1 0x00000001003e3e2c in kvmppc_configure_v3_mmu (cpu=<optimized out>,
radix=<optimized out>, gtse=<optimized out>, proc_tbl=<optimized out>)
at qemu/target/ppc/kvm.c:396
#2 0x00000001002f8b88 in spapr_post_load (opaque=0x1019103c0,
version_id=<optimized out>) at qemu/hw/ppc/spapr.c:1578
#3 0x000000010059e4cc in vmstate_load_state (f=0x106230000,
vmsd=0x1009479e0 <vmstate_spapr>, opaque=0x1019103c0,
version_id=<optimized out>) at qemu/migration/vmstate.c:165
#4 0x00000001005987e0 in vmstate_load (f=<optimized out>, se=<optimized out>)
at qemu/migration/savevm.c:748
This patch fixes the problem by not calling the KVM function with the
TCG mode.
Fixes: d39c90f5f3 ("spapr: Fix migration of Radix guests")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The patb_entry is used to store the location of the process table in
guest memory. The msb is also used to indicate the mmu mode of the
guest, that is patb_entry & 1 << 63 ? radix_mode : hash_mode.
Currently we set this to zero in spapr_setup_hpt_and_vrma() since if
this function gets called then we know we're hash. However some code
paths, such as setting up the hpt on incoming migration of a hash guest,
call spapr_reallocate_hpt() directly bypassing this higher level
function. Since we assume radix if the host is capable this results in
the msb in patb_entry being left set so in spapr_post_load() we call
kvmppc_configure_v3_mmu() and tell the host we're radix which as
expected means addresses cannot be translated once we actually run the cpu.
To fix this move the zeroing of patb_entry into spapr_reallocate_hpt().
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
LUNs >= 256 have to be encoded with the so-called "flat space
addressing method" for virtio-scsi, where an additional bit has to
be set. SLOF already took care of this with the following commit:
https://git.qemu.org/?p=SLOF.git;a=commitdiff;h=f72a37713fea47da
(see https://bugzilla.redhat.com/show_bug.cgi?id=1431584 for details)
But QEMU does not use this encoding yet for device tree paths
that have to be handed over to SLOF to deal with the "bootindex"
property, so SLOF currently fails to boot from virtio-scsi devices
with LUNs >= 256 in the right boot order. Fix it by using the bit
to indicate the "flat space addressing method" for LUNs >= 256.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A DRC with a pending unplug request releases its associated device at
machine reset time.
In the case of LMB, when all DRCs for a DIMM device have been reset,
the DIMM gets unplugged, causing guest memory to disappear. This may
be very confusing for anything still using this memory.
This is exactly what happens with vhost backends, and QEMU aborts
with:
qemu-system-ppc64: used ring relocated for ring 2
qemu-system-ppc64: qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion
`r >= 0' failed.
The issue is that each DRC registers a QEMU reset handler, and we
don't control the order in which these handlers are called (ie,
a LMB DRC will unplug a DIMM before the virtio device using the
memory on this DIMM could stop its vhost backend).
To avoid such situations, let's reset DRCs after all devices
have been reset.
Reported-by: Mallesh N. Koti <mallesh@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The device tree nodes ibm,arch-vec-5-platform-support and ibm,pa-features
are used to communicate features of the cpu to the guest operating
system. The properties of each of these are determined based on the
selected cpu model and the availability of hypervisor features.
Currently the compatibility mode of the cpu is not taken into account.
The ibm,arch-vec-5-platform-support node is used to communicate the
level of support for various ISAv3 processor features to the guest
before CAS to inform the guests' request. The available mmu mode should
only be hash unless the cpu is a POWER9 which is not in a prePOWER9
compat mode, in which case the available modes depend on the
accelerator and the hypervisor capabilities.
The ibm,pa-featues node is used to communicate the level of cpu support
for various features to the guest os. This should only contain features
relevant to the operating mode of the processor, that is the selected
cpu model taking into account any compat mode. This means that the
compat mode should be taken into account when choosing the properties of
ibm,pa-features and they should match the compat mode selected, or the
cpu model selected if no compat mode.
Update the setting of these cpu features in the device tree as described
above to properly take into account any compat mode. We use the
ppc_check_compat function which takes into account the current processor
model and the cpu compat mode.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
KVM HV will soon support running a guest in hash mode on a POWER9 host
running in radix mode (see [1]), however the guest currently fails to
boot.
This is because the "htab_shift" value (the size of the MMU's hash
table) is added to the device tree before KVM has had a chance to
change it. If the host is in hash mode, KVM does not need to change it
and so the problem is not seen, but when the host is in radix mode a
change is required and we see a problem.
To fix this, move the call spapr_setup_hpt_and_vrma() (where
htab_shift could be changed) up a little so that it's called before
spapr_h_cas_compose_response() (where htab_shift is added to the
device tree).
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[1] See http://www.spinics.net/lists/kvm-ppc/msg13057.html
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Actual number of interrupt pins isn't known
in ppce500_init_mpic() so a hardcoded number
was used, which causes a crash with older openpic.
Instead, return the DeviceState* and change ppce500_init()
to call qdev_get_gpio_in() to get only the irq pins
which are needed.
Signed-off-by: Michael Davidsaver <mdavidsaver@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This makes the code easier to understand and it is consistent with what
we already do for PHBs.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU currently crashes when the user tries to add an spapr-pci-host-bridge
on a non-pseries machine:
$ qemu-system-ppc64 -M ppce500 -device spapr-pci-host-bridge,index=1
hw/ppc/spapr_pci.c:1535:spapr_phb_realize:
Object 0x1003dacae60 is not an instance of type spapr-machine
Aborted (core dumped)
The same thing happens with the deprecated but still available child type
spapr-pci-vfio-host-bridge.
Fix both by checking the machine type with object_dynamic_cast().
Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In order to prevent the guest from forcing the allocation of large amounts
of qemu memory (or host kernel memory, in the case of KVM HV), we limit
the size of Hashed Page Table (HPT) it is allowed to allocated, based on
its RAM size.
However, the current calculation is not correct: it only adds up the size
of plugged memory, ignoring the base memory size. This patch corrects it.
While we're there, use get_plugged_memory_size() instead of directly
calling pc_existing_dimms_capacity(). The only difference is that it
will abort on failure, which is right: a failure here indicates something
wrong within qemu.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Use a new DEFINE_TYPES() helper to simplify type registration
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
deduce core type directly from chip type instead of
maintaining type mapping in PnvChipClass::cpu_model.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
pnv core type definition doesn't have any fields that
require it to be defined at runtime. So replace code
that fills in TypeInfo at runtime with static TypeInfo
array that does the same at complie time.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
deduce cpu type directly from core type instead of
maintaining type mapping in PnvCoreClass::cpu_oc and doing
extra cpu_model parsing in pnv_core_class_init()
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
typically for cpus/core type names following convention is used
new_type_prefix-superclass_typename
make PNV core/chip to follow common convention.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
use common cpu_model prasing in vl.c and set default cpu_model
using generic MachineClass::default_cpu_type.
Beside of switching to generic infrastructure it solves several
issues.
* ppc_cpu_class_by_name() is used to deal with lower/upper case
and alias translations into actual cpu type, which fixes
'-M powernv -cpu power8' and '-M powernv -cpu power9_v1.0'
usecases which error out with:
'invalid CPU model 'FOO' for powernv machine'
* allows to switch to lower-case typenames in pnv chip/core name
(by convention typnames should be lower-case)
* replace aliased names /power8, power9, .../ with exact cpu model
names (i.e. typenames should be stable but aliases might decide to
point to other cpu model withi family or changed by kvm). It will
also help to simplify pnv_chip/core code and get rid of dependency
on cpu_model parsing.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[dwg: Updated to make DD2.0 as default POWER9 chip]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
use generic cpu_model parsing introduced by
(6063d4c0f vl.c: convert cpu_model to cpu type and set of global properties before machine_init())
it allows to:
* replace sPAPRMachineClass::tcg_default_cpu with
MachineClass::default_cpu_type
* drop cpu_parse_cpu_model() from hw/ppc/spapr.c and reuse
one in vl.c
* simplify spapr_get_cpu_core_type() by removing
not needed anymore recurrsion since alias look up
happens earlier at vl.c and spapr_get_cpu_core_type()
works only with resulted from that cpu type.
* spapr no more needs to parse/depend on being phased out
MachineState::cpu_model, all tha parsing done by generic
code and target specific callback.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[dwg: Correct minor compile error]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
consolidate 'host' core type registration by moving it from
KVM specific code into spapr_cpu_core.c, similar like it's
done in x86 target.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
replace sPAPRCPUCoreClass::cpu_class with cpu type name
since it were needed just to get that at points it were
accessed.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr core type definition doesn't have any fields that
require it to be defined at runtime. So replace code
that fills in TypeInfo at runtime with static TypeInfo
array that does the same at complie time.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
there is a dedicated callback CPUClass::parse_features
which purpose is to convert -cpu features into a set of
global properties AND deal with compat/legacy features
that couldn't be directly translated into CPU's properties.
Create ppc variant of it (ppc_cpu_parse_featurestr) and
move 'compat=val' handling from spapr_cpu_core.c into it.
That removes a dependency of board/core code on cpu_model
parsing and would let to reuse common -cpu parsing
introduced by 6063d4c0
Set "max-cpu-compat" property only if it exists, in practice
it should limit 'compat' hack to spapr machine and allow
to avoid including machine/spapr headers in target/ppc/cpu.c
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ppc_cpu_parse_features() is doing practically the same thing as
generic cpu_parse_cpu_model(). So remove duplicated impl. and
reuse generic one.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
LMB removal is completed only when the spapr_lmb_release callback
is called after all DRCs of the dimm are detached. During this
time, it is possible that a unplug request for the same dimm
arrives, trying to detach DRCs that were detached by the guest
in the first unplug_request.
BQL doesn't help in this case - the lock will prevent any concurrent
removal from happening until the end of spapr_memory_unplug_request
only. What happens is that the second unplug_request ends up calling
spapr_drc_detach in a DRC that were detached already, causing an
assert error in spapr_drc_detach (e.g
https://bugs.launchpad.net/qemu/+bug/1718118).
spapr_lmb_release uses a structure called sPAPRDIMMState, stored in the
spapr->pending_dimm_unplugs QTAIL, to track how many LMB DRCs are left
to be detached by the guest. When there are no more DRCs left, this
structure is deleted and the pc-dimm unplug handler is called to
finish the process.
This patch reuses the sPAPRDIMMState to allow unplug_request to know
if there is an ongoing unplug process for a given dimm, aborting the
unplug request in this case, by doing the following changes:
- in spapr_lmb_release callback, move the dimm state removal to the
end, after pc-dimm unplug handler. With this change we can check for
the existence of the dimm state to see if the unplug process is
done.
- use spapr_pending_dimm_unplugs_find in spapr_memory_unplug_request
to check if the dimm state exists. If positive, there is an unplug
operation already in progress for this dimm, meaning that we should
abort it and warn the user about it.
Fixes: https://bugs.launchpad.net/qemu/+bug/1718118
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
At the moment the only POWER9 model which is listed in qemu is v1.0 (aka
"DD1"). This is a very early (read, buggy) version which will never be
released to the public - it was included in qemu only for the convenience
of those doing bringup on the early silicon. For bonus points, we actually
had its PVR incorrect in the table (0x004e0000 instead of 0x004e0100). We
also never actually implemented the differences in behaviour (read, bugs)
that marked DD1 in qemu.
Now that we know the PVR for the substantially better v2.0 (DD2) chip,
include it and make it the default POWER9 in qemu. For the time being we
leave the DD1 definition in place for the poor souls (read, me) who still
need to work with DD1 hardware.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The CAS buffer is provided by SLOF. A broken SLOF could pass a silly
size: either smaller than the diff header, in which case the current
code will try to allocate 16 Exabytes of memory and g_malloc0() will
abort, or bigger than the maximum memory provisioned for SLOF (ie,
40 Megabytes), which doesn't make sense. Both cases indicate that
SLOF has a bug.
Let's print out an explicit error message and exit since rebooting as
we do with other errors would only result in a reset loop.
Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Fix format specifier that broke 32-bit builds]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The offset of the root node is guaranteed to be 0.
This doesn't fix anything, it's just trivial cleanup of the two
remaining places where this was done under hw/ppc.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add INTERFACE_CONVENTIONAL_PCI_DEVICE to all direct subtypes of
TYPE_PCI_DEVICE, except:
1) The ones that already have INTERFACE_PCIE_DEVICE set:
* base-xhci
* e1000e
* nvme
* pvscsi
* vfio-pci
* virtio-pci
* vmxnet3
2) base-pci-bridge
Not all PCI bridges are Conventional PCI devices, so
INTERFACE_CONVENTIONAL_PCI_DEVICE is added only to the subtypes
that are actually Conventional PCI:
* dec-21154-p2p-bridge
* i82801b11-bridge
* pbm-bridge
* pci-bridge
The direct subtypes of base-pci-bridge not touched by this patch
are:
* xilinx-pcie-root: Already marked as PCIe-only.
* pcie-pci-bridge: Already marked as PCIe-only.
* pcie-port: all non-abstract subtypes of pcie-port are already
marked as PCIe-only devices.
3) megasas-base
Not all megasas devices are Conventional PCI devices, so the
interface names are added to the subclasses registered by
megasas_register_types(), according to information in the
megasas_devices[] array.
"megasas-gen2" already implements INTERFACE_PCIE_DEVICE, so add
INTERFACE_CONVENTIONAL_PCI_DEVICE only to "megasas".
Acked-by: Alberto Garcia <berto@igalia.com>
Acked-by: John Snow <jsnow@redhat.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZy64HAAoJEAUWMx68W/3nTqwP/A5Gx4Qwkv5KKdpM0YLq//d+
OODmzl7Ni3a5Up1ETqGdLb84estrgY+5DISp73Rkt4a5tbT7+XKrhb4qD+93NnTe
zynY9in4C1jGxYm7YzeOhwSeIiuLZMTCLQlGdYw7/nunIFwkItUEvAFx3AG1WCJe
2Mk0lvmg4LikruDDMdzqZaJu7h5RU5sQjA7SsyrTBdsN7tNWl3rKLYGXwgzv0uz5
n2xkUgzvvnj1Bk/Adojkn05yxA86xKD/4rhFED9fjNVSjAGHMrHIWOJ70V26Cg5w
3gJ+5mesWsH+erf0JFYv0S38SyFbmIOE39Nn13D/d0o1x89P8B8cgqbi3ADTKM77
875wuIVnZzi2vIwVdxXQ9GHQ79cpXwr2fOfQ2rjT6Ll95K+u/MQG86fQiO0eJW+0
KwQVCwwh+HmCUcCogMuxAc9+F8C8qolwCi/9QXwS2yLBElHKaWDIMyTce36cW9d7
cZaKIOeSJUGNFoaWZnXN88MRuOYbdywTl+GddVAW3+VJCTYV2oi0o5fsTfxXy5AV
y7uYo/pcSj2gSZJ5GairMlB6p5iXnE8yusi1e4ZKA1x1TaSHSb6zR59lRUFr+j/L
JhUCfA85v5/elGqgkYp6UhSzFDJ2ID2oSEMQTIzfVrinOXtnf2KEh33YMbUH5qyo
yHVEu12uPe9rE6A0vWlu
=/+LV
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20170927a' into staging
Migration pull 2017-09-27
# gpg: Signature made Wed 27 Sep 2017 14:56:23 BST
# gpg: using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-migration-20170927a:
migration: Route more error paths
migration: Route errors up through vmstate_save
migration: wire vmstate_save_state errors up to vmstate_subsection_save
migration: Check field save returns
migration: check pre_save return in vmstate_save_state
migration: pre_save return int
migration: disable auto-converge during bulk block migration
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Modify the pre_save method on VMStateDescription to return an int
rather than void so that it potentially can fail.
Changed zillions of devices to make them return 0; the only
case I've made it return non-0 is hw/intc/s390_flic_kvm.c that already
had an error_report/return case.
Note: If you add an error exit in your pre_save you must emit
an error_report to say why.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170925112917.21340-2-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Using a standard QOM object link we can pass a reference to the MAC_DBDMA
controller to the MACIO_IDE object which removes the last external parameter
to macio_ide_register_dma().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
One of the reasons macio_ide_register_dma() needs to exist is because the
channel id isn't passed into the MACIO_IDE object. Pass in the channel id
using a qdev property to remove this requirement.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When running with KVM PR, if a new HPT is allocated we need to inform
KVM about the HPT address and size. This is currently done by hacking
the value of SDR1 and pushing it to KVM in several places.
Also, migration breaks the guest since it is very unlikely the HPT has
the same address in source and destination, but we push the incoming
value of SDR1 to KVM anyway.
This patch introduces a new virtual hypervisor hook so that the spapr
code can provide the correct value of SDR1 to be pushed to KVM each
time kvmppc_put_books_sregs() is called.
It allows to get rid of all the hacking in the spapr/kvmppc code and
it fixes migration of nested KVM PR.
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
and exit before uselessly trying to load it if the file does not
exists.
Issue discovered by Coverity Scan.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
PHBs can be created with an index property, in which case the machine
code automatically sets all the MMIO windows at addresses derived from
the index. Alternatively, they can be manually created without index,
but the user has to provide addresses for all MMIO windows.
The non-index way happens to be more trouble than it's worth: it's
difficult to use, keeps requiring (potentially incompatible) changes
when some new parameter needs adding, and is awkward to check for
collisions. It currently even has a bug that prevents to use two
non-index PHBs because their child DRCs are all derived from the
same index == -1 value, and, thus, collide.
This patch hence makes the index property mandatory. As a consequence,
the PHB's memory regions and BUID are now always configured according
to the index, and it is no longer possible to set them from the command
line.
This DOES BREAK backwards compat, but we don't think the non-index
PHB feature was used in practice (at least libvirt doesn't) and the
simplification is worth it.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This consolidates some duplicated code in a dedicated helpers.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The use of KVM_PPC_GET_HTAB_FD is open-coded in kvmppc_read_hptes()
and kvmppc_write_hpte().
This patch modifies kvmppc_get_htab_fd() so that it can be used
everywhere we need to access the in-kernel htab:
- add an index argument
=> only kvmppc_read_hptes() passes an actual index, all other users
pass 0
- add an errp argument to propagate error messages to the caller.
=> spapr migration code prints the error
=> hpte helpers pass &error_abort to keep the current behavior
of hw_error()
While here, this also fixes a bug in kvmppc_write_hpte() so that it
opens the htab fd for writing instead of reading as it currently does.
This never broke anything because we currently never call this code,
as explained in the changelog of commit c1385933804bb:
"This support updating htab managed by the hypervisor. Currently
we don't have any user for this feature. This actually bring the
store_hpte interface in-line with the load_hpte one. We may want
to use this when we want to emulate henter hcall in qemu for HV
kvm."
The above is still true today.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When kvmppc_get_htab_fd() fails, its return value is propagated up to
qemu_savevm_state_iterate() or to qemu_savevm_state_complete_precopy().
All savevm handlers expect to receive a negative errno on error.
Let's patch kvmppc_get_htab_fd() accordingly.
While here, let's change htab_load() in the spapr code to also
propagate the error, since it doesn't make sense to abort() if
we couldn't get the htab fd from KVM.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Apple uses an IBM MPIC2A without timers, it has 64 sources.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The timing register exists on all variants of MacIO IDE, we just
store and return its value.
The interrupts register only exists on KeyLargo but it doesn't
hurt to have it. The lack of this register causes MacOS X to
hangs under some circumstances.
Both are 32-bit only. The HW might support smaller access sizes
but no known OS uses them.
Because the core IDE subsystem doesn't provide us with a way
to query the main (level) interrupt state, nor do we have a way
to know that DBDMA issued a (edge) interrupt, we reflect both
through a private pair of qirq's in order to maintain the
register state.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We use 900Mhz, otherwise MacOS X 10.5 refuses to install.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
These registers are present in 440 SoCs (and maybe in others too) and
U-Boot accesses them when printing register info. We don't emulate
these but add them to avoid crashing when they are read or written.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This change fixes conflict with the DragonFly BSD headers.
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Calculating default node-ids for CPUs in possible_cpu_arch_ids()
is rather fragile since defaults calculation uses nb_numa_nodes but
callback might be potentially called early before all -numa CLI
options are parsed, which would lead to cpus assigned only upto
nb_numa_nodes at the time possible_cpu_arch_ids() is called.
Issue was introduced by
(7c88e65 numa: mirror cpu to node mapping in MachineState::possible_cpus)
and for example CLI:
-smp 4 -numa node,cpus=0 -numa node
would set props.node-id in possible_cpus array for every non
explicitly mapped CPU to the first node.
Issue is not visible to guest nor to mgmt interface due to
1) implictly mapped cpus are forced to the first node in
case of partial mapping
2) in case of default mapping possible_cpu_arch_ids() is
called after all -numa options are parsed (resulting
in correct mapping).
However it's fragile to rely on late execution of
possible_cpu_arch_ids(), therefore add machine specific
callback that returns node-id for CPU and use it to calculate/
set defaults at machine_numa_finish_init() time when all -numa
options are parsed.
Reported-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1496314408-163972-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Almost every user of cpu_generic_init() checks for
returned NULL and then reports failure in a custom way
and aborts process.
Some users assume that call can't fail and don't check
for failure, though they should have checked for it.
In either cases cpu_generic_init() failure is fatal,
so instead of checking for failure and reporting
it various ways, make cpu_generic_init() report
errors in consistent way and terminate QEMU on failure.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <1505318697-77161-3-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
QTAILQ_FOREACH_SAFE() must be used when removing the current element
inside the loop block.
This fixes a user-after-free error introduced by commit 5625817423
and reported by Coverity (CID 1381017).
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch removes the qdev_get_machine() calls that are made
in spapr_cpu_core.c in situations where we can get an existing
pointer for the MachineState by either passing it as an argument
to the function or by using other already available pointers.
Credits to Daniel Henrique Barboza for the idea and the changelog
text.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When running a pseries-2.2 or older machine type, we get the following
lines in info mtree:
address-space: memory
...
ffffffffffffffff-ffffffffffffffff (prio 0, i/o): alias
pci@800000020000000.mmio64-alias @pci@800000020000000.mmio
ffffffffffffffff-ffffffffffffffff
address-space: cpu-memory
...
ffffffffffffffff-ffffffffffffffff (prio 0, i/o): alias
pci@800000020000000.mmio64-alias @pci@800000020000000.mmio
ffffffffffffffff-ffffffffffffffff
The same thing occurs when running a pseries-2.7 with
-global spapr-pci-host-bridge.mem_win_size=2147483648
This happens because we always create a 64-bit MMIO window, even if
we didn't explicitely requested it (ie, mem64_win_size == 0) and the
32-bit window is below 2GiB. It doesn't seem to have an impact on the
guest though because spapr_populate_pci_dt() doesn't advertise the
bogus windows when mem64_win_size == 0.
Since these memory regions don't induce any state, we can safely
choose to not create them when their address is equal to -1,
without breaking migration from existing setups.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since commit 7cca3e466e ("ppc: spapr: Move VCPU ID calculation into
sPAPR"), QEMU aborts when started with a *-spapr-cpu-core device and
a non-pseries machine.
Let's rely on the already existing call to object_dynamic_cast() instead
of using the SPAPR_MACHINE() macro.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
libfdt failures when creating the FDT should cause QEMU to terminate.
Let's use the _FDT() macro which does just that instead of propagating
the error to the caller. spapr_populate_pci_child_dt() no longer needs
to return a value in this case.
Note that, on the way, this get rids of the following nonsensical lines:
g_assert(!ret);
if (ret) {
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
All other users in hw/ppc already consider an error when building
the FDT to be fatal, even on hotplug paths. There's no valid reason
for spapr_pci to behave differently. So let's used the common _FDT()
helper which terminates QEMU when libfdt fails.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The OV5_MMU_RADIX_300 requires special handling in the CAS negotiation
process. It is cleared from the option vector of the guest before
evaluating the changes and re-added later. But, when testing for a
possible CAS reset :
spapr->cas_reboot = spapr_ovec_diff(ov5_updates,
ov5_cas_old, spapr->ov5_cas);
the bit OV5_MMU_RADIX_300 will each time be seen as removed from the
previous OV5 set, hence generating a reset loop.
Fix this problem by also clearing the same bit in the ov5_cas_old set.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility or
in XIVE exploitation mode. Now that we have initial guest support for
the XIVE interrupt controller, let's fix the bits definition which have
evolved in the latest specs.
The platform advertises the XIVE Exploitation Mode support using the
property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 :
- 0b00 XIVE legacy mode Only
- 0b01 XIVE exploitation mode Only
- 0b10 XIVE legacy or exploitation mode
The OS asks for XIVE Exploitation Mode support using the property
"ibm,architecture-vec-5", byte 23 bits 0-1:
- 0b00 XIVE legacy mode Only
- 0b01 XIVE exploitation mode Only
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit b55d295e3e added the possibility to support HPT resizing with KVM.
In the case of PR, we need to pass the userspace address of the HPT to KVM
using the SDR1 slot.
This is handled by kvmppc_update_sdr1() which uses CPU_FOREACH() to update
all CPUs. It is hence not needed to call kvmppc_update_sdr1() for each CPU.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Building strings with g_strdup_printf() instead of snprintf() is
a QEMU common practice.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_phb_get_loc_code() either returns a non-null pointer, or aborts
if g_strdup_printf() failed to allocate memory.
Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Grammatical fix to commit message]
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
g_strdup_printf() either returns a non-null pointer, or aborts if it
failed to allocate memory.
Signed-off-by: Greg Kurz <groug@kaod.org>
[dwg: Grammatical fix to commit message]
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch removes the qdev_get_machine() calls that are made in
spapr.c in situations where we can get an existing pointer for
the MachineState by either passing it as an argument to the function
or by using other already available pointers.
The following changes were made:
- spapr_node0_size: static function that is called two times:
at spapr_setup_hpt_and_vrma and ppc_spapr_init. In both cases we can
pass an existing MachineState pointer to it.
- spapr_build_fdt: MachineState pointer can be retrieved from
the existing sPAPRMachineState pointer.
- spapr_boot_set: the opaque in the first arg is a sPAPRMachineState
pointer as we can see inside ppc_spapr_init:
qemu_register_boot_set(spapr_boot_set, spapr);
We can get a MachineState pointer from it.
- spapr_machine_device_plug and spapr_machine_device_unplug_request: the
MachineState, sPAPRMachineState, MachineClass and sPAPRMachineClass pointers
can all be retrieved from the HotplugHandler pointer.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Move the calculation of a CPU's VCPU ID out of the generic PPC code
(ppc_cpu_realizefn()) and into sPAPR specific code
(spapr_cpu_core_realize()) where it belongs.
Unfortunately, due to the way things are ordered, we still need to
default the VCPU ID in ppc_cpu_realizfn() but at least doing that
doesn't require any interaction with sPAPR.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
PPC handles -cpu FOO rather incosistently,
i.e. it does case-insensitive matching of FOO to
a CPU type (see: ppc_cpu_compare_class_name) but
handles alias names as case-sensitive, as result:
# qemu-system-ppc64 -M mac99 -cpu g3
qemu-system-ppc64: unable to find CPU model ' kN�U'
# qemu-system-ppc64 -cpu 970MP_V1.1
qemu-system-ppc64: Unable to find sPAPR CPU Core definition
while
# qemu-system-ppc64 -M mac99 -cpu G3
# qemu-system-ppc64 -cpu 970MP_v1.1
start up just fine.
Considering we can't take case-insensitive matching away,
make it case-insensitive for all alias/type/core_type
lookups.
As side effect it allows to remove duplicate core types
which are the same except of using different cased letters in name.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
KVM now allows writing to KVM_CAP_PPC_SMT which has previously been
read only. Doing so causes KVM to act, for that VM, as if the host's
SMT mode was the given value. This is particularly important on Power
9 systems because their default value is 1, but they are able to
support values up to 8.
This patch introduces a way to control this capability via a new
machine property called VSMT ("Virtual SMT"). If the value is not set
on the command line a default is chosen that is, when possible,
compatible with legacy systems.
Note that the intialization of KVM_CAP_PPC_SMT has changed slightly
because it has changed (in KVM) from a global capability to a
VM-specific one. This won't cause a problem on older KVMs because VM
capabilities fall back to global ones.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
KVM PR doesn't allow to set a compat mode. This causes ppc_set_compat_all()
to fail and we return H_HARDWARE to the guest right away.
This is excessive: even if we favor compat mode since commit 152ef803ce,
we should at least fallback to raw mode if the guest supports it.
This patch modifies cas_check_pvr() so that it also reports that the real
PVR was found in the table supplied by the guest. Note that this is only
makes sense if raw mode isn't explicitely disabled (ie, the user didn't
set the machine "max-cpu-compat" property). If this is the case, we can
simply ignore ppc_set_compat_all() failures, and let the guest run in raw
mode.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU currently crashes when the user tries to add a spapr-cpu-core
on a non-pseries machine:
$ qemu-system-ppc64 -S -machine ppce500,accel=tcg \
-device POWER5+_v2.1-spapr-cpu-core
hw/ppc/spapr_cpu_core.c:178:spapr_cpu_core_realize_child:
Object 0x55cee1f55160 is not an instance of type spapr-machine
Aborted (core dumped)
So let's add a proper check for the correct machine time with
a more friendly error message here.
Reported-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Make these device models available outside ppc405_uc.c for reuse in
460EX emulation. They are left in their current place for now because
they are used mostly unchanged and I'm not sure these correctly model
the components in 440 SoCs (but they seem to be good enough). These
functions could be moved in a subsequent clean up series when this is
confirmed.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This device appears in other SoCs as well not just in 405 ones and
subsequent patches will modify it, so move it out of ppc405_uc.c in
preparation
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Allow MAL with more RX and TX channels as found in newer versions.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This device appears in other SoCs as well not just in 405 ones
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This replaces g_malloc() with spapr_tce_alloc_table() as this is
the standard way of allocating tables and this allows moving the table
back to KVM when unplugging a VFIO PCI device and VFIO TCE acceleration
support is not present in the KVM.
Although spapr_tce_alloc_table() is expected to fail with EBUSY
if called when previous fd is not closed yet, in practice we will not
see it because cap_spapr_vfio is false at the moment.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The concept of a VCPU ID that differs from the CPU's index
(cpu->cpu_index) exists only within SPAPR machines so, move the
functions ppc_get_vcpu_id() and ppc_get_cpu_by_vcpu_id() into spapr.c
and rename them appropriately.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This field actually records the VCPU ID used by KVM and, although the
value is also used in the device tree it is primarily the VCPU ID so
rename it as such.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Updated comment missed in cpu.h]
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The e500 platform code uses the function ppc_get_vcpu_dt_id() to get
an id to put in its device tree. Which seems like it makes sense, but
ppc_get_vcpu_dt_id() is actually badly named - it only differs from
cpu_index in cases where you're running on KVM HV and the host's
number of threads differs from the guests. Since KVM HV only supports
PAPR, not e500, it doesn't make sense to use it here.
Simply use the cpu_index instead (which is 'i' in this context
because qemu_get_cpu(i) returns the cpu with cpu_index == i).
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
[dwg: Rewrote commit message]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
TCE table objects attach themselves to an owner as a child
property. unref afterward to allow them to be finalized
when their owner is finalized.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
DRC objects attach themselves to an owner as a child
property. unref afterward to allow them to be finalized
when their owner is finalized.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When hot-unplugging a PHB, all its PCI DRC connectors get unrealized. This
patch adds an unrealize method to the physical DRC class, in order to undo
registrations performed in realize_physical().
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This memory region should be owned by the PHB. This ensures the PHB
cannot be finalized as long as the the region is guest visible, or
used by a CPU or a device.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Passing a stack allocated buffer of arbitrary length to snprintf()
without checking the return value can cause the resultant strings
to be silently truncated.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Passing a stack allocated buffer of arbitrary length to snprintf()
without checking the return value can cause the resultant strings
to be silently truncated.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Passing a null priority to memory_region_add_subregion_overlap() is
strictly equivalent to calling memory_region_add_subregion().
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch is a follow up on the discussions made in patch
"hw/ppc: disable hotplug before CAS is completed" that can be
found at [1].
At this moment, we do not support CPU/memory hotplug in early
boot stages, before CAS. When a hotplug occurs, the event is logged
in an internal RTAS event log queue and an IRQ pulse is fired. In
regular conditions, the guest handles the interrupt by executing
check_exception, fetching the generated hotplug event and enabling
the device for use.
In early boot, this IRQ isn't caught (SLOF does not handle hotplug
events), leaving the event in the rtas event log queue. If the guest
executes check_exception due to another hotplug event, the re-assertion
of the IRQ ends up de-queuing the first hotplug event as well. In short,
a device hotplugged before CAS is considered coldplugged by SLOF.
This leads to device misbehavior and, in some cases, guest kernel
Ooops when trying to unplug the device.
A proper fix would be to turn every device hotplugged before CAS
as a colplugged device. This is not trivial to do with the current
code base though - the FDT is written in the guest memory at
ppc_spapr_reset and can't be retrieved without adding extra state
(fdt_size for example) that will need to managed and migrated. Adding
the hotplugged DT in the middle of CAS negotiation via the updated DT
tree works with CPU devs, but panics the guest kernel at boot. Additional
analysis would be necessary for LMBs and PCI devices. There are
questions to be made in QEMU/SLOF/kernel level about how we can make
this change in a sustainable way.
With Linux guests, a fix would be the kernel executing check_exception
at boot time, de-queueing the events that happened in early boot and
processing them. However, even if/when the newer kernels start
fetching these events at boot time, we need to take care of older
kernels that won't be doing that.
This patch works around the situation by issuing a CAS reset if a hotplugged
device is detected during CAS:
- the DRC conditions that warrant a CAS reset is the same as those that
triggers a DRC migration - the DRC must have a device attached and
the DRC state is not equal to its ready_state. With that in mind, this
patch makes use of 'spapr_drc_needed' to determine if a CAS reset
is needed.
- In the middle of CAS negotiations, the function
'spapr_hotplugged_dev_before_cas' goes through all the DRCs to see
if there are any DRC that requires a reset, using spapr_drc_needed. If
that happens, returns '1' in 'spapr_h_cas_compose_response' which will set
spapr->cas_reboot to true, causing the machine to reboot.
No changes are made for coldplug devices.
[1] http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg02855.html
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The sPAPR machine isn't clearing up the pending events QTAILQ on
machine reboot. This allows for unprocessed hotplug/epow events
to persist in the queue after reset and, when reasserting the IRQs in
check_exception later on, these will be being processed by the OS.
This patch implements a new function called 'spapr_clear_pending_events'
that clears up the pending_events QTAILQ. This helper is then called
inside ppc_spapr_reset to clear up the events queue, preventing
old/deprecated events from persisting after a reset.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch makes a small fix in 'spapr_drc_needed' to change how we detect
if a DRC has a device attached. Previously it used dr_entity_sense for this,
which works for physical DRCs.
However, for logical DRCs, it didn't cover the case where a logical DRC has
a drc->dev but the state is LOGICAL_UNUSABLE (e.g. a hotplugged CPU before
CAS). In this case, the dr_entity_sense of this DRC returns UNUSABLE and the
code was considering that there were no dev attached, making spapr_drc_needed
return 'false' when in fact we would like to migrate the DRC.
Changing it to check for drc->dev instead works for all DRC types.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
it's just a wrapper, drop it and use cpu_generic_init() directly
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1503592308-93913-26-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
QEMU currently aborts unexpectedly when the user tries to add and
remove a "spapr-tce-table" device:
$ qemu-system-ppc64 -nographic -S -nodefaults -monitor stdio
QEMU 2.9.92 monitor - type 'help' for more information
(qemu) device_add spapr-tce-table,id=x
(qemu) device_del x
**
ERROR:qemu/qdev-monitor.c:872:qdev_unplug: assertion failed: (hotplug_ctrl)
Aborted (core dumped)
The device should not be accessable for the users at all, it's just
used internally, so mark it with user_creatable = false.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU currently aborts unexpectedly when a user tries to do something
like this:
$ qemu-system-ppc64 -nographic -S -nodefaults -monitor stdio
QEMU 2.9.92 monitor - type 'help' for more information
(qemu) device_add spapr-rtc,id=spapr-rtc
(qemu) device_del spapr-rtc
**
ERROR:qemu/qdev-monitor.c:872:qdev_unplug: assertion failed: (hotplug_ctrl)
Aborted (core dumped)
The RTC device is not meant to be hot-pluggable - it's an internal
device only and it even should not be possible to create it a
second time with the "-device" parameter, so let's mark this
with "user_creatable = false".
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
QEMU currently crashes when trying to use a 'pc-dimm' on the pseries
machine without specifying its 'memdev' property. This happens because
pc_dimm_get_memory_region() does not check whether the 'memdev' property
has properly been set by the user. Looking closer at this function, it's
also obvious that it is using &error_abort to call another function - and
this is bad in a function that is used in the hot-plugging calling chain
since this can also cause QEMU to exit unexpectedly.
So let's fix these issues in a proper way now: Add a "Error **errp"
parameter to pc_dimm_get_memory_region() which we use in case the 'memdev'
property has not been set by the user, and which we can use instead of
the &error_abort, and change the callers of get_memory_region() to make
use of this "errp" parameter for proper error checking.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In case of in-kernel memory hot unplug, when the guest is not able
to remove all the LMBs that are requested for removal, it will add back
any LMBs that have been successfully removed. The DR Connectors of
these LMBs wouldn't have been unconfigured and hence the addition of
these LMBs will result in configure-connector call being issued on
LMB DR connectors that are already in configured state. Such
configure-connector calls will fail resulting in a DIMM which is
partially unplugged.
This however worked till recently before we overhauled the DRC
implementation in QEMU. Commit 9d4c0f4f0a71e: "spapr: Consolidate
DRC state variables" is the first commit where this problem shows up
as per git bisect.
Ideally guest shouldn't be issuing configure-connector call on an
already configured DR connector. However for now, work around this in
QEMU by allowing configure-connector to be called multiple times for
all types of DR connectors.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
[dwg: Corrected buglet that would have initialized fdt pointers ready
for reading on a device not present at reset]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The unicast case in h_signal_sys_reset() seems to be broken:
rather than selecting the target CPU, it looks like it will pick
either the first CPU or fail to find one at all.
Fix it by using the search function rather than open coding the
search.
This was found by inspection; the code appears to be unused because
the Linux kernel only uses the broadcast target.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
object_property_add_child() can only fail in two cases:
- the child already has a parent, which shouldn't happen since the DRC was
allocated a few lines above
- the parent already has a child with the same name, which would mean the
caller tries to create a DRC that already exists
In both case, this is a QEMU bug and we should abort.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The only exception are groups of numers separated by symbols
'.', ' ', ':', '/', like 'ab.09.7d'.
This patch is made by the following:
> find . -name trace-events | xargs python script.py
where script.py is the following python script:
=========================
#!/usr/bin/env python
import sys
import re
import fileinput
rhex = '%[-+ *.0-9]*(?:[hljztL]|ll|hh)?(?:x|X|"\s*PRI[xX][^"]*"?)'
rgroup = re.compile('((?:' + rhex + '[.:/ ])+' + rhex + ')')
rbad = re.compile('(?<!0x)' + rhex)
files = sys.argv[1:]
for fname in files:
for line in fileinput.input(fname, inplace=True):
arr = re.split(rgroup, line)
for i in range(0, len(arr), 2):
arr[i] = re.sub(rbad, '0x\g<0>', arr[i])
sys.stdout.write(''.join(arr))
=========================
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Message-id: 20170731160135.12101-5-vsementsov@virtuozzo.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
With the move of some docs/ to docs/devel/ on ac06724a71,
no references were updated.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This reverts commit b87680427e.
I thought this was a harmless preliminary for XIVE enablement patches
we expect later on. However, due to some subtle interactions between
qemu and SLOF (guest firmware) this breaks some things. Revert it for
now, we'll work out how to fix it when the rest of the XIVE patches
are ready.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If object_property_add_alias() returns an error in realize(), we should
propagate it to the caller and certainly not unref the DRC.
Same thing goes for unrealize(). Since object_property_del() is the last
call, we can even get rid of the intermediate Error *.
And finally, unrealize() should undo all registrations performed by
realize().
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 0cffce56 (hw/ppc/spapr.c: adding pending_dimm_unplugs to
sPAPRMachineState) introduced a new way to track pending LMBs of DIMM
device that is marked for removal. Since this commit we can hit the
assert in spapr_pending_dimm_unplugs_add() in the following situation:
- DIMM device removal fails as the guest doesn't allow the removal.
- Subsequent attempt to remove the same DIMM would hit the assert
as the corresponding sPAPRDIMMState is still part of the
pending_dimm_unplugs list.
Fix this by removing the assert and conditionally adding the
sPAPRDIMMState to pending_dimm_unplugs list only when it is not
already present.
Fixes: 0cffce56ae
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
[dwg: Tweaked to avoid returning NULL when spapr_pending_dimm_unplugs_add()
does find an existing entry]
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 3a38429 ("spapr: Add a "no HPT" encoding to HTAB migration stream")
allows to migrate an empty HPT, but doesn't mark correctly the
end of the migration stream.
The end condition (value returned by htab_save_iterate())
should be 1, whereas in 3a38429 it returns 0.
The problem can be reproduced with QEMU monitor command "savevm":
the command never stops and the disk image grows without limit.
Fixes: 3a38429748
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
f1c2dc7c86 "spapr-pci: rework MSI/MSIX" (07/2013) changed MSIX encoding
but forgot to change the comment so this changes it.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Make visit_type_null() take an @obj argument like its buddies. This
helps keep the next commit simple.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
This pull requests supersedes the one from 2017-07-14. That one had a
couple of subtle regressions: there was a build error for mingw32, and
an instance_size which was theoretically wrong everywhere, but only
actually bit on the Travis OSX build.
There are two major batches in this set, rather than the usual
collection of assorted fixes.
* More DRC cleanup. This gets the state management into a state
which should fix many of the hotplug+migration problems we've
had. Plus it gets the migration stream format into something
well defined and pretty minimal which we can reasonably support
into the future.
* Hashed Page Table resizing. It's been a while since this was
posted, but it's been through several previous rounds of review.
The kernel parts (both guest and host) are merged in 4.11, so
this is the only remaining piece left to allow resizing of the
HPT in a running guest.
There are also a handful of unrelated fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAllsWwQACgkQbDjKyiDZ
s5LMnA//dpoqWrTPiEmx2DsXMkjLefn/2Yl1dkQDzhyb7v+tNGFYmxpbb7nPRfJE
tfvcKu1Tz23NPOp6+1VC9eTyTO1YOXTgvQrNSbF1MmIg4PGN6s2DHrLviAqCS15M
29x6+RdRaeLUSCsk8elsViiWb8h7cISDuN0SMA0WWjWP3bO/drz5nq5z5dRgdVFe
Z5O0qwDNoN0NypJ68Cld+riP1uDAYMONPxA0QOWCLx8qowoJ3hYMuyNnqBQU5OJn
PpAA3EfdxkN6rtaBjDt7xHkJfm9Xkm9SsT8qTcj/R2JjkENef8EbzrdjFE+pSVz0
7c9C4evgYgmhUCUFvnZfgN+VBL1lS/p5UGnFPyNQ7KbSXDE71OAgWH/f/7kzsJPy
MxbJWM6eUN9Ny0APxM8olLV1FM4GzEoCSLfDVhStrdJ6P5wBmjLSugqSOLB8aMtd
8NwBY06nTpmo9xXGz9enLUWlpSeoReKU3TxvQvY+JcOWWpasDZOO4zD8B3bdLbA/
I8jdkH5Vs0pyPLaWD+1FxlQvlF45CuwpwoiAz00V2XkkMu8jKCGsQ0iuqXorSqvs
/7tQ1pHlUybAX+5W9raaJmphgc4gk33P3PlQCjhgYzxRu4yzRsEzS9hahoO/TAmq
Y70CooZaaeGNOBEDcKLZEzJdBr52cqW4MM8t1xHWTg3VCHJGeYI=
=O6NQ
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170717' into staging
ppc patch queue 2017-07-17
This pull requests supersedes the one from 2017-07-14. That one had a
couple of subtle regressions: there was a build error for mingw32, and
an instance_size which was theoretically wrong everywhere, but only
actually bit on the Travis OSX build.
There are two major batches in this set, rather than the usual
collection of assorted fixes.
* More DRC cleanup. This gets the state management into a state
which should fix many of the hotplug+migration problems we've
had. Plus it gets the migration stream format into something
well defined and pretty minimal which we can reasonably support
into the future.
* Hashed Page Table resizing. It's been a while since this was
posted, but it's been through several previous rounds of review.
The kernel parts (both guest and host) are merged in 4.11, so
this is the only remaining piece left to allow resizing of the
HPT in a running guest.
There are also a handful of unrelated fixes.
# gpg: Signature made Mon 17 Jul 2017 07:36:52 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.10-20170717: (21 commits)
target/ppc: fix CPU hotplug when radix is enabled (TCG)
spapr: fix memory leak in spapr_core_pre_plug()
pseries: Allow HPT resizing with KVM
pseries: Use smaller default hash page tables when guest can resize
pseries: Enable HPT resizing for 2.10
pseries: Implement HPT resizing
pseries: Stubs for HPT resizing
ppc/pnv: Remove unused XICSState reference
spapr: fix potential memory leak in spapr_core_plug()
spapr: Implement DR-indicator for physical DRCs only
spapr: Remove sPAPRConfigureConnectorState sub-structure
spapr: Consolidate DRC state variables
spapr: Cleanups relating to DRC awaiting_release field
spapr: Refactor spapr_drc_detach()
spapr: Abort on delete failure in spapr_drc_release()
spapr: Simplify unplug path
spapr: Remove 'awaiting_allocation' DRC flag
spapr: Treat devices added before inbound migration as coldplugged
spapr: Minor cleanups to events handling
spapr: migrate pending_events of spapr state
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In case of error, we must ensure the dynamically allocated base_core_type
is freed, like it is done everywhere else in this function.
This is a regression introduced in QEMU 2.9 by commit 8149e2992f.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
So far, qemu implements the PAPR Hash Page Table (HPT) resizing extension
with TCG. The same implementation will work with KVM PR, but we don't
currently allow that. For KVM HV we can only implement resizing with the
assistance of the host kernel, which needs a new capability and ioctl()s.
This patch adds support for testing the new KVM capability and implementing
the resize in terms of KVM facilities when necessary. If we're running on
a kernel which doesn't have the new capability flag at all, we fall back to
testing for PR vs. HV KVM using the same hack that we already use in a
number of places for older kernels.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We've now implemented a PAPR extension allowing PAPR guest to resize
their hash page table (HPT) during runtime.
This patch makes use of that facility to allocate smaller HPTs by default.
Specifically when a guest is aware of the HPT resize facility, qemu sizes
the HPT to the initial memory size, rather than the maximum memory size on
the assumption that the guest will resize its HPT if necessary for hot
plugged memory.
When the initial memory size is much smaller than the maximum memory size
(a common configuration with e.g. oVirt / RHEV) then this can save
significant memory on the HPT.
If the guest does *not* advertise HPT resize awareness when it makes the
ibm,client-architecture-support call, qemu resizes the HPT for maxmimum
memory size (unless it's been configured not to allow such guests at all).
For now we make that reallocation assuming the guest has not yet used the
HPT at all. That's true in practice, but not, strictly, an architectural
or PAPR requirement. If we need to in future we can fix this by having
the client-architecture-support call reboot the guest with the revised
HPT size (the client-architecture-support call is explicitly permitted to
trigger a reboot in this way).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
We've now implemented a PAPR extensions which allows PAPR guests (i.e.
"pseries" machine type) to resize their hash page table during runtime.
However, that extension is only enabled if explicitly chosen on the
command line. This patch enables it by default for spapr-2.10, but leaves
it disabled (by default) for older machine types.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table. This will eventually allow for more flexible memory
hotplug.
The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function. The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.
The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion. If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.
The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT. The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).
For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT. This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).
In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs. That's a project for another day, but should be possible
without any changes to the guest interface.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This introduces stub implementations of the H_RESIZE_HPT_PREPARE and
H_RESIZE_HPT_COMMIT hypercalls which we hope to add in a PAPR
extension to allow run time resizing of a guest's hash page table. It
also adds a new machine property for controlling whether this new
facility is available.
For now we only allow resizing with TCG, allowing it with KVM will require
kernel changes as well.
Finally, it adds a new string to the hypertas property in the device
tree, advertising to the guest the availability of the HPT resizing
hypercalls. This is a tentative suggested value, and would need to be
standardized by PAPR before being merged.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Since commit 5c1da81215 ("spapr: Remove unnecessary differences between
hotplug and coldplug paths"), the CPU DT for the DRC is always allocated.
This causes a memory leak for pseries-2.6 and older machine types, that
don't support CPU hotplug and don't allocate DRCs for CPUs.
Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
According to PAPR, the DR-indicator should only be valid for physical DRCs,
not logical DRCs. At the moment we implement it for all DRCs, so restrict
it to physical ones only.
We move the state to the physical DRC subclass, which means adding some
QOM boilerplate to handle the newly distinct type.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Most of the time, the state of a DRC object is contained in the single
'state' variable. However, during the transition from UNISOLATE to
CONFIGURED state requires multiple calls to the ibm,configure-connector
RTAS call to retrieve the device tree for the attached device. We need
some extra state to keep track of where we're up to in delivering the
device tree information to the guest.
Currently that extra state is in a sPAPRConfigureConnectorState
substructure which is only allocated when we're in the middle of the
configure connector process. That sounds like a good idea, but the extra
state is only two integers - on many platforms that will take up the same
room as the (maybe NULL) ccs pointer even before malloc() overhead. Plus
it's another object whose lifetime we need to manage. In short, it's not
worth it.
So, fold the sPAPRConfigureConnectorState substructure directly into the
DRC object.
Previously the structure was allocated lazily when the configure-connector
call discovers it's not there. Now, we need to initialize the subfields
pre-emptively, as soon as we enter UNISOLATE state.
Although it's not strictly necessary (the field values should only ever
be consulted when in UNISOLATE state), we try to keep them at -1 when in
other states, as a debugging aid.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Each DRC has three fields describing its state: isolation_state,
allocation_state and configured. At first this seems like a reasonable
representation, since its based directly on the PAPR defined
isolation-state and allocation-state indicators. However:
* Only a few combinations of the two fields' values are permitted
* allocation_state isn't used at all for physical DRCs
* The indicators are write only so they don't really have a well
defined current value independent of each other
This replaces these variables with a single state variable, whose names
and numbers are based on the diagram in LoPAPR section 13.4. Along with
this we add code to check the current state on various operations and make
sure the requested transition is permitted.
Strictly speaking, this makes guest visible changes to behaviour (since we
probably allowed some transitions we shouldn't have before). However, a
hypothetical guest broken by that wasn't PAPR compliant, and probably
wouldn't have worked under PowerVM.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
'awaiting_release' indicates that the host has requested an unplug of the
device attached to the DRC, but the guest has not (yet) put the device
into a state where it is safe to complete removal.
1. Rename it to 'unplug_requested' which to me at least is clearer
2. Remove the ->release_pending() method used to check this from outside
spapr_drc.c. The method only plausibly has one implementation, so use
a plain function (spapr_drc_unplug_requested()) instead.
3. Remove it from the migration stream. Attempting to migrate mid-unplug
is broken not just for spapr - in general management has no good way to
determine if the device should be present on the destination or not. So,
until that's fixed, there's no point adding extra things to the stream.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
This function has two unused parameters - remove them.
It also sets awaiting_release on all paths, except one. On that path
setting it is harmless, since it will be immediately cleared by
spapr_drc_release(). So factor it out of the if statements.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
We currently ignore errors from the object_property_del() in
spapr_drc_release(). But the only way that could fail is if the property
doesn't exist, in which case it's a bug that we're in spapr_drc_release()
at all. So change from ignoring to abort()ing on errors.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_lmb_release() and spapr_core_release() call hotplug_handler_unplug()
which after a bunch of indirection calls spapr_memory_unplug() or
spapr_core_unplug(). But we already know which is the appropriate thing
to call here, so we can just fold it directly into the release function.
Once that's done, there's no need for an hc->unplug method in the spapr
machine at all: since we also have an hc->unplug_request method, the
hotplug core will never use ->unplug.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
The awaiting_allocation flag in the DRC was introduced by aab9913
"spapr_drc: Prevent detach racing against attach for CPU DR", allegedly to
prevent a guest crash on racing attach and detach. Except.. information
from the BZ actually suggests a qemu crash, not a guest crash. And there
shouldn't be a problem here anyway: if the guest has already moved the DRC
away from UNUSABLE state, the detach would already be deferred, and if it
hadn't it should be safe to detach it (the guest should fail gracefully
when it attempts to change the allocation state).
I think this was probably just a bandaid for some other problem in the
state management. So, remove awaiting_allocation and associated code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
When migrating a guest which has already had devices hotplugged,
libvirt typically starts the destination qemu with -incoming defer,
adds those hotplugged devices with qmp, then initiates the incoming
migration.
This causes problems for the management of spapr DRC state. Because
the device is treated as hotplugged, it goes into a DRC state for a
device immediately after it's plugged, but before the guest has
acknowledged its presence. However, chances are the guest on the
source machine *has* acknowledged the device's presence and configured
it.
If the source has fully configured the device, then DRC state won't be
sent in the migration stream: for maximum migration compatibility with
earlier versions we don't migrate DRCs in coldplug-equivalent state.
That means that the DRC effectively changes state over the migrate,
causing problems later on.
In addition, logging hotplug events for these devices isn't what we
want because a) those events should already have been issued on the
source host and b) the event queue should get wiped out by the
incoming state anyway.
In short, what we really want is to treat devices added before an
incoming migration as if they were coldplugged.
To do this, we first add a spapr_drc_hotplugged() helper which
determines if the device is hotplugged in the sense relevant for DRC
state management. We only send hotplug events when this is true.
Second, when we add a device which isn't hotplugged in this sense, we
force a reset of the DRC state - this ensures the DRC is in a
coldplug-equivalent state (there isn't usually a system reset between
these device adds and the incoming migration).
This is based on an earlier patch by Laurent Vivier, cleaned up and
extended.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
The rtas_error_log structure is marked packed, which strongly suggests its
precise layout is important to match an external interface. Along with
that one could expect it to have a fixed endianness to match the same
interface. That used to be the case - matching the layout of PAPR RTAS
event format and requiring BE fields.
Now, however, it's only used embedded within sPAPREventLogEntry with the
fields in native order, since they're processed internally.
Clear that up by removing the nested structure in sPAPREventLogEntry.
struct rtas_error_log is moved back to spapr_events.c where it is used as
a temporary to help convert the fields in sPAPREventLogEntry to the correct
in memory format when delivering an event to the guest.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In racing situations between hotplug events and migration operation,
a rtas hotplug event could have not yet be delivered to the source
guest when migration is started. In this case the pending_events of
spapr state need be transmitted to the target so that the hotplug
event can be finished on the target.
To achieve the minimal VMSD possible to migrate the pending_events list,
this patch makes the changes in spapr_events.c:
- 'log_type' of sPAPREventLogEntry struct deleted. This information can be
derived by inspecting the rtas_error_log summary field. A new function
called 'spapr_event_log_entry_type' was added to retrieve the type of
a given sPAPREventLogEntry.
- sPAPREventLogEntry, epow_log_full and hp_log_full were redesigned. The
only data we're going to migrate in the VMSD is the event log data itself,
which can be divided in two parts: a rtas_error_log header and an extended
event log field. The rtas_error_log header contains information about the
size of the extended log field, which can be used inside VMSD as the size
parameter of the VBUFFER_ALOC field that will store it. To allow this use,
the header.extended_length field must be exposed inline to the VMSD instead
of embedded into a 'data' field that holds everything. With this in mind,
the following changes were done:
* a new 'header' field was added to sPAPREventLogEntry. This field holds a
a struct rtas_error_log inline.
* the declaration of the 'rtas_error_log' struct was moved to spapr.h
to be visible to the VMSD macros.
* 'data' field of sPAPREventLogEntry was renamed to 'extended_log' and
now holds only the contents of the extended event log.
* 'struct rtas_error_log hdr' were taken away from both epow_log_full
and hp_log_full. This information is now available at the header field of
sPAPREventLogEntry.
* epow_log_full and hp_log_full were renamed to epow_extended_log and
hp_extended_log respectively. This rename makes it clearer to understand
the new purpose of both structures: hold the information of an extended
event log field.
* spapr_powerdown_req and spapr_hotplug_req_event now creates a
sPAPREventLogEntry structure that contains the full rtas log entry.
* rtas_event_log_queue and rtas_event_log_dequeue now receives a
sPAPREventLogEntry pointer as a parameter instead of a void pointer.
- the endianess of the sPAPREventLogEntry header is now native instead
of be32. We can use the fields in native endianess internally and write
them in be32 in the guest physical memory inside 'check_exception'. This
allows the VMSD inside spapr.c to read the correct size of the
entended_log field.
- inside spapr.c, pending_events is put in a subsection in the spapr state
VMSD to make sure migration across different versions is not broken.
A small change in rtas_event_log_queue and rtas_event_log_dequeue were also
made: instead of calling qdev_get_machine(), both functions now receive
a pointer to the sPAPRMachineState. This pointer is already available in
the callers of these functions and we don't need to waste resources
calling qdev() again.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
All the DRC subtypes explicitly list instance_size in TypeInfo (all as
sizeof(sPAPRDRConnector). This isn't necessary, since if it's not listed
it will be derived from the parent type.
Worse, this is dangerous, because if a subtype is changed in future to
have a larger structure, then subtypes of that subtype also need to have
instance_size changed, or it will lead to hard to track memory corruption
bugs.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Use the new functions memory_region_init_{ram,rom,rom_device}()
instead of manually calling the _nomigrate() version and then
vmstate_register_ram_global().
Patch automatically created using coccinelle script:
spatch --in-place -sp_file scripts/coccinelle/memory-region-init-ram.cocci -dir hw
(As it turns out, there are no instances of the rom and
rom_device functions that are caught by this script.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1499438577-7674-8-git-send-email-peter.maydell@linaro.org
Rename memory_region_init_ram() to memory_region_init_ram_nomigrate().
This leaves the way clear for us to provide a memory_region_init_ram()
which does handle migration.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1499438577-7674-4-git-send-email-peter.maydell@linaro.org
This finishes QOM'fication of IOMMUMemoryRegion by introducing
a IOMMUMemoryRegionClass. This also provides a fastpath analog for
IOMMU_MEMORY_REGION_GET_CLASS().
This makes IOMMUMemoryRegion an abstract class.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170711035620.4232-3-aik@ozlabs.ru>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion
as a parent.
This moves IOMMU-related fields from MR to IOMMU MR. However to avoid
dymanic QOM casting in fast path (address_space_translate, etc),
this adds an @is_iommu boolean flag to MR and provides new helper to
do simple cast to IOMMU MR - memory_region_get_iommu. The flag
is set in the instance init callback. This defines
memory_region_is_iommu as memory_region_get_iommu()!=NULL.
This switches MemoryRegion to IOMMUMemoryRegion in most places except
the ones where MemoryRegion may be an alias.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20170711035620.4232-2-aik@ozlabs.ru>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Convert all uses of error_report("warning:"... to use warn_report()
instead. This helps standardise on a single method of printing warnings
to the user.
All of the warnings were changed using these two commands:
find ./* -type f -exec sed -i \
's|error_report(".*warning[,:] |warn_report("|Ig' {} +
Indentation fixed up manually afterwards.
The test-qdev-global-props test case was manually updated to ensure that
this patch passes make check (as the test cases are case sensitive).
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Suggested-by: Thomas Huth <thuth@redhat.com>
Cc: Jeff Cody <jcody@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Lieven <pl@kamp.de>
Cc: Josh Durgin <jdurgin@redhat.com>
Cc: "Richard W.M. Jones" <rjones@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Peter Chubb <peter.chubb@nicta.com.au>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Alexander Graf <agraf@suse.de>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Greg Kurz <groug@kaod.org>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed by: Peter Chubb <peter.chubb@data61.csiro.au>
Acked-by: Max Reitz <mreitz@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Message-Id: <e1cfa2cd47087c248dd24caca9c33d9af0c499b0.1499866456.git.alistair.francis@xilinx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
* Several minor cleanups from Greg Kurz
* Fix for migration of pseries-2.7 and earlier machine types
* More reworking of the DRC hotplug code, fixing several problems
though there are still more to go
* Fixes for CPU family / alias handling on POWER9
* Preliminary patches for POWER9 XIVE (new interrupt controller)
support
* Assorted other fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZZFWEAAoJEGw4ysog2bOSxgAQAI85Vv8RuK1mgN0w0aIguP09
JIM+iZ3zJwSFM3A/D8CnWxMGEQkjkVfKWT8cB97v5vPGTu21WD2hdQ26ZrcjC8Do
Y5sPuCGRRSZvz+tnz17HU2aZMQwteNNgdes9MGr61kdVUk+1uvcyqTdhqxka5rF7
SYcIEf95+Fcu00+bhwGaGg0ZXHer4rSTjDXbT3CcxT64sgQW8X36SceFBkFH0P40
tX1bn9gdQgBNOT11O0MNeq6ewxHhSSusTwyYXpHTvK6p0EXPqfm+vM9dQSmXeKsk
T7/yDmKplutVnWlfbxrdG+wp+ObE1h7KljGdWLx4jIX58dHVvjDJ+kZ+OJbcb6Xj
oEV947tYkZaDC7q7TkwXjYltbq+A6HFFKEwxJ59L4zYgVYVkTUMRJ3Apl66sq5a1
SHEBXAA5SDq8jxdKKqvwzh4ZtkkxIelOO8lTVjOAg8ffcNfEwbJOuom2h0kgzOgz
Sn2PxC/jwk2RZZ4T+qe1KNpVbV3RYpGanMXYDMFUnTRw2RAU2io0R2bBwOlm/0I7
ZUrjD2xCFrMPuthxr5/5/w0P1StALVN50S5YqWvDuQYIbMYhSjSh3tDgAHVrqL4W
Yc1Zr5X9X91qgUjAkejBuirvWLvgofiw8jlqAZ6K2zTUcvtn0KdQGe7eiK+wostA
PhLW9tYrkpt/BmzEMi1X
=8Wy2
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170711' into staging
ppc patch queue 2017-07-11
* Several minor cleanups from Greg Kurz
* Fix for migration of pseries-2.7 and earlier machine types
* More reworking of the DRC hotplug code, fixing several problems
though there are still more to go
* Fixes for CPU family / alias handling on POWER9
* Preliminary patches for POWER9 XIVE (new interrupt controller)
support
* Assorted other fixes
# gpg: Signature made Tue 11 Jul 2017 05:35:16 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.10-20170711:
spapr: populate device tree depending on XIVE_EXPLOIT option
spapr: introduce the XIVE_EXPLOIT option in CAS
ppc/kvm: have the "family" CPU alias to point to TYPE_HOST_POWERPC_CPU
spapr: Only report host/guest IOMMU page size mismatches on KVM
spapr: fix memory hotplug error path
target/ppc: Add debug function for radix mmu translation
target/ppc: Refactor tcg radix mmu code
spapr: Use unplug_request for PCI hot unplug
spapr: Remove unnecessary differences between hotplug and coldplug paths
spapr: Add DRC release method
spapr: Uniform DRC reset paths
spapr: Leave DR-indicator management to the guest
target-ppc: SPR_BOOKE_ESR not set on FP exceptions
spapr: fix migration to pseries machine < 2.8
spapr: fix bogus function name in comment
spapr: refresh "platform-specific" hcalls comment
spapr: make spapr_populate_hotplug_cpu_dt() static
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When XIVE is supported, the device tree should be populated
accordingly and the XIVE memory regions mapped to activate MMIOs.
Depending on the design we choose, we could also allocate different
ICS and ICP objects, or switch between objects. This needs to be
discussed.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility
(the former POWER8 interrupt model) or in XIVE exploitation mode (the
newer POWER9 interrupt model).
Bit 7 of Byte 23 of vector 5 is used for this purpose.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We print a warning if the spapr IOMMU isn't configured to support a page
size matching the host page size backing RAM. When that's the case we need
more complex logic to translate VFIO mappings, which is slower.
But, it's not so slow that it would be at all noticeable against the
general slowness of TCG. So, only warn when using KVM. This removes some
noisy and unhelpful warnings from make check on hosts with page sizes
which typically differ from those on POWER (e.g. Sparc).
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
QEMU shouldn't abort if spapr_add_lmbs()->spapr_drc_attach() fails.
Let's propagate the error instead, like it is done everywhere else
where spapr_drc_attach() is called.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
AIUI, ->unplug_request in the HotplugHandler is used for "soft"
unplug, where acknowledgement from the guest is required before
completing the unplug, whereas ->unplug is used for "hard" unplug
where qemu unilaterally removes the device, and the guest just has to
cope with its sudden absence. For spapr we (correctly) use
->unplug_request for CPU and memory hot unplug but we use ->unplug for
PCI.
While I think it might be possible to support "hard" PCI unplug within
the PAPR model, that's not how it actually works now. Although it's
called from ->unplug, the PCI unplug path will usually just mark the
device for removal, with completion of the unplug delayed until
userspace responds to the unplug notification. If the guest doesn't
respond as expected, that could delay the unplug completion arbitrarily
long.
To reflect that, change the PCI unplug path to be called from
->unplug_request. We also rename spapr_phb_hot_plug_child() and
spapr_phb_hot_unplug_child() to spapr_pci_plug() and
spapr_pci_unplug_request() to more obviously reflect the callbacks they're
implementing.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
spapr_drc_attach() has a 'coldplug' parameter which sets the DRC into
configured state initially, instead of the usual ISOLATED/UNUSABLE state.
It turns out this is unnecessary: although coldplugged devices do need to
be in CONFIGURED state once the guest starts, that will already be
accomplished by the reset code which will move DRCs for already plugged
devices into a coldplug equivalent state.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
At the moment, spapr_drc_release() has an ugly switch on the DRC type to
call the right, device-specific release function. This cleans it up by
doing that via a proper QOM method.
It's still arguably an abstraction violation for the DRC code to call into
the specific device code, but one mess at a time.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
DRC objects have a regular device reset method. However, it only gets
called in the usual way for PCI DRCs. Because of where CPU and LMB DRCs
are in the QOM tree, their device reset method isn't automatically called.
So, the machine manually registers reset handlers to call device_reset().
This patch removes the device reset method, and instead always explicitly
registers the reset handler from realize(). This means the callers don't
have to worry about the two cases, and we always get proper resets.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
The DR-indicator is essentially a "virtual LED" attached to a hotpluggable
device, which the guest can set to various states for the attention of
the operator or management layers.
It's mostly guest managed, except that we once-off set it to
ACTIVE/INACTIVE in the attach/detach path. While that makes certain sense,
there's no indication in PAPR that the hypervisor should do this, and the
drmgr code on the guest side doesn't appear to need it (it will already set
the indicator to ACTIVE on hotplug, and INACTIVE on remove).
So, leave the DR-indicator entirely to the guest; the only thing we need
to do is ensure it's in a sane state on reset.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
since commit 5c4537bd ("spapr: Fix 2.7<->2.8 migration of PCI host bridge"),
some migration fields are forged from the new ones in spapr_pci_pre_save().
It works well, except when the number of MSI devices is 0,
because in this case the function exits immediately.
This fix moves the migration code before the exit code.
The problem can be reproduced with these commands:
source qemu-2.9:
qemu-system-ppc64 -monitor stdio -M pseries-2.6 -nodefaults -S
destination qemu-2.6:
qemu-system-ppc64 -monitor stdio -M pseries-2.6 -nodefaults \
-incoming tcp:0:4444
on the source:
migrate tcp:localhost:4444
Destination fails with the following error:
qemu-system-ppc64: error while loading state for
instance 0x0 of device 'spapr_pci'
qemu-system-ppc64: load of migration failed: Invalid argument
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
$ git grep spapr_ppc_reset
hw/ppc/spapr.c: * as part of spapr_ppc_reset().
$ git grep ppc_spapr_reset
hw/ppc/spapr.c:static void ppc_spapr_reset(void)
hw/ppc/spapr.c: mc->reset = ppc_spapr_reset;
hw/ppc/spapr_hcall.c: /* If ppc_spapr_reset() did not set up a HPT
but one is necessary
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since commit ff9006ddbf ("spapr: move spapr_core_[foo]plug() callbacks
close to machine code in spapr.c"), this function doesn't need to be extern
anymore.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We need a cleanup for loads, so we rename here to be consistent.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
--
Rename htab_cleanup to htap_save_cleanup as dave suggestion
Message-Id: <20170628095228.4661-3-quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We are going to use it now for more than save live regions.
Once there rename qemu_savevm_state_begin() to qemu_savevm_state_setup().
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170628095228.4661-2-quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
There are substantial differences in the various paths through
set_isolation_state(), both for setting to ISOLATED versus UNISOLATED
state and for logical versus physical DRCs.
So, split the set_isolation_state() method into isolate() and unisolate()
methods, and give it different implementations for the two DRC types.
Factor some minimal common checks, including for valid indicator values
(which we weren't previously checking) into rtas_set_isolation_state().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The allocation-state indicator should only actually be implemented for
"logical" DRCs, not physical ones. Factor a check for this, and also for
valid indicator state values into rtas_set_allocation_state(). Because
they don't exist for physical DRCs, there's no reason that we'd ever want
more than one method implementation, so it can just be a plain function.
In addition, the setting to USABLE and setting to UNUSABLE paths in
set_allocation_state() don't actually have much in common. So, split the
method separate functions for each parameter value (drc_set_usable()
and drc_set_unusable()).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The reset handler for DRCs attempts several state transitions which are
subject to various checks and restrictions. But at reset time we know
there is no guest, so we can ignore most of the usual sequencing rules and
just set the DRC back to a known state. In fact, it's safer to do so.
The existing code also has several redundant checks for
drc->awaiting_release inside a block which has already tested that. This
patch removes those and sets the DRC to a fixed initial state based only
on whether a device is currently plugged or not.
With DRCs correctly reset to a state based on device presence, we don't
need to force state transitions as cold plugged devices are processed.
This allows us to remove all the callers of the set_*_state() methods from
outside spapr_drc.c.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
spapr_drc_detach() is called when qemu generic code requests a device be
unplugged. It makes a number of tests, which could well delay further
action until later, before actually detach the device from the DRC.
This splits out the part which actually removes the device from the DRC
into spapr_drc_release(). This will be useful for further cleanups.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The 'signalled' field in the DRC appears to be entirely a torturous
workaround for the fact that PCI devices were started in UNISOLATED state
for unclear reasons.
1) 'signalled' is already meaningless for logical (so far, all non PCI)
DRCs. It's always set to true (at least at any point it might be tested),
and can't be assigned any real meaning due to the way signalling works for
logical DRCs.
2) For PCI DRCs, the only time signalled would be false is when non-zero
functions of a multifunction device are hotplugged, followed by function
zero (the other way around is explicitly not permitted). In that case the
secondary function DRCs are attached, but the notification isn't sent to
the guest until function 0 is plugged.
3) signalled being false is used to allow a DRC detach to switch mode
back to ISOLATED state, which allows a secondary function to be hotplugged
then unplugged with function 0 never inserted. Without this a secondary
function starting in UNISOLATED state couldn't be detached again without
function 0 being inserted, all the functions configured by the guest, then
sent back to ISOLATED state.
4) But now that PCI DRCs start in ISOLATED state, there's nothing to be
done. If the guest doesn't get the notification, it won't switch the
device to UNISOLATED state, so nothing prevents it from being unplugged.
If the guest does move it to UNISOLATED state without the signal (due to
a manual drmgr call, for instance) then it really isn't safe to unplug it.
So, this patch removes the signalled variable and all code related to it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
PCI DRCs, and only PCI DRCs, are immediately moved to UNISOLATED isolation
state once the device is attached. This has been there from the initial
implementation, and it's not clear why.
The state diagram in PAPR 13.4 suggests PCI devices should start in
ISOLATED state until the guest moves them into UNISOLATED, and the code in
the guest-side drmgr tool seems to work that way too.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
In ppc_spapr_reset(), if the guest is using HPT, the code was executing:
} else {
spapr->patb_entry = 0;
spapr_setup_hpt_and_vrma(spapr);
}
And, at the end of spapr_setup_hpt_and_vrma:
/* We're setting up a hash table, so that means we're not radix */
spapr->patb_entry = 0;
Resulting in spapr->patb_entry being assigned to 0 twice in a row.
Given that 'spapr_setup_hpt_and_vrma' is also called inside
'spapr_check_setup_free_hpt' of spapr_hcall.c, this trivial patch removes
the 'patb_entry = 0' assignment from the 'else' clause inside ppc_spapr_reset
to avoid this behavior.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ICPState objects were being allocated before CPU thread realization.
However commit 9ed656631d (xics: setup cpu at realize time) reversed it
by allocating ICPState objects after CPU thread is realized. But it
didn't take care to fix the error path because of which we observe
a SIGSEGV when CPU thread realization fails during cold/hotplug.
Fix this by ensuring that we do object_unparent() of ICPState object
only in case when is was created earlier.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 5bc8d26de2 ("spapr: allocate the ICPState object from under
sPAPRCPUCore") moved ICPState objects from the machine to CPU cores.
This is an improvement since we no longer allocate ICPState objects
that will never be used. But it has the side-effect of breaking
migration of older machine types from older QEMU versions.
This patch allows spapr to register dummy "icp/server" entries to vmstate.
These entries use a dedicated VMStateDescription that can swallow and
discard state of an incoming migration stream, and that don't send anything
on outgoing migration.
As for real ICPState objects, the instance_id is the cpu_index of the
corresponding vCPU, which happens to be equal to the generated instance_id
of older machine types.
The machine can unregister/register these entries when CPUs are dynamically
plugged/unplugged.
This is only available for pseries-2.9 and older machines, thanks to a
compat property.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Fix migration of radix guests by ensuring that we issue
KVM_PPC_CONFIGURE_V3_MMU for radix case post migration.
Reported-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add a "no HPT" encoding (using value -1) to the HTAB migration
stream (in the place of HPT size) when the guest doesn't allocate HPT.
This will help the target side to match target HPT with the source HPT
and thus enable successful migration.
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Migrating between different CPU versions is a bit complicated for ppc.
A long time ago, we ensured identical CPU versions at either end by
checking the PVR had the same value. However, this breaks under KVM
HV, because we always have to use the host's PVR - it's not
virtualized. That would mean we couldn't migrate between hosts with
different PVRs, even if the CPUs are close enough to compatible in
practice (sometimes identical cores with different surrounding logic
have different PVRs, so this happens in practice quite often).
So, we removed the PVR check, but instead checked that several flags
indicating supported instructions matched. This turns out to be a bad
idea, because those instruction masks are not architected information, but
essentially a TCG implementation detail. So changes to qemu internal CPU
modelling can break migration - this happened between qemu-2.6 and
qemu-2.7. That was addressed by 146c11f1 "target-ppc: Allow eventual
removal of old migration mistakes".
Now, verification of CPU compatibility across a migration basically doesn't
happen. We simply ignore the PVR of the incoming migration, and hope the
cpu on the destination is close enough to work.
Now that we've cleaned up handling of processor compatibility modes
for pseries machine type, we can do better. For new machine types
(pseries-2.10+) We allow migration if:
* The source and destination PVRs are for the same type of CPU, as
determined by CPU class's pvr_match function
OR * When the source was in a compatibility mode, and the destination CPU
supports the same compatibility mode
For older machine types we retain the existing behaviour - current CAS
code will usually set a compat mode which would break backwards
migration if we made them use the new behaviour. [Fixed from an
earlier version by Greg Kurz].
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Currently, the CPU compatibility mode is set when the cpu is initialized,
then again when the guest negotiates features. This means if a guest
negotiates a compatibility mode, then reboots, that compatibility mode
will be retained across the reset.
Usually that will get overridden when features are negotiated on the next
boot, but it's still not really correct. This patch moves the initial set
up of the compatibility mode from cpu init to reset time. The mode *is*
retained if the reboot was caused by the feature negotiation (it might
be important in that case, though it's unlikely).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Server class POWER CPUs have a "compat" property, which is used to set the
backwards compatibility mode for the processor. However, this only makes
sense for machine types which don't give the guest access to hypervisor
privilege - otherwise the compatibility level is under the guest's control.
To reflect this, this removes the CPU 'compat' property and instead
creates a 'max-cpu-compat' property on the pseries machine. Strictly
speaking this breaks compatibility, but AFAIK the 'compat' option was
never (directly) used with -device or device_add.
The option was used with -cpu. So, to maintain compatibility, this
patch adds a hack to the cpu option parsing to strip out any compat
options supplied with -cpu and set them on the machine property
instead of the now deprecated cpu property.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Tested-by: Andrea Bolognani <abologna@redhat.com>
When using the 40p machine, soundhw_init() is currently called twice,
one time from vl.c and one time from ibm_40p_init(). The call in
ibm_40p_init() was likely just a copy-and-paste from a old version
of the prep machine - but there the call to audio_init() (which was
the previous name of this function) has been removed many years ago
already, with commit b3e6d591b0
("audio: enable PCI audio cards for all PCI-enabled targets"), so
we certainly also do not need the soundhw_init() in the 40p function
anymore nowadays.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Sahid Ferdjaoui <sferdjao@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In some cases a failing VMSTATE_*_EQUAL does not mean we detected a bug,
but it's actually the best we can do. Especially in these cases a verbose
error message is required.
Let's introduce infrastructure for specifying a error hint to be used if
equal check fails. Let's do this by adding a parameter to the _EQUAL
macros called _err_hint. Also change all current users to pass NULL as
last parameter so nothing changes for them.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Message-Id: <20170623144823.42936-1-pasic@linux.vnet.ibm.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Move it into MigrationState, revert its meaning and renaming it to
send_section_footer, with a property bound to it. Same trick is played
like previous patches.
Removing savevm_skip_section_footers().
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1498536619-14548-9-git-send-email-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
It was in SaveState but now moved to MigrationState altogether, reverted
its meaning, then renamed to "send_configuration". Again, using
HW_COMPAT_2_3 for old PC/SPAPR machines, and accel_register_prop() for
xen_init().
Removing savevm_skip_configuration().
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1498536619-14548-8-git-send-email-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Put it into MigrationState then we can use the properties to specify
whether to enable storing global state.
Removing global_state_set_optional() since now we can use HW_COMPAT_2_3
for x86/power, and AccelClass.global_props for Xen.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1498536619-14548-6-git-send-email-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This is an alias of TYPE_PNV_CORE's property "pir", which is defined
with DEFINE_PROP_UINT32()
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20170607163635.17635-38-marcandre.lureau@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
TYPE_PC_DIMM's property PC_DIMM_ADDR_PROP is defined with
DEFINE_PROP_UINT64().
TYPE_PC_DIMM's property PC_DIMM_NODE_PROP is defined with
DEFINE_PROP_UINT32().
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20170607163635.17635-22-marcandre.lureau@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Use the actual unsigned integer type name.
The type name change impacts the following externally visible area:
* vl.c's machine_help_func() puts it in help for -machine NAME,help.
* QMP command qom-list exposes it in ObjectPropertyInfo member @type.
* QMP command device-list-properties exposes it in DevicePropertyInfo
member @type.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170607163635.17635-15-marcandre.lureau@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
It don't belong anywhere else, just the global state where everybody
can stick other things.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
They are indpendent, and nowadays almost every device register things
with qdev->vmsd.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
This reverts commit fe6824d126.
Conflicts hw/ppc/spapr_drc.c, because get_index() has been renamed
spapr_get_index().
This didn't fix the problem. Once the hotplug has been started
some memory is allocated and some structures are allocated.
We don't free it when we ignore the unplug, and we can't because
they can be in use by the kernel.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Until recently, spapr used to allocate ICPState objects for the lifetime
of the machine. They would only be associated to vCPUs in xics_cpu_setup()
when plugging a CPU core.
Now that ICPState objects have the same lifecycle as vCPUs, it is
possible to associate them during realization.
This patch hence open-codes xics_cpu_setup() in icp_realize(). The vCPU
is passed as a property. Note that vCPU now needs to be realized first
for the IRQs to be allocated. It also needs to resetted before ICPState
realization in order to synchronize with KVM.
Since ICPState objects are freed when unrealized, xics_cpu_destroy() isn't
needed anymore and can be safely dropped.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
These properties are part of the XICS API. They deserve to appear
explicitely in the XICS header file.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
SLOF uses "pci" as name for PCI bridges nodes in the device tree instead
of "pci-bridges", so booting via bootindex from a device behind a PCI
bridge currently does not work since QEMU passes the wrong name in the
"qemu,boot-list" property. Fix it by changing the name of the PCI bridge
nodes to "pci" instead.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1459170
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Similarly to what was done to spapr with commit 249127d0df, this patch
ensures that we don't keep an extra reference on the ICPState object. Also
since the object was just created and not reparented yet, the call to
object_property_add_child() should never fail: let's pass &error_abort to
make this clear.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
DRC objects have a get_name method which returns the DRC name generated
when the DRC is created. Replace that with a fixed spapr_drc_name()
function which generates the name on the fly from other information. This
means:
* We get rid of a method with only one implementation, and only local
callers
* We don't have to carry the name string around for the lifetime of the
DRC
* We use information added to the class structure to generate the name
in standard format, so we don't need an explicit switch on drc type
any more
We also eliminate the 'name' property; it's basically useless since the
only information in it can easily be deduced from other things.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Both functions are fairly short, and so are their callers. There's no
particular logical distinction between them, so fold them together.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
DRC objects have attach & detach methods, but there's only one
implementation. Although there are some differences in its behaviour for
different DRC types, the overall structure is the same, so while we might
want different method implementations for some parts, we're unlikely to
want them for the top-level functions.
So, replace them with direct function calls.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
There are 3 types of "indicator" associated with hotplug in the PAPR spec
the "allocation state", "isolation state" and "DR-indicator". The first
two are intimately tied to the various state transitions associated with
hotplug. The DR-indicator, however, is different and simpler.
It's basically just a guest controlled variable which can be used by the
guest to flag state or problems associated with a device. The idea is that
the hypervisor can use it to present information back on management
consoles (on some machines with PowerVM it may even control physical LEDs
on the machine case associated with the relevant device).
For that reason, there's only ever likely to be a single update
implementation so the set_indicator_state method isn't useful. Replace it
with a direct function call.
While we're there, make some small associated cleanups:
* PAPR doesn't use the term "indicator state", just "DR-indicator" and
the allocation state and isolation state are also considered "indicators".
Rename things to be less confusing
* Fold set_indicator_state() and rtas_set_indicator_state() into a single
rtas_set_dr_indicator() function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
In theory the RTAS set-indicator call can be used for a number of
"indicators" defined by PAPR. In practice the only ones we're ever likely
to implement are those used for Dynamic Reconfiguration (i.e. hotplug).
Because of this, the current implementation determines the associated DRC
object, before dispatching based on the type of indicator.
However, this means we also need a check that we're dealing with a DR
related indicator at all, which duplicates some of the logic from the
switch further down.
Even though it means a bit of code duplication, things work out cleaner if
we delegate the DRC lookup to the individual indicator type functions -
and it also allows some further cleanups.
While we're there, remove references to "sensor", a copy/paste artefact
from the related, but distinct "get-sensor" call.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
With some combinations of migration and hotplug we can lost temporary state
indicating how many DRCs (guest side hotplug handles) are still connected
to a DIMM object in the process of removal. When we hit that situation
spapr_recover_pending_dimm_state() is used to scan more extensively and
work out the right number.
It does this using drc->indicator state to determine what state of
disconnection the DRC is in. However, this is not safe, because the
indicator state is guest settable - in fact it's more-or-less a purely
guest->host notification mechanism which should have no bearing on the
internals of hotplug state management.
So, replace the test for this with a test on drc->dev, which is a purely
qemu side managed variable, and updated the same BQL critical section as
the indicator state.
This does introduce an off-by-one change, because the indicator state was
updated before the call to spapr_lmb_release() on the current DRC, whereas
drc->dev is updated afterwards. That's corrected by always decrementing
the nr_lmbs value instead of only doing so in the case where we didn't
have to recover information.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
DRC classes have an entity_sense method to determine (in a specific PAPR
sense) the presence or absence of a device plugged into a DRC. However,
we only have one implementation of the method, which explicitly tests for
different DRC types. This changes it to instead have different method
implementations for the two cases: "logical" and "physical" DRCs.
While we're at it, the entity sense method always returns RTAS_OUT_SUCCESS,
and the interesting value is returned via pass-by-reference. Simplify this
to directly return the value we care about
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The pseries machine type doesn't usually use the 'pvpanic' device as such,
because it has a firmware/hypervisor facility with roughly the same
purpose. The 'ibm,os-term' RTAS call notifies the hypervisor that the
guest has crashed.
Our implementation of this call was sending a GUEST_PANICKED qmp event;
however, it was not doing the other usual panic actions, making its
behaviour different from pvpanic for no good reason.
To correct this, we should call qemu_system_guest_panicked() rather than
directly sending the panic event.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
The string returned by object_property_get_str() is dynamically allocated.
(Spotted by Coverity, CID 1375942)
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Accumulated patches for ppc targets and the pseries machine type.
The big thing in this batch is a start on a substantial cleanup of the
pseries hotplug mechanisms, which were pretty confusing. For now
these shouldn't cause substantial behavioural changes, but I am hoping
these lead to clearer code and eventually to fixes for the bugs we
have in hotplug handling, particularly when hotplug and migration are
combined.
The remaining patches are mostly bugfixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZNhgSAAoJEGw4ysog2bOSERwP/A7T7UJ8XXWit9QXGCi+G83w
+RUuHxjA9qFqrg1zYqFyLg3ctGl93Sxu7mzI5MOIKwAVXlTsE6+84TH7zBc18DPB
fekPWmzJ6jfiVO+1Zg1JPWorMfIHDDc2v6Q6qPfD8KWbt02yPfrXbKlivQB4hVZ4
Qb4VJdjZgBDcVy79xhcW5k6v8dVw8PdSyDmkQrBhccI0noLerhI41Mgt7QQaWQRH
Le3ziexUpWelVCRQB0FqE/PIWo2+NY/e0pumX7Aqtjs/G35KjOXy0ja3yKLjfeUW
Z4NugIO2I2hncERa68YFar/BqG26DX8KCErNMDkn7LyZcoDAQWhcDH+65G1BNuf2
jW+KApMNm+N1vXabbz8P9BbLjuZpRQQhyPOxB3I8UGaTYGtCPe/lUCe2/V8EbKNa
VFavc1UuLftOZuJj/rYGJeU/4JBU6srbAKCO3VVK4Tnd8DyiT3QCpUWEkjv+J6jo
co35oYBavLfQPMr+rsX15lgbmZwg7iBV+dgKLa2+cwmKXzCf7aYe38aJy7nRBmhb
ivhH3bKtdysy0qq4UYaCgW06qQcVF0QMJaxFQ0X7I+GBNwHA7wdZD/i6IMcO6Z7H
7gQdavBTdukgKb2+pVjR58H13ieHXuBxktonhOz70rvEDVa4xx8pxhnZlpSiH2ha
RzpkhanrwEeECG6Lke/3
=QDWB
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170606' into staging
ppc patch queue 2017-06-06
Accumulated patches for ppc targets and the pseries machine type.
The big thing in this batch is a start on a substantial cleanup of the
pseries hotplug mechanisms, which were pretty confusing. For now
these shouldn't cause substantial behavioural changes, but I am hoping
these lead to clearer code and eventually to fixes for the bugs we
have in hotplug handling, particularly when hotplug and migration are
combined.
The remaining patches are mostly bugfixes.
# gpg: Signature made Tue 06 Jun 2017 03:48:50 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.10-20170606:
spapr: Remove some non-useful properties on DRC objects
spapr: Eliminate spapr_drc_get_type_str()
spapr: Move configure-connector state into DRC
spapr: Clean up spapr_dr_connector_by_*()
spapr: Introduce DRC subclasses
spapr/drc: don't migrate DRC of cold-plugged CPUs and LMBs
spapr: Allow boot from vhost-*-scsi backends
ppc/pnv: check the return value of fdt_setprop()
spapr_nvram: Check return value from blk_getlength()
target/ppc: Fixup set_spr error in h_register_process_table
target-ppc: Fix openpic timer read register offset
spapr: Make DRC get_index and get_type methods into plain functions
spapr: Abolish DRC set_configured method
spapr: Abolish DRC get_fdt method
spapr: Move DRC RTAS calls into spapr_drc.c
migration: Mark CPU states dirty before incoming migration/loadvm
migration: remove register_savevm()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* 'connector_type' is easily derived from the 'index' property, so there's
no point to it (it's also implicit in the QOM type of the DRC)
* 'isolation-state', 'indicator-state' and 'allocation-state' are
part of the transaction between qemu and guest during PAPR hotplug
operations, and outside tools really have no business looking at it
(especially not changing, and these were RW properties)
* 'entity-sense' is basically just a weird PAPR encoding of whether there
is a device connected to this DRC
Strictly speaking removing these properties is breaking the qemu interface.
However, I'm pretty sure no management tools have ever used these. For
debugging there are better alternatives. Therefore, I think removing these
broken interfaces is the better option.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
This function was used in generating the device tree. However, now that
we have different QOM types for different DRC types we can easily store
the information we need in the class structure and avoid this specialized
lookup function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Currently the sPAPRMachineState contains a list of sPAPRConfigureConnector
structures which store intermediate state for the ibm,configure-connector
RTAS call.
This was an attempt to separate this state from the core of the DRC state.
However the configure connector process is intimately tied to the DRC
model, so there's really no point trying to have two levels of interface
here.
Moving the configure-connector state into its corresponding DRC allows
removal of a number of helpers for maintaining the anciliary list.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
* Change names to something less ludicrously verbose
* Now that we have QOM subclasses for the different DRC types, use a QOM
typename instead of a PAPR type value parameter
The latter allows removal of the get_type_shift() helper.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Currently we only have a single QOM type for all DRCs, but lots of
places where we switch behaviour based on the DRC's PAPR defined type.
This is a poor use of our existing type system.
So, instead create QOM subclasses for each PAPR defined DRC type. We
also introduce intermediate subclasses for physical and logical DRCs,
a division which will be useful later on.
Instead of being stored in the DRC object itself, the PAPR type is now
stored in the class structure. There are still many places where we
switch directly on the PAPR type value, but this at least provides the
basis to start to remove those.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
As explained in commit 5c0139a8c2 ("spapr: fix default DRC state for
coldplugged LMBs"), guests expect cold-plugged LMBs to be pre-allocated
and unisolated. The same goes for cold-plugged CPUs.
While here, let's convert g_assert(false) to the better self documenting
g_assert_not_reached().
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The current implementation of spapr_get_fw_dev_path() doesn't take into
consideration vhost-*-scsi devices. This makes said devices unbootable
on PPC as SLOF is unable to work out the path to scan boot disks.
This makes VMs bootable on spapr when using vhost-*-scsi by implementing
a disk path for VHostSCSICommon (which currently includes both
vhost-user-scsi and vhost-scsi).
Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Signed-off-by: Mike Cui <cui@nutanix.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
set_spr is used in the function h_register_process_table() to update the
LPCR_GTSE and LPCR_UPRT values based on the flags passed by the guest.
The set_spr function takes the last two arguments mask and value used to
mask and set the value of the spr respectively.
The current call site passes these arguments in the wrong order and thus
bot GTSE and UPRT will be set irrespective, which is obviously
incorrect.
Rearrange the function call so that these arguments are passed in the
correct order and the correct behaviour is exhibited.
It is worth noting that this wasn't detected earlier since these were
always both set in all cases where this H_CALL was made.
Fixes: 6de833070c ("target/ppc: Set UPRT and GTSE on all cpus in H_REGISTER_PROCESS_TABLE")
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
These two methods only have one implementation, and the spec they're
implementing means any other implementation is unlikely, verging on
impossible.
So replace them with simple functions.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
DRConnectorClass has a set_configured method, however:
* There is only one implementation, and only ever likely to be one
* There's exactly one caller, and that's (now) local
* The implementation is very straightforward
So abolish the method entirely, and just open-code what we need.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
The DRConnectorClass includes a get_fdt method. However
* There's only one implementation, and there's only likely to ever be one
* Both callers are local to spapr_drc
* Each caller only uses one half of the actual implementation
So abolish get_fdt() entirely, and just open-code what we need.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
Currently implementations of the RTAS calls related to DRCs are in
spapr_rtas.c. They belong better in spapr_drc.c - that way they're closer
to related code, and we'll be able to make some more things local.
spapr_rtas.c was intended to contain the RTAS infrastructure and core calls
that don't belong anywhere else, not every RTAS implementation.
Code motion only.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
even though spapr_fixup_cpu_numa_dt() has no effect on FDT
if numa is disabled, don't call it uselessly. It makes it
obvious at call sites that function is needed only when numa
is enabled.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1496161442-96665-7-git-send-email-imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Move vcpu's associated numa_node field out of generic CPUState
into inherited classes that actually care about cpu<->numa mapping,
i.e: ARMCPU, PowerPCCPU, X86CPU.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1496161442-96665-6-git-send-email-imammedo@redhat.com>
[ehabkost: s/CPU is belonging to/CPU belongs to/ on comments]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Those are apparently unnecessary includes.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
A bunch of fixes all over the place. Most notably this fixes
the new MTU feature when using vhost.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJZK2bwAAoJECgfDbjSjVRpNBgIALmNG7VaixhNUlnfX1n1JBnh
+HBP2zNfvi0q5roBuPFmlziKa3IBHb2Fcte4nb6QxmPg+uoaj39AOzfrrvz210kR
h2j5Qk2bCdMeWBpxI+xDDScwi/Im23Y6KN1eZyMekFr2CaSGiqOHZPPdbsyEcHPB
VylM0uHqSTZL5JAAzEuYlH+LLfPu91HoxMsIAdNuQX+qKyM2DZ4eICBQ0zA73USt
OduZltcRMk7UpvQMqY+2iaEXapXQQEUGrP2Mo8ZyqeIl2ItC33GspqBQIKjuZdrr
tpr/T1VWsLdZnURZXyELrFqrErDXvKaP9HROwvyLyYPXZF+pJ3LA7TopS5UmfNQ=
=Z4xG
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'mst/tags/for_upstream' into staging
pci, virtio, vhost: fixes
A bunch of fixes all over the place. Most notably this fixes
the new MTU feature when using vhost.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Mon 29 May 2017 01:10:24 AM BST
# gpg: using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* mst/tags/for_upstream:
acpi-test: update expected files
pc: ACPI BIOS: use highest NUMA node for hotplug mem hole SRAT entry
vhost-user: pass message as a pointer to process_message_reply()
virtio_net: Bypass backends for MTU feature negotiation
intel_iommu: turn off pt before 2.9
intel_iommu: support passthrough (PT)
intel_iommu: allow dev-iotlb context entry conditionally
intel_iommu: use IOMMU_ACCESS_FLAG()
intel_iommu: provide vtd_ce_get_type()
intel_iommu: renaming context entry helpers
x86-iommu: use DeviceClass properties
memory: remove the last param in memory_region_iommu_replay()
memory: tune last param of iommu_ops.translate()
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Assorted accumulated patches. These are nearly all bugfixes at one
level or another - some for longstanding problems, others for some
regressions caused by more recent cleanups.
This includes preliminary patches towards fixing migration for Radix
Page Table guests under POWER9 and also fixing some migration
regressions due to the re-organization of the interrupt controller
code. Not all the pieces are there yet, so those still won't quite
work, but the preliminary changes make sense on their own.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZJlRoAAoJEGw4ysog2bOS4m0P/0fm0k9znGQ8jpbGDJ18PF4g
Z7rhEcz5Ab1f5xn+ujYSc23ViJ0wgonhQB0F2d02O50Br0Gu2zN1XMrstysUEN/6
qg7nngsDqe+mGFMXASNb+YIzK4mYZQXmW8qscVm6fdaGXq/tZ13zMRPoRHdJQpsg
uN/uDWvQqwZO4RizKFbXlosoeNS1Q4c+Bm5MszV+B6TfVvgNd81Od7rjY/ucj4tr
9e8oG3lx1YpRjg6XN3uT/AEtPxgUe6hAS5RlsAWk/B0FBUK6JvRSaDAS8ojg8UIg
8cPWix5OrHQSpjcTsNW3X2FRb31O8YvExPYFHrVZeVhaB5HzVLPXEudeSIMiuqjn
CfZxRz6+IToWUJWFn30NozfJUwgQlJ2sf92CHcmMKHu2Zd/hUWdApIukmEFY43Y5
jyhDkubrRtSsCcR6wd4mGeAg2iQWubSOPFdM/TAGzlbGWoT4qXBK1Ol03DaiF971
fkxWaHrmgiKhe8G1sUIZXfDDxpTIvFv1bcmGOnhGmsELFh65bMXVLmwjNvVK9fdE
hTuWibRPPE3btyI4eOMbtVdooliCfp+0XvraACnuOXQlgD1bqCPSrnsS2HLPiDS+
npRKlHGlf4cYSVCeTCjmsAVIqzsDfyvpd67qP3xPsaX/pxI/i+I2H9usZWWJBXMp
I5M78EL5NCkMnZgYIFad
=nlnV
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'dgibson/tags/ppc-for-2.10-20170525' into staging
ppc patch queue 2017-05-25
Assorted accumulated patches. These are nearly all bugfixes at one
level or another - some for longstanding problems, others for some
regressions caused by more recent cleanups.
This includes preliminary patches towards fixing migration for Radix
Page Table guests under POWER9 and also fixing some migration
regressions due to the re-organization of the interrupt controller
code. Not all the pieces are there yet, so those still won't quite
work, but the preliminary changes make sense on their own.
# gpg: Signature made Thu 25 May 2017 04:50:00 AM BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* dgibson/tags/ppc-for-2.10-20170525:
xics: add unrealize handler
hw/ppc/spapr.c: recover pending LMB unplug info in spapr_lmb_release
hw/ppc: migrating the DRC state of hotplugged devices
hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque
hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState
spapr: add pre_plug function for memory
pseries: Restore support for total vcpus not a multiple of threads-per-core for old machine types
pseries: Split CAS PVR negotiation out into a separate function
spapr: fix error reporting in xics_system_init()
spapr_cpu_core: drop reference on ICP object during CPU realization
hw/ppc/spapr_events.c: removing 'exception' from sPAPREventLogEntry
spapr: ensure core_slot isn't NULL in spapr_core_unplug()
xics_kvm: cache already enabled vCPU ids
spapr: Consolidate HPT freeing code into a routine
spapr-cpu-core: release ICP object when realization fails
spapr: sanitize error handling in spapr_ics_create()
ppc/xics: simplify prototype of xics_spapr_init()
target/ppc: reset reservation in do_rfi()
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch converts the old "is_write" bool into IOMMUAccessFlags. The
difference is that "is_write" can only express either read/write, but
sometimes what we really want is "none" here (neither read nor write).
Replay is an good example - during replay, we should not check any RW
permission bits since thats not an actual IO at all.
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
When a LMB hot unplug starts, the current DRC LMB status is stored at
spapr->pending_dimm_unplugs QTAILQ. This queue isn't migrated, thus
if a migration occurs in the middle of a LMB unplug the
spapr_lmb_release callback will lost track of the LMB unplug progress.
This patch implements a new recover function spapr_recover_pending_dimm_state
that is used inside spapr_lmb_release to recover this DRC LMB release
status that is lost during the migration.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
[dwg: Minor stylistic changes, simplify error handling]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In pseries, a firmware abstraction called Dynamic Reconfiguration
Connector (DRC) is used to assign a particular dynamic resource
to the guest and provide an interface to manage configuration/removal
of the resource associated with it. In other words, DRC is the
'plugged state' of a device.
Before this patch, DRC wasn't being migrated. This causes
post-migration problems due to DRC state mismatch between source and
target. The DRC state of a device X in the source might
change, while in the target the DRC state of X is still fresh. When
migrating the guest, X will not have the same hotplugged state as it
did in the source. This means that we can't hot unplug X in the
target after migration is completed because its DRC state is not consistent.
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1677552 is one
bug that is caused by this DRC state mismatch between source and
target.
To migrate the DRC state, we defined the VMStateDescription struct for
spapr_drc to enable the transmission of spapr_drc state in migration.
Not all the elements in the DRC state are migrated - only those
that can be modified by guest actions or device add/remove
operations:
- 'isolation_state', 'allocation_state' and 'indicator_state'
are involved in the DR state transition diagram from
PAPR+ 2.7, 13.4;
- 'configured', 'signalled', 'awaiting_release' and 'awaiting_allocation'
are needed in attaching and detaching devices;
- 'indicator_state' provides users with hardware state information.
These are the DRC elements that are migrated.
In this patch the DRC state is migrated for PCI, LMB and CPU
connector types. At this moment there is no support to migrate
DRC for the PHB (PCI Host Bridge) type.
In the 'realize' function the DRC is registered using vmstate_register,
similar to what hw/ppc/spapr_iommu.c does in 'spapr_tce_table_realize'.
This approach works because DRCs are bus-less and do not sit
on a BusClass that implements bc->get_dev_path, so as a fallback the
VMSD gets identified via "spapr_drc"/get_index(drc).
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The pointer drc->detach_cb is being used as a way of informing
the detach() function inside spapr_drc.c which cb to execute. This
information can also be retrieved simply by checking drc->type and
choosing the right callback based on it. In this context, detach_cb
is redundant information that must be managed.
After the previous spapr_lmb_release change, no detach_cb_opaques
are being used by any of the three callbacks functions. This is
yet another information that is now unused and, on top of that, can't
be migrated either.
This patch makes the following changes:
- removal of detach_cb_opaque. the 'opaque' argument was removed from
the callbacks and from the detach() function of sPAPRConnectorClass. The
attribute detach_cb_opaque of sPAPRConnector was removed.
- removal of detach_cb from the detach() call. The function pointer
detach_cb of sPAPRConnector was removed. detach() now uses a
switch(drc->type) to execute the apropriate callback. To achieve this,
spapr_core_release, spapr_lmb_release and spapr_phb_remove_pci_device_cb
callbacks were made public to be visible inside detach().
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The LMB DRC release callback, spapr_lmb_release(), uses an opaque
parameter, a sPAPRDIMMState struct that stores the current LMBs that
are allocated to a DIMM (nr_lmbs). After each call to this callback,
the nr_lmbs is decremented by one and, when it reaches zero, the callback
proceeds with the qdev calls to hot unplug the LMB.
Using drc->detach_cb_opaque is problematic because it can't be migrated in
the future DRC migration work. This patch makes the following changes to
eliminate the usage of this opaque callback inside spapr_lmb_release:
- sPAPRDIMMState was moved from spapr.c and added to spapr.h. A new
attribute called 'addr' was added to it. This is used as an unique
identifier to associate a sPAPRDIMMState to a PCDIMM element.
- sPAPRMachineState now hosts a new QTAILQ called 'pending_dimm_unplugs'.
This queue of sPAPRDIMMState elements will store the DIMM state of DIMMs
that are currently going under an unplug process.
- spapr_lmb_release() will now retrieve the nr_lmbs value by getting the
correspondent sPAPRDIMMState. A helper function called spapr_dimm_get_address
was created to fetch the address of a PCDIMM device inside spapr_lmb_release.
When nr_lmbs reaches zero and the callback proceeds with the qdev hot unplug
calls, the sPAPRDIMMState struct is removed from spapr->pending_dimm_unplugs.
After these changes, the opaque argument for spapr_lmb_release is now
unused and is passed as NULL inside spapr_del_lmbs. This and the other
opaque arguments can now be safely removed from the code.
As an additional cleanup made by this patch, the spapr_del_lmbs function
was merged with spapr_memory_unplug_request. The former was being called
only by the latter and both were small enough to fit one single function.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
[dwg: Minor stylistic cleanups]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This allows to manage errors before the memory
has started to be hotplugged. We already have
the function for the CPU cores.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
[dwg: Fixed a couple of style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As of pseries-2.7 and later, we require the total number of guest vcpus to
be a multiple of the threads-per-core. pseries-2.6 and earlier machine
types, however, are supposed to allow this for the sake of migration from
old qemu versions which allowed this.
Unfortunately, 8149e29 "pseries: Enforce homogeneous threads-per-core"
broke this by not considering the old machine type case. This fixes it by
only applying the check when the machine type supports hotpluggable cpus.
By not-entirely-coincidence, that corresponds to the same time when we
started enforcing total threads being a multiple of threads-per-core.
Fixes: 8149e2992f
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Guests of the qemu machine type go through a feature negotiation process
known as "client architecture support" (CAS) during early boot. This does
a number of things, one of which is finding a CPU compatibility mode which
can be supported by both guest and host.
In fact the CPU negotiation is probably the single most complex part of the
CAS process, so this splits it out into a helper function. We've recently
made some mistakes in maintaining backward compatibility for old machine
types here. Splitting this out will also make it easier to fix this.
This also adds a possibly useful error message if the negotiation fails
(i.e. if there isn't a CPU mode that's suitable for both guest and host).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
If the user explicitely asked for kernel-irqchip support and "xics-kvm"
initialization fails, we shouldn't fallback to emulated "xics" as we
do now. It is also awkward to print an error message when we have an
errp pointer argument.
Let's use the errp argument to report the error and let the caller decide.
This simplifies the code as we don't need a local Error * here.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When a piece of code allocates an object, it implicitely gets a reference
on it. If it then makes that object a child property of another object, it
should drop its own reference at some point otherwise the child object can
never be finalized. The current code hence leaks one ICP object per CPU
when hot-removing a core.
Failing to add a newly allocated ICP object to the CPU is a bug. While here,
let's ensure QEMU aborts if this ever happens.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currenty we do not have any RTAS event that is reported by the
event-scan interface. The existing events, RTAS_LOG_TYPE_EPOW and
RTAS_LOG_TYPE_HOTPLUG, are being reported by the check-exception
interface and, as such, marked as 'exception=true'.
Commit 79853e18d9, 'spapr_events: event-scan RTAS interface', added
the event_scan interface because the guest kernel requires it to
initialize other required interfaces. It is acting since then as
a stub because no events that would be reported by it were added
since then. However, the existence of the 'exception' boolean adds
an unnecessary load in the future migration of the pending_events,
sPAPREventLogEntry QTAILQ that hosts the pending RTAS events.
To make the code cleaner and ease the future migration changes, this
patch makes the following changes:
- remove the 'exception' boolean that filter these events. There is
nothing to filter since all events are reported by check-exception;
- functions rtas_event_log_queue, rtas_event_log_dequeue and
rtas_event_log_contains don't receive the 'exception' boolean
as parameter;
- event_scan function was simplified. It was calling
'rtas_event_log_dequeue(mask, false)' that was always returning
'NULL' because we have no events that are created with
exception=false, thus in the end it would execute a jump to
'out_no_events' all the time. The function now assumes that
this will always be the case and all the remaining logic were
deleted.
In the future, when or if we add new RTAS events that should
be reported with the event_scan interface, we can refer to
the changes made in this patch to add the event_scan logic
back.
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If we go that far on the path of hot-removing a core and we find out that
the core-id is invalid, then we have a serious bug.
Let's make it explicit with an assert() instead of dereferencing a NULL
pointer.
This fixes Coverity issue CID 1375404.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Consolidate the code that frees HPT into a separate routine
spapr_free_hpt() as the same chunk of code is called from two places.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
While here we introduce a single error path to avoid code duplication.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The spapr_ics_create() function handles errors in a rather convoluted
way, with two local Error * variables. Moreover, failing to parent the
ICS object to the machine should be considered as a bug but it is
currently ignored.
This patch addresses both issues.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This function only does hypercall and RTAS-call registration, and thus
never returns an error. This patch adapt the prototype to reflect that.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Time to wire up all the call sites that request a shutdown or
reset to use the enum added in the previous patch.
It would have been less churn to keep the common case with no
arguments as meaning guest-triggered, and only modified the
host-triggered code paths, via a wrapper function, but then we'd
still have to audit that I didn't miss any host-triggered spots;
changing the signature forces us to double-check that I correctly
categorized all callers.
Since command line options can change whether a guest reset request
causes an actual reset vs. a shutdown, it's easy to also add the
information to reset requests.
Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au> [ppc parts]
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> [SPARC part]
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x parts]
Message-Id: <20170515214114.15442-5-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
All the functions in hw/audio/audio.h are called "soundhw_*()"
and live in hw/audio/audiohw.c. Rename the header file for
consistency.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
Message-id: 20170508205735.23444-4-ehabkost@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
To make it consistent with the remaining soundhw.c functions and
avoid confusion with the audio_init() function in audio/audio.c,
rename audio_init() to soundhw_init().
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-id: 20170508205735.23444-3-ehabkost@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
There's no reason to keep the soundhw table in arch_init.c. Move
that code to a new hw/audio/soundhw.c file.
While moving the code, trivial coding style issues were fixed.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170508205735.23444-2-ehabkost@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
commit 33cd52b5d7 unset
cannot_instantiate_with_device_add_yet in TYPE_SYSBUS, making all
sysbus devices appear on "-device help" and lack the "no-user"
flag in "info qdm".
To fix this, we can set user_creatable=false by default on
TYPE_SYS_BUS_DEVICE, but this requires setting
user_creatable=true explicitly on the sysbus devices that
actually work with -device.
Fortunately today we have just a few has_dynamic_sysbus=1
machines: virt, pc-q35-*, ppce500, and spapr.
virt, ppce500, and spapr have extra checks to ensure just a few
device types can be instantiated:
* virt supports only TYPE_VFIO_CALXEDA_XGMAC, TYPE_VFIO_AMD_XGBE.
* ppce500 supports only TYPE_ETSEC_COMMON.
* spapr supports only TYPE_SPAPR_PCI_HOST_BRIDGE.
This patch sets user_creatable=true explicitly on those 4 device
classes.
Now, the more complex cases:
pc-q35-*: q35 has no sysbus device whitelist yet (which is a
separate bug). We are in the process of fixing it and building a
sysbus whitelist on q35, but in the meantime we can fix the
"-device help" and "info qdm" bugs mentioned above. Also, despite
not being strictly necessary for fixing the q35 bug, reducing the
list of user_creatable=true devices will help us be more
confident when building the q35 whitelist.
xen: We also have a hack at xen_set_dynamic_sysbus(), that sets
has_dynamic_sysbus=true at runtime when using the Xen
accelerator. This hack is only used to allow xen-backend devices
to be dynamically plugged/unplugged.
This means today we can use -device with the following 22 device
types, that are the ones compiled into the qemu-system-x86_64 and
qemu-system-i386 binaries:
* allwinner-ahci
* amd-iommu
* cfi.pflash01
* esp
* fw_cfg_io
* fw_cfg_mem
* generic-sdhci
* hpet
* intel-iommu
* ioapic
* isabus-bridge
* kvmclock
* kvm-ioapic
* kvmvapic
* SUNW,fdtwo
* sysbus-ahci
* sysbus-fdc
* sysbus-ohci
* unimplemented-device
* virtio-mmio
* xen-backend
* xen-sysdev
This patch adds user_creatable=true explicitly to those devices,
temporarily, just to keep 100% compatibility with existing
behavior of q35. Subsequent patches will remove
user_creatable=true from the devices that are really not meant to
user-creatable on any machine, and remove the FIXME comment from
the ones that are really supposed to be user-creatable. This is
being done in separate patches because we still don't have an
obvious list of devices that will be whitelisted by q35, and I
would like to get each device reviewed individually.
Cc: Alexander Graf <agraf@suse.de>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Beniamino Galvani <b.galvani@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Frank Blaschka <frank.blaschka@de.ibm.com>
Cc: Gabriel L. Somlo <somlo@cmu.edu>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Pierre Morel <pmorel@linux.vnet.ibm.com>
Cc: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-arm@nongnu.org
Cc: qemu-block@nongnu.org
Cc: qemu-ppc@nongnu.org
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Shannon Zhao <zhaoshenglong@huawei.com>
Cc: sstabellini@kernel.org
Cc: Thomas Huth <thuth@redhat.com>
Cc: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Acked-by: John Snow <jsnow@redhat.com>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-3-ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[ehabkost: Small changes at sysbus_device_class_init() comments]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
cannot_instantiate_with_device_add_yet was introduced by commit
efec3dd631 to replace no_user. It was
supposed to be a temporary measure.
When it was introduced, we had 54
cannot_instantiate_with_device_add_yet=true lines in the code.
Today (3 years later) this number has not shrunk: we now have
57 cannot_instantiate_with_device_add_yet=true lines. I think it
is safe to say it is not a temporary measure, and we won't see
the flag go away soon.
Instead of a long field name that misleads people to believe it
is temporary, replace it a shorter and less misleading field:
user_creatable.
Except for code comments, changes were generated using the
following Coccinelle patch:
@@
expression DC;
@@
(
-DC->cannot_instantiate_with_device_add_yet = false;
+DC->user_creatable = true;
|
-DC->cannot_instantiate_with_device_add_yet = true;
+DC->user_creatable = false;
)
@@
typedef ObjectClass;
expression dc;
identifier class, data;
@@
static void device_class_init(ObjectClass *class, void *data)
{
...
dc->hotpluggable = true;
+dc->user_creatable = true;
...
}
@@
@@
struct DeviceClass {
...
-bool cannot_instantiate_with_device_add_yet;
+bool user_creatable;
...
}
@@
expression DC;
@@
(
-!DC->cannot_instantiate_with_device_add_yet
+DC->user_creatable
|
-DC->cannot_instantiate_with_device_add_yet
+!DC->user_creatable
)
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Thomas Huth <thuth@redhat.com>
Acked-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-2-ehabkost@redhat.com>
[ehabkost: kept "TODO remove once we're there" comment]
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Highlights:
* New "-numa cpu" option
* NUMA distance configuration
* migration/i386 vmstatification
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJZFLh3AAoJECgHk2+YTcWmppQQAJe9Y5a3VwHXqvHbwBHX2ysn
RZDAUPd9DWpbM+UUydyKOVIZ7u5RXbbVq4E0NeCD8VYYd+grZB5Wo1cAzy3b4U2j
2s+MDqaPMtZtGoqxTsyQOVoVxazT5Kf1zglK+iUEzik44J7LGdro+ty2Z7Ut2c11
q9rE/GNS78czBm7c4lxgkxXW4N95K/tEGlLtDQ7uct//3U/ZimF+mO6GcbVFlOWT
4iEbOz2sqvBVv22nLJRufiPgFNIW4hizAz5KBWxwGFCCKvT3N6yYNKKjzEpCw+jE
lpjIRODU02yIZZZY841fLRtyrk7p4zORS8jRaHTdEJgb5bGc/YazxxVL8nzRQT1W
VxFwAMd+UNrDkV24hpN++Ln2O+b3kwcGZ7uA/qu9d5WvSYUKXlHqcMJ35q6zuhAI
/ecfYO7EZfVP86VjIt5IH04iV8RChA9Q6de+kQEFa6wHUxufeCOwCFqukGo8zj07
plX8NcjnzYmSXKnYjHOHao4rKT+DiJhRB60rFiMeKP/qvKbZPjtgsIeonhHm53qZ
/QwkhowahHKkpAnetIl0QHm8KS4YudAofMi/Fl+he4gRkEbSQVAo6iQb2L4cjcLC
LNSDDsIVWGem4gCR+vcsFqB3lggRDfltHXm15JKh92UMpOr6RI6s8pD55T7EdnPC
CfdxWB5kYM6/lLbOHj94
=48wH
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'ehabkost/tags/x86-and-machine-pull-request' into staging
x86 and machine queue, 2017-05-11
Highlights:
* New "-numa cpu" option
* NUMA distance configuration
* migration/i386 vmstatification
# gpg: Signature made Thu 11 May 2017 08:16:07 PM BST
# gpg: using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# gpg: Note: This key has expired!
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* ehabkost/tags/x86-and-machine-pull-request: (29 commits)
migration/i386: Remove support for pre-0.12 formats
vmstatification: i386 FPReg
migration/i386: Remove old non-softfloat 64bit FP support
tests: check -numa node,cpu=props_list usecase
numa: add '-numa cpu,...' option for property based node mapping
numa: remove node_cpu bitmaps as they are no longer used
numa: use possible_cpus for not mapped CPUs check
machine: call machine init from wrapper
numa: remove no longer need numa_post_machine_init()
tests: numa: add case for QMP command query-cpus
QMP: include CpuInstanceProperties into query_cpus output output
virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
numa: do default mapping based on possible_cpus instead of node_cpu bitmaps
numa: mirror cpu to node mapping in MachineState::possible_cpus
numa: add check that board supports cpu_index to node mapping
virt-arm: add node-id property to CPU
pc: add node-id property to CPU
spapr: add node-id property to sPAPR core
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This pull request supersedes the one from yesterday (20170510), fixing
an important style bug in one patch, and adding an extra couple of
simple patches.
Highlights of this set:
* Some fixes for POWER9
* TCG support for POWER9 radix MMU
* VGA rom for Mac machine types
* Fixes for the XICS interrupt controller
* MTTCG support for ppc targets
As suggested by Paolo, I've tried to add the Docker tests to my
standard pre-pull-request tests. I haven't wholly suceeded; this has
been tested with some of the Docker images, but others I haven't
managed due to problems that as best I can tell are not due to
problems in this patch series. I'll continue working on this for
future pull requests. Specifically, 'travis', 'fedora', and 'centos6'
seem to work. 'min-glib' jammed while gtesting moxie, which seems
very unlikely to be caused by this series. 'ubuntu', 'debian' and
'debian-bootstrap' hit build errors almost immediately that look like
problems with the container configuration, and 'debian-*-cross' hit
build errors later on which also look like missing dependencies from
the container.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZE+T6AAoJEGw4ysog2bOSgSYQAMszDZ+HCYlp6iVlJqDoy55S
u8krYwkS9MnzrmbMjPVzGiFmH6IEuOd3zAx0aM1wZtcXjprUKsr6jHZOEtk7Frv5
vzIwzcP85vkVegPX+fNUAo+1+T3OHix9RAI3BF5rHdyCC2OmJriPyvyOQ76uVORJ
9ouSAeG/dCyjkVRYAlTQPidGqc/OQUaMFwZdLvhTJHeDqcdlqziCzP1YnDjN78UQ
BRpsYOYFnGSzaqjNj16edF/yM4NiW/4tLd700mvGkvPUHrFEiyQur0Lm0bc1iZs5
JZwcgAxhivI6CiWt57y/OpC6pWsasVhlBY00aWBcEExAh6j+Kp20g0C6MYB4JdwX
jVJUOzWGWuMFkS65S/nHmdngUWvrSpn1xzPr0KQihLRFpoYK3btaS2TcekQocnZc
mF3NFvKXeS/F6ZYLDWkLF/9VVEjz2mJNRvimhMWljuFyLmxlQxQSvzNXZ7Lt/maj
D4nFaOWf5eG0O0Em54hLizM6r6vnt2qkLVSjPmOFO2gQvDsu10G/5ociqkYNRYvz
srJUfo2xMjzaM0lvJTJT3VOWfbX1Wq4A8zyjLuoi1xpqI1Yb95zEycFvc0Eszzh1
OByIuHLNynu+W2w08dAA8vU1tlh+Yf0yld2LOgY3Cn2gjgHQRFtmyrIsPgYsptC1
63Y0PbTnGdbaEdlAJu8I
=D0Hk
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'dgibson/tags/ppc-for-2.10-20170511' into staging
ppc patch queue for 2017-05-11
This pull request supersedes the one from yesterday (20170510), fixing
an important style bug in one patch, and adding an extra couple of
simple patches.
Highlights of this set:
* Some fixes for POWER9
* TCG support for POWER9 radix MMU
* VGA rom for Mac machine types
* Fixes for the XICS interrupt controller
* MTTCG support for ppc targets
As suggested by Paolo, I've tried to add the Docker tests to my
standard pre-pull-request tests. I haven't wholly suceeded; this has
been tested with some of the Docker images, but others I haven't
managed due to problems that as best I can tell are not due to
problems in this patch series. I'll continue working on this for
future pull requests. Specifically, 'travis', 'fedora', and 'centos6'
seem to work. 'min-glib' jammed while gtesting moxie, which seems
very unlikely to be caused by this series. 'ubuntu', 'debian' and
'debian-bootstrap' hit build errors almost immediately that look like
problems with the container configuration, and 'debian-*-cross' hit
build errors later on which also look like missing dependencies from
the container.
# gpg: Signature made Thu 11 May 2017 05:13:46 AM BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* dgibson/tags/ppc-for-2.10-20170511: (23 commits)
target/ppc: Avoid printing wrong aliases in CPU help text
pnv: Fix build failures on some host platforms
target/ppc: Allow workarounds for POWER9 DD1
spapr: Don't accidentally advertise HTM support on POWER9
ppc: xics: fix compilation with CentOS 6
target/ppc: Enable RADIX mmu mode for pseries TCG guest
target/ppc: Implement ISA V3.00 radix page fault handler
target/ppc: Change tlbie invalid fields for POWER9 support
target/ppc: Update tlbie to check privilege level based on GTSE
target/ppc: Set UPRT and GTSE on all cpus in H_REGISTER_PROCESS_TABLE
ppc: add qemu_vga.ndrv ROM to fw_cfg interface for NewWorld Macs
ppc: add qemu_vga.ndrv ROM to fw_cfg interface for OldWorld Macs
Add QemuMacDrivers qemu_vga.ndrv revision d4e7d7a built as submodule
Add QemuMacDrivers as submodule
ppc/xics: preserve P and Q bits for KVM IRQs
ppc/xics: Fix stale irq->status bits after get
target/ppc: do not reset reserve_addr in exec_enter
tcg: enable MTTCG by default for PPC64 on x86
cpus: Fix CPU unplug for MTTCG
target/ppc: Generate fence operations
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
it's safe to remove thread node_id != core node_id error
branch as machine_set_cpu_numa_node() also does mismatch
check and is called even before any CPU is created.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1494415802-227633-10-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
it will allow switching from cpu_index to core based numa
mapping in follow up patches.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1494415802-227633-3-git-send-email-imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Originally CPU threads were by default assigned in
round-robin fashion. However it was causing issues in
guest since CPU threads from the same socket/core could
be placed on different NUMA nodes.
Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
fixed it by grouping threads within a socket on the same node
introducing cpu_index_to_socket_id() callback and commit
20bb648d (spapr: Fix default NUMA node allocation for threads)
reused callback to fix similar issues for SPAPR machine
even though socket doesn't make much sense there.
As result QEMU ended up having 3 default distribution rules
used by 3 targets /virt-arm, spapr, pc/.
In effort of moving NUMA mapping for CPUs into possible_cpus,
generalize default mapping in numa.c by making boards decide
on default mapping and let them explicitly tell generic
numa code to which node a CPU thread belongs to by replacing
cpu_index_to_socket_id() with @cpu_index_to_instance_props()
which provides default node_id assigned by board to specified
cpu_index.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1494415802-227633-2-git-send-email-imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
When there are more nodes than available memory to put the minimum
allowed memory by node, all the memory is put on the last node.
This is because we put (ram_size / nb_numa_nodes) &
~((1 << mc->numa_mem_align_shift) - 1); on each node, and in this
case the value is 0. This is particularly true with pseries,
as the memory must be aligned to 256MB.
To avoid this problem, this patch uses an error diffusion algorithm [1]
to distribute equally the memory on nodes.
We introduce numa_auto_assign_ram() function in MachineClass
to keep compatibility between machine type versions.
The legacy function is used with pseries-2.9, pc-q35-2.9 and
pc-i440fx-2.9 (and previous), the new one with all others.
Example:
qemu-system-ppc64 -S -nographic -nodefaults -monitor stdio -m 1G -smp 8 \
-numa node -numa node -numa node \
-numa node -numa node -numa node
Before:
(qemu) info numa
6 nodes
node 0 cpus: 0 6
node 0 size: 0 MB
node 1 cpus: 1 7
node 1 size: 0 MB
node 2 cpus: 2
node 2 size: 0 MB
node 3 cpus: 3
node 3 size: 0 MB
node 4 cpus: 4
node 4 size: 0 MB
node 5 cpus: 5
node 5 size: 1024 MB
After:
(qemu) info numa
6 nodes
node 0 cpus: 0 6
node 0 size: 0 MB
node 1 cpus: 1 7
node 1 size: 256 MB
node 2 cpus: 2
node 2 size: 0 MB
node 3 cpus: 3
node 3 size: 256 MB
node 4 cpus: 4
node 4 size: 256 MB
node 5 cpus: 5
node 5 size: 256 MB
[1] https://en.wikipedia.org/wiki/Error_diffusion
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20170502162955.1610-2-lvivier@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
[ehabkost: s/ram_size/size/ at numa_default_auto_assign_ram()]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Logic in spapr_populate_pa_features() enables the bit advertising
Hardware Transactional Memory (HTM) in the guest's device tree only when
KVM advertises its availability with the KVM_CAP_PPC_HTM feature.
However, this assumes that the HTM bit is off in the base template used for
the device tree value. That is true for POWER8, but not for POWER9.
It looks like that was accidentally changed in 9fb4541 "spapr: Enable ISA
3.0 MMU mode selection via CAS".
Fixes: 9fb4541f58
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Now that we have added all the infrastructure we can enable a pseries TCG
guest to use radix.
In order to do this we have to add the appropriate bits to the
ibm,arch-vec-5-platform-support vector to represent that we support both
hash and radix mmu models.
A radix guest can now be booted in pseries tcg mode by specifying:
-cpu POWER9
Note that we assume hash, that is we allocate a hpt, until a guest tells
us otherwise via a H_REGISTER_PROCESS_TABLE call with radix specified - in
which case we free the hpt. If we were right and the guest is hash then
there's nothing for us to do.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The UPRT and GTSE bits are set when a guest calls H_REGISTER_PROCESS_TABLE
to choose determine how address translation is performed. Currently these
bits in the LPCR are only set for the cpu which handles the H_CALL, however
they need to be set for all cpus for that guest as address translation
cannot be performed differently on a per cpu basis.
Update the H_CALL handler to set these bits in the LPCR correctly for all
cpus of the guest.
Note it is the reponsibility of the guest to ensure that any secondary cpus
are suspended when the H_CALL is made and thus we can safely update these
values here.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, when a PowerNV guest runs, it uses the sensor definitions of
the BMC simulator to populate the device tree. But an external IPMI
BMC could also be used and, in that case, it is not (yet) possible to
retrieve the sensor list. Generating the OEM SEL event for shutdown or
reboot also does not make sense as it should be generated on the BMC
side.
This change allows a guest to use an 'ipmi-bmc-extern' backend to the
'isa-ipmi-bt' device and a 'chardev' for transport such as :
-chardev socket,id=ipmi0,host=localhost,port=9002,reconnect=10 \
-device ipmi-bmc-extern,id=bmc0,chardev=ipmi0 \
-device isa-ipmi-bt,bmc=bmc0,irq=10
and connect to a BMC simulator, the OpenIPMI ipmi_sim simulator for
instance.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The tb_env variable is set two lines above. So just drop the double assignment.
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This patch removes redundant "qemu:" from error functions. The link to the bitesized task is:
http://wiki.qemu-project.org/Contribute/BiteSizedTasks#Error_checking
Signed-off-by: Ishani Chugh <chugh.ishani@research.iiit.ac.in>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Recent commits that re-organized ICPState object missed to destroy
the object when CPU is unrealized. Fix this so that CPU unplug
doesn't abort QEMU.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
OpenPOWER systems expect to be notified with such an event before a
shutdown or a reboot. An OEM SEL message is sent with specific
identifiers and a user data containing the request : OFF or REBOOT.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Skiboot, the firmware for the PowerNV platform, expects the BMC to
provide some specific IPMI sensors. These sensors are exposed in the
device tree and their values are updated by the firmware at boot time.
Sensors of interest are :
"FW Boot Progress"
"Boot Count"
As such a device is defined on the command line, we can only detect
its presence at reset time.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When an ipmi-bt device [1] is defined on the ISA bus, we need to
populate the device tree with the object properties. Such devices are
created with the command line options :
-device ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10
[1] https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg03168.html
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The code could be common to any ISA device but we are missing the IO
length.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is an empty shell that we will use to include nodes in the device
tree for ISA devices. We expect RTC, UART and IPMI BT devices.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The default LPC bus of a multichip system is on chip 0. It's
recognized by the firmware (skiboot) using a "primary" property in the
device tree.
We introduce a pnv_chip_lpc_offset() routine to locate the LPC node of
a chip and set the property directly from the machine level.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It adds the Naples chip which supports proper LPC interrupts via the
LPC controller rather than via an external CPLD.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - updated for qemu-2.9
- ported on latest PowerNV patchset
- moved the IRQ handler in pnv_lpc.c
- introduced pnv_lpc_isa_irq_create() to create the ISA IRQs ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
xics_system_init() does not need 'nr_servers' anymore as it is only
used to define the 'interrupt-controller' node in the device tree. So
let's just compute the value when calling spapr_dt_xics().
This also gives us an opportunity to simplify the xics_system_init()
routine and introduce a specific spapr_ics_create() helper to create
the sPAPR ICS object.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The OCC is an on-chip microcontroller based on a ppc405 core used
for various power management tasks. It comes with a pile of additional
hardware sitting on the PIB (aka XSCOM bus). At this point we don't
emulate it (nor plan to do so). However there is one facility which
is provided by the surrounding hardware that we do need, which is the
interrupt generation facility. OPAL uses it to send itself interrupts
under some circumstances and there are other uses around the corner.
So this implement just enough to support this.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - updated for qemu-2.9
- changed the XSCOM interface to fit new model
- QOMified the model ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The Processor Service Interface (PSI) Controller is one of the engines
of the "Bridge" unit which connects the different interfaces to the
Power Processor.
This adds just enough of the PSI bridge to handle various on-chip and
the one external interrupt. The rest of PSI has to do with the link to
the IBM FSP service processor which we don't plan to emulate (not used
on OpenPower machines).
The ics_get() and ics_resend() handlers of the XICSFabric interface of
the PowerNV machine are now defined to handle the Interrupt Control
Source of PSI. The InterruptStatsProvider interface is also modified
to dump the new ICS.
Originally from Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This provides to a PowerNV chip (POWER8) access to the Interrupt
Management area, which contains the registers of the Interrupt Control
Presenters of each thread. These are used to accept, return, forward
interrupts in the system.
This area is modeled with a per-chip container memory region holding
all the ICP registers. Each thread of a chip is then associated with
its ICP registers using a memory subregion indexed by its PIR number
in the overall region.
The device tree is populated accordingly.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Each thread of a core is linked to an ICP. This allocates a PnvICPState
object before the PowerPCCPU object is realized and lets the XICSFabric
do the store under the 'intc' backlink when xics_cpu_setup() is
called.
This modeling removes the need of maintaining an array of ICP objects
under the PowerNV machine and also simplifies the XICSFabric icp_get()
handler.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A XICSFabric QOM interface is used by the XICS layer to manipulate the
ICP and ICS objects. Let's define the associated handlers for the
PowerNV machine. All handlers should be defined even if there is no
ICS under the PowerNV machine yet.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, all the ICPs are created before the CPUs, stored in an array
under the sPAPR machine and linked to the CPU when the core threads
are realized. This modeling brings some complexity when a lookup in
the array is required and it can be simplified by allocating the ICPs
when the CPUs are.
This is the purpose of this proposal which introduces a new 'icp_type'
field under the machine and creates the ICP objects of the right type
(KVM or not) before the PowerPCCPU object are.
This change allows more cleanups : the removal of the icps array under
the sPAPR machine and the removal of the xics_get_cpu_index_by_dt_id()
helper.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is the second step to abstract the IRQ 'server' number of the
XICS layer. Now that the prereq cleanups have been done in the
previous patch, we can move down the 'cpu_dt_id' to 'cpu_index'
mapping in the sPAPR machine handler.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, the ICPState array of the sPAPR machine is indexed with
'cpu_index' of the CPUState. This numbering of CPUs is internal to
QEMU and the guest only knows about what is exposed in the device
tree, that is the 'cpu_dt_id'. This is why sPAPR uses the helper
xics_get_cpu_index_by_dt_id() to do the mapping in a couple of places.
To provide a more generic XICS layer, we need to abstract the IRQ
'server' number and remove any assumption made on its nature. It
should not be used as a 'cpu_index' for lookups like xics_cpu_setup()
and xics_cpu_destroy() do.
To reach that goal, we choose to introduce a generic 'intc' backlink
under PowerPCCPU, and let the machine core init routine do the
ICPState lookup. The resulting object is passed on to xics_cpu_setup()
which does the store under PowerPCCPU. The IRQ 'server' number in XICS
is now generic. sPAPR uses 'cpu_dt_id' and PowerNV will use 'PIR'
number.
This also has the benefit of simplifying the sPAPR hcall routines
which do not need to do any ICPState lookups anymore.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If a page size used by QEMU is not enabled in the PHB IOMMU page mask,
in-kernel acceleration of TCE handling won't be enabled and performance
might be slower than expected.
This prints a warning if system page size is not enabled. This should
print a warning if huge pages are enabled but sphb.pgsz still uses
the default value of 4K|64K.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This enables in-kernel handling of H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls. The host kernel support is there since v4.6,
in particular d3695aa4f452
("KVM: PPC: Add support for multiple-TCE hcalls").
H_PUT_TCE is already accelerated and does not need any special enablement.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For a little while around 4.9, Linux kernels that saw the radix bit in
ibm,pa-features would attempt to set up the MMU as if they were a
hypervisor, even if they were a guest, which would cause them to
crash.
Work around this by detecting pre-ISA 3.0 guests by their lack of that
bit in option vector 1, and then removing the radix bit from
ibm,pa-features. Note: This now requires regeneration of that node
after CAS negotiation.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add the new node, /chosen/ibm,arch-vec-5-platform-support to the
device tree. This allows the guest to determine which modes are
supported by the hypervisor.
Update the option vector processing in h_client_architecture_support()
to handle the new MMU bits. This allows guests to request hash or
radix mode and QEMU to create the guest's HPT at this time if it is
necessary but hasn't yet been done. QEMU will terminate the guest if
it requests an unavailable mode, as required by the architecture.
Extend the ibm,pa-features node with the new ISA 3.0 values
and set the radix bit if KVM supports radix mode. This probably won't
be used directly by guests to determine the availability of radix mode
(that is indicated by the new node added above) but the architecture
requires that it be set when the hardware supports it.
If QEMU is using KVM, and KVM is capable of running in radix mode,
guests can be run in real-mode without allocating a HPT (because KVM
will use a minimal RPT). So in this case, we avoid creating the HPT
at reset time and later (during CAS) create it if it is necessary.
ISA 3.0 guests will now begin to call h_register_process_table(),
which has been added previously.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Strip some unneeded prefix from error messages]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In the next patch, spapr_fixup_cpu_dt() will need to call
spapr_populate_pa_features() so move it's definition up without making
any other changes.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The H_REGISTER_PROCESS_TABLE H_CALL is used by a guest to indicate to the
hypervisor where in memory its process table is and how translation should
be performed using this process table.
Provide the implementation of this H_CALL for a guest.
We first check for invalid flags, then parse the flags to determine the
operation, and then check the other parameters for valid values based on
the operation (register new table/deregister table/maintain registration).
The process table is then stored in the appropriate location and registered
with the hypervisor (if running under KVM), and the LPCR_[UPRT/GTSE] bits
are updated as required.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Correct missing prototype and uninitialized variable]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The use of the new in memory tables introduced in ISAv3.00 for translation,
also referred to as process tables, requires the introduction of 3 new
H-CALLs; H_REGISTER_PROCESS_TABLE, H_CLEAN_SLB, and H_INVALIDATE_PID.
Add shells for each of these and register them as the hypercall handlers.
Currently they all log an unimplemented hypercall and return H_FUNCTION.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Use the new ioctl, KVM_PPC_GET_RMMU_INFO, to fetch radix MMU
information from KVM and present the page encodings in the device tree
under ibm,processor-radix-AP-encodings. This provides page size
information to the guest which is necessary for it to use radix mode.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Compile fix for 32-bit targets, style nit fix]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
KVM_CAP_SPAPR_TCE capability allows creating TCE tables in KVM which
allows having in-kernel acceleration for H_PUT_TCE_xxx hypercalls.
However it only supports 32bit DMA windows at zero bus offset.
There is a new KVM_CAP_SPAPR_TCE_64 capability which supports 64bit
window size, variable page size and bus offset.
This makes use of the new capability. The kernel headers are already
updated as the kernel support went in to v4.6.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The devices that are derived from TYPE_PNV_CHIP currently show up
as "uncategorized" devices in the help text of "-device ?". Since
they obviously are related to the CPU, let's put them into the
CPU category instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Also use an 'sPAPRRTCState' attribute under the sPAPR machine to hold
the RTC object. Overall, these changes remove an unnecessary and
implicit dependency on SysBus.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For reasons that may be useful in future, CPU core objects, as used on the
pseries machine type have their own nr-threads property, potentially
allowing cores with different numbers of threads in the same system.
If the user/management uses the values specified in query-hotpluggable-cpus
as they're expected to do, this will never matter in pratice. But that's
not actually enforced - it's possible to manually specify a core with
a different number of threads from that in -smp. That will confuse the
platform - most immediately, this can be used to create a CPU thread with
index above max_cpus which leads to an assertion failure in
spapr_cpu_core_realize().
For now, enforce that all cores must have the same, standard, number of
threads.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
If, once the kernel has booted, we try to remove a memory
hotplugged while the kernel was not started, QEMU crashes on
an assert:
qemu-system-ppc64: hw/virtio/vhost.c:651:
vhost_commit: Assertion `r >= 0' failed.
...
#4 in vhost_commit
#5 in memory_region_transaction_commit
#6 in pc_dimm_memory_unplug
#7 in spapr_memory_unplug
#8 spapr_machine_device_unplug
#9 in hotplug_handler_unplug
#10 in spapr_lmb_release
#11 in detach
#12 in set_allocation_state
#13 in rtas_set_indicator
...
If we take a closer look to the guest kernel log, we can see when
we try to unplug the memory:
pseries-hotplug-mem: Attempting to hot-add 4 LMB(s)
What happens:
1- The kernel has ignored the memory hotplug event because
it was not started when it was generated.
2- When we hot-unplug the memory,
QEMU starts to remove the memory,
generates an hot-unplug event,
and signals the kernel of the incoming new event
3- as the kernel is started, on the QEMU signal, it reads
the event list, decodes the hotplug event and tries to
finish the hotplugging.
4- QEMU receive the the hotplug notification while it
is trying to hot-unplug the memory. This moves the memory
DRC to an invalid state
This patch prevents this by not allowing to set the allocation
state to USABLE while the DRC is awaiting release.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Running postcopy-test with ASAN produces the following error:
QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 tests/postcopy-test
...
=================================================================
==23641==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f1556600000 at pc 0x55b8e9d28208 bp 0x7f1555f4d3c0 sp 0x7f1555f4d3b0
READ of size 8 at 0x7f1556600000 thread T6
#0 0x55b8e9d28207 in htab_save_first_pass /home/elmarco/src/qq/hw/ppc/spapr.c:1528
#1 0x55b8e9d2939c in htab_save_iterate /home/elmarco/src/qq/hw/ppc/spapr.c:1665
#2 0x55b8e9beae3a in qemu_savevm_state_iterate /home/elmarco/src/qq/migration/savevm.c:1044
#3 0x55b8ea677733 in migration_thread /home/elmarco/src/qq/migration/migration.c:1976
#4 0x7f15845f46c9 in start_thread (/lib64/libpthread.so.0+0x76c9)
#5 0x7f157d9d0f7e in clone (/lib64/libc.so.6+0x107f7e)
0x7f1556600000 is located 0 bytes to the right of 2097152-byte region [0x7f1556400000,0x7f1556600000)
allocated by thread T0 here:
#0 0x7f159bb76980 in posix_memalign (/lib64/libasan.so.3+0xc7980)
#1 0x55b8eab185b2 in qemu_try_memalign /home/elmarco/src/qq/util/oslib-posix.c:106
#2 0x55b8eab186c8 in qemu_memalign /home/elmarco/src/qq/util/oslib-posix.c:122
#3 0x55b8e9d268a8 in spapr_reallocate_hpt /home/elmarco/src/qq/hw/ppc/spapr.c:1214
#4 0x55b8e9d26e04 in ppc_spapr_reset /home/elmarco/src/qq/hw/ppc/spapr.c:1261
#5 0x55b8ea12e913 in qemu_system_reset /home/elmarco/src/qq/vl.c:1697
#6 0x55b8ea13fa40 in main /home/elmarco/src/qq/vl.c:4679
#7 0x7f157d8e9400 in __libc_start_main (/lib64/libc.so.6+0x20400)
Thread T6 created by T0 here:
#0 0x7f159bae0488 in __interceptor_pthread_create (/lib64/libasan.so.3+0x31488)
#1 0x55b8eab1d9cb in qemu_thread_create /home/elmarco/src/qq/util/qemu-thread-posix.c:465
#2 0x55b8ea67874c in migrate_fd_connect /home/elmarco/src/qq/migration/migration.c:2096
#3 0x55b8ea66cbb0 in migration_channel_connect /home/elmarco/src/qq/migration/migration.c:500
#4 0x55b8ea678f38 in socket_outgoing_migration /home/elmarco/src/qq/migration/socket.c:87
#5 0x55b8eaa5a03a in qio_task_complete /home/elmarco/src/qq/io/task.c:142
#6 0x55b8eaa599cc in gio_task_thread_result /home/elmarco/src/qq/io/task.c:88
#7 0x7f15823e38e6 (/lib64/libglib-2.0.so.0+0x468e6)
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/elmarco/src/qq/hw/ppc/spapr.c:1528 in htab_save_first_pass
index seems to be wrongly incremented, unless I miss something that
would be worth a comment.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since commit 224245b ("spapr: Add LMB DR connectors"), NUMA node
memory size must be aligned to 256MB (SPAPR_MEMORY_BLOCK_SIZE).
But when "-numa" option is provided without "mem" parameter,
the memory is equally divided between nodes, but 8MB aligned.
This can be not valid for pseries.
In that case we can have:
$ ./ppc64-softmmu/qemu-system-ppc64 -m 4G -numa node -numa node -numa node
qemu-system-ppc64: Node 0 memory size 0x55000000 is not aligned to 256 MiB
With this patch, we have:
(qemu) info numa
3 nodes
node 0 cpus: 0
node 0 size: 1280 MB
node 1 cpus:
node 1 size: 1280 MB
node 2 cpus:
node 2 size: 1536 MB
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This dependency is the wrong way, and we will need util/qemu-timer.h from
sysemu/cpus.h in the next patch.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
bb9986452 "spapr_pci: Advertise access to PCIe extended config space"
allowed guests to access the extended config space of PCI Express devices
via the PAPR interfaces, even though the paravirtualized bus mostly acts
like plain PCI.
However, that patch enabled access unconditionally, including for existing
machine types, which is an unwise change in behaviour. This patch limits
the change to pseries-2.9 (and later) machine types.
Suggested-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Looks like my previous batch wasn't quite the last before hard freeze.
This has a handful of bugfixes to go in. They're all genuine
bugfixes, though not regressions in some cases.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYvOCUAAoJEGw4ysog2bOSQWgQAKzPeIqz8I/1eXL+zmZCUaiU
J2gyjzfaKkQ/AVGPtT45ZjJsihxSFbZT6koxXtEaxwq5DD87yXQOqA/d+BH7jr5d
75FGjVzKOA0IKQymySztwoC2j/ftWmmSx0N6YUmL0QcXCISS1YHRvdQkdXf6j4I/
XtK1FA34wmCsTK1AgZ9WDxjABdkHP+7FDRBpVmr01Nv1TeK2Xms2MqJ5Wku/lOX/
6bg1KbC8pVHy5YZhIpRFzgGxaMr2UcJ0Q3YR9fD/4UW/k518sJk+i2xlagVsFxyG
gqfPolv0wjwuGpYt42UyFG4IouCbKN+MChU5MBIaqU10VouOw+0/W+p+1ZOHgdB8
GoaBGyfuJ6/i4EQL0/+FL4hPOI5vHLliWxPfMJxDL5ujP0cFaPm2XbK5Yqxksu3m
uYp3yYIbiSaF8QUxbBjAAoKPdVpP5dsgHjAlxecwCUGlIo0Ur3uphnU5lPoNlvS4
5ZcDDlMGjPb0oIHfdPt2ai8g+32uAsD7X7pi+qI0x+srSnjisRpOT2wKv0otMbGx
U4j01/Na2DjFjhGW+vNm9UYsE/QgKr6pU9z3jUXOIplX1HBXirtfv5C/OypCN7Zj
LgqsmiMWMJFjSLk8N8cxeM1w839B3wEM+2+46su7/qpW9sd0jKvHk0cJDyZPzn29
zQ52CbQQiewXM8y+mffe
=/RZL
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.9-20170306' into staging
ppc patch queue for 2017-03-06
Looks like my previous batch wasn't quite the last before hard freeze.
This has a handful of bugfixes to go in. They're all genuine
bugfixes, though not regressions in some cases.
# gpg: Signature made Mon 06 Mar 2017 04:07:48 GMT
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.9-20170306:
target/ppc: use helper for excp handling
target/ppc: fmadd: add macro for updating flags
target/ppc: fmadd check for excp independently
spapr: ensure that all threads within core are on the same NUMA node
ppc/xics: register reset handlers for the ICP and ICS objects
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Threads within a core shouldn't be on different
NUMA nodes, so if user has misconfgured command
line, fail QEMU at start up to force user fix it.
For now use the first thread on the core as source
of core's node-id. Later when cpu-numa refactoring
lands it will be switched to core's node-id from
possible_cpus[].
This prevents the same problems as commit 20bb648d
"spapr: Fix default NUMA node allocation for threads",
but for the case of manually configured NUMA node
mappings, instead of just the default case.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The recent changes on the XICS layer removed the XICSState object to
let the sPAPR machine handle the ICP and ICS directly. The reset of
these objects was previously handled by XICSState, which was a SysBus
device, and to keep the same behavior, the ICP and ICS were assigned
to SysbBus.
But that broke the 'info qtree' command in the monitor. 'qtree'
performs a loop on the children of a bus to print their properties and
SysBus devices are expected to be found under SysBus, which is not the
case anymore.
The fix for this problem is to register reset handlers for the ICP and
ICS objects and stop using SysBus for such devices.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Fix the design flaw demonstrated in the previous commit: new method
check_list() lets input visitors report that unvisited input remains
for a list, exactly like check_struct() lets them report that
unvisited input remains for a struct or union.
Implement the method for the qobject input visitor (straightforward),
and the string input visitor (less so, due to the magic list syntax
there). The opts visitor's list magic is even more impenetrable, and
all I can do there today is a stub with a FIXME comment. No worse
than before.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1488544368-30622-26-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The PPC MMU types are sometimes treated as if they were a bit field
and sometime as if they were an enum which causes maintenance
problems: flipping bits in the MMU type (which is done on both the 1TB
segment and 64K segment bits) currently produces new MMU type
values that are not handled in every "switch" on it, sometimes causing
an abort().
This patch provides some macros that can be used to filter out the
"bit field-like" bits so that the remainder of the value can be
switched on, like an enum. This allows removal of all of the
"degraded" types from the list and should ease maintenance.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The (paravirtual) PCI host bridge on the 'pseries' machine in most
regards acts like a regular PCI bus, rather than a PCIe bus. Despite
this, though, it does allow access to the PCIe extended config space.
We already implemented the RTAS methods to allow this access.. but
forgot to put the markers into the device tree so that guest's know it
is there. This adds them in.
With this, a pseries guest is able to view extended config space on
(for example an e1000e device. This should be enough to allow guests
to use at least some PCIe devices.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add POWER9 cpu to list of spapr core models which allows it to be specified
as the cpu model for a pseries guest (e.g. -machine pseries -cpu POWER9).
This now allows a POWER9 cpu to boot to userspace in tcg emulation for a
pseries machine with a legacy kernel.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add a pa-features definition which includes all of the new fields which
have been added, note we don't claim support for any of these new features
at this stage.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ISA v3.00 adds the idea of a partition table which is used to store the
address translation details for all partitions on the system. The partition
table consists of double word entries indexed by partition id where the second
double word contains the location of the process table in guest memory. The
process table is registered by the guest via a h-call.
We need somewhere to store the address of the process table so we add an entry
to the sPAPRMachineState struct called patb_entry to represent the second
doubleword of a single partition table entry corresponding to the current
guest. We need to store this value so we know if the guest is using radix or
hash translation and the location of the corresponding process table in guest
memory. Since we only have a single guest per qemu instance, we only need one
entry.
Since the partition table is technically a hypervisor resource we require that
access to it is abstracted by the virtual hypervisor through the get_patbe()
call. Currently the value of the entry is never set (and thus
defaults to 0 indicating hash), but it will be required to both implement
POWER9 kvm support and tcg radix support.
We also add this field to be migrated as part of the sPAPRMachineState as we
will need it on the receiving side as the guest will never tell us this
information again and we need it to perform translation.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It provides a better monitor output of the ICP and ICS objects, else
the objects are printed out of order.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The ICS object uses a post_load() handler which is implicitly relying
on the fact that the internal state of the ICS and ICP objects has
been restored but this is not guaranteed. So, let's move the code
under the post_load() handler of the machine where we know the objects
have been fully restored.
The icp_resend() handler of the XICSFabric QOM interface is also
removed as it is now obsolete.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The XICSState classes are not used anymore. They have now been fully
deprecated by the XICSFabric QOM interface. Do the cleanups.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There is nothing left related to the XICS object in the realize
functions of the KVMXICSState and XICSState class. So adapt the
interfaces to call these routines directly from the sPAPR machine init
sequence.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is the last step to remove the XICSState abstraction and have the
machine hold all the objects related to interrupts : ICSs and ICPs.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The reset of the ICP objects is currently handled by XICS but this can
be done for each individual ICP.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_dt_xics() only needs the number of servers to build the device
tree nodes. Let's change the routine interface to reflect that.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Also introduce a xics_icp_get() helper to simplify the changes.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Let's add two new handlers for ICPs. One is to get an ICP object from
a server number and a second is to resend the irqs when needed.
The icp_resend() handler is a temporary workaround needed by the
ics-simple post_load() handler. It will be removed when the post_load
portion can be done at the machine level.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is not used anymore.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The reset of the ICS objects is currently handled by XICS but this can
be done for each individual ICS. This also reduces the use of the XICS
list of ICS.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Also change the ICPState 'xics' backlink to be a XICSFabric, this
removes the need of using qdev_get_machine() to get the QOM interface
in some of the routines.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add 'ics_get' and 'ics_resend' handlers to the sPAPR machine. These
are relatively simple for a single ICS.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A list of ICS objects was introduced under the XICS object for the
PowerNV machine but, for the sPAPR machine, it brings extra complexity
as there is only a single ICS. To simplify the code, let's add the ICS
pointer under the sPAPR machine and try to reduce the use of this list
where possible.
Also, change the xics_spapr_*() routines to use an ICS object instead
of an XICSState and change their name to reflect that these are
specific to the sPAPR ICS object.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, the ICP (Interrupt Controller Presenter) objects are created by
the 'nr_servers' property handler of the XICS object and a class
handler. They are realized in the XICS object realize routine.
Let's simplify the process by creating the ICP objects along with the
XICS object at the machine level.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, the ICS (Interrupt Controller Source) object is created and
realized by the init and realize routines of the XICS object, but some
of the parameters are only known at the machine level.
These parameters are passed from the sPAPR machine to the ICS object
in a rather convoluted way using property handlers and a class handler
of the XICS object. The number of irqs required to allocate the IRQ
state objects in the ICS realize routine is one of them.
Let's simplify the process by creating the ICS object along with the
XICS object at the machine level and link the ICS into the XICS list
of ICSs at this level also. In the sPAPR machine, there is only a
single ICS but that will change with the PowerNV machine.
Also, QOMify the creation of the objects and get rid of the
superfluous code.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently xics - the component of the IBM POWER interrupt controller
representing the overall interrupt fabric / architecture is
represented as a descendent of SysBusDevice. However, this is not
really correct - the xics presents nothing in MMIO space so it should
be an "unattached" device in the current QOM model.
Since this device will always be created by the machine type, not created
specifically from the command line, and because it has no migrated state
it should be safe to move it around the device composition tree.
Therefore this patch changes it to a descendent of TYPE_DEVICE, and
makes it an unattached device. So that its reset handler still gets
called correctly, we add a qdev_set_parent_bus() to attach it to
sysbus. It's not really clear that's correct (instead of using
register_reset()) but it appears to a common technique.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[clg corrected problems with reset]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg folded together and updated commit message]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since commit 1d2d974244 "spapr_pci: enumerate and add PCI device tree", QEMU
populates the PCI device tree in the opposite order compared to SLOF.
Before 1d2d974244c6:
Populating /pci@800000020000000
00 0000 (D) : 1af4 1000 virtio [ net ]
00 0800 (D) : 1af4 1001 virtio [ block ]
00 1000 (D) : 1af4 1009 virtio [ network ]
Populating /pci@800000020000000/unknown-legacy-device@2
7e5294b8 : /pci@800000020000000
7e52b998 : |-- ethernet@0
7e52c0c8 : |-- scsi@1
7e52c7e8 : +-- unknown-legacy-device@2 ok
Since 1d2d974244c6:
Populating /pci@800000020000000
00 1000 (D) : 1af4 1009 virtio [ network ]
Populating /pci@800000020000000/unknown-legacy-device@2
00 0800 (D) : 1af4 1001 virtio [ block ]
00 0000 (D) : 1af4 1000 virtio [ net ]
7e5e8118 : /pci@800000020000000
7e5ea6a0 : |-- unknown-legacy-device@2
7e5eadb8 : |-- scsi@1
7e5eb4d8 : +-- ethernet@0 ok
This behaviour change is not actually a bug since no assumptions should be
made on DT ordering. But it has no real justification either, other than
being the consequence of the way fdt_add_subnode() inserts new elements
to the front of the FDT rather than adding them to the tail.
This patch reverts to the historical SLOF ordering by walking PCI devices
in reverse order. This reconciles pseries with x86 machine types behavior.
It is expected to make things easier when porting existing applications to
power.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
(slight update to the changelog)
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The pseries machine type implements the behaviour of a PAPR compliant
hypervisor, without actually executing such a hypervisor on the virtual
CPU. To do this we need some hooks in the CPU code to make hypervisor
facilities get redirected to the machine instead of emulated internally.
For hypercalls this is managed through the cpu->vhyp field, which points
to a QOM interface with a method implementing the hypercall.
For the hashed page table (HPT) - also a hypervisor resource - we use an
older hack. CPUPPCState has an 'external_htab' field which when non-NULL
indicates that the HPT is stored in qemu memory, rather than within the
guest's address space.
For consistency - and to make some future extensions easier - this merges
the external HPT mechanism into the vhyp mechanism. Methods are added
to vhyp for the basic operations the core hash MMU code needs: map_hptes()
and unmap_hptes() for reading the HPT, store_hpte() for updating it and
hpt_mask() to retrieve its size.
To match this, the pseries machine now sets these vhyp fields in its
existing vhyp class, rather than reaching into the cpu object to set the
external_htab field.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
CPUPPCState includes fields htab_base and htab_mask which store the base
address (GPA) and size (as a mask) of the guest's hashed page table (HPT).
These are set when the SDR1 register is updated.
Keeping these in sync with the SDR1 is actually a little bit fiddly, and
probably not useful for performance, since keeping them expands the size of
CPUPPCState. It also makes some upcoming changes harder to implement.
This patch removes these fields, in favour of calculating them directly
from the SDR1 contents when necessary.
This does make a change to the behaviour of attempting to write a bad value
(invalid HPT size) to the SDR1 with an mtspr instruction. Previously, the
bad value would be stored in SDR1 and could be retrieved with a later
mfspr, but the HPT size as used by the softmmu would be, clamped to the
allowed values. Now, writing a bad value is treated as a no-op. An error
message is printed in both new and old versions.
I'm not sure which behaviour, if either, matches real hardware. I don't
think it matters that much, since it's pretty clear that if an OS writes
a bad value to SDR1, it's not going to boot.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Accesses to the hashed page table (HPT) are complicated by the fact that
the HPT could be in one of three places:
1) Within guest memory - when we're emulating a full guest CPU at the
hardware level (e.g. powernv, mac99, g3beige)
2) Within qemu, but outside guest memory - when we're emulating user and
supervisor instructions within TCG, but instead of emulating
the CPU's hypervisor mode, we just emulate a hypervisor's behaviour
(pseries in TCG or KVM-PR)
3) Within the host kernel - a pseries machine using KVM-HV
acceleration. Mostly accesses to the HPT are handled by KVM,
but there are a few cases where qemu needs to access it via a
special fd for the purpose.
In order to batch accesses to the fd in case (3), we use a somewhat awkward
ppc_hash64_start_access() / ppc_hash64_stop_access() pair, which for case
(3) reads / releases several HPTEs from the kernel as a batch (usually a
whole PTEG). For cases (1) & (2) it just returns an address value. The
actual HPTE load helpers then need to interpret the returned token
differently in the 3 cases.
This patch keeps the same basic structure, but simplfiies the details.
First start_access() / stop_access() are renamed to map_hptes() and
unmap_hptes() to make their operation more obvious. Second, map_hptes()
now always returns a qemu pointer, which can always be used in the same way
by the load_hpte() helpers. In case (1) it comes from address_space_map()
in case (2) directly from qemu's HPT buffer and in case (3) from a
temporary buffer read from the KVM fd.
While we're at it, make things a bit more consistent in terms of types and
variable names: avoid variables named 'index' (it shadows index(3) which
can lead to confusing results), use 'hwaddr ptex' for HPTE indices and
uint64_t for each of the HPTE words, use ptex throughout the call stack
instead of pte_offset in some places (we still need that at the bottom
layer, but nowhere else).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
cpu_ppc_set_papr() sets up various aspects of CPU state for use with PAPR
paravirtualized guests. However, it doesn't set the virtual hypervisor,
so callers must also call cpu_ppc_set_vhyp() so that PAPR hypercalls are
handled properly. This is a bit silly, so fold setting the virtual
hypervisor into cpu_ppc_set_papr().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
* Standardize on 'ptex' instead of 'pte_index' for HPTE index variables
for consistency and brevity
* Avoid variables named 'index'; shadowing index(3) from libc can lead to
surprising bugs if the variable is removed, because compiler errors
might not appear for remaining references
* Clarify index calculations in h_enter() - we have two cases, H_EXACT
where the exact HPTE slot is given, and !H_EXACT where we search for
an empty slot within the hash bucket. Make the calculation more
consistent between the cases.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Some systems can already provide more than 255 hardware threads.
Bumping the QEMU limit to 1024 seems reasonable:
- it has no visible overhead in top;
- the limit itself has no effect on hot paths.
Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When DT node names for PCI devices are generated by SLOF,
they are generated according to the type of the device
(for instance, ethernet for virtio-net-pci device).
Node name for hotplugged devices is generated by QEMU.
This patch adds the mechanic to QEMU to create the node
name according to the device type too.
The data structure has been roughly copied from OpenBIOS/OpenHackware,
node names from SLOF.
Example:
Hotplugging some PCI cards with QEMU monitor:
device_add virtio-tablet-pci
device_add virtio-serial-pci
device_add virtio-mouse-pci
device_add virtio-scsi-pci
device_add virtio-gpu-pci
device_add ne2k_pci
device_add nec-usb-xhci
device_add intel-hda
What we can see in linux device tree:
for dir in /proc/device-tree/pci@800000020000000/*@*/; do
echo $dir
cat $dir/name
echo
done
WITHOUT this patch:
/proc/device-tree/pci@800000020000000/pci@0/
pci
/proc/device-tree/pci@800000020000000/pci@1/
pci
/proc/device-tree/pci@800000020000000/pci@2/
pci
/proc/device-tree/pci@800000020000000/pci@3/
pci
/proc/device-tree/pci@800000020000000/pci@4/
pci
/proc/device-tree/pci@800000020000000/pci@5/
pci
/proc/device-tree/pci@800000020000000/pci@6/
pci
/proc/device-tree/pci@800000020000000/pci@7/
pci
WITH this patch:
/proc/device-tree/pci@800000020000000/communication-controller@1/
communication-controller
/proc/device-tree/pci@800000020000000/display@4/
display
/proc/device-tree/pci@800000020000000/ethernet@5/
ethernet
/proc/device-tree/pci@800000020000000/input-controller@0/
input-controller
/proc/device-tree/pci@800000020000000/mouse@2/
mouse
/proc/device-tree/pci@800000020000000/multimedia-device@7/
multimedia-device
/proc/device-tree/pci@800000020000000/scsi@3/
scsi
/proc/device-tree/pci@800000020000000/usb-xhci@6/
usb-xhci
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.
We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.
Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
These numbers demonstrate where we gain something:
20338 jan 20 0 331m 75m 6904 R 99 0.9 0:50.95 qemu-system-arm
20337 jan 20 0 331m 75m 6904 S 20 0.9 0:26.50 qemu-system-arm
The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond
32206 jan 20 0 330m 73m 7036 R 82 0.9 1:06.00 qemu-system-arm
32204 jan 20 0 330m 73m 7036 S 21 0.9 0:17.03 qemu-system-arm
We don't benefit significantly, though, when the guest is not fully
loading a host CPU.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com>
[FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
[EGC: fixed iothread lock for cpu-exec IRQ handling]
Signed-off-by: Emilio G. Cota <cota@braap.org>
[AJB: -smp single-threaded fix, clean commit msg, BQL fixes]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
[PM: target-arm changes]
Acked-by: Peter Maydell <peter.maydell@linaro.org>
When performing clock calculations, the ppc405_uc code
has several places where it multiplies together two
32-bit variables and assigns the result to a 64-bit
variable. This doesn't quite do what is intended because
C will compute a 32-bit multiply result. Add casts to
ensure we don't truncate the result.
(Spotted by Coverity, CID 1005504, 1005505.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On POWER, the valid page sizes that the guest can use are bound
to the CPU and not to the memory region. QEMU already has some
fancy logic to find out the right maximum memory size to tell
it to the guest during boot (see getrampagesize() in the file
target/ppc/kvm.c for more information).
However, once we're booted and the guest is using huge pages
already, it is currently still possible to hot-plug memory regions
that does not support huge pages - which of course does not work
on POWER, since the guest thinks that it is possible to use huge
pages everywhere. The KVM_RUN ioctl will then abort with -EFAULT,
QEMU spills out a not very helpful error message together with
a register dump and the user is annoyed that the VM unexpectedly
died.
To avoid this situation, we should check the page size of hot-plugged
DIMMs to see whether it is possible to use it in the current VM.
If it does not fit, we can print out a better error message and
refuse to add it, so that the VM does not die unexpectely and the
user has a second chance to plug a DIMM with a matching memory
backend instead.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1419466
Signed-off-by: Thomas Huth <thuth@redhat.com>
[dwg: Fix a build error on 32-bit builds with KVM]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Generic helper machine_query_hotpluggable_cpus() replaced
target specific query_hotpluggable_cpus() callbacks so
there is no need in it anymore. However inon NULL callback
value is used to detect/report hotpluggable cpus support,
therefore it can be removed completely.
Replace it with MachineClass.has_hotpluggable_cpus boolean
which is sufficient for the task.
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
All callbacks FOO_query_hotpluggable_cpus() are practically
the same except of setting vcpus_count to different values.
Convert them to a generic machine_query_hotpluggable_cpus()
callback by moving vcpus_count initialization to per machine
specific callback possible_cpu_arch_ids().
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Replace SPAPR specific cores[] array with generic
machine->possible_cpus and store core objects there.
It makes cores bookkeeping similar to x86 cpus and
will allow to unify similar code.
It would allow to replace cpu_index based NUMA node
mapping with iproperty based one (for -device created
cores) since possible_cpus carries board defined
topology/layout.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The last byte of the option vector was missing due to an off-by-one
error. Without this fix, client architecture support negotiation will
fail because the last byte of option vector 5, which contains the MMU
support, will be missed.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
error_report() already puts a prefix with the program name in front
of the error strings, so the "qemu:" prefix is not necessary here
anymore.
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_core_unplug() were essentially spapr_core_unplug_request()
handler that requested CPU removal and registered callback
which did actual cpu core removali but it was called from
spapr_machine_device_unplug() which is intended for actual object
removal. Commit (cf632463 spapr: Memory hot-unplug support)
sort of fixed it introducing spapr_machine_device_unplug_request()
and calling spapr_core_unplug() but it hasn't renamed callback and
by mistake calls it from spapr_machine_device_unplug().
However spapr_machine_device_unplug() isn't ever called for
cpu core since spapr_core_release() doesn't follow expected
hotunplug call flow which is:
1: device_del() ->
hotplug_handler_unplug_request() ->
set destroy_cb()
2: destroy_cb() ->
hotplug_handler_unplug() ->
object_unparent // actual device removal
Fix it by renaming spapr_core_unplug() to spapr_core_unplug_request()
which is called from spapr_machine_device_unplug_request() and
making spapr_core_release() call hotplug_handler_unplug() which
will call spapr_machine_device_unplug() -> spapr_core_unplug()
to remove cpu core.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reveiwed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_core_pre_plug/spapr_core_plug/spapr_core_unplug() are managing
wiring CPU core into spapr machine state and not internal CPU core state.
So move them from spapr_cpu_core.c to spapr.c where other similar
(spapr_memory_[foo]plug()) callbacks are located, which also matches
x86 target practice.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Split off destroying VCPU threads from drc callback
spapr_core_release() into new spapr_cpu_core_unrealizefn()
which takes care of internal cpu core state cleanup (i.e.
VCPU threads) and is called when object_unparent(core)
is called.
That leaves spapr_core_release() only with board mgmt
code, which will be moved to board related file in
follow up patch along with the rest on hotplug callbacks.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Block backends defined with -drive if=ide are meant to be picked up by
machine initialization code: a suitable frontend gets created and
wired up automatically.
if=ide drives not picked up that way can still be used with -device as
if they had if=none, but that's unclean and best avoided. Unused ones
produce an "Orphaned drive without device" warning.
-drive parameter "if" is optional, and the default depends on the
machine type. If a machine type doesn't specify a default, the
default is "ide".
Many machine types default to if=ide, even though they don't actually
have an IDE controller. A future patch will change these defaults to
something more sensible. To prepare for it, this patch makes default
"ide" explicit for the machines that actually pick up if=ide drives:
* alpha: clipper
* arm/aarch64: spitz borzoi terrier tosa
* i386/x86_64: generic-pc-machine (with concrete subtypes pc-q35-*
pc-i440fx-* pc-* isapc xenfv)
* mips64el: fulong2e
* mips/mipsel/mips64el: malta mips
* ppc/ppc64: mac99 g3beige prep
* sh4/sh4eb: r2d
* sparc64: sun4u sun4v
Note that ppc64 machine powernv already sets an "ide" default
explicitly. Its IDE controller isn't implemented, yet.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1487153147-11530-2-git-send-email-armbru@redhat.com>
it's not very convenient to use the crash-information property interface,
so provide a CPU class callback to get the guest crash information, and pass
that information in the event
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Message-Id: <1487053524-18674-3-git-send-email-den@openvz.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
hw_error() is for CPU related errors only (it dumps the CPU registers
and calls abort()!), so using error_report() is the better choice
of reporting an error in case we simply did not find a file.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Machines bamboo, e500 and virtex-ml507 assume a certain MMU model,
otherwise resulting in unpredictable behavior. Add apropriate checks
into *_init functions.
Signed-off-by: Valentin Plotkin <caliborn@sdf.org>
[regarding virtex parts]
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We are switching BUILD_BUG_ON to verify that it's parameter is a
compile-time constant, and it turns out that some gcc versions
(specifically gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609) are
not smart enough to figure it out for expressions involving local
variables. This is harmless but means that the check is ineffective for
these platforms. To fix, replace the variable with macros.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
[dwg: Correct a printf format warning]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is a port to ppc of the i386 commit:
00f4d64 kvmclock: clock should count only if vm is running
We remove timebase_post_load function, and use the VM state
change handler to save and restore the guest_timebase (on stop
and continue).
We keep timebase_pre_save to reduce the clock difference on
migration like in:
6053a86 kvmclock: reduce kvmclock difference on migration
Time base offset has originally been introduced by commit
98a8b52 spapr: Add support for time base offset migration
So while VM is paused, the time is stopped. This allows to have
the same result with date (based on Time Base Register) and
hwclock (based on "get-time-of-day" RTAS call).
Moreover in TCG mode, the Time Base is always paused, so this
patch also adjust the behavior between TCG and KVM.
VM state field "time_of_the_day_ns" is now useless but we keep
it to be able to migrate to older version of the machine.
As vmstate_ppc_timebase structure (with timebase_pre_save() and
timebase_post_load() functions) was only used by vmstate_spapr,
we register the VM state change handler only in ppc_spapr_init().
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It is completely unused, thus it can be removed without problems.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If the DECAR register is set to 0, QEMU tries to reload the decrementer with
zero in an inifinite loop. According to PPC documentation, the decrementer is
triggered on 1->0 transition, so avoid reloading the decrementer if if is
already zero.
The problem does not manifest under Linux, but it is valid to set DECAR to zero
(and may make sense as part of decrementer initialization when interrupts are
disabled).
Signed-off-by: Roman Kapl <rka@sysgo.com>
[dwg: Fixed style nit]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Once a compatiblity mode is negotiated with the guest,
h_client_architecture_support() uses run_on_cpu() to update each CPU to
the new mode. We're going to want this logic somewhere else shortly,
so make a helper function to do this global update.
We put it in target-ppc/compat.c - it makes as much sense at the CPU level
as it does at the machine level. We also move the cpu_synchronize_state()
into ppc_set_compat(), since it doesn't really make any sense to call that
without synchronizing state.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
During boot, PAPR guests negotiate CPU model support with the
ibm,client-architecture-support mechanism. The logic to implement this in
qemu is very convoluted. This cleans it up to be cleaner, using the new
ppc_check_compat() call.
The new logic for choosing a compatibility mode is:
1. Usually, use the most recent compatibility mode that is
a) supported by the guest
b) supported by the CPU
and c) no later than the maximum allowed (if specified)
2. If no suitable compatibility mode was found, the guest *does*
support this CPU explicitly, and no maximum compatibility mode is
specified, then use "raw" mode for the current CPU
3. Otherwise, fail the boot.
This differs from the results of the old code: the old code preferred using
"raw" mode to a compatibility mode, whereas the new code prefers a
compatibility mode if available. Using compatibility mode preferentially
means that we're more likely to be able to migrate the guest to a similar
but not identical host.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Machine supports both Open Hack'Ware and OpenBIOS.
Open Hack'Ware is the default because OpenBIOS is currently unable to boot
PReP boot partitions or PReP kernels.
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
[dwg: Correct compile failure with KVM located by Thomas Huth]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Added CONFIG_RS6000_MC to ppc64 or it breaks testcases]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This device is a partial duplicate of System I/O device available in hw/ppc/prep.c
This new one doesn't have all the Motorola-specific registers.
The old one should be deprecated and removed with the 'prep' machine.
Partial documentation available at
ftp://ftp.software.ibm.com/rs6000/technology/spec/srp1_1.exe
section 6.1.5 (I/O Device Mapping)
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Drop the old SysBus init function and use instance_init
Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Drop the old SysBus init function and use instance_init
Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
To continue consolidation of compatibility mode information, this rewrites
the ppc_get_compat_smt_threads() function using the table of compatiblity
modes in target-ppc/compat.c.
It's not a direct replacement, the new ppc_compat_max_threads() function
has simpler semantics - it just returns the number of threads the cpu
model has, taking into account any compatiblity mode it is in.
This no longer takes into account kvmppc_smt_threads() as the previous
version did. That check wasn't useful because we check in
ppc_cpu_realizefn() that CPUs aren't instantiated with more threads
than kvm allows (or if we didn't things will already be broken and
this won't make it any worse).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
When passing through an USB storage device to a pseries guest, it
is currently not possible to automatically boot from the device
if the "bootindex" property has been specified, too (e.g. when using
"-device nec-usb-xhci -device usb-host,hostbus=1,hostaddr=2,bootindex=0"
at the command line). The problem is that QEMU builds a device tree path
like "/pci@800000020000000/usb@0/usb-host@1" and passes it to SLOF
in the /chosen/qemu,boot-list property. SLOF, however, probes the
USB device, recognizes that it is a storage device and thus changes
its name to "storage", and additionally adds a child node for the
SCSI LUN, so the correct boot path in SLOF is something like
"/pci@800000020000000/usb@0/storage@1/disk@101000000000000" instead.
So when we detect an USB mass storage device with SCSI interface,
we've got to adjust the firmware boot-device path properly that
SLOF can automatically boot from the device.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1354177
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The H_SIGNAL_SYS_RESET hcall allows a guest CPU to raise a system reset
exception on CPUs within the same guest -- all CPUs, all-but-self, or a
specific CPU (including self).
This has not made its way to a PAPR release yet, but we have an hcall
number assigned.
H_SIGNAL_SYS_RESET = 0x380
Syntax:
hcall(uint64 H_SIGNAL_SYS_RESET, int64 target);
Generate a system reset NMI on the threads indicated by target.
Values for target:
-1 = target all online threads including the caller
-2 = target all online threads except for the caller
All other negative values: reserved
Positive values: The thread to be targeted, obtained from the value
of the "ibm,ppc-interrupt-server#s" property of the CPU in the OF
device tree.
Semantics:
- Invalid target: return H_Parameter.
- Otherwise: Generate a system reset NMI on target thread(s),
return H_Success.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The 'cpu_version' field in PowerPCCPU is badly named. It's named after the
'cpu-version' device tree property where it is advertised, but that meaning
may not be obvious in most places it appears.
Worse, it doesn't even really correspond to that device tree property. The
property contains either the processor's PVR, or, if the CPU is running in
a compatibility mode, a special "logical PVR" representing which mode.
Rename the cpu_version field, and a number of related variables to
compat_pvr to make this clearer.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Thomas Huth <thuth@redhat.com>
The pseries machine type is a bit unusual in that it runs a paravirtualized
guest. The guest expects to interact with a hypervisor, and qemu
emulates the functions of that hypervisor directly, rather than executing
hypervisor code within the emulated system.
To implement this in TCG, we need to intercept hypercall instructions and
direct them to the machine's hypercall handlers, rather than attempting to
perform a privilege change within TCG. This is controlled by a global
hook - cpu_ppc_hypercall.
This cleanup makes the handling a little cleaner and more extensible than
a single global variable. Instead, each CPU to have hypercalls intercepted
has a pointer set to a QOM object implementing a new virtual hypervisor
interface. A method in that interface is called by TCG when it sees a
hypercall instruction. It's possible we may want to add other methods in
future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
spapr_h_cas_compose_response() includes a cpu_update parameter which
controls whether it includes updated information on the CPUs in the device
tree fragment returned from the ibm,client-architecture-support (CAS) call.
Providing the updated information is essential when CAS has negotiated
compatibility options which require different cpu information to be
presented to the guest. However, it should be safe to provide in other
cases (it will just override the existing data in the device tree with
identical data). This simplifies the code by removing the parameter and
always providing the cpu update information.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Currently the pseries machine has two paths for constructing CPUs. On
newer machine type versions, which support cpu hotplug, it constructs
cpu core objects, which in turn construct CPU threads. For older machine
versions it individually constructs the CPU threads.
This division is going to make some future changes to the cpu construction
harder, so this patch unifies them. Now cpu core objects are always
created. This requires some updates to allow core objects to be created
without a full complement of threads (since older versions allowed a
number of cpus not a multiple of the threads-per-core). Likewise it needs
some changes to the cpu core hot/cold plug path so as not to choke on the
old machine types without hotplug support.
For good measure, we move the cpu construction to its own subfunction,
spapr_init_cpus().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Move the generic cpu_synchronize_ functions to the common hw_accel.h header,
in order to prepare for the addition of a second hardware accelerator.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Vincent Palatin <vpalatin@chromium.org>
Message-Id: <f5c3cffe8d520011df1c2e5437bb814989b48332.1484045952.git.vpalatin@chromium.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We've currently got 18 architectures in QEMU, and thus 18 target-xxx
folders in the root folder of the QEMU source tree. More architectures
(e.g. RISC-V, AVR) are likely to be included soon, too, so the main
folder of the QEMU sources slowly gets quite overcrowded with the
target-xxx folders.
To disburden the main folder a little bit, let's move the target-xxx
folders into a dedicated target/ folder, so that target-xxx/ simply
becomes target/xxx/ instead.
Acked-by: Laurent Vivier <laurent@vivier.eu> [m68k part]
Acked-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> [tricore part]
Acked-by: Michael Walle <michael@walle.cc> [lm32 part]
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x part]
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [s390x part]
Acked-by: Eduardo Habkost <ehabkost@redhat.com> [i386 part]
Acked-by: Artyom Tarasenko <atar4qemu@gmail.com> [sparc part]
Acked-by: Richard Henderson <rth@twiddle.net> [alpha part]
Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa part]
Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [ppc part]
Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> [crisµblaze part]
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> [unicore32 part]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Currently we set the initial isolation/allocation state for DRCs
associated with coldplugged LMBs to ISOLATED/UNUSABLE,
respectively, under the assumption that the guest will move this
state to UNISOLATED/USABLE.
In fact, this is only the case for LMBs added via hotplug. For
coldplugged LMBs, the guest actually assumes the initial state to
be UNISOLATED/USABLE.
In practice, this only becomes an issue when we attempt to unplug
one of these LMBs, where the guest kernel will issue an
rtas-get-sensor-state call to check that the corresponding DRC is
in an USABLE state before it will release the LMB back to
QEMU. If the returned state is otherwise, the guest will assume no
further action is needed, which bypasses the QEMU-side cleanup that
occurs during the USABLE->UNUSABLE transition. This results in
LMBs and their corresponding pc-dimm devices to stick around
indefinitely.
This patch fixes the issue by manually setting DRCs associated with
cold-plugged LMBs to UNISOLATED/ALLOCATED, but leaving the hotplug
state untouched. As it turns out, this is analogous to the handling
for cold-plugged CPUs in spapr_core_plug().
Cc: qemu-ppc@nongnu.org
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
daa2369 "spapr_pci: Add a 64-bit MMIO window" subtly broke migration
from qemu-2.7 to the current version. It split the device's MMIO
window into two pieces for 32-bit and 64-bit MMIO.
The patch included backwards compatibility code to convert the old
property into the new format. However, the property value was also
transferred in the migration stream and compared with a (probably
unwise) VMSTATE_EQUAL. So, the "raw" value from 2.7 is compared to
the new style converted value from (pre-)2.8 giving a mismatch and
migration failure.
Along with the actual field that caused the breakage, there are
several other ill-advised VMSTATE_EQUAL()s. To fix forwards
migration, we read the values in the stream into scratch variables and
ignore them, instead of comparing for equality. To fix backwards
migration, we populate those scratch variables in pre_save() with
adjusted values to match the old behaviour.
To permit the eventual possibility of removing this cruft from the
stream, we only include these compatibility fields if a new
'pre-2.8-migration' property is set. We clear it on the pseries-2.8
machine type, which obviously can't be migrated backwards, but set it
on earlier machine type versions.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
This reverts commit 9b54ca0ba7.
The commit above corrected a migration breakage between qemu-2.7 and
qemu-2.8. However it did so by advancing the migration version for
the PCI host bridge, which obviously breaks migration backwards to
earlier qemu versions.
Although it's not totally essential, we'd like to maintain the
possibility for backwards migration, so revert the change in
preparation for a better fix.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Until very recently, the vmstate for ppc cpus included some poorly
thought out VMSTATE_EQUAL() components, that can easily break
migration compatibility, and did so between qemu-2.6 and later
versions. A hack was recently added which fixes this migration
breakage, but it leaves the unhelpful cruft of these fields in the
migration stream.
This patch adds a new cpu property allowing these fields to be removed
from the stream entirely. For the pseries-2.8 machine type - which
comes after the fix - and for all non-pseries machine types - which
aren't mature enough to care about cross-version migration - we remove
the fields from the stream.
For pseries-2.7 and earlier, The migration hack remains in place,
allowing backwards and forwards migration with the older machine
types.
This restricts the migration compatibility cruft to older machine
types, and at least opens the possibility of eventually deprecating
and removing it entirely.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
With the additional of the OV5_HP_EVT option vector, we now have
certain functionality (namely, memory unplug) that checks at run-time
for whether or not the guest negotiated the option via CAS. Because
we don't currently migrate these negotiated values, we are unable
to unplug memory from a guest after it's been migrated until after
the guest is rebooted and CAS-negotiation is repeated.
This patch fixes this by adding CAS-negotiated options to the
migration stream. We do this using a subsection, since the
negotiated value of OV5_HP_EVT is the only option currently needed
to maintain proper functionality for a running guest.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
PC will use this field in other way, so move it outside the common
code so PC could set a different value, i.e. all CPUs
regardless of where they are coming from (-smp X | -device cpu...).
It's quick and dirty hack as it could be implemented in more generic
way in MashineClass. But do it in simple way since only PC is affected
so far.
Later we can generalize it when another affected target gets support
for -device cpu.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1479212236-183810-3-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
If the pnv machine type is compiled on a 32-bit host, the unsigned long
(host) type is 32-bit. This means that the hweight_long() used to
calculate the number of allowed cores only considers the low 32 bits of
the cores_mask variable, and can thus return 0 in some circumstances.
This corrects the bug.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Suggested-by: Richard Henderson <rth@twiddle.net>
[clg: replaced hweight_long() by ctpop64() ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
High addresses can overflow the uint32_t pcba variable after the 8byte
shift.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The XSCOM addresses for the core registers are encoded in a slightly
different way on POWER8 and POWER9.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
daa2369 "spapr_pci: Add a 64-bit MMIO window" subtly broke migration from
qemu-2.7 to the current version. It split the device's MMIO window into
two pieces for 32-bit and 64-bit MMIO.
The patch included backwards compatibility code to convert the old property
into the new format. However, the property value was also transferred in
the migration stream and compared with a (probably unwise) VMSTATE_EQUAL.
So, the "raw" value from 2.7 is compared to the new style converted value
from (pre-)2.8 giving a mismatch and migration failure.
Although it would be technically possible to fix this in a way allowing
backwards migration, that would leave an ugly legacy around indefinitely.
This patch takes the simpler approach of bumping the migration version,
dropping the unwise VMSTATE_EQUAL (and some equally unwise ones around it)
and ignoring them on an incoming migration.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
PnvChip is defined twice and this can confuse old compilers :
CC ppc64-softmmu/hw/ppc/pnv_xscom.o
In file included from qemu.git/hw/ppc/pnv.c:29:
qemu.git/include/hw/ppc/pnv.h:60: error: redefinition of typedef ‘PnvChip’
qemu.git/include/hw/ppc/pnv_xscom.h:24: note: previous declaration of ‘PnvChip’ was here
make[1]: *** [hw/ppc/pnv.o] Error 1
make[1]: *** Waiting for unfinished jobs....
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
powernv has some code (derived from the spapr equivalent) used in device
tree generation which depends on the CPU's compatibility mode / logical
PVR. However, compatibility modes don't make sense on powernv - at least
not as a property controlled by the host - because the guest in powernv
has full hypervisor level access to the virtual system, and so owns the
PCR (Processor Compatibility Register) which implements compatiblity modes.
Note: the new logic doesn't take into account kvmppc_smt_threads() like the
old version did. However, if core->nr_threads exceeds kvmppc_smt_threads()
then things will already be broken and clamping the value in the device
tree isn't going to save us.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
This changes the *_run_on_cpu APIs (and helpers) to pass data in a
run_on_cpu_data type instead of a plain void *. This is because we
sometimes want to pass a target address (target_ulong) and this fails on
32 bit hosts emulating 64 bit guests.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20161027151030.20863-24-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some files contain multiple #includes of the same header file.
Removed most of those unnecessary duplicate entries using
scripts/clean-includes.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Anand J <anand.indukala@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Add support to hot remove pc-dimm memory devices.
Since we're introducing a machine-level unplug_request hook, we also
had handling for CPU unplug there as well to ensure CPU unplug
continues to work as it did before.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
* add hooks to CAS/cmdline enablement of hotplug ACR support
* add hook for CPU unplug
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 0a417869:
spapr: Move memory hotplug to RTAS_LOG_V6_HP_ID_DRC_COUNT type
dropped per-DRC/per-LMB hotplugs event in favor of a bulk add via a
single LMB count value. This was to avoid overrunning the guest EPOW
event queue with hotplug events. This works fine, but relies on the
guest exhaustively scanning for pluggable LMBs to satisfy the
requested count by issuing rtas-get-sensor(DR_ENTITY_SENSE, ...) calls
until all the LMBs associated with the DIMM are identified.
With newer support for dedicated hotplug event source, this queue
exhaustion is no longer as much of an issue due to implementation
details on the guest side, but we still try to avoid excessive hotplug
events by now supporting both a count and a starting index to avoid
unecessary work. This patch makes use of that approach when the
capability is available.
Cc: bharata@linux.vnet.ibm.com
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add support for DRC count indexed hotplug ID type which is primarily
needed for memory hot unplug. This type allows for specifying the
number of DRs that should be plugged/unplugged starting from a given
DRC index.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
* updated rtas_event_log_v6_hp to reflect count/index field ordering
used in PAPR hotplug ACR
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This adds machine options of the form:
-machine pseries,modern-hotplug-events=true
-machine pseries,modern-hotplug-events=false
If false, QEMU will force the use of "legacy" style hotplug events,
which are surfaced through EPOW events instead of a dedicated
hot plug event source, and lack certain features necessary, mainly,
for memory unplug support.
If true, QEMU will enable support for "modern" dedicated hot plug
event source. Note that we will still default to "legacy" style unless
the guest advertises support for the "modern" hotplug events via
ibm,client-architecture-support hcall during early boot.
For pseries-2.7 and earlier we default to false, for newer machine
types we default to true.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Hotplug events were previously delivered using an EPOW interrupt
and were queued by linux guests into a circular buffer. For traditional
EPOW events like shutdown/resets, this isn't an issue, but for hotplug
events there are cases where this buffer can be exhausted, resulting
in the loss of hotplug events, resets, etc.
Newer-style hotplug event are delivered using a dedicated event source.
We enable this in supported guests by adding standard an additional
event source in the guest device-tree via /event-sources, and, if
the guest advertises support for the newer-style hotplug events,
using the corresponding interrupt to signal the available of
hotplug/unplug events.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ibm,architecture-vec-5 is supposed to encode all option vector 5 bits
negotiated between platform/guest. Currently we hardcode this property
in the boot-time device tree to advertise a single negotiated
capability, "Form 1" NUMA Affinity, regardless of whether or not CAS
has been invoked or that capability has actually been negotiated.
Improve this by generating ibm,architecture-vec-5 based on the full
set of option vector 5 capabilities negotiated via CAS.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.
We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.
Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_dt_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:
1) When ibm,client-architecture-support gets called, we
call spapr_dt_cas_updates() with the set of capabilities
added since the previous call to ibm,client-architecture-support.
For the initial boot, or a system reset generated by something
other than the CAS call itself, this set will consist of *all*
options supported both the platform and the guest. For calls
to ibm,client-architecture-support immediately after a CAS-induced
reset, we call spapr_dt_cas_updates() with only the set
of capabilities added since the previous call, since the other
capabilities will have already been addressed by the boot-time
device-tree this time around. In the unlikely event that
capabilities are *removed* since the previous CAS, we will
generate a CAS-induced reset. In the unlikely event that we
cannot fit the device-tree updates into the buffer provided
by the guest, well generate a CAS-induced reset.
2) When a CAS update results in the need to reset the machine and
include the updates in the boot-time device tree, we call the
spapr_dt_cas_updates() using the full set of negotiated
capabilities as part of the reset path. At initial boot, or after
a reset generated by something other than the CAS call itself,
this set will be empty, resulting in what should be the same
boot-time device-tree as we generated prior to this patch. For
CAS-induced reset, this routine will be called with the full set of
capabilities negotiated by the platform/guest in the previous
CAS call, which should result in CAS updates from previous call
being accounted for in the initial boot-time device tree.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Changed an int -> bool conversion to be more explicit]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently we access individual bytes of an option vector via
ldub_phys() to test for the presence of a particular capability
within that byte. Currently this is only done for the "dynamic
reconfiguration memory" capability bit. If that bit is present,
we pass a boolean value to spapr_h_cas_compose_response()
to generate a modified device tree segment with the additional
properties required to enable this functionality.
As more capability bits are added, will would need to modify the
code to add additional option vector accesses and extend the
param list for spapr_h_cas_compose_response() to include similar
boolean values for these parameters.
Avoid this by switching to spapr_ovec_* helpers so we can do all
the parsing in one shot and then test for these additional bits
within spapr_h_cas_compose_response() directly.
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
PAPR guests advertise their capabilities to the platform by passing
an ibm,architecture-vec structure via an
ibm,client-architecture-support hcall as described by LoPAPR v11,
B.6.2.3. during early boot.
Using this information, the platform enables the capabilities it
supports, then encodes a subset of those enabled capabilities (the
5th option vector of the ibm,architecture-vec structure passed to
ibm,client-architecture-support) into the guest device tree via
"/chosen/ibm,architecture-vec-5".
The logical format of these these option vectors is a bit-vector,
where individual bits are addressed/documented based on the byte-wise
offset from the beginning of the bit-vector, followed by the bit-wise
index starting from the byte-wise offset. Thus the bits of each of
these bytes are stored in reverse order. Additionally, the first
byte of each option vector is encodes the length of the option vector,
so byte offsets begin at 1, and bit offset at 0.
This is not very intuitive for the purposes of mapping these bits to
a particular documented capability, so this patch introduces a set
of abstractions that encapsulate the work of parsing/encoding these
options vectors and testing for individual capabilities.
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[dwg: Tweaked double-include protection to not trigger a checkpatch
false positive]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For historical reasons construction of the guest device tree in spapr is
divided between spapr_create_fdt_skel() which is called at init time, and
spapr_build_fdt() which runs at reset time. Over time, more and more
things have needed to be moved to reset time.
Previous cleanups mean the only things left in spapr_create_fdt_skel() are
the properties of the root node itself. Finish consolidating these two
parts of device tree construction, by moving this to the start of
spapr_build_fdt(), and removing spapr_create_fdt_skel() entirely.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Construction of the /vdevice node (and its children) is divided between
spapr_create_fdt_skel() (at init time), which creates the base node, and
spapr_populate_vdevice() (at reset time) which creates the nodes for each
individual virtual device.
This consolidates both into a single function called from
spapr_build_fdt().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Currently the /hypervisor device tree node is constructed in
spapr_create_fdt_skel(). As part of consolidating device tree construction
to reset time, move it to a function called from spapr_build_fdt().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The /event-sources device tree node is built from spapr_create_fdt_skel().
As part of consolidating device tree construction to reset time, this moves
it to spapr_build_fdt().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
For historical reasons construction of the /rtas node in the device
tree (amongst others) is split into several places. In particular
it's split between spapr_create_fdt_skel(), spapr_build_fdt() and
spapr_rtas_device_tree_setup().
In fact, as well as adding the actual RTAS tokens to the device tree,
spapr_rtas_device_tree_setup() just adds the ibm,lrdr-capacity
property, which despite going in the /rtas node, doesn't have a lot to
do with RTAS.
This patch consolidates the code constructing /rtas together into a new
spapr_dt_rtas() function. spapr_rtas_device_tree_setup() is renamed to
spapr_dt_rtas_tokens() and now only adds the token properties.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
For historical reasons, building the /chosen node in the guest device tree
is split across several places and includes both parts which write the DT
sequentially and others which use random access functions.
This patch consolidates construction of the node into one place, using
random access functions throughout.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Currently the device tree node for the XICS interrupt controller is in
spapr_create_fdt_skel(). As part of consolidating device tree construction
to reset time, this moves it to a function called from spapr_build_fdt().
In addition we move the actual code into hw/intc/xics_spapr.c with the
rest of the PAPR specific interrupt controller code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
At each system reset, the pseries machine needs to load RTAS (the runtime
portion of the guest firmware) into the VM. This means copying
the actual RTAS code into guest memory, and also updating the device
tree so that the guest OS and boot firmware can locate it.
For historical reasons the copy and update to the device tree were in
different parts of the code. This cleanup brings them both together in
an spapr_load_rtas() function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The flattened device tree passed to pseries guests contains a list of
reserved memory areas. Currently we construct this list early in
spapr_create_fdt_skel() as we sequentially write the fdt.
This will be inconvenient for upcoming cleanups, so this patch moves
the reserve map changes to the end of fdt construction. This changes
fdt_add_reservemap_entry() calls - which work when writing the fdt
sequentially to fdt_add_mem_rsv() calls used when altering the fdt in
random access mode.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Currently spapr_create_fdt_skel() takes a bunch of individual parameters
for various things it will put in the device tree. Some of these can
already be taken directly from sPAPRMachineState. This patch alters it so
that all of them can be taken from there, which will allow this code to
be moved away from its current caller in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
These values are used only within ppc_spapr_reset(), so just change them
to local variables.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
spapr_finalize_fdt() both finishes building the device tree for the guest
and loads it into guest memory. For future cleanups, it's going to be
more convenient to do these two things separately. The loading portion is
pretty trivial, so we move it inline into the caller, ppc_spapr_reset().
We also rename spapr_finalize_fdt(), because the current name is going to
become inaccurate.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
As Qemu only supports a single instance of the ISA bus, we use the LPC
controller of chip 0 to create one and plug in a couple of useful
devices, like an UART and RTC. An IPMI BT device, which is also an ISA
device, can be defined on the command line to connect an external BMC.
That is for later.
The PowerNV machine now has a console. Skiboot should load a kernel
and jump into it but execution will stop quite early because we lack a
model for the native XICS controller for the moment :
[ 0.000000] NR_IRQS:512 nr_irqs:512 16
[ 0.000000] XICS: Cannot find a Presentation Controller !
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: at arch/powerpc/platforms/powernv/setup.c:81
...
[ 0.000000] NIP [c00000000079d65c] pnv_init_IRQ+0x30/0x44
You can still do a few things under xmon.
Based on previous work from :
Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Trivial fix for a change in the serial_hds_isa_init() interface]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The LPC (Low Pin Count) interface on a POWER8 is made accessible to
the system through the ADU (XSCOM interface). This interface is part
of set of units connected together via a local OPB (On-Chip Peripheral
Bus) which act as a bridge between the ADU and the off chip LPC
endpoints, like external flash modules.
The most important units of this OPB are :
- OPB Master: contains the ADU slave logic, a set of internal
registers and the logic to control the OPB.
- LPCHC (LPC HOST Controller): which implements a OPB Slave, a set of
internal registers and the LPC HOST Controller to control the LPC
interface.
Four address spaces are provided to the ADU :
- LPC Bus Firmware Memory
- LPC Bus Memory
- LPC Bus I/O (ISA bus)
- and the registers for the OPB Master and the LPC Host Controller
On POWER8, an intermediate hop is necessary to reach the OPB, through
a unit called the ECCB. OPB commands are simply mangled in ECCB write
commands.
On POWER9, the OPB master address space can be accessed via MMIO. The
logic is same but the code will be simpler as the XSCOM and ECCB hops
are not necessary anymore.
This version of the LPC controller model doesn't yet implement support
for the SerIRQ deserializer present in the Naples version of the chip
though some preliminary work is there.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - updated for qemu-2.7
- ported on latest PowerNV patchset
- changed the XSCOM interface to fit new model
- QOMified the model
- moved the ISA hunks in another patch
- removed printf logging
- added a couple of UNIMP logging
- rewrote commit log ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Now that we are using real HW ids for the cores in PowerNV chips, we
can route the XSCOM accesses to them. We just need to attach a
specific XSCOM memory region to each core in the appropriate window
for the core number.
To start with, let's install the DTS (Digital Thermal Sensor) handlers
which should return 38°C for each core.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On a real POWER8 system, the Pervasive Interconnect Bus (PIB) serves
as a backbone to connect different units of the system. The host
firmware connects to the PIB through a bridge unit, the
Alter-Display-Unit (ADU), which gives him access to all the chiplets
on the PCB network (Pervasive Connect Bus), the PIB acting as the root
of this network.
XSCOM (serial communication) is the interface to the sideband bus
provided by the POWER8 pervasive unit to read and write to chiplets
resources. This is needed by the host firmware, OPAL and to a lesser
extent, Linux. This is among others how the PCI Host bridges get
configured at boot or how the LPC bus is accessed.
To represent the ADU of a real system, we introduce a specific
AddressSpace to dispatch XSCOM accesses to the targeted chiplets. The
translation of an XSCOM address into a PCB register address is
slightly different between the P9 and the P8. This is handled before
the dispatch using a 8byte alignment for all.
To customize the device tree, a QOM InterfaceClass, PnvXScomInterface,
is provided with a populate() handler. The chip populates the device
tree by simply looping on its children. Therefore, each model needing
custom nodes should not forget to declare itself as a child at
instantiation time.
Based on previous work done by :
Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: Added cpu parameter to xscom_complete()]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is largy inspired by sPAPRCPUCore with some simplification, no
hotplug for instance. A set of PnvCore objects is added to the PnvChip
and the device tree is populated looping on these cores.
Real HW cpu ids are now generated depending on the chip cpu model, the
chip id and a core mask. The id is propagated to the CPU object, using
properties, to set the SPR_PIR (Processor Identification Register)
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The Processor Identification Register (PIR) is a register that holds a
processor identifier which is used for bus transactions (XSCOM) and
for processor differentiation in multiprocessor systems. It also used
in the interrupt vector entries (IVE) to identify the thread serving
the interrupts.
P9 and P8 have some differences in the CPU PIR encoding.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This will be used to build real HW ids for the cores and enforce some
limits on the available cores per chip.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is is an abstraction of a POWER8 chip which is a set of cores
plus other 'units', like the pervasive unit, the interrupt controller,
the memory controller, the on-chip microcontroller, etc. The whole can
be seen as a socket. It depends on a cpu model and its characteristics:
max cores and specific inits are defined in a PnvChipClass.
We start with an near empty PnvChip with only a few cpu constants
which we will grow in the subsequent patches with the controllers
required to run the system.
The Chip CFAM (Common FRU Access Module) ID gives the model of the
chip and its version number. It is generally the first thing firmwares
fetch, available at XSCOM PCB address 0xf000f, to start initialization.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The goal is to emulate a PowerNV system at the level of the skiboot
firmware, which loads the OS and provides some runtime services. Power
Systems have a lower firmware (HostBoot) that does low level system
initialization, like DRAM training. This is beyond the scope of what
qemu will address in a PowerNV guest.
No devices yet, not even an interrupt controller. Just to get started,
some RAM to load the skiboot firmware, the kernel and initrd. The
device tree is fully created in the machine reset op.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - updated for qemu-2.7
- replaced fprintf by error_report
- used a common definition of _FDT macro
- removed VMStateDescription as migration is not yet supported
- added IBM Copyright statements
- reworked kernel_filename handling
- merged PnvSystem and sPowerNVMachineState
- removed PHANDLE_XICP
- added ppc_create_page_sizes_prop helper
- removed nmi support
- removed kvm support
- updated powernv machine to version 2.8
- removed chips and cpus, They will be provided in another patches
- added a machine reset routine to initialize the device tree (also)
- french has a squelette and english a skeleton.
- improved commit log.
- reworked prototypes parameters
- added a check on the ram size (thanks to Michael Ellerman)
- fixed chip-id cell
- changed MAX_CPUS to 2048
- simplified memory node creation to one node only
- removed machine version
- rewrote the device tree creation with the fdt "rw" routines
- s/sPowerNVMachineState/PnvMachineState/
- etc.]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
With the addition of "numa_node" properties for PHBs we began
advertising NUMA affinity in cases where nb_numa_nodes > 1.
Since the default on the guest side is to make no assumptions about
PHB NUMA affinity (defaulting to -1), there is still a valid use-case
for explicitly defining a PHB's NUMA affinity even when there's just
one node. In particular, some workloads make faulty assumptions about
/sys/bus/pci/<devid>/numa_node being >= 0, warranting the use of
this property as a workaround even if there's just 1 PHB or NUMA
node.
Enable this use-case by always advertising the PHB's NUMA affinity
if "numa_node" has been explicitly set.
We could achieve this by relaxing the check to simply be
nb_numa_nodes > 0, but even safer would be to check
numa_info[nodeid].present explicitly, and to fail at start time
for cases where it does not exist.
This has an additional affect of no longer advertising PHB NUMA
affinity unconditionally if nb_numa_nodes > 1 and "numa_node"
property is unset/-1, but since the default value on the guest
side for each PHB is also -1, the behavior should be the same for
that situation. We could still retain the old behavior if desired,
but the decision seems arbitrary, so we take the simpler route.
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Shivaprasad G. Bhat <shivapbh@in.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
so that it would be possible to increase maxcpus limit
for x86 target. Keep spapr/virt_arm at limit they used
to have 255.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Currently, the MMIO space for accessing PCI on pseries guests begins at
1 TiB in guest address space. Each PCI host bridge (PHB) has a 64 GiB
chunk of address space in which it places its outbound PIO and 32-bit and
64-bit MMIO windows.
This scheme as several problems:
- It limits guest RAM to 1 TiB (though we have a limited fix for this
now)
- It limits the total MMIO window to 64 GiB. This is not always enough
for some of the large nVidia GPGPU cards
- Putting all the windows into a single 64 GiB area means that naturally
aligning things within there will waste more address space.
In addition there was a miscalculation in some of the defaults, which meant
that the MMIO windows for each PHB actually slightly overran the 64 GiB
region for that PHB. We got away without nasty consequences because
the overrun fit within an unused area at the beginning of the next PHB's
region, but it's not pretty.
This patch implements a new scheme which addresses those problems, and is
also closer to what bare metal hardware and pHyp guests generally use.
Because some guest versions (including most current distro kernels) can't
access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
64 TiB. This is broken into 1 TiB chunks. The first 1 TiB contains the
PIO (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs. Each
subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
one PHB each.
This reduces the number of allowed PHBs (without full manual configuration
of all the windows) from 256 to 31, but this should still be plenty in
practice.
We also change some of the default window sizes for manually configured
PHBs to saner values.
Finally we adjust some tests and libqos so that it correctly uses the new
default locations. Ideally it would parse the device tree given to the
guest, but that's a more complex problem for another time.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
On real hardware, and under pHyp, the PCI host bridges on Power machines
typically advertise two outbound MMIO windows from the guest's physical
memory space to PCI memory space:
- A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
- A 64-bit window which maps onto a large region somewhere high in PCI
address space (traditionally this used an identity mapping from guest
physical address to PCI address, but that's not always the case)
The qemu implementation in spapr-pci-host-bridge, however, only supports a
single outbound MMIO window, however. At least some Linux versions expect
the two windows however, so we arranged this window to map onto the PCI
memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
windows, the "32-bit" window from 2G..4G and the "64-bit" window from
4G..~64G.
This approach means, however, that the 64G window is not naturally aligned.
In turn this limits the size of the largest BAR we can map (which does have
to be naturally aligned) to roughly half of the total window. With some
large nVidia GPGPU cards which have huge memory BARs, this is starting to
be a problem.
This patch adds true support for separate 32-bit and 64-bit outbound MMIO
windows to the spapr-pci-host-bridge implementation, each of which can
be independently configured. The 32-bit window always maps to 2G.. in PCI
space, but the PCI address of the 64-bit window can be configured (it
defaults to the same as the guest physical address).
So as not to break possible existing configurations, as long as a 64-bit
window is not specified, a large single window can be specified. This
will appear the same way to the guest as the old approach, although it's
now implemented by two contiguous memory regions rather than a single one.
For now, this only adds the possibility of 64-bit windows. The default
configuration still uses the legacy mode.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Currently the default PCI host bridge for the 'pseries' machine type is
constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
guest memory space. This means that if > 1TiB of guest RAM is specified,
the RAM will collide with the PCI IO windows, causing serious problems.
Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
there's a little unused space at the bottom of the area reserved for PCI,
but essentially this means that > 1TiB of RAM has never worked with the
pseries machine type.
This patch fixes this by altering the placement of PHBs on large-RAM VMs.
Instead of always placing the first PHB at 1TiB, it is placed at the next
1 TiB boundary after the maximum RAM address.
Technically, this changes behaviour in a migration-breaking way for
existing machines with > 1TiB maximum memory, but since having > 1 TiB
memory was broken anyway, this seems like a reasonable trade-off.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB)
for a PAPR guest. Unlike on x86, it's routine on Power (both bare metal
and PAPR guests) to have numerous independent PHBs, each controlling a
separate PCI domain.
There are two ways of configuring the spapr-pci-host-bridge device: first
it can be done fully manually, specifying the locations and sizes of all
the IO windows. This gives the most control, but is very awkward with 6
mandatory parameters. Alternatively just an "index" can be specified
which essentially selects from an array of predefined PHB locations.
The PHB at index 0 is automatically created as the default PHB.
The current set of default locations causes some problems for guests with
large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia
GPGPU cards via VFIO). Obviously, for migration we can only change the
locations on a new machine type, however.
This is awkward, because the placement is currently decided within the
spapr-pci-host-bridge code, so it breaks abstraction to look inside the
machine type version.
So, this patch delegates the "default mode" PHB placement from the
spapr-pci-host-bridge device back to the machine type via a public method
in sPAPRMachineClass. It's still a bit ugly, but it's about the best we
can do.
For now, this just changes where the calculation is done. It doesn't
change the actual location of the host bridges, or any other behaviour.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Instead of an array of fixed sized blocks, use a list, as we will need
to have sources with variable number of interrupts. SPAPR only uses
a single entry. Native will create more. If performance becomes an
issue we can add some hashed lookup but for now this will do fine.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[ move the initialization of list to xics_common_initfn,
restore xirr_owner after migration and move restoring to
icp_post_load]
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
[ clg: removed the icp_post_load() changes from nikunj patchset v3:
http://patchwork.ozlabs.org/patch/646008/ ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Rather than machine instances having backward-compatible option
defaults that need to be repeatedly re-enabled for every new machine
type we introduce, we set the defaults appropriate for newer machine
types, then add code to explicitly disable instance options as needed
to maintain compatibility with older machine types.
Currently pseries-2.5 does not inherit from pseries-2.6 in this
fashion, which is okay at the moment since we do not have any
instance compatibility options for pseries-2.6+ currently.
We will make use of this in future patches though, so fix it here.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[dwg: Extended to make 2.7 inherit from 2.8 as well]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Replace repeated pattern
for (i = 0; i < nb_numa_nodes; i++) {
if (test_bit(idx, numa_info[i].node_cpu)) {
...
break;
with a helper function to lookup numa node index for cpu.
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A couple of distributors are compiling their distributions
with "-mcpu=power8" for ppc64le these days, so the user sooner
or later runs into a crash there when not explicitely specifying
the "-cpu POWER8" option to QEMU (which is currently using POWER7
for the "pseries" machine by default). Due to this reason, the
linux-user target already switched to POWER8 a while ago (see commit
de3f1b9841). Since the softmmu target
of course has the same problem, we should switch there to POWER8 for
the newer machine types, too.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If the user passes an alias name and a property to -cpu, QEMU fails to
find the CPU definition and exits.
$ qemu-system-ppc64 -cpu POWER8E,compat=power7
qemu-system-ppc64: Unable to find sPAPR CPU Core definition
This happens because spapr_get_cpu_core_type() passes the full string from
the command line (i.e. "POWER8E,compat=power7") to ppc_cpu_lookup_alias(),
instead of the alias name piece only (i.e. "POWER8E").
The fix is to pass model_pieces[0] to ppc_cpu_lookup_alias().
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
KVM-PR currently does not support transactional memory, and the
implementation in TCG is just a fake. We should not announce TM
support in the ibm,pa-features property when running on such a
system, so disable it by default and only enable it if the KVM
implementation supports it (i.e. recent versions of KVM-HV).
These changes are based on some earlier work from Anton Blanchard
(thanks!).
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The current code uses pa_features_206 for POWERPC_MMU_2_06, and
for everything else, it uses pa_features_207. This is bad in some
cases because there is also a "degraded" MMU version of ISA 2.06,
called POWERPC_MMU_2_06a, which should of course use the flags for
2.06 instead. And there is also the possibility that the user runs
the pseries machine with a POWER5+ or even 970 processor. In that
case we certainly do not want to set the flags for 2.07, and rather
simply skip the setting of the pa-features property instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The function spapr_populate_cpu_dt() has become quite big
already, and since we likely have to extend the pa-features
property for every new processor generation, it is nicer
if we put the related code into a separate function.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Now that 2.7 is released, create the pseries-2.8 machine type and add the
boilerplate compatiblity macro stuff. There's nothing new to put into the
2.7 compatiliby properties yet, but we'll need something eventually, so
we might as well get it ready now.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Functions of type FindSysbusDeviceFunc currently return an integer.
However, this return value is always ignored by the caller in
find_sysbus_device().
This changes the function type to return void, to avoid confusion over
the function semantics.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
CPUState is a fairly common pointer to pass to these helpers. This means
if you need other arguments for the async_run_on_cpu case you end up
having to do a g_malloc to stuff additional data into the routine. For
the current users this isn't a massive deal but for MTTCG this gets
cumbersome when the only other parameter is often an address.
This adds the typedef run_on_cpu_func for helper functions which has an
explicit CPUState * passed as the first parameter. All the users of
run_on_cpu and async_run_on_cpu have had their helpers updated to use
CPUState where available.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
[Sergey Fedorov:
- eliminate more CPUState in user data;
- remove unnecessary user data passing;
- fix target-s390x/kvm.c and target-s390x/misc_helper.c]
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts)
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> (s390 parts)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1470158864-17651-3-git-send-email-alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The new interface can be used to replace the old notify_started() and
notify_stopped(). Meanwhile it provides explicit flags so that IOMMUs
can know what kind of notifications it is requested for.
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1474606948-14391-3-git-send-email-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This pull request supersedes ppc-for-2.8-20160922. There was a clang
build error in that, and I've also added one extra patch in the new pull.
Included in this set of ppc and spapr patches are:
* TCG implementations for more POWER9 instructions
* Some preliminary XICS fixes in preparataion for the pnv machine type
* A significant ADB (Macintosh kbd/mouse) cleanup
* Some conversions to use trace instead of debug macros
* Fixes to correctly handle global TLB flush synchronization in
TCG. This is already a bug, but it will have much more impact
when we get MTTCG
* Add more qtest testcases for Power
* Some MAINTAINERS updates
* Assorted bugfixes
* Add the basics of NUMA associativity to the spapr PCI host bridge
This touches some test files and monitor.c which are technically
outside the ppc code, but coming through this tree because the changes
are primarily of interest to ppc.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJX5NZnAAoJEGw4ysog2bOSoLEP/1YpRFG/6gmiT+T+Btz1QYcd
eqrJkV63/rY/lvgZOvUBdqA/YKaBSWDOEByNFRZ+Grqz9h5zKrRcmM7IWdRWg+vG
gyrZUm1pscFG20iGNcenxB8mD0VMk7C77gnUlv12bo+mK+1D1i8eUfKLFqxb0kOx
JGIRQNG5orF5vZxsyjRPVpvMS9gNG90vrPIypux4ryozCVMWbrjXRZNsPQKz8wb9
UGcJIFB6R6JVbmBGchi434PEJkcdZzP/a0HvVSO51oGsFBnwYwQ7XVc3PyA4KCD7
tTbm6T2Rpdak3Pcd/nuzoXCMBCkh48XGKxZ+yPuLXGG5ZGIZ6rzlHPqBsEqqiLz5
DLzbsxKyLHX2Af87js4J9OXkoNQI4rVGurvNbkQ7IMQ2/Xt97kgUEgr3W0Vj+r82
bqIqWm4OdJ9cDzTGVlQ7l2vLv6RMe7DrkeWRNEKZZgfir7Hgj1gr79BOe96ETKBd
7r/1z0fBkZoWSq2OdjX8RouXMwd1Nq3FnqYv2BQ99rvM/AqpkY0HYsPIfUilHq6T
ZXhvm/4LIEev0F/GiJvV5jHHg637QS4QqdyglF8ODC8vSMvOThhL9Gj7EMgJs7hj
Ywt1B5y88//Zq4+IGVda98J5ynOZO1CArvzoYR5UMnWiq2K0Lxpq7wemE/finyIK
0jWLqlmCmYRzsS+oQEg/
=et1C
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.8-20160923' into staging
ppc patch queue 2016-09-23
This pull request supersedes ppc-for-2.8-20160922. There was a clang
build error in that, and I've also added one extra patch in the new pull.
Included in this set of ppc and spapr patches are:
* TCG implementations for more POWER9 instructions
* Some preliminary XICS fixes in preparataion for the pnv machine type
* A significant ADB (Macintosh kbd/mouse) cleanup
* Some conversions to use trace instead of debug macros
* Fixes to correctly handle global TLB flush synchronization in
TCG. This is already a bug, but it will have much more impact
when we get MTTCG
* Add more qtest testcases for Power
* Some MAINTAINERS updates
* Assorted bugfixes
* Add the basics of NUMA associativity to the spapr PCI host bridge
This touches some test files and monitor.c which are technically
outside the ppc code, but coming through this tree because the changes
are primarily of interest to ppc.
# gpg: Signature made Fri 23 Sep 2016 08:14:47 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.8-20160923: (45 commits)
spapr_pci: Add numa node id
monitor: fix crash for platforms without a CPU 0
linux-user: ppc64: fix ARCH_206 bit in AT_HWCAP
ppc/kvm: Mark 64kB page size support as disabled if not available
ppc/xics: An ICS with offset 0 is assumed to be uninitialized
ppc/xics: account correct irq status
Enable H_CLEAR_MOD and H_CLEAR_REF hypercalls on KVM/PPC64.
target-ppc: tlbie/tlbivax should have global effect
target-ppc: add flag in check_tlb_flush()
target-ppc: add TLB_NEED_LOCAL_FLUSH flag
spapr: Introduce sPAPRCPUCoreClass
target-ppc: implement darn instruction
target-ppc: add stxsi[bh]x instruction
target-ppc: add lxsi[bw]zx instruction
target-ppc: add xxspltib instruction
target-ppc: consolidate store conditional
target-ppc: move out stqcx impementation
target-ppc: consolidate load with reservation
target-ppc: convert st[16,32,64]r to use new macro
target-ppc: convert st64 to use new macro
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Update all qemu_uuid users as well, especially get rid of the duplicated
low level g_strdup_printf, sscanf and snprintf calls with QEMU UUID API.
Since qemu_uuid_parse is quite tangled with qemu_uuid, its switching to
QemuUUID is done here too to keep everything in sync and avoid code
churn.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-Id: <1474432046-325-10-git-send-email-famz@redhat.com>
This adds a numa id property to a PHB to allow linking passed PCI device
to CPU/memory. It is up to the management stack to do CPU/memory pinning
to the node with the actual PCI device.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[dwg: Renamed property from "node" to "numa_node" to match the similar
one in the pxb device]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
These are mandatory per PAPR and available on Linux 4.3 and newer kernels. The calls in question are required to run FreeBSD guests with reasonable performance, so enable them if possible.
Signed-off-by: Nathan Whitehorn <nwhitehorn@freebsd.org>
[dwg: Added a stub to fix compile without KVM (e.g. on x86 host)]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
tlbie (BookS) and tlbivax (BookE) plus the H_CALLs(pseries) should have
a global effect.
Introduces TLB_NEED_GLOBAL_FLUSH flag. During lazy tlb flush, after
taking care of pending local flushes, check broadcast flush(at context
synchronizing event ptesync/tlbsync, etc) is needed. Depending on the
bitmask state of the tlb_need_flush, tlb is flushed from other cpus if
needed and the flags are cleared.
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Use 'true' instead of '1' for call to check_tlb_flush()]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We flush the qemu TLB lazily. check_tlb_flush is called whenever we hit
a context synchronizing event or instruction that requires a pending
flush to be performed.
However, we fail to handle broadcast TLB flush operations. In order to
fix that efficiently, we want to differentiate whether check_tlb_flush()
needs to only apply pending local flushes (isync instructions,
interrupts, ...) or also global pending flush operations. The latter is
only needed when executing instructions that are defined architecturally
as synchronizing global TLB flush operations. This in our case is
ptesync on BookS and tlbsync on BookE along with the paravirtualized
hypervisor calls.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
[dwg: Changed gen_check_tlb_flush() to also take a bool, and fixed
some spelling errors in commit message]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Each spapr cpu core type defines an instance_init routine which just
populates the CPU class name. This can be done in the class_init
commonly for all core types which simplifies the registration.
This is inspired by how PowerNV core types are registered.
Certain types of spapr cpu cores ('host' and generic type based on host
CPU) are initialized in target-ppc/kvm.c. To convert these type
registrations to use class_init, we need to expose
spapr_cpu_core_class_init() outside of spapr_cpu_core.c.
Commit d11b268e17 added a generic sPAPR CPU core family
type to support cases like POWER8 CPU type on POWER8E host CPU.
Switching to class_init would fix such scenarios to use the right
CPU thread type instead of defaulting to host-powerpc64-cpu.
In an unrelated cleanup, fix a typo in .get_hotplug_handler routine.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add a first test to validate the protocol:
- rtas/get-time-of-day compares the time
from the guest with the time from the host.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Unused function declarations were found using a simple gcc plugin and
manually verified by grepping the sources.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The exact same routine will be used in PowerNV.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_pci would also be a good candidate but the macro _FDT is
slightly different. It returns and does not exit.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Just a single patch here, I hope this is the last ppc / spapr fix to
squeeze into qemu-2.7.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXsWVMAAoJEGw4ysog2bOSxpkQAKCybBBMbQ6viEeqZBNtrleC
whKm6WhN5AZvxb1W/NzacrpwXPHCM8C9+jZRIpea3ucHn5ijyRPCE73gBZLcyV6h
CRFisJQ2NT9gq4iCw0Iw1TwxL+tt6xw2dPr3+mKQpJuUHbcKK8hO5EhZLe/dr+u7
54j2l+EgqhokTjLJuD7GEa/qca1qSsae/Q0HvIThcA4h4jX5RtpMHNSpbh6PJ8fI
dxlcHnjtfei75ptMMqrP+YZ+HPEuiqOqLSVKmcEsjJblKABk7SW7RjbW4Jk8dKYo
Z8VA+MOP+eLrbjYOPJHROHK80Ik6hg3NH/4/tduZM0hsOeFV2i9AyMR1n/Qhkpyu
xEi8Ld+wcVun8NFWV2dj/m/RAE/BgZ1non3wddxVIog8W2R/+PMIfMdVOWt3pRMj
KS/1kkCzKYHWFO18FTpxGfFLsdiNo1szjtJydjfAGd5RvectDm6bBguz0ZwgDPSo
338I7uIFB7h4L/DwMFcPSYTRTSyrvE5MsxcwpQoS4OB5ZKrKGLrqLG9cy0XvO9sO
ImHRMT/YMnD9qiXXnuzmHCg8XgRPyfbxdml6EkxcIDJn9wsINDRdvN9GZ33vDUgT
CBy7xqxRlYJ+MXFJP5S6dyzM6mqtwy8MFDqlcDvIzNDl5GEAyVJHjQdtUu/t3cRx
OzQ0bArG7WeIK2norvwL
=Jm4E
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160815' into staging
ppc patch queue for 2016-08-15
Just a single patch here, I hope this is the last ppc / spapr fix to
squeeze into qemu-2.7.
# gpg: Signature made Mon 15 Aug 2016 07:46:36 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.7-20160815:
ppc: parse cpu features once
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Considering that features are converted to global properties and
global properties are automatically applied to every new instance
of created CPU (at object_new() time), there is no point in
parsing cpu_model string every time a CPU created. So move
parsing outside CPU creation loop and do it only once.
Parsing also should be done before any CPU is created so that
features would affect the first CPU a well.
This patch does that for all PowerPC machine types.
It is based on previous work from Bharata:
https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg07564.html
Signed-off-by: Greg Kurz <groug@kaod.org>
[clg: only kept the fix for the spapr platform. support for other
platform will be added in 2.8 ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Hard-coding the CPU alias names in the spapr_cores[] array has
two big disadvantages:
1) We register a real type with the CPU alias name in
spapr_cpu_core_register_types() - this prevents us from registering
a CPU family name in kvm_ppc_register_host_cpu_type() with the same
name (as we do it for the non-hotpluggable CPU types).
2) It's quite cumbersome to maintain the aliases here in sync with the
ppc_cpu_aliases list from target-ppc/cpu-models.c.
So let's simply add proper alias lookup to the spapr cpu core code,
too (by checking whether the given model can be used directly, and
if not by trying to look up the given model as an alias name instead).
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The sPAPR CPU core typename is already available in the upper
block. Let's use it and move the check upward also.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Commit 9af9e0f, 6daf194d, be62a2eb and 312fd5f got rid of a bunch, but
they keep coming back. checkpatch.pl tries to flag them since commit
5d596c2, but it's not very good at it. Offenders tracked down with
Coccinelle script scripts/coccinelle/err-bad-newline.cocci, an updated
version of the script from commit 312fd5f.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1470224274-31522-2-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
When a TCE table (sPAPR IOMMU context) is in disabled state (which is true
by default for the 64-bit window), it has tcet->nb_table == 0 and
tcet->table == NULL. However, on system reset, spapr_tce_reset() executes,
which unconditionally calls
memset(tcet->table, 0, table_size);
We get away with this in practice, because it's a zero length memset(),
but memset() on a NULL pointer is undefined behaviour, so we should not
call it in this case.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Prior to c8721d3 "spapr: Error out when CPU hotplug is attempted on older
pseries machines", attempting to use query-hotpluggable-cpus on pseries-2.6
and earlier machine types would SEGV.
That change fixed that, but due to some unexpected interactions in init
order and a brown-paper-bag worthy failure to test, it accidentally
disabled query-hotpluggable-cpus for all pseries machine types, including
the current one which should allow it.
In fact, query_hotpluggable_cpus needs to be non-NULL when and only when
the dr_cpu_enabled flag in sPAPRMachineClass is set, which makes
dr_cpu_enabled itself redundant.
This patch removes dr_cpu_enabled, instead directly setting
query_hotpluggable_cpus from the machine class_init functions, and using
that to determine the availability of CPU hotplug when necessary.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
CPU hotplug and coldplug aren't supported prior to pseries-2.7. Further,
earlier machine types don't use CPU core objects at all. These mean that
query-hotpluggable-cpus and coldplug on older pseries machines will crash
QEMU. It also means that hotpluggable_cpus flag in query-machines will
be incorrectly set to true for pseries < 2.7, since it is based on the
presence of the query_hotpluggable_cpus hook.
- Don't assign the query_hotpluggable_cpus hook for pseries < 2.7
- query_hotpluggable_cpus should therefore never be called on pseries <
2.7, so add an assert
- spapr_core_pre_plug() should fail hot/cold plug attempts for pseries <
2.7, since core objects are never used there
- spapr_core_plug() should therefore never be called for pseries < 2.7, so
add an assert.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
[dwg: Change from query_hotpluggable_cpus returning NULL for pseries < 2.7
to not being called at all, reword commit message for accuracy]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>