The ICS object uses a post_load() handler which is implicitly relying
on the fact that the internal state of the ICS and ICP objects has
been restored but this is not guaranteed. So, let's move the code
under the post_load() handler of the machine where we know the objects
have been fully restored.
The icp_resend() handler of the XICSFabric QOM interface is also
removed as it is now obsolete.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The XICSState classes are not used anymore. They have now been fully
deprecated by the XICSFabric QOM interface. Do the cleanups.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There is nothing left related to the XICS object in the realize
functions of the KVMXICSState and XICSState class. So adapt the
interfaces to call these routines directly from the sPAPR machine init
sequence.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is the last step to remove the XICSState abstraction and have the
machine hold all the objects related to interrupts : ICSs and ICPs.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The reset of the ICP objects is currently handled by XICS but this can
be done for each individual ICP.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
spapr_dt_xics() only needs the number of servers to build the device
tree nodes. Let's change the routine interface to reflect that.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Also introduce a xics_icp_get() helper to simplify the changes.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The cpu_setup() handler is currently under the XICSState class but it
really belongs under ICPState as it is setting up an individual vCPU.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The cpu_setup() handler currently takes a 'XICSState *' argument to
grab the kernel ICP file descriptor. This interface can be simplified
by using the 'xics' backlink of the ICP object.
This change is also required by subsequent patches which makes use of
the QOM interface for XICS.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The kernel ICP file descriptor is the only reason behind the
KVMXICSState class and it's in the way of more cleanups. Let's make it
a static for the moment and move forward.
If this is problem, we could use an attribute under the sPAPR machine
later on.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Let's add two new handlers for ICPs. One is to get an ICP object from
a server number and a second is to resend the irqs when needed.
The icp_resend() handler is a temporary workaround needed by the
ics-simple post_load() handler. It will be removed when the post_load
portion can be done at the machine level.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is not used anymore.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The reset of the ICS objects is currently handled by XICS but this can
be done for each individual ICS. This also reduces the use of the XICS
list of ICS.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It is not used anymore now that we have the QOM interface for XICS.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Also change the ICPState 'xics' backlink to be a XICSFabric, this
removes the need of using qdev_get_machine() to get the QOM interface
in some of the routines.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add 'ics_get' and 'ics_resend' handlers to the sPAPR machine. These
are relatively simple for a single ICS.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This interface provides two simple handlers. One is to get an ICS
(Interrupt Source Controller) object from an irq number and a second
to resend the irqs when needed.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is, again, to reduce the use of the list of ICS objects. Let's
make each individual ICS and ICP object an InterruptStatsProvider and
remove this same interface from XICSState.
The InterruptStatsProvider will be moved at the machine level after
the XICS cleanups are completed.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A list of ICS objects was introduced under the XICS object for the
PowerNV machine but, for the sPAPR machine, it brings extra complexity
as there is only a single ICS. To simplify the code, let's add the ICS
pointer under the sPAPR machine and try to reduce the use of this list
where possible.
Also, change the xics_spapr_*() routines to use an ICS object instead
of an XICSState and change their name to reflect that these are
specific to the sPAPR ICS object.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, the ICP (Interrupt Controller Presenter) objects are created by
the 'nr_servers' property handler of the XICS object and a class
handler. They are realized in the XICS object realize routine.
Let's simplify the process by creating the ICP objects along with the
XICS object at the machine level.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, the ICS (Interrupt Controller Source) object is created and
realized by the init and realize routines of the XICS object, but some
of the parameters are only known at the machine level.
These parameters are passed from the sPAPR machine to the ICS object
in a rather convoluted way using property handlers and a class handler
of the XICS object. The number of irqs required to allocate the IRQ
state objects in the ICS realize routine is one of them.
Let's simplify the process by creating the ICS object along with the
XICS object at the machine level and link the ICS into the XICS list
of ICSs at this level also. In the sPAPR machine, there is only a
single ICS but that will change with the PowerNV machine.
Also, QOMify the creation of the objects and get rid of the
superfluous code.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently xics - the component of the IBM POWER interrupt controller
representing the overall interrupt fabric / architecture is
represented as a descendent of SysBusDevice. However, this is not
really correct - the xics presents nothing in MMIO space so it should
be an "unattached" device in the current QOM model.
Since this device will always be created by the machine type, not created
specifically from the command line, and because it has no migrated state
it should be safe to move it around the device composition tree.
Therefore this patch changes it to a descendent of TYPE_DEVICE, and
makes it an unattached device. So that its reset handler still gets
called correctly, we add a qdev_set_parent_bus() to attach it to
sysbus. It's not really clear that's correct (instead of using
register_reset()) but it appears to a common technique.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[clg corrected problems with reset]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg folded together and updated commit message]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Since commit 1d2d974244 "spapr_pci: enumerate and add PCI device tree", QEMU
populates the PCI device tree in the opposite order compared to SLOF.
Before 1d2d974244c6:
Populating /pci@800000020000000
00 0000 (D) : 1af4 1000 virtio [ net ]
00 0800 (D) : 1af4 1001 virtio [ block ]
00 1000 (D) : 1af4 1009 virtio [ network ]
Populating /pci@800000020000000/unknown-legacy-device@2
7e5294b8 : /pci@800000020000000
7e52b998 : |-- ethernet@0
7e52c0c8 : |-- scsi@1
7e52c7e8 : +-- unknown-legacy-device@2 ok
Since 1d2d974244c6:
Populating /pci@800000020000000
00 1000 (D) : 1af4 1009 virtio [ network ]
Populating /pci@800000020000000/unknown-legacy-device@2
00 0800 (D) : 1af4 1001 virtio [ block ]
00 0000 (D) : 1af4 1000 virtio [ net ]
7e5e8118 : /pci@800000020000000
7e5ea6a0 : |-- unknown-legacy-device@2
7e5eadb8 : |-- scsi@1
7e5eb4d8 : +-- ethernet@0 ok
This behaviour change is not actually a bug since no assumptions should be
made on DT ordering. But it has no real justification either, other than
being the consequence of the way fdt_add_subnode() inserts new elements
to the front of the FDT rather than adding them to the tail.
This patch reverts to the historical SLOF ordering by walking PCI devices
in reverse order. This reconciles pseries with x86 machine types behavior.
It is expected to make things easier when porting existing applications to
power.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
(slight update to the changelog)
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The pseries machine type implements the behaviour of a PAPR compliant
hypervisor, without actually executing such a hypervisor on the virtual
CPU. To do this we need some hooks in the CPU code to make hypervisor
facilities get redirected to the machine instead of emulated internally.
For hypercalls this is managed through the cpu->vhyp field, which points
to a QOM interface with a method implementing the hypercall.
For the hashed page table (HPT) - also a hypervisor resource - we use an
older hack. CPUPPCState has an 'external_htab' field which when non-NULL
indicates that the HPT is stored in qemu memory, rather than within the
guest's address space.
For consistency - and to make some future extensions easier - this merges
the external HPT mechanism into the vhyp mechanism. Methods are added
to vhyp for the basic operations the core hash MMU code needs: map_hptes()
and unmap_hptes() for reading the HPT, store_hpte() for updating it and
hpt_mask() to retrieve its size.
To match this, the pseries machine now sets these vhyp fields in its
existing vhyp class, rather than reaching into the cpu object to set the
external_htab field.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
CPUPPCState includes fields htab_base and htab_mask which store the base
address (GPA) and size (as a mask) of the guest's hashed page table (HPT).
These are set when the SDR1 register is updated.
Keeping these in sync with the SDR1 is actually a little bit fiddly, and
probably not useful for performance, since keeping them expands the size of
CPUPPCState. It also makes some upcoming changes harder to implement.
This patch removes these fields, in favour of calculating them directly
from the SDR1 contents when necessary.
This does make a change to the behaviour of attempting to write a bad value
(invalid HPT size) to the SDR1 with an mtspr instruction. Previously, the
bad value would be stored in SDR1 and could be retrieved with a later
mfspr, but the HPT size as used by the softmmu would be, clamped to the
allowed values. Now, writing a bad value is treated as a no-op. An error
message is printed in both new and old versions.
I'm not sure which behaviour, if either, matches real hardware. I don't
think it matters that much, since it's pretty clear that if an OS writes
a bad value to SDR1, it's not going to boot.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Accesses to the hashed page table (HPT) are complicated by the fact that
the HPT could be in one of three places:
1) Within guest memory - when we're emulating a full guest CPU at the
hardware level (e.g. powernv, mac99, g3beige)
2) Within qemu, but outside guest memory - when we're emulating user and
supervisor instructions within TCG, but instead of emulating
the CPU's hypervisor mode, we just emulate a hypervisor's behaviour
(pseries in TCG or KVM-PR)
3) Within the host kernel - a pseries machine using KVM-HV
acceleration. Mostly accesses to the HPT are handled by KVM,
but there are a few cases where qemu needs to access it via a
special fd for the purpose.
In order to batch accesses to the fd in case (3), we use a somewhat awkward
ppc_hash64_start_access() / ppc_hash64_stop_access() pair, which for case
(3) reads / releases several HPTEs from the kernel as a batch (usually a
whole PTEG). For cases (1) & (2) it just returns an address value. The
actual HPTE load helpers then need to interpret the returned token
differently in the 3 cases.
This patch keeps the same basic structure, but simplfiies the details.
First start_access() / stop_access() are renamed to map_hptes() and
unmap_hptes() to make their operation more obvious. Second, map_hptes()
now always returns a qemu pointer, which can always be used in the same way
by the load_hpte() helpers. In case (1) it comes from address_space_map()
in case (2) directly from qemu's HPT buffer and in case (3) from a
temporary buffer read from the KVM fd.
While we're at it, make things a bit more consistent in terms of types and
variable names: avoid variables named 'index' (it shadows index(3) which
can lead to confusing results), use 'hwaddr ptex' for HPTE indices and
uint64_t for each of the HPTE words, use ptex throughout the call stack
instead of pte_offset in some places (we still need that at the bottom
layer, but nowhere else).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
cpu_ppc_set_papr() sets up various aspects of CPU state for use with PAPR
paravirtualized guests. However, it doesn't set the virtual hypervisor,
so callers must also call cpu_ppc_set_vhyp() so that PAPR hypercalls are
handled properly. This is a bit silly, so fold setting the virtual
hypervisor into cpu_ppc_set_papr().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
* Standardize on 'ptex' instead of 'pte_index' for HPTE index variables
for consistency and brevity
* Avoid variables named 'index'; shadowing index(3) from libc can lead to
surprising bugs if the variable is removed, because compiler errors
might not appear for remaining references
* Clarify index calculations in h_enter() - we have two cases, H_EXACT
where the exact HPTE slot is given, and !H_EXACT where we search for
an empty slot within the hash bucket. Make the calculation more
consistent between the cases.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Some systems can already provide more than 255 hardware threads.
Bumping the QEMU limit to 1024 seems reasonable:
- it has no visible overhead in top;
- the limit itself has no effect on hot paths.
Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When DT node names for PCI devices are generated by SLOF,
they are generated according to the type of the device
(for instance, ethernet for virtio-net-pci device).
Node name for hotplugged devices is generated by QEMU.
This patch adds the mechanic to QEMU to create the node
name according to the device type too.
The data structure has been roughly copied from OpenBIOS/OpenHackware,
node names from SLOF.
Example:
Hotplugging some PCI cards with QEMU monitor:
device_add virtio-tablet-pci
device_add virtio-serial-pci
device_add virtio-mouse-pci
device_add virtio-scsi-pci
device_add virtio-gpu-pci
device_add ne2k_pci
device_add nec-usb-xhci
device_add intel-hda
What we can see in linux device tree:
for dir in /proc/device-tree/pci@800000020000000/*@*/; do
echo $dir
cat $dir/name
echo
done
WITHOUT this patch:
/proc/device-tree/pci@800000020000000/pci@0/
pci
/proc/device-tree/pci@800000020000000/pci@1/
pci
/proc/device-tree/pci@800000020000000/pci@2/
pci
/proc/device-tree/pci@800000020000000/pci@3/
pci
/proc/device-tree/pci@800000020000000/pci@4/
pci
/proc/device-tree/pci@800000020000000/pci@5/
pci
/proc/device-tree/pci@800000020000000/pci@6/
pci
/proc/device-tree/pci@800000020000000/pci@7/
pci
WITH this patch:
/proc/device-tree/pci@800000020000000/communication-controller@1/
communication-controller
/proc/device-tree/pci@800000020000000/display@4/
display
/proc/device-tree/pci@800000020000000/ethernet@5/
ethernet
/proc/device-tree/pci@800000020000000/input-controller@0/
input-controller
/proc/device-tree/pci@800000020000000/mouse@2/
mouse
/proc/device-tree/pci@800000020000000/multimedia-device@7/
multimedia-device
/proc/device-tree/pci@800000020000000/scsi@3/
scsi
/proc/device-tree/pci@800000020000000/usb-xhci@6/
usb-xhci
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
for building a s390-netboot.img) can be found at
http://wiki.qemu-project.org/Features/S390xNetworkBoot
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYtV6WAAoJEN7Pa5PG8C+vip4P/0qdPwFJ7BJlbWH9os58btwH
3fkIaGvbVbHVPyi5E9XyyiYRHfxUMdDJkQxzf2kD7HR8Sqx5Pyy1p/Fdz/XPCw6R
vWcLAn8FkcI9pPdjI7dY7MxM6BqeeZAUwL9HmTLU1L80Rf1xxHxjbzE7hy3YeEGg
bFPx4AEpa/MVyOV4sSpfUjsYVQiB9dPtgjjDnxVSGcU3oE2lyceQXj4UxcagxfbE
gQis2hPqQC13pSC/jB6KPI8IW1+Y88Azf493fiGEj2MCm+Mge/9ksK+TtYuWrbV8
e5DY73vNymJd4WXEFp1f2+2T14zpObpGOQv6O0yJMNj/UOJY5KON8qIEIe1XP7lY
D22MiV1q+CoDnRQeN4BoNRPwGDoRPCh31BdzMOZXc6c/kyTcW5lxOBWl6SKaxlDM
K3MoboiH9qF23VX3i2C6id9EiebDMubYmO3d8WlIHB+7digEco0VjC+vEZa18AW2
3/fMke8mWvymeHKIbmZwm2X15QQg0s7VMjoU+iZ4l3vqjBb21BkYv/ZofqfcNe2q
y9xp84m4YuFu8rRjPoupgexuY7N5JaxX1opExFLNyR6RUWGRWpBkaQzBbvUYmVF0
9Ml3lXXPpo5AfXBs256KXrAZTG3oZ+eUeIWYTgWPY8ibqUCHHHMa23ZnB8UUe/2a
rxx9PEZVcG4i73cOTclH
=nH9Y
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20170228' into staging
Network boot for s390x. More information (and instructions
for building a s390-netboot.img) can be found at
http://wiki.qemu-project.org/Features/S390xNetworkBoot
# gpg: Signature made Tue 28 Feb 2017 11:27:18 GMT
# gpg: using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF
* remotes/cohuck/tags/s390x-20170228:
pc-bios/s390-ccw.img: rebuild image
pc-bios/s390-ccw: Use the ccw bios to start the network boot
s390x/ipl: Load network boot image
s390x/ipl: Extend S390IPLState to support network boot
elf-loader: Allow late loading of elf
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
By default, don't allow another writer for block devices that are
attached to a guest device. For the cases where this setup is intended
(e.g. using a cluster filesystem on the disk), the new option can be
used to allow it.
This change affects only devices using DEFINE_BLOCK_PROPERTIES().
Devices directly using DEFINE_PROP_DRIVE() still accept writers
unconditionally.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
This makes all device emulations with a qdev drive property request
permissions on their BlockBackend. The only thing we block at this point
is resizing images for some devices that can't support it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Some devices allow a media change between read-only and read-write
media. They need to adapt the permissions in their .change_media_cb()
implementation, which can fail. So add an Error parameter to the
function.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Now that blk_insert_bs() requests the BlockBackend permissions for the
node it attaches to, it can fail. Instead of aborting, pass the errors
to the callers.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
We want every user to be specific about the permissions it needs, so
we'll pass the initial permissions as parameters to blk_new(). A user
only needs to call blk_set_perm() if it wants to change the permissions
after the fact.
The permissions are stored in the BlockBackend and applied whenever a
BlockDriverState should be attached in blk_insert_bs().
This does not include actually choosing the right set of permissions
everywhere yet. Instead, the usual FIXME comment is added to each place
and will be addressed in individual patches.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
- a fix to a minor bug reported by Coverity
- throttling support in the local backend (command line only)
-----BEGIN PGP SIGNATURE-----
iEYEABECAAYFAli1Q64ACgkQAvw66wEB28I5yQCePbLPSOtHO4LJGc2E973L7vH2
hQIAnReLFevyNN6BpivucP2/0YmAIKSi
=uTYd
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging
This pull request brings:
- a fix to a minor bug reported by Coverity
- throttling support in the local backend (command line only)
# gpg: Signature made Tue 28 Feb 2017 09:32:30 GMT
# gpg: using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <groug@kaod.org>"
# gpg: aka "Greg Kurz <groug@free.fr>"
# gpg: aka "Greg Kurz <gkurz@linux.vnet.ibm.com>"
# gpg: aka "Gregory Kurz (Groug) <groug@free.fr>"
# gpg: aka "[jpeg image of size 3330]"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2
* remotes/gkurz/tags/for-upstream:
throttle: factor out duplicate code
fsdev: add IO throttle support to fsdev devices
9pfs: fix v9fs_lock error case
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This adds the bcm2835_sdhost and bcm2835_gpio to the BCM2835 platform.
For supporting the SD controller selection (alternate function of GPIOs
48-53), the bcm2835_gpio now exposes an sdbus.
It also has a link to both the sdbus of sdhci and sdhost controllers,
and the card is reparented from one bus to another when the alternate
function of GPIOs 48-53 is modified.
Signed-off-by: Clement Deschamps <clement.deschamps@antfield.fr>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1488293711-14195-5-git-send-email-peter.maydell@linaro.org
Message-id: 20170224164021.9066-5-clement.deschamps@antfield.fr
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This adds the BCM2835 GPIO controller.
It currently implements:
- The 54 GPIOs as outputs (qemu_irq)
- The SD controller selection via alternate function of GPIOs 48-53
Signed-off-by: Clement Deschamps <clement.deschamps@antfield.fr>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1488293711-14195-4-git-send-email-peter.maydell@linaro.org
Message-id: 20170224164021.9066-4-clement.deschamps@antfield.fr
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Provide a new function sdbus_reparent_card() in sd core for reparenting
a card from a SDBus to another one.
This function is required by the raspi platform, where the two SD
controllers can be dynamically switched.
Signed-off-by: Clement Deschamps <clement.deschamps@antfield.fr>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1488293711-14195-3-git-send-email-peter.maydell@linaro.org
Message-id: 20170224164021.9066-3-clement.deschamps@antfield.fr
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: added a doc comment to the header file; changed to
use new behaviour of qdev_set_parent_bus()]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Instead of qdev_set_parent_bus() silently doing the wrong
thing if it's handed a device that's already on a bus,
have it remove the device from the old bus and add it to
the new one. This is useful for the raspi2 sdcard.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 1488293711-14195-2-git-send-email-peter.maydell@linaro.org
Reset CPU interface registers of GICv3 when CPU is reset.
For this, ARMCPRegInfo struct is registered with one ICC
register whose resetfn is called when cpu is reset.
All the ICC registers are reset under one single register
reset function instead of calling resetfn for each ICC
register.
Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 1487850673-26455-6-git-send-email-vijay.kilari@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Add gicv3state void pointer to CPUARMState struct
to store GICv3CPUState.
In case of usecase like CPU reset, we need to reset
GICv3CPUState of the CPU. In such scenario, this pointer
becomes handy.
Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 1487850673-26455-5-git-send-email-vijay.kilari@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This actually implements pre_save and post_load methods for in-kernel
vGICv3.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
Message-id: 1487850673-26455-4-git-send-email-vijay.kilari@gmail.com
[PMM:
* use decimal, not 0bnnn
* fixed typo in names of ICC_APR0R_EL1 and ICC_AP1R_EL1
* completely rearranged the get and put functions to read and write
the state in a natural order, rather than mixing distributor and
redistributor state together]
Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
[Vijay:
* Update macro KVM_VGIC_ATTR
* Use 32 bit access for gicd and gicr
* GICD_IROUTER, GICD_TYPER, GICR_PROPBASER and GICR_PENDBASER reg
access are changed from 64-bit to 32-bit access
* Add ICC_SRE_EL1 save and restore
* Dropped translate_fn mechanism and coded functions to handle
save and restore of edge_trigger and priority
* Number of APnR register saved/restored based on number of
priority bits supported]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
To Save and Restore ICC_SRE_EL1 register introduce vmstate
subsection and load only if non-zero.
Also initialize icc_sre_el1 with to 0x7 in pre_load
function.
Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 1487850673-26455-3-git-send-email-vijay.kilari@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The local variable 'nvic' in stm32f205_soc_realize() no longer
holds a direct pointer to the NVIC device; it is a pointer to
the ARMv7M container object. Rename it 'armv7m' accordingly.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1487604965-23220-12-git-send-email-peter.maydell@linaro.org
Switch the stm32f205 SoC to create the armv7m object directly
rather than via the armv7m_init() wrapper. This fits better
with the SoC model's very QOMified design.
In particular this means we can push loading the guest image
out to the top level board code where it belongs, rather
than the SoC object having a QOM property for the filename
to load.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1487604965-23220-11-git-send-email-peter.maydell@linaro.org
The SysTick timer isn't really part of the NVIC proper;
we just modelled it that way back when we couldn't
easily have devices that only occupied a small chunk
of a memory region. Split it out into its own device.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1487604965-23220-10-git-send-email-peter.maydell@linaro.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
The NVIC is a core v7M device that exists for all v7M CPUs;
put it under a CONFIG_ARM_V7M rather than hiding it under
CONFIG_STELLARIS.
(We'll use CONFIG_ARM_V7M for the SysTick device too
when we split it out of the NVIC.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1487604965-23220-9-git-send-email-peter.maydell@linaro.org
Instead of the bitband device doing a cpu_physical_memory_read/write,
make it take a MemoryRegion which specifies where it should be
accessing, and use address_space_read/write to access the
corresponding AddressSpace.
Since this entails pretty much a rewrite, convert away from
old_mmio in the process.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1487604965-23220-8-git-send-email-peter.maydell@linaro.org
Make the NVIC device expose a memory region for its users
to map, rather than mapping itself into the system memory
space on realize, and get the one user (the ARMv7M object)
to do this.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1487604965-23220-7-git-send-email-peter.maydell@linaro.org
Make the ARMv7M object take a memory region link which it uses
to wire up the bitband rather than having them always put
themselves in the system address space.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1487604965-23220-6-git-send-email-peter.maydell@linaro.org
Make the legacy armv7m_init() function use the newly QOMified
armv7m object rather than doing everything by hand.
We can return the armv7m object rather than the NVIC from
armv7m_init() because its interface to the rest of the
board (GPIOs, etc) is identical.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1487604965-23220-5-git-send-email-peter.maydell@linaro.org
Create a proper QOM object for the armv7m container, which
holds the CPU, the NVIC and the bitband regions.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1487604965-23220-4-git-send-email-peter.maydell@linaro.org
Move the NVICState struct definition into a header, so we can
embed it into other QOM objects like SoCs.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1487604965-23220-3-git-send-email-peter.maydell@linaro.org
Abstract the "load kernel" code out of armv7m_init() into its own
function. This includes the registration of the CPU reset function,
to parallel how we handle this for A profile cores.
We make the function public so that boards which choose to
directly instantiate an ARMv7M device object can call it.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1487604965-23220-2-git-send-email-peter.maydell@linaro.org
The Exynos4210 has cluster ID 0x9 in its MPIDR register (raw value
0x8000090x). If this cluster ID is not provided, then Linux kernel
cannot map DeviceTree nodes to MPIDR values resulting in kernel
warning and lack of any secondary CPUs:
DT missing boot CPU MPIDR[23:0], fall back to default cpu_logical_map
...
smp: Bringing up secondary CPUs ...
smp: Brought up 1 node, 1 CPU
SMP: Total of 1 processors activated (24.00 BogoMIPS).
Provide a cluster ID so Linux will see proper MPIDR and will try to
bring the secondary CPU online.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Message-id: 20170226200142.31169-2-krzk@kernel.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Without any clock controller, the Linux kernel was hitting division by
zero during boot or with clk_summary:
[ 0.000000] [<c031054c>] (unwind_backtrace) from [<c030ba6c>] (show_stack+0x10/0x14)
[ 0.000000] [<c030ba6c>] (show_stack) from [<c05b2660>] (dump_stack+0x88/0x9c)
[ 0.000000] [<c05b2660>] (dump_stack) from [<c05b11a4>] (Ldiv0+0x8/0x10)
[ 0.000000] [<c05b11a4>] (Ldiv0) from [<c06ad1e0>] (samsung_pll45xx_recalc_rate+0x58/0x74)
[ 0.000000] [<c06ad1e0>] (samsung_pll45xx_recalc_rate) from [<c0692ec0>] (clk_register+0x39c/0x63c)
[ 0.000000] [<c0692ec0>] (clk_register) from [<c125d360>] (samsung_clk_register_pll+0x2e0/0x3d4)
[ 0.000000] [<c125d360>] (samsung_clk_register_pll) from [<c125d7e8>] (exynos4_clk_init+0x1b0/0x5e4)
[ 0.000000] [<c125d7e8>] (exynos4_clk_init) from [<c12335f4>] (of_clk_init+0x17c/0x210)
[ 0.000000] [<c12335f4>] (of_clk_init) from [<c1204700>] (time_init+0x24/0x2c)
[ 0.000000] [<c1204700>] (time_init) from [<c1200b2c>] (start_kernel+0x24c/0x38c)
[ 0.000000] [<c1200b2c>] (start_kernel) from [<4020807c>] (0x4020807c)
Provide stub for clock controller returning reset values for PLLs.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Message-id: 20170226200142.31169-1-krzk@kernel.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This adds the BCM2835 SDHost controller from Arasan.
Signed-off-by: Clement Deschamps <clement.deschamps@antfield.fr>
Message-id: 20170224164021.9066-2-clement.deschamps@antfield.fr
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Implement the NVIC SHCSR write behaviour which allows pending and
active status of some exceptions to be changed.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Implement the exception return consistency checks
described in the v7M pseudocode ExceptionReturn().
Inspired by a patch from Michael Davidsaver's series, but
this is a reimplementation from scratch based on the
ARM ARM pseudocode.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
The VECTCLRACTIVE and VECTRESET bits in the AIRCR are both
documented as UNPREDICTABLE if you write a 1 to them when
the processor is not halted in Debug state (ie stopped
and under the control of an external JTAG debugger).
Since we don't implement Debug state or emulated JTAG
these bits are always UNPREDICTABLE for us. Instead of
logging them as unimplemented we can simply log writes
as guest errors and ignore them.
Signed-off-by: Michael Davidsaver <mdavidsaver@gmail.com>
[PMM: change extracted from another patch; commit message
constructed from scratch]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Having armv7m_nvic_acknowledge_irq() return the new value of
env->v7m.exception and its one caller assign the return value
back to env->v7m.exception is pointless. Just make the return
type void instead.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
The v7M exception architecture requires that if a synchronous
exception cannot be taken immediately (because it is disabled
or at too low a priority) then it should be escalated to
HardFault (and the HardFault exception is then taken).
Implement this escalation logic.
Signed-off-by: Michael Davidsaver <mdavidsaver@gmail.com>
[PMM: extracted from another patch]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Now that the NVIC is its own separate implementation, we can
clean up the GIC code by removing REV_NVIC and conditionals
which use it.
Signed-off-by: Michael Davidsaver <mdavidsaver@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
The M profile condition for when we can take a pending exception or
interrupt is not the same as that for A/R profile. The code
originally copied from the A/R profile version of the
cpu_exec_interrupt function only worked by chance for the
very simple case of exceptions being masked by PRIMASK.
Replace it with a call to a function in the NVIC code that
correctly compares the priority of the pending exception
against the current execution priority of the CPU.
[Michael Davidsaver's patchset had a patch to do something
similar but the implementation ended up being a rewrite.]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Despite some superficial similarities of register layout, the
M-profile NVIC is really very different from the A-profile GIC.
Our current attempt to reuse the GIC code means that we have
significant bugs in our NVIC.
Implement the NVIC as an entirely separate device, to give
us somewhere we can get the behaviour correct.
This initial commit does not attempt to implement exception
priority escalation, since the GIC-based code didn't either.
It does fix a few bugs in passing:
* ICSR.RETTOBASE polarity was wrong and didn't account for
internal exceptions
* ICSR.VECTPENDING was 16 too high if the pending exception
was for an external interrupt
* UsageFault, BusFault and MemFault were not disabled on reset
as they are supposed to be
Signed-off-by: Michael Davidsaver <mdavidsaver@gmail.com>
[PMM: reworked, various bugs and stylistic cleanups]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Add a state field for the v7M PRIGROUP register and implent
reading and writing it. The current NVIC doesn't honour
the values written, but the new version will.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Rename the nvic_state struct to NVICState, to match
our naming conventions.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
The i.MX timer device can be reset by writing to the SWR bit
of the CR register. This has to behave differently from hard
(power-on) reset because it does not reset all of the bits
in the CR register.
We were incorrectly implementing soft reset and hard reset
the same way, and in addition had a logic error which meant
that we were clearing the bits that soft-reset is supposed
to preserve and not touching the bits that soft-reset clears.
This was not correct behaviour for either kind of reset.
Separate out the soft reset and hard reset code paths, and
correct the handling of reset of the CR register so that it
is correct in both cases.
Signed-off-by: Kurban Mallachiev <mallachiev@ispras.ru>
[PMM: rephrased commit message, spacing on operators;
use bool rather than int for is_soft_reset]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In 2.9 ITS will block save/restore and migration use cases. As such,
let's introduce a user option that allows to turn its instantiation
off, along with GICv3. With the "its" option turned false, migration
will be possible, obviously at the expense of MSI support (with GICv3).
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-id: 1487681108-14452-1-git-send-email-eric.auger@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
object_new(FOO) returns an object with ref_cnt == 1
and following
object_property_set_bool(cpuobj, true, "realized", NULL)
set parent of cpuobj to '/machine/unattached' which makes
ref_cnt == 2.
Since machvirt_init() doesn't take ownership of cpuobj
returned by object_new() it should explicitly drop
reference to cpuobj when dangling pointer is about to
go out of scope like it's done pc_new_cpu() to avoid
object leak.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 1487253461-269218-1-git-send-email-imammedo@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In SDHCI protocol, the 'Block count enable' bit of the Transfer
Mode register is relevant only in multi block transfers. We need
not check it in single block transfers.
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 20170214185225.7994-5-ppandit@redhat.com
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In sdhci_write invoke multi block transfer if it is enabled
in the transfer mode register 's->trnmod'.
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 20170214185225.7994-4-ppandit@redhat.com
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In the SDHCI protocol, the transfer mode register value
is used during multi block transfer to check if block count
register is enabled and should be updated. Transfer mode
register could be set such that, block count register would
not be updated, thus leading to an infinite loop. Add check
to avoid it.
Reported-by: Wjjzhang <wjjzhang@tencent.com>
Reported-by: Jiang Xin <jiangxin1@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 20170214185225.7994-3-ppandit@redhat.com
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In SDHCI protocol, the transfer mode register is defined
to be of 6 bits. Mask its value with '0x0037' so that an
invalid value could not be assigned.
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 20170214185225.7994-2-ppandit@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Switch to using qcrypto_random_bytes() rather than rand() as
our source of randomness for the BCM2835 RNG.
If qcrypto_random_bytes() fails, we don't want to return the guest a
non-random value in case they're really using it for cryptographic
purposes, so the best we can do is a fatal error. This shouldn't
happen unless something's broken, though.
In theory we could implement this device's full FIFO and interrupt
semantics and then just stop filling the FIFO. That's a lot of work,
though, and doesn't really give a very nice diagnostic to the user
since the guest will just seem to hang.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Recent vanilla Raspberry Pi kernels started to make use of
the hardware random number generator in BCM2835 SoC. As a
result, those kernels wouldn't work anymore under QEMU
but rather just freeze during the boot process.
This patch implements a trivial BCM2835 compatible RNG,
and adds it as a peripheral to BCM2835 platform, which
allows to boot a vanilla Raspberry Pi kernel under Qemu.
Changes since v1:
* Prevented guest from writing [31..20] bits in rng_status
* Removed redundant minimum_version_id_old
* Added field entries for the state
* Changed realize function to reset
Signed-off-by: Marcin Chojnacki <marcinch7@gmail.com>
Message-id: 20170210210857.47893-1-marcinch7@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit a3a3d8c7 introduced a segfault bug while checking for
'dc->vmsd->unmigratable' which caused QEMU to crash when trying to add
devices which do no set their 'dc->vmsd' yet while initialization.
Place a 'dc->vmsd' check prior to it so that we do not segfault for
such devices.
NOTE: This doesn't compromise the functioning of --only-migratable
option as all the unmigratable devices do set their 'dc->vmsd'.
Introduce a new function check_migratable() and move the
only_migratable check inside it, also use stubs to avoid user-mode qemu
build failures.
Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Message-Id: <1487009088-23891-1-git-send-email-ashijeetacharya@gmail.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Load the network boot image into guest RAM when the boot
device selected is a network device. Use some of the reserved
space in IplBlockCcw to store the start address of the netboot
image.
A user could also use 'chreipl'(diag 308/5) to change the boot device.
So every time we update the IPLB, we need to verify if the selected
boot device is a network device so we can appropriately load the
network boot image.
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Add new field to S390IPLState to store the name of the network boot
loader.
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
The current QEMU ROM infrastructure rejects late loading of ROMs.
And ELFs are currently loaded as ROM, this prevents delayed loading
of ELFs. So when loading ELF, allow the user to specify if ELF should
be loaded as ROM or not.
If an ELF is not loaded as ROM, then they are not restored on a
guest reboot/reset and so its upto the user to handle the reloading.
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Now that the all callbacks have been converted to use "at" syscalls, we
can drop this code.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_open2() callback is vulnerable to symlink attacks because it
calls:
(1) open() which follows symbolic links for all path elements but the
rightmost one
(2) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(3) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
(4) local_post_create_passthrough() which calls in turn lchown() and
chmod(), both functions also following symbolic links
This patch converts local_open2() to rely on opendir_nofollow() and
mkdirat() to fix (1), as well as local_set_xattrat(),
local_set_mapped_file_attrat() and local_set_cred_passthrough() to
fix (2), (3) and (4) respectively. Since local_open2() already opens
a descriptor to the target file, local_set_cred_passthrough() is
modified to reuse it instead of opening a new one.
The mapped and mapped-file security modes are supposed to be identical,
except for the place where credentials and file modes are stored. While
here, we also make that explicit by sharing the call to openat().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_mkdir() callback is vulnerable to symlink attacks because it
calls:
(1) mkdir() which follows symbolic links for all path elements but the
rightmost one
(2) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(3) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
(4) local_post_create_passthrough() which calls in turn lchown() and
chmod(), both functions also following symbolic links
This patch converts local_mkdir() to rely on opendir_nofollow() and
mkdirat() to fix (1), as well as local_set_xattrat(),
local_set_mapped_file_attrat() and local_set_cred_passthrough() to
fix (2), (3) and (4) respectively.
The mapped and mapped-file security modes are supposed to be identical,
except for the place where credentials and file modes are stored. While
here, we also make that explicit by sharing the call to mkdirat().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_mknod() callback is vulnerable to symlink attacks because it
calls:
(1) mknod() which follows symbolic links for all path elements but the
rightmost one
(2) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(3) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
(4) local_post_create_passthrough() which calls in turn lchown() and
chmod(), both functions also following symbolic links
This patch converts local_mknod() to rely on opendir_nofollow() and
mknodat() to fix (1), as well as local_set_xattrat() and
local_set_mapped_file_attrat() to fix (2) and (3) respectively.
A new local_set_cred_passthrough() helper based on fchownat() and
fchmodat_nofollow() is introduced as a replacement to
local_post_create_passthrough() to fix (4).
The mapped and mapped-file security modes are supposed to be identical,
except for the place where credentials and file modes are stored. While
here, we also make that explicit by sharing the call to mknodat().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_symlink() callback is vulnerable to symlink attacks because it
calls:
(1) symlink() which follows symbolic links for all path elements but the
rightmost one
(2) open(O_NOFOLLOW) which follows symbolic links for all path elements but
the rightmost one
(3) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(4) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
This patch converts local_symlink() to rely on opendir_nofollow() and
symlinkat() to fix (1), openat(O_NOFOLLOW) to fix (2), as well as
local_set_xattrat() and local_set_mapped_file_attrat() to fix (3) and
(4) respectively.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_chown() callback is vulnerable to symlink attacks because it
calls:
(1) lchown() which follows symbolic links for all path elements but the
rightmost one
(2) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(3) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
This patch converts local_chown() to rely on open_nofollow() and
fchownat() to fix (1), as well as local_set_xattrat() and
local_set_mapped_file_attrat() to fix (2) and (3) respectively.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_chmod() callback is vulnerable to symlink attacks because it
calls:
(1) chmod() which follows symbolic links for all path elements
(2) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(3) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
We would need fchmodat() to implement AT_SYMLINK_NOFOLLOW to fix (1). This
isn't the case on linux unfortunately: the kernel doesn't even have a flags
argument to the syscall :-\ It is impossible to fix it in userspace in
a race-free manner. This patch hence converts local_chmod() to rely on
open_nofollow() and fchmod(). This fixes the vulnerability but introduces
a limitation: the target file must readable and/or writable for the call
to openat() to succeed.
It introduces a local_set_xattrat() replacement to local_set_xattr()
based on fsetxattrat() to fix (2), and a local_set_mapped_file_attrat()
replacement to local_set_mapped_file_attr() based on local_fopenat()
and mkdirat() to fix (3). No effort is made to factor out code because
both local_set_xattr() and local_set_mapped_file_attr() will be dropped
when all users have been converted to use the "at" versions.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_link() callback is vulnerable to symlink attacks because it calls:
(1) link() which follows symbolic links for all path elements but the
rightmost one
(2) local_create_mapped_attr_dir()->mkdir() which follows symbolic links
for all path elements but the rightmost one
This patch converts local_link() to rely on opendir_nofollow() and linkat()
to fix (1), mkdirat() to fix (2).
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
When using the mapped-file security model, we also have to create a link
for the metadata file if it exists. In case of failure, we should rollback.
That's what this patch does.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_rename() callback is vulnerable to symlink attacks because it
uses rename() which follows symbolic links in all path elements but the
rightmost one.
This patch simply transforms local_rename() into a wrapper around
local_renameat() which is symlink-attack safe.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_renameat() callback is currently a wrapper around local_rename()
which is vulnerable to symlink attacks.
This patch rewrites local_renameat() to have its own implementation, based
on local_opendir_nofollow() and renameat().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_lstat() callback is vulnerable to symlink attacks because it
calls:
(1) lstat() which follows symbolic links in all path elements but the
rightmost one
(2) getxattr() which follows symbolic links in all path elements
(3) local_mapped_file_attr()->local_fopen()->openat(O_NOFOLLOW) which
follows symbolic links in all path elements but the rightmost
one
This patch converts local_lstat() to rely on opendir_nofollow() and
fstatat(AT_SYMLINK_NOFOLLOW) to fix (1), fgetxattrat_nofollow() to
fix (2).
A new local_fopenat() helper is introduced as a replacement to
local_fopen() to fix (3). No effort is made to factor out code
because local_fopen() will be dropped when all users have been
converted to call local_fopenat().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_readlink() callback is vulnerable to symlink attacks because it
calls:
(1) open(O_NOFOLLOW) which follows symbolic links for all path elements but
the rightmost one
(2) readlink() which follows symbolic links for all path elements but the
rightmost one
This patch converts local_readlink() to rely on open_nofollow() to fix (1)
and opendir_nofollow(), readlinkat() to fix (2).
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_truncate() callback is vulnerable to symlink attacks because
it calls truncate() which follows symbolic links in all path elements.
This patch converts local_truncate() to rely on open_nofollow() and
ftruncate() instead.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_statfs() callback is vulnerable to symlink attacks because it
calls statfs() which follows symbolic links in all path elements.
This patch converts local_statfs() to rely on open_nofollow() and fstatfs()
instead.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>