Commit Graph

15738 Commits

Author SHA1 Message Date
Thomas Huth 831e882253 hw/net/spapr_llan: Fix receive buffer handling for better performance
tl;dr:
This patch introduces an alternate way of handling the receive
buffers of the spapr-vlan device, resulting in much better
receive performance for the guest.

Full story:
One of our testers recently discovered that the performance of the
spapr-vlan device is very poor compared to other NICs, and that
a simple "ping -i 0.2 -s 65507 someip" in the guest can result
in more than 50% lost ping packets (especially with older guest
kernels < 3.17).

After doing some analysis, it was clear that there is a problem
with the way we handle the receive buffers in spapr_llan.c: The
ibmveth driver of the guest Linux kernel tries to add a lot of
buffers into several buffer pools (with 512, 2048 and 65536 byte
sizes by default, but it can be changed via the entries in the
/sys/devices/vio/1000/pool* directories of the guest). However,
the spapr-vlan device of QEMU only tries to squeeze all receive
buffer descriptors into one single page which has been supplied
by the guest during the H_REGISTER_LOGICAL_LAN call, without
taking care of different buffer sizes. This has two bad effects:
First, only a very limited number of buffer descriptors is accepted
at all. Second, we also hand 64k buffers to the guest even if
the 2k buffers would fit better - and this results in dropped packets
in the IP layer of the guest since too much skbuf memory is used.

Though it seems at a first glance like PAPR says that we should store
the receive buffer descriptors in the page that is supplied during
the H_REGISTER_LOGICAL_LAN call, chapter 16.4.1.2 in the LoPAPR spec
declares that "the contents of these descriptors are architecturally
opaque, none of these descriptors are manipulated by code above
the architected interfaces". That means we don't have to store
the RX buffer descriptors in this page, but can also manage the
receive buffers at the hypervisor level only. This is now what we
are doing here: Introducing proper RX buffer pools which are also
sorted by size of the buffers, so we can hand out a buffer with
the best fitting size when a packet has been received.

To avoid problems with migration from/to older version of QEMU,
the old behavior is also retained and enabled by default. The new
buffer management has to be enabled via a new "use-rx-buffer-pools"
property.

Now with the new buffer pool management enabled, the problem with
"ping -s 65507" is fixed for me, and the throughput of a simple
test with wget increases from creeping 3MB/s up to 20MB/s!

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Thomas Huth d6f39fdfcd hw/net/spapr_llan: Extract rx buffer code into separate functions
Refactor the code a little bit by extracting the code that reads
and writes the receive buffer list page into separate functions.
There should be no functional change in this patch, this is just
a preparation for the upcoming extensions that introduce receive
buffer pools.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Benjamin Herrenschmidt 26a7f1291b ppc: Create cpu_ppc_set_papr() helper
And move the code adjusting the MSR mask and calling kvmppc_set_papr()
to it. This allows us to add a few more things such as disabling setting
of MSR:HV and appropriate LPCR bits which will be used when fixing
the exception model.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[clg: removed LPCR setting ]
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Alexey Kardashevskiy 0ddbd05362 spapr/target-ppc/kvm: Only add hcall-instructions if KVM supports it
ePAPR defines "hcall-instructions" device-tree property which contains
code to call hypercalls in ePAPR paravirtualized guests.  In general
pseries guests won't use this property, instead using the PAPR defined
hypercall interface.

However, this property has been re-used to implement a hack to allow
PR KVM to run (slightly modified) guests in some situations where it
otherwise wouldn't be able to (because the system's L0 hypervisor
doesn't forward the PAPR hypercalls to the PR KVM kernel).

Hence, this property is always present in the device tree for pseries
guests. All KVM guests use it at least to read features via the
KVM_HC_FEATURES hypercall.

The property is populated by the code returned from the KVM's
KVM_PPC_GET_PVINFO ioctl; if not implemented in the KVM, QEMU supplies
code which will fail all hypercall attempts. If QEMU does not create
the property, and the guest kernel is compiled with
CONFIG_EPAPR_PARAVIRT (which is normally the case), there is exactly
the same stub at @epapr_hypercall_start already.

Rather than maintaining this fairly useless stub implementation, it
makes more sense not to create the property in the device tree in the
first place if the host kernel does not implement it.

This changes kvmppc_get_hypercall() to return 1 if the host kernel
does not implement KVM_CAP_PPC_GET_PVINFO. The caller can use it to decide
on whether to create the property or not.

This changes the pseries machine to not create the property if KVM does
not implement KVM_PPC_GET_PVINFO. In practice this means that from now
on the property will not be created if either HV KVM or TCG is used.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[reworded commit message for clarity --dwg]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:33 +11:00
Peter Maydell 2538039f2c ivshmem: Fixes, cleanups, device model split
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW8FqyAAoJEDhwtADrkYZTjYcP/R1m2LcFnLTxzDjSK38nxWcw
 5t/Do7nBNgXL2ZdRHfJsy7bx/9RR55k16rvzkFgW8LpUa5Ro64onRh2PfMz2p0e8
 QvZRBhXTh5/y4TD61y5Y8d9xawA6Hr1oEUtwsfovI9EiXzVaLl3sLI/nleed68Rk
 eAD2h8+ZcBeJ+lRK3UHEzAvqh0u+IScRMJifCxHyJuoZiylHIHVVq7x40ywg0Ejq
 8wHEj/nDJZHUxbuH4sm215Lv4dK6CmIP8UzuhfY6MxAS6Jo7Zdk1zv2SjJO2DzwT
 rWU4hD0+khwTz3hBR341oWxb84C5MujPwkeP7mibR46HLHCn5imQMz0W+6tj7umb
 dxnwPpXzON00+56B7e4i21aXTO0IaY3AcL9QuETSAaoy3SD5BdDkt3R9XWM+jqqZ
 armE5nNAv8WEN8qUYL/YpBxFDYSZ3CFgNv1enoP2pSp4DqeF/H3aP4RWu+dYqLDm
 MyVhcXUkjHfTCY6NVPPBkNwSvz2vq4ft/b6t7tLN+0ZmIRsEegKxxRrI2vB6O8Ga
 Gh2iKcJfMp90jwwvywfGO+DNQ8npHvhxMkioyzMHflo0QyS2ZDhlf4ubp7cXlYZ6
 tj7iGXJKJQpQyJWA58k8EXR9wc2W+fgRYD/H61QTTyTUgxEo6w10KjBDTsbFwvIY
 R0poHCfRR0DQ7y3GerZO
 =XEMm
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-ivshmem-2016-03-18' into staging

ivshmem: Fixes, cleanups, device model split

# gpg: Signature made Mon 21 Mar 2016 20:33:54 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-ivshmem-2016-03-18: (40 commits)
  contrib/ivshmem-server: Print "not for production" warning
  ivshmem: Require master to have ID zero
  ivshmem: Drop ivshmem property x-memdev
  ivshmem: Clean up after the previous commit
  ivshmem: Split ivshmem-plain, ivshmem-doorbell off ivshmem
  ivshmem: Replace int role_val by OnOffAuto master
  qdev: New DEFINE_PROP_ON_OFF_AUTO
  ivshmem: Inline check_shm_size() into its only caller
  ivshmem: Simplify memory regions for BAR 2 (shared memory)
  ivshmem: Implement shm=... with a memory backend
  ivshmem: Tighten check of property "size"
  ivshmem: Simplify how we cope with short reads from server
  ivshmem: Drop the hackish test for UNIX domain chardev
  ivshmem: Rely on server sending the ID right after the version
  ivshmem: Propagate errors through ivshmem_recv_setup()
  ivshmem: Receive shared memory synchronously in realize()
  ivshmem: Plug leaks on unplug, fix peer disconnect
  ivshmem: Disentangle ivshmem_read()
  ivshmem: Simplify rejection of invalid peer ID from server
  ivshmem: Assert interrupts are set up once
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-23 12:57:44 +00:00
Peter Maydell ac0d25e843 usb: bugfix collection.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJW79X7AAoJEEy22O7T6HE4iwMP/iR0VZNyiyFLBFXxOztIJzPC
 d2PeZdx6QTXSrLQ6IDgXbWUAiAgR2QivqIH9DD8novQiTZOBHXVvGz2hu3/HRTVA
 tVvNP9W3+Ia9x3ERqA07loS+dPqsfXdwXomLpF524SFMTJRXqHCKQbBT0r8wIXK/
 FyqK/DoNom8MLfmGaLe3Vu/jvLfCo/jFoojOD39GXn8xLZ24EpGa+hOuDYGB/JfN
 rs7TGjHNpVzzqto8cuTT++r6JOEyRL/wwBpQ2gpiV+J/a6Os80shQN+0aVeszzZE
 MH9XXtb4q+f3PxH5CDdzIixOBvRvdKJXxj5xwgHWPzFObyIXzFx9ijywrgvVujCG
 c5Ql3EBYiHfpxis0g5nifs7xi06PbzcEyLjSKjeY36hZ7VSlzOQm2ZI4zdALM2nv
 A8iy12zYBaNX42IXBbpBkclgJuXrprZURfsFSbj5232rQ6N8HUA2FVRLWuppKbZ0
 LBOog6qaA8LlOR3Csb94PtYFL8p3N6mqiZ3dibsW9cLf0cObi0MOaRPd7LaXYnGG
 bbeOJGcCWDwd57QjGIFi4KZTnBjJIWoknfRgSuxBCJGyDmSHcQZv/3oUCYetw5Di
 Mr7XttUIb63btV9EAWiP1V7ljLUJSqq3VhX1JbP6oUDb12f7yBL7RxBxzyFRVLLj
 W41W9Ei+xES4BUaNXpNM
 =kl34
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kraxel/tags/pull-usb-20160321-1' into staging

usb: bugfix collection.

# gpg: Signature made Mon 21 Mar 2016 11:07:39 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-usb-20160321-1:
  usb: ehci: add capability mmio write function
  hw/usb/dev-mtp: Guard inotify usage with CONFIG_INOTIFY1
  usb: fix unbound stack warning for inotify_watchfn
  usb: fix unbound stack usage for usb_mtp_add_str
  usb: fix unbounded stack warning for xhci_dma_write_u32s
  usb: Fix compilation for Windows

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-22 16:42:06 +00:00
Markus Armbruster 62a830b688 ivshmem: Require master to have ID zero
Migration with ivshmem needs to be carefully orchestrated to work.
Exactly one peer (the "master") migrates to the destination, all other
peers need to unplug (and disconnect), migrate, plug back (and
reconnect).  This is sort of documented in qemu-doc.

If peers connect on the destination before migration completes, the
shared memory can get messed up.  This isn't documented anywhere.  Fix
that in qemu-doc.

To avoid messing up register IVPosition on migration, the server must
assign the same ID on source and destination.  ivshmem-spec.txt leaves
ID assignment unspecified, however.

Amend ivshmem-spec.txt to require the first client to receive ID zero.
The example ivshmem-server complies: it always assigns the first
unused ID.

For a bit of additional safety, enforce ID zero for the master.  This
does nothing when we're not using a server, because the ID is zero for
all peers then.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-40-git-send-email-armbru@redhat.com>
2016-03-21 21:29:03 +01:00
Markus Armbruster 13fd2cb689 ivshmem: Drop ivshmem property x-memdev
Use ivshmem-plain instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-39-git-send-email-armbru@redhat.com>
2016-03-21 21:29:03 +01:00
Markus Armbruster ddc8528443 ivshmem: Clean up after the previous commit
Move code to more sensible places.  Use the opportunity to reorder and
document IVShmemState members.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-38-git-send-email-armbru@redhat.com>
2016-03-21 21:29:03 +01:00
Markus Armbruster 5400c02b90 ivshmem: Split ivshmem-plain, ivshmem-doorbell off ivshmem
ivshmem can be configured with and without interrupt capability
(a.k.a. "doorbell").  The two configurations have largely disjoint
options, which makes for a confusing (and badly checked) user
interface.  Moreover, the device can't tell the guest whether its
doorbell is enabled.

Create two new device models ivshmem-plain and ivshmem-doorbell, and
deprecate the old one.

Changes from ivshmem:

* PCI revision is 1 instead of 0.  The new revision is fully backwards
  compatible for guests.  Guests may elect to require at least
  revision 1 to make sure they're not exposed to the funny "no shared
  memory, yet" state.

* Property "role" replaced by "master".  role=master becomes
  master=on, role=peer becomes master=off.  Default is off instead of
  auto.

* Property "use64" is gone.  The new devices always have 64 bit BARs.

Changes from ivshmem to ivshmem-plain:

* The Interrupt Pin register in PCI config space is zero (does not use
  an interrupt pin) instead of one (uses INTA).

* Property "x-memdev" is renamed to "memdev".

* Properties "shm" and "size" are gone.  Use property "memdev"
  instead.

* Property "msi" is gone.  The new device can't have MSI-X capability.
  It can't interrupt anyway.

* Properties "ioeventfd" and "vectors" are gone.  They're meaningless
  without interrupts anyway.

Changes from ivshmem to ivshmem-doorbell:

* Property "msi" is gone.  The new device always has MSI-X capability.

* Property "ioeventfd" defaults to on instead of off.

* Property "size" is gone.  The new device can only map all the shared
  memory received from the server.

Guests can easily find out whether the device is configured for
interrupts by checking for MSI-X capability.

Note: some code added in sub-optimal places to make the diff easier to
review.  The next commit will move it to more sensible places.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-37-git-send-email-armbru@redhat.com>
2016-03-21 21:29:03 +01:00
Markus Armbruster 2a845da736 ivshmem: Replace int role_val by OnOffAuto master
In preparation of making it a qdev property.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-36-git-send-email-armbru@redhat.com>
2016-03-21 21:29:02 +01:00
Markus Armbruster 55e8a15435 qdev: New DEFINE_PROP_ON_OFF_AUTO
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-35-git-send-email-armbru@redhat.com>
2016-03-21 21:29:02 +01:00
Markus Armbruster 8baeb22bfc ivshmem: Inline check_shm_size() into its only caller
Improve the error messages while there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1458066895-20632-34-git-send-email-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2016-03-21 21:29:02 +01:00
Markus Armbruster c2d8019cd7 ivshmem: Simplify memory regions for BAR 2 (shared memory)
ivshmem_realize() puts the shared memory region in a container region.
Used to be necessary to permit delayed mapping of the shared memory.
However, we recently moved to synchronous mapping, in "ivshmem:
Receive shared memory synchronously in realize()" and the commit
following it.  The container is redundant since then.  Drop it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1458066895-20632-33-git-send-email-armbru@redhat.com>
2016-03-21 21:29:02 +01:00
Markus Armbruster 5503e28504 ivshmem: Implement shm=... with a memory backend
ivshmem has its very own code to create and map shared memory.
Replace that with an implicitly created memory backend.  Reduces the
number of ways we create BAR 2 from three to two.

The memory-backend-file is currently available only with CONFIG_LINUX,
so this adds a second Linuxism to ivshmem (the other one is eventfd).
Should we ever need to make it portable to systems where
memory-backend-file can't be made to serve, we could create a
memory-backend-shmem that allocates memory with shm_open().

Bonus fix: shared memory files are now created with permissions 0655
instead of 0777.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1458066895-20632-32-git-send-email-armbru@redhat.com>
2016-03-21 21:29:02 +01:00
Markus Armbruster 08183c20b8 ivshmem: Tighten check of property "size"
If size_t is narrower than 64 bits, passing uint64_t ivshmem_size to
mmap() truncates.  Reject such sizes.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-31-git-send-email-armbru@redhat.com>
2016-03-21 21:29:02 +01:00
Markus Armbruster ee276391a3 ivshmem: Simplify how we cope with short reads from server
Short reads from a UNIX domain sockets are exceedingly unlikely when
the other side always sends eight bytes and we always read eight
bytes.  We cope with them anyway.  However, the code doing that is
rather convoluted.  Dumb it down radically.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-30-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster ba5970a178 ivshmem: Drop the hackish test for UNIX domain chardev
The chardev must be capable of transmitting SCM_RIGHTS ancillary
messages.  We check it by comparing CharDriverState member filename to
"unix:".  That's almost as brittle as it is disgusting.

When the actual transmission all happened asynchronously, this check
was all we could do in realize(), and thus better than nothing.  But
now we receive at least one SCM_RIGHTS synchronously in realize(),
it's not worth its keep anymore.  Drop it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-29-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster a3feb08639 ivshmem: Rely on server sending the ID right after the version
The protocol specification (ivshmem-spec.txt, formerly
ivshmem_device_spec.txt) has always required the ID message to be sent
right at the beginning, and ivshmem-server has always complied.  The
device, however, accepts it out of order.  If an interrupt setup
arrived before it, though, it would be misinterpreted as connect
notification.  Fix the latent bug by relying on the spec and
ivshmem-server's actual behavior.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-28-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster 1309cf448a ivshmem: Propagate errors through ivshmem_recv_setup()
This kills off the funny state described in the previous commit.

Simplify ivshmem_io_read() accordingly, and update documentation.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1458066895-20632-27-git-send-email-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster 3a55fc0f24 ivshmem: Receive shared memory synchronously in realize()
When configured for interrupts (property "chardev" given), we receive
the shared memory from an ivshmem server.  We do so asynchronously
after realize() completes, by setting up callbacks with
qemu_chr_add_handlers().

Keeping server I/O out of realize() that way avoids delays due to a
slow server.  This is probably relevant only for hot plug.

However, this funny "no shared memory, yet" state of the device also
causes a raft of issues that are hard or impossible to work around:

* The guest is exposed to this state: when we enter and leave it its
  shared memory contents is apruptly replaced, and device register
  IVPosition changes.

  This is a known issue.  We document that guests should not access
  the shared memory after device initialization until the IVPosition
  register becomes non-negative.

  For cold plug, the funny state is unlikely to be visible in
  practice, because we normally receive the shared memory long before
  the guest gets around to mess with the device.

  For hot plug, the timing is tighter, but the relative slowness of
  PCI device configuration has a good chance to hide the funny state.

  In either case, guests complying with the documented procedure are
  safe.

* Migration becomes racy.

  If migration completes before the shared memory setup completes on
  the source, shared memory contents is silently lost.  Fortunately,
  migration is rather unlikely to win this race.

  If the shared memory's ramblock arrives at the destination before
  shared memory setup completes, migration fails.

  There is no known way for a management application to wait for
  shared memory setup to complete.

  All you can do is retry failed migration.  You can improve your
  chances by leaving more time between running the destination QEMU
  and the migrate command.

  To mitigate silent memory loss, you need to ensure the server
  initializes shared memory exactly the same on source and
  destination.

  These issues are entirely undocumented so far.

I'd expect the server to be almost always fast enough to hide these
issues.  But then rare catastrophic races are in a way the worst kind.

This is way more trouble than I'm willing to take from any device.
Kill the funny state by receiving shared memory synchronously in
realize().  If your hot plug hangs, go kill your ivshmem server.

For easier review, this commit only makes the receive synchronous, it
doesn't add the necessary error propagation.  Without that, the funny
state persists.  The next commit will do that, and kill it off for
real.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-26-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster 9db51b4d64 ivshmem: Plug leaks on unplug, fix peer disconnect
close_peer_eventfds() cleans up three things: ioeventfd triggers if
they exist, eventfds, and the array to store them.

Commit 98609cd (v1.2.0) fixed it not to clean up ioeventfd triggers
when they don't exist (property ioeventfd=off, which is the default).
Unfortunately, the fix also made it skip cleanup of the eventfds and
the array then.  This is a memory and file descriptor leak on unplug.

Additionally, the reset of nb_eventfds is skipped.  Doesn't matter on
unplug.  On peer disconnect, however, this permanently wedges the
interrupt vectors used for that peer's ID.  The eventfds stay behind,
but aren't connected to a peer anymore.  When the ID gets recycled for
a new peer, the new peer's eventfds get assigned to vectors after the
old ones.  Commonly, the device's number of vectors matches the
server's, so the new ones get dropped with a "Too many eventfd
received" message.  Interrupts either don't work (common case) or go
to the wrong vector.

Fix by narrowing the conditional to just the ioeventfd trigger
cleanup.

While there, move the "invalid" peer check to the only caller where it
can actually happen, and tighten it to reject own ID.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-25-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster ca0b7566cc ivshmem: Disentangle ivshmem_read()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-24-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster cd9953f720 ivshmem: Simplify rejection of invalid peer ID from server
ivshmem_read() processes server messages.  These are 64 bit signed
integers.  -1 is shared memory setup, 16 bit unsigned is a peer ID,
anything else is invalid.

ivshmem_read() rejects invalid negative messages right away, silently.

Invalid positive messages get rejected only in resize_peers(), and
ivshmem_read() then prints the rather cryptic message "failed to
resize peers array".

Extend the first check to cover all invalid messages, make it report
"server sent invalid message", and drop the second check.

Now resize_peers() can't fail anymore; simplify.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-23-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster 3c27969b3e ivshmem: Assert interrupts are set up once
An interrupt is set up when the interrupt's file descriptor is
received.  Each message applies to the next interrupt vector.
Therefore, each vector cannot be set up more than once.

ivshmem_add_kvm_msi_virq() half-heartedly tries not to rely on this by
doing nothing then, but that's not going to recover from this error
should it become possible in the future.  watch_vector_notifier()
doesn't even try.

Simply assert what is the case, so we get alerted if we ever screw it
up.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-22-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster 2d1d422d11 ivshmem: Leave INTx alone when using MSI-X
The ivshmem device can either use MSI-X or legacy INTx for interrupts.

With MSI-X enabled, peer interrupt events trigger an MSI as they
should.  But software can still raise INTx via interrupt status and
mask register in BAR 0.  This is explicitly prohibited by PCI Local
Bus Specification Revision 3.0, section 6.8.3.3:

    While enabled for MSI or MSI-X operation, a function is prohibited
    from using its INTx# pin (if implemented) to request service (MSI,
    MSI-X, and INTx# are mutually exclusive).

Fix the device model to leave INTx alone when using MSI-X.

Document that we claim to use INTx in config space even when we don't.
Unlike other devices, ivshmem does *not* use INTx when configured for
MSI-X and MSI-X isn't enabled by software.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1458066895-20632-21-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster 082751e82b ivshmem: Clean up MSI-X conditions
There are three predicates related to MSI-X:

* ivshmem_has_feature(s, IVSHMEM_MSI) is true unless the non-MSI-X
  variant of the device is selected with msi=off.

* msix_present() is true when the device has the PCI capability MSI-X.
  It's initially false, and becomes true during successful realize of
  the MSI-X variant of the device.  Thus, it's the same as
  ivshmem_has_feature(s, IVSHMEM_MSI) for realized devices.

* msix_enabled() is true when msix_present() is true and guest software
  has enabled MSI-X.

Code that differs between the non-MSI-X and the MSI-X variant of the
device needs to be guarded by ivshmem_has_feature(s, IVSHMEM_MSI) or
by msix_present(), except the latter works only for realized devices.

Code that depends on whether MSI-X is in use needs to be guarded with
msix_enabled().

Code review led me to two minor messes:

* ivshmem_vector_notify() calls msix_notify() even when
  !msix_enabled(), unlike most other MSI-X-capable devices.  As far as
  I can tell, msix_notify() does nothing when !msix_enabled().  Add
  the guard anyway.

* Most callers of ivshmem_use_msix() guard it with
  ivshmem_has_feature(s, IVSHMEM_MSI).  Not necessary, because
  ivshmem_use_msix() does nothing when !msix_present().  That's
  ivshmem's only use of msix_present(), though.  Guard it
  consistently, and drop the now redundant msix_present() check.
  While there, rename ivshmem_use_msix() to ivshmem_msix_vector_use().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1458066895-20632-20-git-send-email-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster 434ad76db5 ivshmem: Clean up register callbacks
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-19-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster d855e27565 ivshmem: Failed realize() can leave migration blocker behind
If pci_ivshmem_realize() fails after it created its migration blocker,
the blocker is left in place.  Fix that by creating it last.

Likewise, if it fails after it called fifo8_create(), it leaks fifo
memory.  Fix that the same way.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-18-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster 9cf70c5225 ivshmem: Fix harmless misuse of Error
We reuse errp after passing it host_memory_backend_get_memory().  If
both host_memory_backend_get_memory() and the reuse set an error, the
reuse will fail the assertion in error_setv().  Fortunately,
host_memory_backend_get_memory() can't fail.

Pass it &error_abort to make our assumption explicit, and to get the
assertion failure in the right place should it become invalid.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-17-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster 71c265816d ivshmem: Don't destroy the chardev on version mismatch
Yes, the chardev is commonly useless after we read a bad version from
it, but destroying it is inappropriate anyway: the user created it, so
the user should be able to hold on to it as long as he likes.  We
don't destroy it on other errors.  Screwed up in commit 5105b1d.

Stop reading instead.

Also note QEMU's behavior in ivshmem-spec.txt.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-16-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster c20fc0c3ee ivshmem: Drop ivshmem_event() stub
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-15-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster e64befe929 ivshmem: Clean up after commit 9940c32
IVShmemState member eventfd_chr is useless since commit 9940c32.  Drop
it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-14-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster a4fa93bf20 ivshmem: Compile debug prints unconditionally to prevent bit-rot
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-13-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster 97553976dd ivshmem: Add missing newlines to debug printfs
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-12-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Prasad J Pandit dff0367cf6 usb: ehci: add capability mmio write function
USB Ehci emulation supports host controller capability registers.
But its mmio '.write' function was missing, which lead to a null
pointer dereference issue. Add a do nothing 'ehci_caps_write'
definition to avoid it; Do nothing because capability registers
are Read Only(RO).

Reported-by: Zuozhi Fzz <zuozhi.fzz@alibaba-inc.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 1454072434-16045-1-git-send-email-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-18 14:20:39 +01:00
Matthew Fortune 983bff3530 hw/usb/dev-mtp: Guard inotify usage with CONFIG_INOTIFY1
inotify_init1 usage was guarded by a check for linux but does not
exist on older distributions like CentOS 5 resulting in build
failures.

Signed-off-by: Matthew Fortune <matthew.fortune@imgtec.com>
Message-id: 6D39441BF12EF246A7ABCE6654B023536BB85D4A@hhmail02.hh.imgtec.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-18 13:58:15 +01:00
Peter Xu f34d57d359 usb: fix unbound stack warning for inotify_watchfn
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1457503640-31473-1-git-send-email-peterx@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-18 13:56:24 +01:00
Peter Xu e3d60bc7c6 usb: fix unbound stack usage for usb_mtp_add_str
Use heap instead of stack.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-18 13:55:16 +01:00
Peter Xu 182b391e79 usb: fix unbounded stack warning for xhci_dma_write_u32s
All the callers for xhci_dma_write_u32s() are using mostly 5 * uint32_t
in len. To avoid unbound stack warning for the function, make it
statically allocated, and assert when it's not big enough in the
future.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 1457661106-9569-1-git-send-email-peterx@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-18 13:42:14 +01:00
Stefan Weil 0ab6d12ffd usb: Fix compilation for Windows
Mingw-w64 does not provide sys/ioctl.h and Linux builds don't need it,
so remove that include statement.

ERROR is defined by wingdi.h (included via windows.h). Undefine it before
it is redefined to avoid a compiler warning / error.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-id: 1458159439-32322-1-git-send-email-sw@weilnetz.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-18 13:13:30 +01:00
Eric Blake 32bafa8fdd qapi: Don't special-case simple union wrappers
Simple unions were carrying a special case that hid their 'data'
QMP member from the resulting C struct, via the hack method
QAPISchemaObjectTypeVariant.simple_union_type().  But by using
the work we started by unboxing flat union and alternate
branches, coupled with the ability to visit the members of an
implicit type, we can now expose the simple union's implicit
type in qapi-types.h:

| struct q_obj_ImageInfoSpecificQCow2_wrapper {
|     ImageInfoSpecificQCow2 *data;
| };
|
| struct q_obj_ImageInfoSpecificVmdk_wrapper {
|     ImageInfoSpecificVmdk *data;
| };
...
| struct ImageInfoSpecific {
|     ImageInfoSpecificKind type;
|     union { /* union tag is @type */
|         void *data;
|-        ImageInfoSpecificQCow2 *qcow2;
|-        ImageInfoSpecificVmdk *vmdk;
|+        q_obj_ImageInfoSpecificQCow2_wrapper qcow2;
|+        q_obj_ImageInfoSpecificVmdk_wrapper vmdk;
|     } u;
| };

Doing this removes asymmetry between QAPI's QMP side and its
C side (both sides now expose 'data'), and means that the
treatment of a simple union as sugar for a flat union is now
equivalent in both languages (previously the two approaches used
a different layer of dereferencing, where the simple union could
be converted to a flat union with equivalent C layout but
different {} on the wire, or to an equivalent QMP wire form
but with different C representation).  Using the implicit type
also lets us get rid of the simple_union_type() hack.

Of course, now all clients of simple unions have to adjust from
using su->u.member to using su->u.member.data; while this touches
a number of files in the tree, some earlier cleanup patches
helped minimize the change to the initialization of a temporary
variable rather than every single member access.  The generated
qapi-visit.c code is also affected by the layout change:

|@@ -7393,10 +7393,10 @@ void visit_type_ImageInfoSpecific_member
|     }
|     switch (obj->type) {
|     case IMAGE_INFO_SPECIFIC_KIND_QCOW2:
|-        visit_type_ImageInfoSpecificQCow2(v, "data", &obj->u.qcow2, &err);
|+        visit_type_q_obj_ImageInfoSpecificQCow2_wrapper_members(v, &obj->u.qcow2, &err);
|         break;
|     case IMAGE_INFO_SPECIFIC_KIND_VMDK:
|-        visit_type_ImageInfoSpecificVmdk(v, "data", &obj->u.vmdk, &err);
|+        visit_type_q_obj_ImageInfoSpecificVmdk_wrapper_members(v, &obj->u.vmdk, &err);
|         break;
|     default:
|         abort();

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-13-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:26 +01:00
Max Reitz efaa7c4eeb blockdev: Split monitor reference from BB creation
Before this patch, blk_new() automatically assigned a name to the new
BlockBackend and considered it referenced by the monitor. This patch
removes the implicit monitor_add_blk() call from blk_new() (and
consequently the monitor_remove_blk() call from blk_delete(), too) and
thus blk_new() (and related functions) no longer take a BB name
argument.

In fact, there is only a single point where blk_new()/blk_new_open() is
called and the new BB is monitor-owned, and that is in blockdev_init().
Besides thus relieving us from having to invent names for all of the BBs
we use in qemu-img, this fixes a bug where qemu cannot create a new
image if there already is a monitor-owned BB named "image".

If a BB and its BDS tree are created in a single operation, as of this
patch the BDS tree will be created before the BB is given a name
(whereas it was the other way around before). This results in minor
change to the output of iotest 087, whose reference output is amended
accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Eduardo Habkost 34294e2f54 module: Rename machine_init() to opts_init()
The only remaining users of machine_init() only call
qemu_add_opts(). Rename machine_init() to opts_init() and move it
closer to the qemu_add_opts() calls on vl.c.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-03-16 15:54:23 -03:00
Eduardo Habkost 0e6aac87fd machine: Use type_init() to register machine classes
Change all machine_init() users that simply call type_register*()
to use type_init().

Cc: Evgeny Voevodin <e.voevodin@samsung.com>
Cc: Maksim Kozlov <m.kozlov@samsung.com>
Cc: Igor Mitsyanko <i.mitsyanko@gmail.com>
Cc: Dmitry Solodkiy <d.solodkiy@samsung.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Michael Walle <michael@walle.cc>
Cc: "Hervé Poussineau" <hpoussin@reactos.org>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-03-16 15:34:05 -03:00
Peter Maydell fec44a8c70 sd: Fix "info qtree" on boards with SD cards
The SD card object is not a SysBusDevice, so don't create it with
qdev_create() if we're not assigning it to a specific bus; use
object_new() instead.

This was causing 'info qtree' to segfault on boards with SD cards,
because qdev_create(NULL, TYPE_FOO) puts the created object on the
system bus, and then we may try to run functions like sysbus_dev_print()
on it, which fail when casting the object to SysBusDevice.

(This is the same mistake that we made with the NAND device
and fixed in commit 6749695eaaf346c1.)

Reported-by: xiaoqiang.zhao <zxq_yx_007@163.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: xiaoqiang.zhao <zxq_yx_007@163.com>
Message-id: 1458061009-7733-1-git-send-email-peter.maydell@linaro.org
2016-03-16 17:42:19 +00:00
Grégory ESTRADE 6717f587a4 bcm2835_dma: add emulation of Raspberry Pi DMA controller
At present, all DMA transfers complete inline (so a looping descriptor
queue will lock up the device). We also do not model pause/abort,
arbitrarion/priority, or debug features.

Signed-off-by: Grégory ESTRADE <gregory.estrade@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Message-id: 1457467526-8840-6-git-send-email-Andrew.Baumann@microsoft.com
[AB: implement 2D mode, cleanup/refactoring for upstream submission]
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Grégory ESTRADE 355a8ccc5c bcm2835_property: implement framebuffer control/configuration properties
The property channel driver now interfaces with the framebuffer device
to query and set framebuffer parameters. As a result of this, the "get
ARM RAM size" query now correctly returns the video RAM base address
(not total RAM size), and the ram-size property is no longer relevant
here.

Signed-off-by: Grégory ESTRADE <gregory.estrade@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Message-id: 1457467526-8840-5-git-send-email-Andrew.Baumann@microsoft.com
[AB: cleanup/refactoring for upstream submission]
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Grégory ESTRADE 5e9c2a8dac bcm2835_fb: add framebuffer device for Raspberry Pi
The framebuffer occupies the upper portion of memory (64MiB by
default), but it can only be controlled/configured via a system
mailbox or property channel (to be added by a subsequent patch).

Signed-off-by: Grégory ESTRADE <gregory.estrade@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Message-id: 1457467526-8840-4-git-send-email-Andrew.Baumann@microsoft.com
[AB: added Windows (BGR) support and cleanup/refactoring for upstream submission]
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Andrew Baumann 97398d900c bcm2835_aux: add emulation of BCM2835 AUX (aka UART1) block
At present only the core UART functions (data path for tx/rx) are
implemented, which is enough for UEFI to boot. The following
features/registers are unimplemented:
  * Line/modem control
  * Scratch register
  * Extra control
  * Baudrate
  * SPI interfaces

Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1457467526-8840-3-git-send-email-Andrew.Baumann@microsoft.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00