xqemu/hw
Thomas Huth 831e882253 hw/net/spapr_llan: Fix receive buffer handling for better performance
tl;dr:
This patch introduces an alternate way of handling the receive
buffers of the spapr-vlan device, resulting in much better
receive performance for the guest.

Full story:
One of our testers recently discovered that the performance of the
spapr-vlan device is very poor compared to other NICs, and that
a simple "ping -i 0.2 -s 65507 someip" in the guest can result
in more than 50% lost ping packets (especially with older guest
kernels < 3.17).

After doing some analysis, it was clear that there is a problem
with the way we handle the receive buffers in spapr_llan.c: The
ibmveth driver of the guest Linux kernel tries to add a lot of
buffers into several buffer pools (with 512, 2048 and 65536 byte
sizes by default, but it can be changed via the entries in the
/sys/devices/vio/1000/pool* directories of the guest). However,
the spapr-vlan device of QEMU only tries to squeeze all receive
buffer descriptors into one single page which has been supplied
by the guest during the H_REGISTER_LOGICAL_LAN call, without
taking care of different buffer sizes. This has two bad effects:
First, only a very limited number of buffer descriptors is accepted
at all. Second, we also hand 64k buffers to the guest even if
the 2k buffers would fit better - and this results in dropped packets
in the IP layer of the guest since too much skbuf memory is used.

Though it seems at a first glance like PAPR says that we should store
the receive buffer descriptors in the page that is supplied during
the H_REGISTER_LOGICAL_LAN call, chapter 16.4.1.2 in the LoPAPR spec
declares that "the contents of these descriptors are architecturally
opaque, none of these descriptors are manipulated by code above
the architected interfaces". That means we don't have to store
the RX buffer descriptors in this page, but can also manage the
receive buffers at the hypervisor level only. This is now what we
are doing here: Introducing proper RX buffer pools which are also
sorted by size of the buffers, so we can hand out a buffer with
the best fitting size when a packet has been received.

To avoid problems with migration from/to older version of QEMU,
the old behavior is also retained and enabled by default. The new
buffer management has to be enabled via a new "use-rx-buffer-pools"
property.

Now with the new buffer pool management enabled, the problem with
"ping -s 65507" is fixed for me, and the throughput of a simple
test with wget increases from creeping 3MB/s up to 20MB/s!

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
..
9pfs all: Clean up includes 2016-02-23 12:43:05 +00:00
acpi module: Rename machine_init() to opts_init() 2016-03-16 15:54:23 -03:00
alpha loader: Add data swap option to load-elf 2016-03-04 11:30:21 +00:00
arm machine: Use type_init() to register machine classes 2016-03-16 15:34:05 -03:00
audio all: Clean up includes 2016-02-23 12:43:05 +00:00
block blockdev: Split monitor reference from BB creation 2016-03-17 15:47:56 +01:00
bt hw: Clean up includes 2016-01-29 15:07:25 +00:00
char qapi: Don't special-case simple union wrappers 2016-03-18 10:29:26 +01:00
core qdev: New DEFINE_PROP_ON_OFF_AUTO 2016-03-21 21:29:02 +01:00
cpu hw/intc/arm_gic.c: Implement GICv2 GICC_DIR 2016-03-04 11:30:22 +00:00
cris loader: Add data swap option to load-elf 2016-03-04 11:30:21 +00:00
display bcm2835_fb: add framebuffer device for Raspberry Pi 2016-03-16 17:42:18 +00:00
dma bcm2835_dma: add emulation of Raspberry Pi DMA controller 2016-03-16 17:42:18 +00:00
gpio ARM: PL061: Checking register r/w accesses to reserved area 2016-02-26 15:09:42 +00:00
i2c i.MX: Add missing descriptions in devices. 2016-03-16 17:42:18 +00:00
i386 kvm: x86: q35: Add support for -machine kernel_irqchip=split for q35 2016-03-15 18:23:33 +01:00
ide ahci: prohibit "restarting" the FIS or CLB engines 2016-02-10 13:29:40 -05:00
input qapi: Don't special-case simple union wrappers 2016-03-18 10:29:26 +01:00
intc hw/intc: Add (new) ASPEED VIC device model 2016-03-16 17:42:18 +00:00
ipack hw: Clean up includes 2016-01-29 15:07:25 +00:00
ipmi ipmi: add some local variables in ipmi_sdr_init 2016-03-11 16:59:13 +02:00
isa ich9lpc: fix typo 2016-03-11 16:45:21 +02:00
lm32 machine: Use type_init() to register machine classes 2016-03-16 15:34:05 -03:00
m68k loader: Add data swap option to load-elf 2016-03-04 11:30:21 +00:00
mem qapi: Don't special-case simple union wrappers 2016-03-18 10:29:26 +01:00
microblaze loader: Add data swap option to load-elf 2016-03-04 11:30:21 +00:00
mips machine: Use type_init() to register machine classes 2016-03-16 15:34:05 -03:00
misc ivshmem: Require master to have ID zero 2016-03-21 21:29:03 +01:00
moxie loader: Add data swap option to load-elf 2016-03-04 11:30:21 +00:00
net hw/net/spapr_llan: Fix receive buffer handling for better performance 2016-03-24 11:17:34 +11:00
nvram fw_cfg: expose control register size in fw_cfg.h 2016-03-08 10:46:30 +01:00
openrisc loader: Add data swap option to load-elf 2016-03-04 11:30:21 +00:00
pci msi_supported -> msi_nonbroken 2016-03-11 16:45:21 +02:00
pci-bridge pxb: cleanup 2016-03-11 16:59:12 +02:00
pci-host loader: Add data swap option to load-elf 2016-03-04 11:30:21 +00:00
pcmcia hw: Clean up includes 2016-01-29 15:07:25 +00:00
ppc ppc: Create cpu_ppc_set_papr() helper 2016-03-24 11:17:34 +11:00
s390x machine: Use type_init() to register machine classes 2016-03-16 15:34:05 -03:00
scsi scsi-bus: Remove tape command from scsi_req_xfer 2016-03-07 17:56:23 +01:00
sd sd: Fix "info qtree" on boards with SD cards 2016-03-16 17:42:19 +00:00
sh4 sh4: Clean up includes 2016-01-29 15:07:24 +00:00
smbios module: Rename machine_init() to opts_init() 2016-03-16 15:54:23 -03:00
sparc machine: Use type_init() to register machine classes 2016-03-16 15:34:05 -03:00
sparc64 machine: Use type_init() to register machine classes 2016-03-16 15:34:05 -03:00
ssi hw: Clean up includes 2016-01-29 15:07:25 +00:00
timer hw/timer: Add ASPEED timer device model 2016-03-16 17:42:18 +00:00
tpm hw: Clean up includes 2016-01-29 15:07:25 +00:00
tricore loader: Add data swap option to load-elf 2016-03-04 11:30:21 +00:00
unicore32 unicore: Clean up includes 2016-01-29 15:07:22 +00:00
usb usb: ehci: add capability mmio write function 2016-03-18 14:20:39 +01:00
vfio vfio: Eliminate vfio_container_ioctl() 2016-03-16 09:55:11 +11:00
virtio virtio-pci: call pci reset variant when guest requests reset. 2016-03-11 16:45:21 +02:00
watchdog watchdog/diag288: avoid race condition on expired watchdog 2016-03-01 12:15:28 +01:00
xen xen: drop XenXC and associated interface wrappers 2016-02-10 12:01:24 +00:00
xenpv xen: Clean up includes 2016-01-29 15:07:23 +00:00
xtensa machine: Use type_init() to register machine classes 2016-03-16 15:34:05 -03:00
Makefile.objs Add a base IPMI interface 2015-12-22 18:39:19 +02:00