AF_XDP is a network socket family that allows communication directly
with the network device driver in the kernel, bypassing most or all
of the kernel networking stack. In the essence, the technology is
pretty similar to netmap. But, unlike netmap, AF_XDP is Linux-native
and works with any network interfaces without driver modifications.
Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't
require access to character devices or unix sockets. Only access to
the network interface itself is necessary.
This patch implements a network backend that communicates with the
kernel by creating an AF_XDP socket. A chunk of userspace memory
is shared between QEMU and the host kernel. 4 ring buffers (Tx, Rx,
Fill and Completion) are placed in that memory along with a pool of
memory buffers for the packet data. Data transmission is done by
allocating one of the buffers, copying packet data into it and
placing the pointer into Tx ring. After transmission, device will
return the buffer via Completion ring. On Rx, device will take
a buffer form a pre-populated Fill ring, write the packet data into
it and place the buffer into Rx ring.
AF_XDP network backend takes on the communication with the host
kernel and the network interface and forwards packets to/from the
peer device in QEMU.
Usage example:
-device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C
-netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1
XDP program bridges the socket with a network interface. It can be
attached to the interface in 2 different modes:
1. skb - this mode should work for any interface and doesn't require
driver support. With a caveat of lower performance.
2. native - this does require support from the driver and allows to
bypass skb allocation in the kernel and potentially use
zero-copy while getting packets in/out userspace.
By default, QEMU will try to use native mode and fall back to skb.
Mode can be forced via 'mode' option. To force 'copy' even in native
mode, use 'force-copy=on' option. This might be useful if there is
some issue with the driver.
Option 'queues=N' allows to specify how many device queues should
be open. Note that all the queues that are not open are still
functional and can receive traffic, but it will not be delivered to
QEMU. So, the number of device queues should generally match the
QEMU configuration, unless the device is shared with something
else and the traffic re-direction to appropriate queues is correctly
configured on a device level (e.g. with ethtool -N).
'start-queue=M' option can be used to specify from which queue id
QEMU should start configuring 'N' queues. It might also be necessary
to use this option with certain NICs, e.g. MLX5 NICs. See the docs
for examples.
In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN
or CAP_BPF capabilities in order to load default XSK/XDP programs to
the network interface and configure BPF maps. It is possible, however,
to run with no capabilities. For that to work, an external process
with enough capabilities will need to pre-load default XSK program,
create AF_XDP sockets and pass their file descriptors to QEMU process
on startup via 'sock-fds' option. Network backend will need to be
configured with 'inhibit=on' to avoid loading of the program.
QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue
or CAP_IPC_LOCK.
There are few performance challenges with the current network backends.
First is that they do not support IO threads. This means that data
path is handled by the main thread in QEMU and may slow down other
work or may be slowed down by some other work. This also means that
taking advantage of multi-queue is generally not possible today.
Another thing is that data path is going through the device emulation
code, which is not really optimized for performance. The fastest
"frontend" device is virtio-net. But it's not optimized for heavy
traffic either, because it expects such use-cases to be handled via
some implementation of vhost (user, kernel, vdpa). In practice, we
have virtio notifications and rcu lock/unlock on a per-packet basis
and not very efficient accesses to the guest memory. Communication
channels between backend and frontend devices do not allow passing
more than one packet at a time as well.
Some of these challenges can be avoided in the future by adding better
batching into device emulation or by implementing vhost-af-xdp variant.
There are also a few kernel limitations. AF_XDP sockets do not
support any kinds of checksum or segmentation offloading. Buffers
are limited to a page size (4K), i.e. MTU is limited. Multi-buffer
support implementation for AF_XDP is in progress, but not ready yet.
Also, transmission in all non-zero-copy modes is synchronous, i.e.
done in a syscall. That doesn't allow high packet rates on virtual
interfaces.
However, keeping in mind all of these challenges, current implementation
of the AF_XDP backend shows a decent performance while running on top
of a physical NIC with zero-copy support.
Test setup:
2 VMs running on 2 physical hosts connected via ConnectX6-Dx card.
Network backend is configured to open the NIC directly in native mode.
The driver supports zero-copy. NIC is configured to use 1 queue.
Inside a VM - iperf3 for basic TCP performance testing and dpdk-testpmd
for PPS testing.
iperf3 result:
TCP stream : 19.1 Gbps
dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
Tx only : 3.4 Mpps
Rx only : 2.0 Mpps
L2 FWD Loopback : 1.5 Mpps
In skb mode the same setup shows much lower performance, similar to
the setup where pair of physical NICs is replaced with veth pair:
iperf3 result:
TCP stream : 9 Gbps
dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
Tx only : 1.2 Mpps
Rx only : 1.0 Mpps
L2 FWD Loopback : 0.7 Mpps
Results in skb mode or over the veth are close to results of a tap
backend with vhost=on and disabled segmentation offloading bridged
with a NIC.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> (docker/lcitool)
Signed-off-by: Jason Wang <jasowang@redhat.com>
Instead of having CI pick tomli from the vendored wheel at configure
time, place it in the containers.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Bios bits avocado tests need mformat (provided by the mtools package) and
xorriso tools in order to run within gitlab CI containers. Add those
dependencies within the Dockerfiles so that containers can be built with
those tools present and bios bits avocado tests can be run there.
xorriso package conflicts with genisoimage package on some distributions.
Therefore, it is not possible to have both the packages at the same time
in the container image uniformly for all distribution flavors. Further,
on some distributions like RHEL, both xorriso and genisoimage
packages provide /usr/bin/genisoimage and on some other distributions like
Fedora, only genisoimage package provides the same utility.
Therefore, this change removes the dependency on geninsoimage for building
container images altogether keeping only xorriso package. At the same time,
cdrom-test.c is updated to use and check for existence of only xorrisofs.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20230504154611.85854-3-anisinha@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Update to commit which has fixes needed for OpenSUSE 15.4 and
re-generate output files.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Message-Id: <bd11b5954d3dd1e989699370af2b9e2e0c77194a.1681735482.git.pkrempa@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We need this to be able to run the tuxrun_baseline tests in CI which
in turn helps us reduce overhead running other tests. We need to
update libvirt-ci and refresh the generated files by running 'make
lcitool-refresh' to get the new mapping.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230228190653.1602033-24-alex.bennee@linaro.org>
For the cross-compilation use-case it is important to add the host
user to the dockerfile so we can map them to the docker environment
when cross-building files.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230228190653.1602033-19-alex.bennee@linaro.org>
We only use it for test-io-channel-command at the moment.
Unfortunately bringing socat into CI exposed an existing bug in the
test-io-channel-command unit test so we disabled it for MacOS in the
previous patch.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230228190653.1602033-3-alex.bennee@linaro.org>
Python 3.6 is at end-of-life. Update the libvirt-ci module to a
version that supports overrides for targets and package mappings;
this way, QEMU can use the newer versions provided by CentOS 8 (Python
3.8) and OpenSUSE 15.3 (Python 3.9).
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add sndio to the FreeBSD CI containers / VM
Signed-off-by: Brad Smith <brad@comstyle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Warner Losh <imp@bsdimp.com>
Message-Id: <Y1f6dxjvD01DtXyG@humpty.home.comstyle.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
This patch updates the docker and cirrus files with the new packages by
running tests/lcitool/refresh
Signed-off-by: Anton Johansson <anjo@rev.ng>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220804115548.13024-10-anjo@rev.ng>
Message-Id: <20220929114231.583801-8-alex.bennee@linaro.org>
Notable changes:
- libvirt-ci source tree was re-arranged, so the script we
run now lives in a bin/ sub-dir
- opensuse 15.2 is replaced by opensuse 15.3
- libslirp is temporarily dropped on opensuse as the
libslirp-version.h is broken
https://bugzilla.opensuse.org/show_bug.cgi?id=1201551
- The incorrectly named python3-virtualenv module was
changed to python3-venv, but most distros don't need
any package as 'venv' is a standard part of python
- glibc-static was renamed to libc-static, to reflect
fact that it isn't going to be glibc on all distros
- The cmocka/json-c deps that were manually added to
the centos dockerfile and are now consistently added
to all targets
Acked-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220722130431.2319019-2-berrange@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220725140520.515340-2-alex.bennee@linaro.org>
add the libvfio-user library as a submodule. build it as a meson
subproject.
libvfio-user is distributed with BSD 3-Clause license and
json-c with MIT (Expat) license
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: c2adec87958b081d1dc8775d4aa05c897912f025.1655151679.git.jag.raman@oracle.com
[Changed submodule URL to QEMU's libvfio-user mirror on GitLab. The QEMU
project mirrors its dependencies so that it can provide full source code
even in the event that its dependencies become unavailable. Note that
the mirror repo is manually updated, so please contact me to make newer
libvfio-user commits available. If I become a bottleneck we can set up a
cronjob.
Updated scripts/meson-buildoptions.sh to match the meson_options.txt
change. Failure to do so can result in scripts/meson-buildoptions.sh
being modified by the build system later on and you end up with a dirty
working tree.
--Stefan]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The perl test harness is not necessary anymore since commit 3d2f73ef75
("build: use "meson test" as the test harness"). Thus remove it from
tests/lcitool/projects/qemu.yml, run "make lcitool-refresh" and manually
clean the remaining docker / vm files that are not managed by lcitool yet.
Message-Id: <20220329102808.423681-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Support for CentOS 8 has stopped at the end of 2021, so let's
switch to the Stream variant instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220201101911.97900-1-thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220204204335.1689602-16-alex.bennee@linaro.org>
The previous commit removed all uses of libxml2.
Refresh lcitool submodule, update qemu.yml and refresh the generated
files by running:
$ make lcitool-refresh
Note: This refreshment also removes libudev dependency on Fedora
and CentOS due to libvirt-ci commit 18bfaee ("mappings: Improve
mapping for libudev"), since "The udev project has been absorbed
by the systemd project", and lttng-ust on FreeBSD runners due to
libvirt-ci commit 6dd9b6f ("guests: drop lttng-ust from FreeBSD
platform").
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220121154134.315047-6-f4bug@amsat.org>
Message-Id: <20220204204335.1689602-10-alex.bennee@linaro.org>
The FUSE exports feature is not built because most container images do
not have libfuse3 development headers installed. Add the necessary
packages to the Dockerfiles.
Cc: Hanna Reitz <hreitz@redhat.com>
Cc: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Beraldo Leal <bleal@redhat.com>
Tested-by: Beraldo Leal <bleal@redhat.com>
Message-Id: <20211207160025.52466-1-stefanha@redhat.com>
[AJB: migrate to lcitool qemu.yml and regenerate]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20220105135009.1584676-21-alex.bennee@linaro.org>
This commit is best examined using the "-b" option to diff.
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20211215141949.3512719-9-berrange@redhat.com>
Message-Id: <20220105135009.1584676-9-alex.bennee@linaro.org>
Under SELinux, Unix domain sockets have two labels. One is on the
disk and can be set with commands such as chcon(1). There is a
different label stored in memory (called the process label). This can
only be set by the process creating the socket. When using SELinux +
SVirt and wanting qemu to be able to connect to a qemu-nbd instance,
you must set both labels correctly first.
For qemu-nbd the options to set the second label are awkward. You can
create the socket in a wrapper program and then exec into qemu-nbd.
Or you could try something with LD_PRELOAD.
This commit adds the ability to set the label straightforwardly on the
command line, via the new --selinux-label flag. (The name of the flag
is the same as the equivalent nbdkit option.)
A worked example showing how to use the new option can be found in
this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1984938
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1984938
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[eblake: rebase to configure changes, reject --selinux-label if it is
not compiled in or not used on a Unix socket]
Note that we may relax some of these restrictions at a later date,
such as making it possible to label a TCP socket, although it may be
smarter to do so as a generic QMP action rather than more one-off
command lines in qemu-nbd.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20211115202944.615966-1-eblake@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
[eblake: adjust meson output as suggested by thuth]
Signed-off-by: Eric Blake <eblake@redhat.com>
This is the fully expanded list of build pre-requisites QEMU can
conceivably use in any scenario.
[AJB: added centos-release-advanced-virtualization/epel-release]
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210623142245.307776-12-berrange@redhat.com>
Message-Id: <20210709143005.1554-20-alex.bennee@linaro.org>
mesa-libEGL-devel is not used in QEMU at all, but mesa-libgbm-devel is.
spice-glib-devel is not use in QEMU at all, but spice-protocol is.
We also need the -devel package for spice-server, not the runtime.
There is no need to specifically refer to python36, we can just
use python3 as in other distros.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210623142245.307776-8-berrange@redhat.com>
Message-Id: <20210709143005.1554-16-alex.bennee@linaro.org>
This will make diffs in later patches clearer.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210623142245.307776-7-berrange@redhat.com>
Message-Id: <20210709143005.1554-15-alex.bennee@linaro.org>
It is good practice to use an explicit registry for referencing the base
image. This is because some distros will inject their own registries
into the search path. For example registry.fedoraproject.org comes ahead
of docker.io. Using an explicit registry avoids wasting time querying
multiple registries for images that they won't have.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210623142245.307776-5-berrange@redhat.com>
Message-Id: <20210709143005.1554-13-alex.bennee@linaro.org>
Add libffi as a build requirement for TCI.
Add libffi to the dockerfiles to satisfy that requirement.
Construct an ffi_cif structure for each unique typemask.
Record the result in a separate hash table for later lookup;
this allows helper_table to stay const.
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We are using the dtrace backend in downstream RHEL, so testing this
in the CentOS 8 task seems to be a good fit.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Willian Rampazzo <willianr@redhat.com>
Message-Id: <20210331160351.3071279-1-thuth@redhat.com>
Message-Id: <20210401102530.12030-12-alex.bennee@linaro.org>
Since the meson build system rework, the configure script prefers the
git submodules over the system libraries. So we are testing compilation
with capstone, fdt and libslirp as a submodule all over the place,
burning CPU cycles by recompiling these third party modules and wasting
some network bandwidth in the CI by cloning the submodules each time.
Let's stop doing that in at least a couple of jobs and use the system
libraries instead.
While we're at it, also install meson in the Fedora container, since
it is new enough already, so we do not need to check out the meson
submodule here.
Message-Id: <20210121174451.658924-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
We do not support SDL1 in QEMU anymore. Use SDL2 instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201021072308.9224-1-thuth@redhat.com>
Message-Id: <20201021163136.27324-4-alex.bennee@linaro.org>
Some of the "check-acceptance" tests are still skipped in the CI
since the docker images do not provide the necessary packages, e.g.
the netcat binary. Add them to get more test coverage.
Message-Id: <20201023073351.251332-5-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
genisoimage is needed for running the tests/qtest/cdrom-test qtest.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201006174347.152040-1-thuth@redhat.com>
Message-Id: <20201007160038.26953-7-alex.bennee@linaro.org>
Most jobs test the latest nettle library. This adds explicit coverage
for latest gcrypt using Fedora, and old gcrypt and nettle using
CentOS-7. The latter does a minimal tools-only build, as we only need to
validate that the crypto code builds and unit tests pass. Finally a job
disabling both nettle and gcrypt is provided to validate that gnutls
still works.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200901133050.381844-3-berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
QEMU does not use flex/bison packages.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200515163029.12917-2-philmd@redhat.com>
Which is currenly missing, and will be referenced later in the
contributed CI playbooks.
Signed-off-by: Cleber Rosa <crosa@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200312193616.438922-2-crosa@redhat.com>
Signed-off-by: Cleber Rosa <crosa@redhat.com>