Some small bug fixes, notably a fix for a regression
in cpu hotplug after migration. I also included a
new test, just to help make sure we don't regress cxl.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmdHJRIPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpTkQIAJYFpFTPRnO8mA6gurfheB7Jt0ywAMrjKWfg
uEkfZXXSQeCS8NBNPoZt7S8AE6xHE2a4b5lNWiS4a4coFmgTjtKPM8YsU01riyRk
EasRxynGua2XGUWGK93r9L27v9zGz/vRC0Lujmw3VAUKGeL7a17KzmxwXLXe+DzS
PgcI/H5hqoCDalT8aF6wOEDaWIHeo4dauDubYavW/+yaPtUvmy9MBkXbIV4iYqT5
V6geeYIKW/yE/GHxxXOw/RE1FgpiZhebtQP26jPTSk0z/JaV5S0DNYs07joXmbaU
fW5LSLgH3+oDI/GIhvsZ6hP87rVXBdaAogeJqT8SsuChBR55TpY=
=B7KB
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pc,pci: bug fixes, new test
Some small bug fixes, notably a fix for a regression
in cpu hotplug after migration. I also included a
new test, just to help make sure we don't regress cxl.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmdHJRIPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpTkQIAJYFpFTPRnO8mA6gurfheB7Jt0ywAMrjKWfg
# uEkfZXXSQeCS8NBNPoZt7S8AE6xHE2a4b5lNWiS4a4coFmgTjtKPM8YsU01riyRk
# EasRxynGua2XGUWGK93r9L27v9zGz/vRC0Lujmw3VAUKGeL7a17KzmxwXLXe+DzS
# PgcI/H5hqoCDalT8aF6wOEDaWIHeo4dauDubYavW/+yaPtUvmy9MBkXbIV4iYqT5
# V6geeYIKW/yE/GHxxXOw/RE1FgpiZhebtQP26jPTSk0z/JaV5S0DNYs07joXmbaU
# fW5LSLgH3+oDI/GIhvsZ6hP87rVXBdaAogeJqT8SsuChBR55TpY=
# =B7KB
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 27 Nov 2024 13:56:34 GMT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu:
vhost: fail device start if iotlb update fails
bios-tables-test: Add data for complex numa test (GI, GP etc)
bios-tables-test: Add complex SRAT / HMAT test for GI GP
bios-tables-test: Allow for new acpihmat-generic-x test data.
qapi/qom: Change Since entry for AcpiGenericPortProperties to 9.2
hw/acpi: Fix size of HID in build_append_srat_acpi_device_handle()
qapi: fix device-sync-config since-version
hw/cxl: Check for zero length features in cmd_features_set_feature()
tests/acpi: update expected blobs
Revert "hw/acpi: Make CPUs ACPI `presence` conditional during vCPU hot-unplug"
Revert "hw/acpi: Update ACPI `_STA` method with QOM vCPU ACPI Hotplug states"
qtest: allow ACPI DSDT Table changes
vhost_net: fix assertion triggered by batch of host notifiers processing
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When the backend of vhost_net restarts during the vm is running, vhost_net
is stopped and started. The virtio_device_grab_ioeventfd() fucntion in
vhost_net_enable_notifiers() will result in a call to
virtio_bus_set_host_notifier()(assign=false).
And now virtio_device_grab_ioeventfd() is batched in a single transaction
with virtio_bus_set_host_notifier()(assign=true).
This triggers the following assertion:
kvm_mem_ioeventfd_del: error deleting ioeventfd: Bad file descriptor
This patch moves virtio_device_grab_ioeventfd() out of the batch to fix
this problem.
To be noted that the for loop to release ioeventfd should start from i+1,
not i, because the i-th ioeventfd has already been released in
vhost_dev_disable_notifiers_nvqs().
Fixes: 6166799f6 ("vhost_net: configure all host notifiers in a single MR transaction")
Signed-off-by: Zuo Boqun <zuoboqun@baidu.com>
Reported-by: Gao Shiyuan <gaoshiyuan@baidu.com>
Message-Id: <20241115080312.3184-1-zuoboqun@baidu.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Call virtio_net_set_multiqueue() to add queues before loading their
states. Otherwise the loaded queues will not have handlers and elements
in them will not be processed.
Cc: qemu-stable@nongnu.org
Fixes: 8c49756825 ("virtio-net: Add only one queue pair when realizing")
Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
receive_header() used to cast the const qualifier of the pointer to the
received packet away to modify the header. Avoid this by copying the
received header to buffer.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The specification says hash_report should be set to
VIRTIO_NET_HASH_REPORT_NONE if VIRTIO_NET_F_HASH_REPORT is negotiated
but not configured with VIRTIO_NET_CTRL_MQ_RSS_CONFIG. However,
virtio_net_receive_rcu() instead wrote out the content of the extra_hdr
variable, which is not uninitialized in such a case.
Fix this by zeroing the extra_hdr.
Fixes: e22f0603fb ("virtio-net: reference implementation of hash report")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
virtio_net_process_rss() fills the values used for hash reporting, but
the values used to be thrown away with a recursive function call if
the queue changes after RSS. Avoid the function call to keep the values.
Fixes: a4c960eedc ("virtio-net: Do not write hashes to peer buffer")
Buglink: https://issues.redhat.com/browse/RHEL-59572
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
virtio_net_can_receive() checks if the queue is ready, but RSS will
change the queue to use so, strictly speaking, we may still be able to
receive the packet even if the queue initially provided is not ready.
Perform RSS before virtio_net_can_receive() to cover such a case.
Fixes: 4474e37a5b ("virtio-net: implement RX RSS processing")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
work_around_broken_dhclient() accesses IP and UDP headers to detect
relevant packets and to calculate checksums, but it didn't check if
the packet has size sufficient to accommodate them, causing out-of-bound
access hazards. Fix this by correcting the size requirement.
Fixes: 1d41b0c1ec ("Work around dhclient brokenness")
Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
of_dpa_cmd_add_acl_ip() is called from a single place, and despite the
fact that it always returns ROCKER_OK, its return value is still checked
by the caller.
Change of_dpa_cmd_add_acl_ip() to return void and remove the superfluous
check from of_dpa_cmd_add_acl().
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2471
Signed-off-by: Rodrigo Dias Correa <r@drigo.nl>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Message-id: 20241114075051.404284-1-r@drigo.nl
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In virtio-net.c we assume that the IP length field in the packet is
aligned, and we copy its address into a uint16_t* in the
VirtioNetRscUnit struct which we then dereference later. This isn't
a safe assumption; it will also result in compilation failures if we
mark the ip_header struct as QEMU_PACKED because the compiler will
not let you take the address of an unaligned struct field.
Make the ip_plen field in VirtioNetRscUnit a void*, and make all the
places where we read or write through that pointer instead use some
new accessor functions read_unit_ip_len() and write_unit_ip_len()
which account for the pointer being potentially unaligned and also do
the network-byte-order conversion we were previously using htons() to
perform.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20241114141619.806652-2-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-ID: <20241103133412.73536-17-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Instead of defining redundant constants and using magic numbers reuse the
existing MII constants.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
cc: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-ID: <20241103133412.73536-16-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Convert the LOG_GUEST_ERROR for the "tx descriptor is owned
by software" to a trace message. This condition is normal
when there is there is nothing to transmit, and we would
otherwise spam the logs with it in that situation.
Signed-off-by: Nabih Estefan <nabihestefan@google.com>
Signed-off-by: Roque Arcudia Hernandez <roqueh@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20241014184847.1594056-1-roqueh@google.com
[PMM: tweaked commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We computes indirections_len by adding 1 to indirection_table_mask, but
it may overflow indirection_table_mask is UINT16_MAX. Check if
indirection_table_mask is small enough before adding 1.
Fixes: 590790297c ("virtio-net: implement RSS configuration command")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This adds more trace events to key eBPF RSS setup operations, and
also distinguishes events from multiple NIC instances.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
If the user/mgmt app passed in a set of pre-opened FDs for eBPF RSS,
then it is expecting QEMU to use them. Any failure to do so must be
considered a fatal error and propagated back up the stack, otherwise
deployment mistakes will not be detectable in a prompt manner. When
not using pre-opened FDs, then eBPF RSS is tried on a "best effort"
basis only and thus fallback to software RSS is valid.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The eBPF code is currently reporting error messages through trace
events. Trace events are fine for debugging, but they are not to be
considered the primary error reporting mechanism, as their output
is inaccessible to callers.
This adds an "Error **errp" parameter to all methods which have
important error scenarios to report to the caller.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
rocker_fp_ports hasn't been used since it was added back in 2015.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
net_rx_pkt_get_l3_hdr_offset and net_rx_pkt_get_iovec_len haven't
been used since they were added.
Remove them.
Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: also removed net_rx_pkt_get_l3_hdr_offset prototype from hw/net/net_rx_pkt.h as suggested by Akihiko Odaki)
This patch is part of a series that moves towards a consistent use of
g_assert_not_reached() rather than an ad hoc mix of different
assertion mechanisms.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20240919044641.386068-28-pierrick.bouvier@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
This patch is part of a series that moves towards a consistent use of
g_assert_not_reached() rather than an ad hoc mix of different
assertion mechanisms.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20240919044641.386068-19-pierrick.bouvier@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
This patch is part of a series that moves towards a consistent use of
g_assert_not_reached() rather than an ad hoc mix of different
assertion mechanisms.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20240919044641.386068-10-pierrick.bouvier@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
This patch is part of a series that moves towards a consistent use of
g_assert_not_reached() rather than an ad hoc mix of different
assertion mechanisms.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20240919044641.386068-4-pierrick.bouvier@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The 'GPL-2.0' license identifier has been deprecated since license
list version 3.0 [1] and replaced by the 'GPL-2.0-only' tag [2].
[1] https://spdx.org/licenses/GPL-2.0.html
[2] https://spdx.org/licenses/GPL-2.0-only.html
Mechanical patch running:
$ sed -i -e s/GPL-2.0/GPL-2.0-only/ \
$(git grep -l 'SPDX-License-Identifier: GPL-2.0[ $]' \
| egrep -v '^linux-headers|^include/standard-headers')
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Since the "2 | 3+" expression can be simplified as "2+",
it is pointless to mention the GPLv3 license.
Add the corresponding SPDX identifier to remove all doubt.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
in many cases, <zlib.h> is only included for crc32 function,
and in some of them, there's a comment saying that, but in
a different way. In one place (hw/net/rtl8139.c), there was
another #include added between the comment and <zlib.h> include.
Make all such comments to be on the same line as #include, make
it consistent, and also add a few missing comments, including
hw/nvram/mac_nvram.c which uses adler32 instead.
There's no code changes.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
We just removed the single machine using it (axis-dev88).
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Message-ID: <20240904143603.52934-10-philmd@linaro.org>
The read index should not be changed when storing a new message into the
RX or TX FIFO. Changing it at this point will cause the reader to get
out of sync. The wrapping of the read index is already handled by the
pre-write functions for the FIFO status registers anyway.
Additionally, the calculation for wrapping the store index was off by
one, which caused new messages to be written to the wrong location in
the FIFO. This caused incorrect messages to be delivered.
Signed-off-by: Doug Brown <doug@schmorgal.com>
Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com>
Message-id: 20240827034927.66659-8-doug@schmorgal.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Use QEMU's helper functions can_dlc2len() and can_len2dlc() for
translating between the raw DLC value and the SocketCAN length value.
This also has the side effect of correctly handling received CAN FD
frames with a DLC of 0-8, which was broken previously.
Signed-off-by: Doug Brown <doug@schmorgal.com>
Reviewed-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com>
Message-id: 20240827034927.66659-7-doug@schmorgal.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The endianness of the CAN data was backwards in each group of 4 bytes.
For example, the following data:
00 11 22 33 44 55 66 77
was showing up like this:
33 22 11 00 77 66 55 44
Fix both the TX and RX code to put the data in the correct order.
Signed-off-by: Doug Brown <doug@schmorgal.com>
Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com>
Acked-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Message-id: 20240827034927.66659-6-doug@schmorgal.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Add support for QEMU_CAN_FRMF_ESI and QEMU_CAN_FRMF_BRS flags, and
ensure frame->flags is always initialized to 0.
Note that the Xilinx IP core doesn't allow manually setting the ESI bit
during transmits, so it's only implemented for the receive case.
Signed-off-by: Doug Brown <doug@schmorgal.com>
Reviewed-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Message-id: 20240827034927.66659-5-doug@schmorgal.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Previously the emulated CAN ID register was being set to the exact same
value stored in qemu_can_frame.can_id. This doesn't work correctly
because the Xilinx IP core uses a different bit arrangement than
qemu_can_frame for all of its ID registers. Correct this problem for
both RX and TX, including RX filtering.
Signed-off-by: Doug Brown <doug@schmorgal.com>
Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com>
Acked-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Message-id: 20240827034927.66659-4-doug@schmorgal.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When checking the QEMU_CAN_FRMF_TYPE_FD flag, we need to ignore other
potentially set flags. Before this change, received CAN FD frames from
SocketCAN weren't being recognized as CAN FD.
Signed-off-by: Doug Brown <doug@schmorgal.com>
Reviewed-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com>
Message-id: 20240827034927.66659-3-doug@schmorgal.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The interrupt level should be 0 or 1. The existing code was using the
interrupt flags to determine the level. In the only machine currently
supported (xlnx-versal-virt), the GICv3 was masking off all bits except
bit 0 when applying it, resulting in the IRQ never being delivered.
Signed-off-by: Doug Brown <doug@schmorgal.com>
Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com>
Reviewed-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Message-id: 20240827034927.66659-2-doug@schmorgal.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Use device_class_set_legacy_reset() instead of opencoding an
assignment to DeviceClass::reset. This change was produced
with:
spatch --macro-file scripts/cocci-macro-file.h \
--sp-file scripts/coccinelle/device-reset.cocci \
--keep-comments --smpl-spacing --in-place --dir hw
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240830145812.1967042-8-peter.maydell@linaro.org
This allows the vhost_net device which has multiple virtqueues to batch
the setup of all its host notifiers. This significantly reduces the
vhost_net device starting and stoping time, e.g. the time spend
on enabling notifiers reduce from 630ms to 75ms and the time spend on
disabling notifiers reduce from 441ms to 45ms for a VM with 192 vCPUs
and 15 vhost-user-net devices (64vq per device) in our case.
Signed-off-by: zuoboqun <zuoboqun@baidu.com>
Message-Id: <20240816070835.8309-1-zuoboqun@baidu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Historically, .get_vhost() was probably only called when
vdev->vhost_started is true. However, we now decidedly want to call it
also when vhost_started is false, specifically so we can issue a reset
to the vhost back-end while device operation is stopped.
Some .get_vhost() implementations dereference some pointers (or return
offsets from them) that are probably guaranteed to be non-NULL when
vhost_started is true, but not necessarily otherwise. This patch makes
all such implementations check all such pointers, returning NULL if any
is NULL.
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20240723163941.48775-2-hreitz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
VIRTIO_NET_F_RSC_EXT is implemented in the rx data path, which vhost
implements, so vhost needs to support the feature if it is ever to be
enabled with vhost. The feature must be disabled otherwise.
Fixes: 2974e916df ("virtio-net: support RSC v4/v6 tcp traffic for Windows HCK")
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240802-rsc-v1-1-2b607bd2f555@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Receive coalescing is visible to the target machine, so its timers
should use virtual time like other timers in virtio-net, to be
compatible with record-replay.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20240813050638.446172-10-npiggin@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240813202329.1237572-18-alex.bennee@linaro.org>
The regular qemu_bh_schedule() calls result in non-deterministic
execution of the bh in record-replay mode, which causes replay failure.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20240813050638.446172-9-npiggin@gmail.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20240813202329.1237572-17-alex.bennee@linaro.org>
Patch 06b1297017 ("virtio-net: fix network stall under load")
added double-check to test whether the available buffer size
can satisfy the request or not, in case the guest has added
some buffers to the avail ring simultaneously after the first
check. It will be lucky if the available buffer size becomes
okay after the double-check, then the host can send the packet
to the guest. If the buffer size still can't satisfy the request,
even if the guest has added some buffers, viritio-net would
stall at the host side forever.
The patch enables notification and checks whether the guest has
added some buffers since last check of available buffers when
the available buffers are insufficient. If no buffer is added,
return false, else recheck the available buffers in the loop.
If the available buffers are sufficient, disable notification
and return true.
Changes:
1. Change the return type of virtqueue_get_avail_bytes() from void
to int, it returns an opaque that represents the shadow_avail_idx
of the virtqueue on success, else -1 on error.
2. Add a new API: virtio_queue_enable_notification_and_check(),
it takes an opaque as input arg which is returned from
virtqueue_get_avail_bytes(). It enables notification firstly,
then checks whether the guest has added some buffers since
last check of available buffers or not by virtio_queue_poll(),
return ture if yes.
The patch also reverts patch "06b12970174".
The case below can reproduce the stall.
Guest 0
+--------+
| iperf |
---------------> | server |
Host | +--------+
+--------+ | ...
| iperf |----
| client |---- Guest n
+--------+ | +--------+
| | iperf |
---------------> | server |
+--------+
Boot many guests from qemu with virtio network:
qemu ... -netdev tap,id=net_x \
-device virtio-net-pci-non-transitional,\
iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x
Each guest acts as iperf server with commands below:
iperf3 -s -D -i 10 -p 8001
iperf3 -s -D -i 10 -p 8002
The host as iperf client:
iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000
After some time, the host loses connection to the guest,
the guest can send packet to the host, but can't receive
packet from the host.
It's more likely to happen if SWIOTLB is enabled in the guest,
allocating and freeing bounce buffer takes some CPU ticks,
copying from/to bounce buffer takes more CPU ticks, compared
with that there is no bounce buffer in the guest.
Once the rate of producing packets from the host approximates
the rate of receiveing packets in the guest, the guest would
loop in NAPI.
receive packets ---
| |
v |
free buf virtnet_poll
| |
v |
add buf to avail ring ---
|
| need kick the host?
| NAPI continues
v
receive packets ---
| |
v |
free buf virtnet_poll
| |
v |
add buf to avail ring ---
|
v
... ...
On the other hand, the host fetches free buf from avail
ring, if the buf in the avail ring is not enough, the
host notifies the guest the event by writing the avail
idx read from avail ring to the event idx of used ring,
then the host goes to sleep, waiting for the kick signal
from the guest.
Once the guest finds the host is waiting for kick singal
(in virtqueue_kick_prepare_split()), it kicks the host.
The host may stall forever at the sequences below:
Host Guest
------------ -----------
fetch buf, send packet receive packet ---
... ... |
fetch buf, send packet add buf |
... add buf virtnet_poll
buf not enough avail idx-> add buf |
read avail idx add buf |
add buf ---
receive packet ---
write event idx ... |
wait for kick add buf virtnet_poll
... |
---
no more packet, exit NAPI
In the first loop of NAPI above, indicated in the range of
virtnet_poll above, the host is sending packets while the
guest is receiving packets and adding buffers.
step 1: The buf is not enough, for example, a big packet
needs 5 buf, but the available buf count is 3.
The host read current avail idx.
step 2: The guest adds some buf, then checks whether the
host is waiting for kick signal, not at this time.
The used ring is not empty, the guest continues
the second loop of NAPI.
step 3: The host writes the avail idx read from avail
ring to used ring as event idx via
virtio_queue_set_notification(q->rx_vq, 1).
step 4: At the end of the second loop of NAPI, recheck
whether kick is needed, as the event idx in the
used ring written by the host is beyound the
range of kick condition, the guest will not
send kick signal to the host.
Fixes: 06b1297017 ("virtio-net: fix network stall under load")
Cc: qemu-stable@nongnu.org
Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Ensure the queue index points to a valid queue when software RSS
enabled. The new calculation matches with the behavior of Linux's TAP
device with the RSS eBPF program.
Fixes: 4474e37a5b ("virtio-net: implement RX RSS processing")
Reported-by: Zhibin Hu <huzhibin5@huawei.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Old linux kernel rtl8139 drivers (ex. debian 2.1) uses outb to set the rx
mode for RxConfig. Unfortunatelly qemu does not support outb for RxConfig.
Signed-off-by: Hans <sungdgdhtryrt@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
- Restrict probe_access*() functions to TCG (Phil)
- Extract do_invalidate_device_tlb from vtd_process_device_iotlb_desc (Clément)
- Fixes in Loongson IPI model (Bibo & Phil)
- Make docs/interop/firmware.json compatible with qapi-gen.py script (Thomas)
- Correct MPC I2C MMIO region size (Zoltan)
- Remove useless cast in Loongson3 Virt machine (Yao)
- Various uses of range overlap API (Yao)
- Use ERRP_GUARD macro in nubus_virtio_mmio_realize (Zhao)
- Use DMA memory API in Goldfish UART model (Phil)
- Expose fifo8_pop_buf and introduce fifo8_drop (Phil)
- MAINTAINERS updates (Zhao, Phil)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmagFF8ACgkQ4+MsLN6t
wN5bKg//f5TwUhsy2ff0FJpHheDOj/9Gc2nZ1U/Fp0E5N3sz3A7MGp91wye6Xwi3
XG34YN9LK1AVzuCdrEEs5Uaxs1ZS1R2mV+fZaGHwYYxPDdnXxGyp/2Q0eyRxzbcN
zxE2hWscYSZbPVEru4HvZJKfp4XnE1cqA78fJKMAdtq0IPq38tmQNRlJ+gWD9dC6
ZUHXPFf3DnucvVuwqb0JYO/E+uJpcTtgR6pc09Xtv/HFgMiS0vKZ1I/6LChqAUw9
eLMpD/5V2naemVadJe98/dL7gIUnhB8GTjsb4ioblG59AO/uojutwjBSQvFxBUUw
U5lX9OSn20ouwcGiqimsz+5ziwhCG0R6r1zeQJFqUxrpZSscq7NQp9ygbvirm+wS
edLc8yTPf4MtYOihzPP9jLPcXPZjEV64gSnJISDDFYWANCrysX3suaFEOuVYPl+s
ZgQYRVSSYOYHgNqBSRkPKKVUxskSQiqLY3SfGJG4EA9Ktt5lD1cLCXQxhdsqphFm
Ws3zkrVVL0EKl4v/4MtCgITIIctN1ZJE9u3oPJjASqSvK6EebFqAJkc2SidzKHz0
F3iYX2AheWNHCQ3HFu023EvFryjlxYk95fs2f6Uj2a9yVbi813qsvd3gcZ8t0kTT
+dmQwpu1MxjzZnA6838R6OCMnC+UpMPqQh3dPkU/5AF2fc3NnN8=
=J/I2
-----END PGP SIGNATURE-----
Merge tag 'hw-misc-20240723' of https://github.com/philmd/qemu into staging
Misc HW patch queue
- Restrict probe_access*() functions to TCG (Phil)
- Extract do_invalidate_device_tlb from vtd_process_device_iotlb_desc (Clément)
- Fixes in Loongson IPI model (Bibo & Phil)
- Make docs/interop/firmware.json compatible with qapi-gen.py script (Thomas)
- Correct MPC I2C MMIO region size (Zoltan)
- Remove useless cast in Loongson3 Virt machine (Yao)
- Various uses of range overlap API (Yao)
- Use ERRP_GUARD macro in nubus_virtio_mmio_realize (Zhao)
- Use DMA memory API in Goldfish UART model (Phil)
- Expose fifo8_pop_buf and introduce fifo8_drop (Phil)
- MAINTAINERS updates (Zhao, Phil)
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmagFF8ACgkQ4+MsLN6t
# wN5bKg//f5TwUhsy2ff0FJpHheDOj/9Gc2nZ1U/Fp0E5N3sz3A7MGp91wye6Xwi3
# XG34YN9LK1AVzuCdrEEs5Uaxs1ZS1R2mV+fZaGHwYYxPDdnXxGyp/2Q0eyRxzbcN
# zxE2hWscYSZbPVEru4HvZJKfp4XnE1cqA78fJKMAdtq0IPq38tmQNRlJ+gWD9dC6
# ZUHXPFf3DnucvVuwqb0JYO/E+uJpcTtgR6pc09Xtv/HFgMiS0vKZ1I/6LChqAUw9
# eLMpD/5V2naemVadJe98/dL7gIUnhB8GTjsb4ioblG59AO/uojutwjBSQvFxBUUw
# U5lX9OSn20ouwcGiqimsz+5ziwhCG0R6r1zeQJFqUxrpZSscq7NQp9ygbvirm+wS
# edLc8yTPf4MtYOihzPP9jLPcXPZjEV64gSnJISDDFYWANCrysX3suaFEOuVYPl+s
# ZgQYRVSSYOYHgNqBSRkPKKVUxskSQiqLY3SfGJG4EA9Ktt5lD1cLCXQxhdsqphFm
# Ws3zkrVVL0EKl4v/4MtCgITIIctN1ZJE9u3oPJjASqSvK6EebFqAJkc2SidzKHz0
# F3iYX2AheWNHCQ3HFu023EvFryjlxYk95fs2f6Uj2a9yVbi813qsvd3gcZ8t0kTT
# +dmQwpu1MxjzZnA6838R6OCMnC+UpMPqQh3dPkU/5AF2fc3NnN8=
# =J/I2
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 24 Jul 2024 06:36:47 AM AEST
# gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]
* tag 'hw-misc-20240723' of https://github.com/philmd/qemu: (28 commits)
MAINTAINERS: Add myself as a reviewer of machine core
MAINTAINERS: Cover guest-agent in QAPI schema
util/fifo8: Introduce fifo8_drop()
util/fifo8: Expose fifo8_pop_buf()
util/fifo8: Rename fifo8_pop_buf() -> fifo8_pop_bufptr()
util/fifo8: Rename fifo8_peek_buf() -> fifo8_peek_bufptr()
util/fifo8: Use fifo8_reset() in fifo8_create()
util/fifo8: Fix style
chardev/char-fe: Document returned value on error
hw/char/goldfish: Use DMA memory API
hw/nubus/virtio-mmio: Fix missing ERRP_GUARD() in realize handler
dump: make range overlap check more readable
crypto/block-luks: make range overlap check more readable
system/memory_mapping: make range overlap check more readable
sparc/ldst_helper: make range overlap check more readable
cxl/mailbox: make range overlap check more readable
util/range: Make ranges_overlap() return bool
hw/mips/loongson3_virt: remove useless type cast
hw/i2c/mpc_i2c: Fix mmio region size
docs/interop/firmware.json: convert "Example" section
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Since fifo8_pop_buf() return a const buffer (which points
directly into the FIFO backing store). Rename it using the
'bufptr' suffix to better reflect that it is a pointer to
the internal buffer that is being returned. This will help
differentiate with methods *copying* the FIFO data.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20240722160745.67904-6-philmd@linaro.org>
Add support for the VIRTIO_F_IN_ORDER feature across a variety of vhost
devices.
The inclusion of VIRTIO_F_IN_ORDER in the feature bits arrays for these
devices ensures that the backend is capable of offering and providing
support for this feature, and that it can be disabled if the backend
does not support it.
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <20240710125522.4168043-6-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>