Next patches will register the vhost_vdpa memory listener while the VM
is migrating at the destination, so we can map the memory to the device
before stopping the VM at the source. The main goal is to reduce the
downtime.
However, the destination QEMU is unaware of which vhost_vdpa device will
register its memory_listener. If the source guest has CVQ enabled, it
will be the CVQ device. Otherwise, it will be the first one.
Move the memory listener to a common place rather than always in the
first / last vhost_vdpa.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-14-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The memory listener functions can call these too. Make vdpa_iommu work
with VhostVDPAShared.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-13-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The callers only have the shared information by the end of this series.
Start converting this functions.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-12-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Next patches will register the vhost_vdpa memory listener while the VM
is migrating at the destination, so we can map the memory to the device
before stopping the VM at the source. The main goal is to reduce the
downtime.
However, the destination QEMU is unaware of which vhost_vdpa device will
register its memory_listener. If the source guest has CVQ enabled, it
will be the CVQ device. Otherwise, it will be the first one.
Move the iommu_list member to VhostVDPAShared so all vhost_vdpa can use
it, rather than always in the first / last vhost_vdpa.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-11-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It is always VHOST_IOTLB_MSG_V2. We can always make it back per
vhost_dev if needed.
This change makes easier for vhost_vdpa_map and unmap not to depend on
vhost_vdpa but only in VhostVDPAShared.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-10-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Next patches will register the vhost_vdpa memory listener while the VM
is migrating at the destination, so we can map the memory to the device
before stopping the VM at the source. The main goal is to reduce the
downtime.
However, the destination QEMU is unaware of which vhost_vdpa device will
register its memory_listener. If the source guest has CVQ enabled, it
will be the CVQ device. Otherwise, it will be the first one.
Move the backend_cap member to VhostVDPAShared so all vhost_vdpa can use
it, rather than always in the first / last vhost_vdpa.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-9-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Next patches will register the vhost_vdpa memory listener while the VM
is migrating at the destination, so we can map the memory to the device
before stopping the VM at the source. The main goal is to reduce the
downtime.
However, the destination QEMU is unaware of which vhost_vdpa device will
register its memory_listener. If the source guest has CVQ enabled, it
will be the CVQ device. Otherwise, it will be the first one.
Move the iotlb_batch_begin_sent member to VhostVDPAShared so all
vhost_vdpa can use it, rather than always in the first / last
vhost_vdpa.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-8-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Next patches will register the vhost_vdpa memory listener while the VM
is migrating at the destination, so we can map the memory to the device
before stopping the VM at the source. The main goal is to reduce the
downtime.
However, the destination QEMU is unaware of which vhost_vdpa device will
register its memory_listener. If the source guest has CVQ enabled, it
will be the CVQ device. Otherwise, it will be the first one.
Move the file descriptor to VhostVDPAShared so all vhost_vdpa can use
it, rather than always in the first / last vhost_vdpa.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-7-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Next patches will register the vhost_vdpa memory listener while the VM
is migrating at the destination, so we can map the memory to the device
before stopping the VM at the source. The main goal is to reduce the
downtime.
However, the destination QEMU is unaware of which vhost_vdpa device will
register its memory_listener. If the source guest has CVQ enabled, it
will be the CVQ device. Otherwise, it will be the first one.
Move the shadow_data member to VhostVDPAShared so all vhost_vdpa can use
it, rather than always in the first or last vhost_vdpa.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-5-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Next patches will register the vhost_vdpa memory listener while the VM
is migrating at the destination, so we can map the memory to the device
before stopping the VM at the source. The main goal is to reduce the
downtime.
However, the destination QEMU is unaware of which vhost_vdpa device will
register its memory_listener. If the source guest has CVQ enabled, it
will be the CVQ device. Otherwise, it will be the first one.
Move the iova range to VhostVDPAShared so all vhost_vdpa can use it,
rather than always in the first or last vhost_vdpa.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-4-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Next patches will register the vhost_vdpa memory listener while the VM
is migrating at the destination, so we can map the memory to the device
before stopping the VM at the source. The main goal is to reduce the
downtime.
However, the destination QEMU is unaware of which vhost_vdpa device will
register its memory_listener. If the source guest has CVQ enabled, it
will be the CVQ device. Otherwise, it will be the first one.
Move the iova tree to VhostVDPAShared so all vhost_vdpa can use it,
rather than always in the first or last vhost_vdpa.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-3-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It will hold properties shared among all vhost_vdpa instances associated
with of the same device. For example, we just need one iova_tree or one
memory listener for the entire device.
Next patches will register the vhost_vdpa memory listener at the
beginning of the VM migration at the destination. This enables QEMU to
map the memory to the device before stopping the VM at the source,
instead of doing while both source and destination are stopped, thus
minimizing the downtime.
However, the destination QEMU is unaware of which vhost_vdpa struct will
register its memory_listener. If the source guest has CVQ enabled, it
will be the one associated with the CVQ. Otherwise, it will be the
first one.
Save the memory operations related members in a common place rather than
always in the first / last vhost_vdpa.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20231221174322.3130442-2-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This adds support for vhost-scsi to be able to create a worker thread
per virtqueue. Right now for vhost-net we get a worker thread per
tx/rx virtqueue pair which scales nicely as we add more virtqueues and
CPUs, but for scsi we get the single worker thread that's shared by all
virtqueues. When trying to send IO to more than 2 virtqueues the single
thread becomes a bottlneck.
This patch adds a new setting, worker_per_virtqueue, which can be set
to:
false: Existing behavior where we get the single worker thread.
true: Create a worker per IO virtqueue.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20231204231618.21962-3-michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
This adds the vhost backend callouts for the worker ioctls added in the
6.4 linux kernel commit:
c1ecd8e95007 ("vhost: allow userspace to create workers")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20231204231618.21962-2-michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add the iothread-vq-mapping parameter to assign virtqueues to IOThreads.
Store the vq:AioContext mapping in the new struct
VirtIOBlockDataPlane->vq_aio_context[] field and refactor the code to
use the per-vq AioContext instead of the BlockDriverState's AioContext.
Reimplement --device virtio-blk-pci,iothread= and non-IOThread mode by
assigning all virtqueues to the IOThread and main loop's AioContext in
vq_aio_context[], respectively.
The comment in struct VirtIOBlockDataPlane about EventNotifiers is
stale. Remove it.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20231220134755.814917-5-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The AioContext lock no longer has any effect. Remove it.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-ID: <20231205182011.1976568-9-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Protect the Task Management Function BH state with a lock. The TMF BH
runs in the main loop thread. An IOThread might process a TMF at the
same time as the TMF BH is running. Therefore tmf_bh_list and tmf_bh
must be protected by a lock.
Run TMF request completion in the IOThread using aio_wait_bh_oneshot().
This avoids more locking to protect the virtqueue and SCSI layer state.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231205182011.1976568-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
s->rq is accessed from IO_CODE and GLOBAL_STATE_CODE. Introduce a lock
to protect s->rq and eliminate reliance on the AioContext lock.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230914140101.1065008-3-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The VirtioPCIDeviceTypeInfo structure, added in commit a4ee4c8baa
("virtio: Helper for registering virtio device types") got extended
in commit 8ea90ee690 ("virtio: add class_size") with the @class_size
field. Do similarly with the @instance_finalize field.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20231121174051.63038-2-philmd@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
virtio sound card support
vhost-user: back-end state migration
cxl:
line length reduction
enabling fabric management
vhost-vdpa:
shadow virtqueue hash calculation Support
shadow virtqueue RSS Support
tests:
CPU topology related smbios test cases
Fixes, cleanups all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmVKDDoPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpF08H/0Zts8uvkHbgiOEJw4JMHU6/VaCipfIYsp01
GSfwYOyEsXJ7GIxKWaCiMnWXEm7tebNCPKf3DoUtcAojQj3vuF9XbWBKw/bfRn83
nGO/iiwbYViSKxkwqUI+Up5YiN9o0M8gBFrY0kScPezbnYmo5u2bcADdEEq6gH68
D0Ea8i+WmszL891ypvgCDBL2ObDk3qX3vA5Q6J2I+HKX2ofJM59BwaKwS5ghw+IG
BmbKXUZJNjUQfN9dQ7vJuiuqdknJ2xUzwW2Vn612ffarbOZB1DZ6ruWlrHty5TjX
0w4IXEJPBgZYbX9oc6zvTQnbLDBJbDU89mnme0TcmNMKWmQKTtc=
=vEv+
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pc,pci: features, fixes
virtio sound card support
vhost-user: back-end state migration
cxl:
line length reduction
enabling fabric management
vhost-vdpa:
shadow virtqueue hash calculation Support
shadow virtqueue RSS Support
tests:
CPU topology related smbios test cases
Fixes, cleanups all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmVKDDoPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpF08H/0Zts8uvkHbgiOEJw4JMHU6/VaCipfIYsp01
# GSfwYOyEsXJ7GIxKWaCiMnWXEm7tebNCPKf3DoUtcAojQj3vuF9XbWBKw/bfRn83
# nGO/iiwbYViSKxkwqUI+Up5YiN9o0M8gBFrY0kScPezbnYmo5u2bcADdEEq6gH68
# D0Ea8i+WmszL891ypvgCDBL2ObDk3qX3vA5Q6J2I+HKX2ofJM59BwaKwS5ghw+IG
# BmbKXUZJNjUQfN9dQ7vJuiuqdknJ2xUzwW2Vn612ffarbOZB1DZ6ruWlrHty5TjX
# 0w4IXEJPBgZYbX9oc6zvTQnbLDBJbDU89mnme0TcmNMKWmQKTtc=
# =vEv+
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 07 Nov 2023 18:06:50 HKT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (63 commits)
acpi/tests/avocado/bits: enable console logging from bits VM
acpi/tests/avocado/bits: enforce 32-bit SMBIOS entry point
hw/cxl: Add tunneled command support to mailbox for switch cci.
hw/cxl: Add dummy security state get
hw/cxl/type3: Cleanup multiple CXL_TYPE3() calls in read/write functions
hw/cxl/mbox: Add Get Background Operation Status Command
hw/cxl: Add support for device sanitation
hw/cxl/mbox: Wire up interrupts for background completion
hw/cxl/mbox: Add support for background operations
hw/cxl: Implement Physical Ports status retrieval
hw/pci-bridge/cxl_downstream: Set default link width and link speed
hw/cxl/mbox: Add Physical Switch Identify command.
hw/cxl/mbox: Add Information and Status / Identify command
hw/cxl: Add a switch mailbox CCI function
hw/pci-bridge/cxl_upstream: Move defintion of device to header.
hw/cxl/mbox: Generalize the CCI command processing
hw/cxl/mbox: Pull the CCI definition out of the CXLDeviceState
hw/cxl/mbox: Split mailbox command payload into separate input and output
hw/cxl/mbox: Pull the payload out of struct cxl_cmd and make instances constant
hw/cxl: Fix a QEMU_BUILD_BUG_ON() in switch statement scope issue.
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
vhost_save_backend_state() and vhost_load_backend_state() can be used by
vhost front-ends to easily save and load the back-end's state to/from
the migration stream.
Because we do not know the full state size ahead of time,
vhost_save_backend_state() simply reads the data in 1 MB chunks, and
writes each chunk consecutively into the migration stream, prefixed by
its length. EOF is indicated by a 0-length chunk.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20231016134243.68248-7-hreitz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add the interface for transferring the back-end's state during migration
as defined previously in vhost-user.rst.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20231016134243.68248-6-hreitz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The implementation populates the array of per IOMMUDevice
host reserved ranges.
It is forbidden to have conflicting sets of host IOVA ranges
to be applied onto the same IOMMU MR (implied by different
host devices).
In case the callback is called after the probe request has
been issues by the driver, a warning is issued.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: "Michael S. Tsirkin" <mst@redhat.com>
Tested-by: Yanghang Liu <yanghliu@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Add an IOMMUDevice 'probe_done' flag to record that the driver
already issued a probe request on that device.
This will be useful to double check host reserved regions aren't
notified after the probe and hence are not taken into account
by the driver.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: "Michael S. Tsirkin" <mst@redhat.com>
Tested-by: Yanghang Liu <yanghliu@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
For the time being the per device reserved regions are
just a duplicate of IOMMU wide reserved regions. Subsequent
patches will combine those with host reserved regions, if any.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Yanghang Liu <yanghliu@redhat.com>
Reviewed-by: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Rename VirtIOIOMMU (nb_)reserved_regions fields with the "prop_" prefix
to highlight those fields are set through a property, at machine level.
They are IOMMU wide.
A subsequent patch will introduce per IOMMUDevice reserved regions
that will include both those IOMMU wide property reserved
regions plus, sometimes, host reserved regions, if the device is
backed by a host device protected by a physical IOMMU. Also change
nb_ prefix by nr_.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
infrastructure for vhost-vdpa shadow work
piix south bridge rework
reconnect for vhost-user-scsi
dummy ACPI QTG DSM for cxl
tests, cleanups, fixes all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmU06PMPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpNIsH/0DlKti86VZLJ6PbNqsnKxoK2gg05TbEhPZU
pQ+RPDaCHpFBsLC5qsoMJwvaEQFe0e49ZFemw7bXRzBxgmbbNnZ9ArCIPqT+rvQd
7UBmyC+kacVyybZatq69aK2BHKFtiIRlT78d9Izgtjmp8V7oyKoz14Esh8wkE+FT
ypHUa70Addi6alNm6BVkm7bxZxi0Wrmf3THqF8ViYvufzHKl7JR5e17fKWEG0BqV
9W7AeHMnzJ7jkTvBGUw7g5EbzFn7hPLTbO4G/VW97k0puS4WRX5aIMkVhUazsRIa
zDOuXCCskUWuRapiCwY0E4g7cCaT8/JR6JjjBaTgkjJgvo5Y8Eg=
=ILek
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pc,pci: features, cleanups
infrastructure for vhost-vdpa shadow work
piix south bridge rework
reconnect for vhost-user-scsi
dummy ACPI QTG DSM for cxl
tests, cleanups, fixes all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmU06PMPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpNIsH/0DlKti86VZLJ6PbNqsnKxoK2gg05TbEhPZU
# pQ+RPDaCHpFBsLC5qsoMJwvaEQFe0e49ZFemw7bXRzBxgmbbNnZ9ArCIPqT+rvQd
# 7UBmyC+kacVyybZatq69aK2BHKFtiIRlT78d9Izgtjmp8V7oyKoz14Esh8wkE+FT
# ypHUa70Addi6alNm6BVkm7bxZxi0Wrmf3THqF8ViYvufzHKl7JR5e17fKWEG0BqV
# 9W7AeHMnzJ7jkTvBGUw7g5EbzFn7hPLTbO4G/VW97k0puS4WRX5aIMkVhUazsRIa
# zDOuXCCskUWuRapiCwY0E4g7cCaT8/JR6JjjBaTgkjJgvo5Y8Eg=
# =ILek
# -----END PGP SIGNATURE-----
# gpg: Signature made Sun 22 Oct 2023 02:18:43 PDT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (62 commits)
intel-iommu: Report interrupt remapping faults, fix return value
MAINTAINERS: Add include/hw/intc/i8259.h to the PC chip section
vhost-user: Fix protocol feature bit conflict
tests/acpi: Update DSDT.cxl with QTG DSM
hw/cxl: Add QTG _DSM support for ACPI0017 device
tests/acpi: Allow update of DSDT.cxl
hw/i386/cxl: ensure maxram is greater than ram size for calculating cxl range
vhost-user: fix lost reconnect
vhost-user-scsi: start vhost when guest kicks
vhost-user-scsi: support reconnect to backend
vhost: move and rename the conn retry times
vhost-user-common: send get_inflight_fd once
hw/i386/pc_piix: Make PIIX4 south bridge usable in PC machine
hw/isa/piix: Implement multi-process QEMU support also for PIIX4
hw/isa/piix: Resolve duplicate code regarding PCI interrupt wiring
hw/isa/piix: Reuse PIIX3's PCI interrupt triggering in PIIX4
hw/isa/piix: Rename functions to be shared for PCI interrupt triggering
hw/isa/piix: Reuse PIIX3 base class' realize method in PIIX4
hw/isa/piix: Share PIIX3's base class with PIIX4
hw/isa/piix: Harmonize names of reset control memory regions
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The VHOST_USER_PROTOCOL_F_XEN_MMAP feature bit was defined in
f21e95ee97, which has been part of qemu's 8.1.0 release. However, it
seems it was never added to qemu's code, but it is well possible that it
is already used by different front-ends outside of qemu (i.e., Xen).
VHOST_USER_PROTOCOL_F_SHARED_OBJECT in contrast was added to qemu's code
in 1609476662, but never defined in the vhost-user specification. As a
consequence, both bits were defined to be 17, which cannot work.
Regardless of whether actual code or the specification should take
precedence, F_XEN_MMAP is already part of a qemu release, while
F_SHARED_OBJECT is not. Therefore, bump the latter to take number 18
instead of 17, and add this to the specification.
Take the opportunity to add at least a little note on the
VhostUserShared structure to the specification. This structure is
referenced by the new commands introduced in 1609476662, but was not
defined.
Fixes: 1609476662
("vhost-user: add shared_object msg")
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20231016083201.23736-1-hreitz@redhat.com>
Reviewed-by: Emmanouil Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When the vhost-user is reconnecting to the backend, and if the vhost-user fails
at the get_features in vhost_dev_init(), then the reconnect will fail
and it will not be retriggered forever.
The reason is:
When the vhost-user fails at get_features, the vhost_dev_cleanup will be called
immediately.
vhost_dev_cleanup calls 'memset(hdev, 0, sizeof(struct vhost_dev))'.
The reconnect path is:
vhost_user_blk_event
vhost_user_async_close(.. vhost_user_blk_disconnect ..)
qemu_chr_fe_set_handlers <----- clear the notifier callback
schedule vhost_user_async_close_bh
The vhost->vdev is null, so the vhost_user_blk_disconnect will not be
called, then the event fd callback will not be reinstalled.
All vhost-user devices have this issue, including vhost-user-blk/scsi.
With this patch, if the vdev->vdev is null, the fd callback will still
be reinstalled.
Fixes: 71e076a07d ("hw/virtio: generalise CHR_EVENT_CLOSED handling")
Signed-off-by: Li Feng <fengli@smartx.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20231009044735.941655-6-fengli@smartx.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If the backend crashes and restarts, the device is broken.
This patch adds reconnect for vhost-user-scsi.
This patch also improves the error messages, and reports some silent errors.
Tested with spdk backend.
Signed-off-by: Li Feng <fengli@smartx.com>
Message-Id: <20231009044735.941655-4-fengli@smartx.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Multiple devices need this macro, move it to a common header.
Signed-off-by: Li Feng <fengli@smartx.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20231009044735.941655-3-fengli@smartx.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost-user-scsi has a VirtioDeviceClass->reset() function that calls
->vhost_reset_device(). The other vhost devices don't notify the vhost
device upon reset.
Stateful vhost devices may need to handle device reset in order to free
resources or prevent stale device state from interfering after reset.
Call ->vhost_device_reset() from virtio_reset() so that that vhost
devices are notified of device reset.
This patch affects behavior as follows:
- vhost-kernel: No change in behavior since ->vhost_reset_device() is
not implemented.
- vhost-user: back-ends that negotiate
VHOST_USER_PROTOCOL_F_RESET_DEVICE now receive a
VHOST_USER_DEVICE_RESET message upon device reset. Otherwise there is
no change in behavior. DPDK, SPDK, libvhost-user, and the
vhost-user-backend crate do not negotiate
VHOST_USER_PROTOCOL_F_RESET_DEVICE automatically.
- vhost-vdpa: an extra SET_STATUS 0 call is made during device reset.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20231004014532.1228637-4-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Access to QemuInputHandlerState::handler are read-only.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20231017131251.43708-1-philmd@linaro.org>
This modifies the common virtio-gpu.h file have the fields and
defintions needed by gfxstream/rutabaga, by VirtioGpuRutabaga.
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Tested-by: Alyssa Ross <hi@alyssa.is>
Tested-by: Emmanouil Pitsidianakis <manos.pitsidianakis@linaro.org>
Tested-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Emmanouil Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Antonio Caggiano <quic_acaggian@quicinc.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Use VIRTIO_GPU_SHM_ID_HOST_VISIBLE as id for virtio-gpu.
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Tested-by: Alyssa Ross <hi@alyssa.is>
Tested-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Huang Rui <ray.huang@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG' to allow
defining shared memory regions with sizes and offsets of 2^32 and more.
Multiple instances of the capability are allowed and distinguished
by a device-specific 'id'.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Tested-by: Alyssa Ross <hi@alyssa.is>
Tested-by: Huang Rui <ray.huang@amd.com>
Tested-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Having large virtio-mem devices that only expose little memory to a VM
is currently a problem: we map the whole sparse memory region into the
guest using a single memslot, resulting in one gigantic memslot in KVM.
KVM allocates metadata for the whole memslot, which can result in quite
some memory waste.
Assuming we have a 1 TiB virtio-mem device and only expose little (e.g.,
1 GiB) memory, we would create a single 1 TiB memslot and KVM has to
allocate metadata for that 1 TiB memslot: on x86, this implies allocating
a significant amount of memory for metadata:
(1) RMAP: 8 bytes per 4 KiB, 8 bytes per 2 MiB, 8 bytes per 1 GiB
-> For 1 TiB: 2147483648 + 4194304 + 8192 = ~ 2 GiB (0.2 %)
With the TDP MMU (cat /sys/module/kvm/parameters/tdp_mmu) this gets
allocated lazily when required for nested VMs
(2) gfn_track: 2 bytes per 4 KiB
-> For 1 TiB: 536870912 = ~512 MiB (0.05 %)
(3) lpage_info: 4 bytes per 2 MiB, 4 bytes per 1 GiB
-> For 1 TiB: 2097152 + 4096 = ~2 MiB (0.0002 %)
(4) 2x dirty bitmaps for tracking: 2x 1 bit per 4 KiB page
-> For 1 TiB: 536870912 = 64 MiB (0.006 %)
So we primarily care about (1) and (2). The bad thing is, that the
memory consumption *doubles* once SMM is enabled, because we create the
memslot once for !SMM and once for SMM.
Having a 1 TiB memslot without the TDP MMU consumes around:
* With SMM: 5 GiB
* Without SMM: 2.5 GiB
Having a 1 TiB memslot with the TDP MMU consumes around:
* With SMM: 1 GiB
* Without SMM: 512 MiB
... and that's really something we want to optimize, to be able to just
start a VM with small boot memory (e.g., 4 GiB) and a virtio-mem device
that can grow very large (e.g., 1 TiB).
Consequently, using multiple memslots and only mapping the memslots we
really need can significantly reduce memory waste and speed up
memslot-related operations. Let's expose the sparse RAM memory region using
multiple memslots, mapping only the memslots we currently need into our
device memory region container.
The feature can be enabled using "dynamic-memslots=on" and requires
"unplugged-inaccessible=on", which is nowadays the default.
Once enabled, we'll auto-detect the number of memslots to use based on the
memslot limit provided by the core. We'll use at most 1 memslot per
gigabyte. Note that our global limit of memslots accross all memory devices
is currently set to 256: even with multiple large virtio-mem devices,
we'd still have a sane limit on the number of memslots used.
The default is to not dynamically map memslot for now
("dynamic-memslots=off"). The optimization must be enabled manually,
because some vhost setups (e.g., hotplug of vhost-user devices) might be
problematic until we support more memslots especially in vhost-user backends.
Note that "dynamic-memslots=on" is just a hint that multiple memslots
*may* be used for internal optimizations, not that multiple memslots
*must* be used. The actual number of memslots that are used is an
internal detail: for example, once memslot metadata is no longer an
issue, we could simply stop optimizing for that. Migration source and
destination can differ on the setting of "dynamic-memslots".
Message-ID: <20230926185738.277351-17-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's add vhost_get_max_memslots().
Message-ID: <20230926185738.277351-12-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's return the number of free slots instead of only checking if there
is a free slot. Required to support memory devices that consume multiple
memslots.
This is a preparation for memory devices that consume multiple memslots.
Message-ID: <20230926185738.277351-6-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Checking whether the memory regions are equal is sufficient: if they are
equal, then most certainly the contained fd is equal.
The whole vhost-user memslot handling is suboptimal and overly
complicated. We shouldn't have to lookup a RAM memory regions we got
notified about in vhost_user_get_mr_data() using a host pointer. But that
requires a bigger rework -- especially an alternative vhost_set_mem_table()
backend call that simply consumes MemoryRegionSections.
For now, let's just drop vhost_backend_can_merge().
Message-ID: <20230926185738.277351-3-david@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Having multiple vhost devices, some filtering out fd-less memslots and
some not, can mess up the "used_memslot" accounting. Consequently our
"free memslot" checks become unreliable and we might run out of free
memslots at runtime later.
An example sequence which can trigger a potential issue that involves
different vhost backends (vhost-kernel and vhost-user) and hotplugged
memory devices can be found at [1].
Let's make the filtering mechanism less generic and distinguish between
backends that support private memslots (without a fd) and ones that only
support shared memslots (with a fd). Track the used_memslots for both
cases separately and use the corresponding value when required.
Note: Most probably we should filter out MAP_PRIVATE fd-based RAM regions
(for example, via memory-backend-memfd,...,shared=off or as default with
memory-backend-file) as well. When not using MAP_SHARED, it might not work
as expected. Add a TODO for now.
[1] https://lkml.kernel.org/r/fad9136f-08d3-3fd9-71a1-502069c000cf@redhat.com
Message-ID: <20230926185738.277351-2-david@redhat.com>
Fixes: 988a27754b ("vhost: allow backends to filter memory sections")
Cc: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Avoid using trivial variable names in macros, otherwise we get
the following compiler warning when compiling with -Wshadow=local:
In file included from ../../qemu/hw/display/virtio-gpu-virgl.c:19:
../../home/thuth/devel/qemu/hw/display/virtio-gpu-virgl.c:
In function ‘virgl_cmd_submit_3d’:
../../qemu/include/hw/virtio/virtio-gpu.h:228:16: error: declaration of ‘s’
shadows a previous local [-Werror=shadow=compatible-local]
228 | size_t s;
| ^
../../qemu/hw/display/virtio-gpu-virgl.c:215:5: note: in expansion of macro
‘VIRTIO_GPU_FILL_CMD’
215 | VIRTIO_GPU_FILL_CMD(cs);
| ^~~~~~~~~~~~~~~~~~~
../../qemu/hw/display/virtio-gpu-virgl.c:213:12: note: shadowed declaration
is here
213 | size_t s;
| ^
cc1: all warnings being treated as errors
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20231009084559.41427-1-thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Add three new vhost-user protocol
`VHOST_USER_BACKEND_SHARED_OBJECT_* messages`.
These new messages are sent from vhost-user
back-ends to interact with the virtio-dmabuf
table in order to add or remove themselves as
virtio exporters, or lookup for virtio dma-buf
shared objects.
The action taken in the front-end depends
on the type stored in the virtio shared
object hash table.
When the table holds a pointer to a vhost
backend for a given UUID, the front-end sends
a VHOST_USER_GET_SHARED_OBJECT to the
backend holding the shared object.
The messages can only be sent after successfully
negotiating a new VHOST_USER_PROTOCOL_F_SHARED_OBJECT
vhost-user protocol feature bit.
Finally, refactor code to send response message so
that all common parts both for the common REPLY_ACK
case, and other data responses, can call it and
avoid code repetition.
Signed-off-by: Albert Esteve <aesteve@redhat.com>
Message-Id: <20231002065706.94707-4-aesteve@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This API manages objects (in this iteration,
dmabuf fds) that can be shared along different
virtio devices, associated to a UUID.
The API allows the different devices to add,
remove and/or retrieve the objects by simply
invoking the public functions that reside in the
virtio-dmabuf file.
For vhost backends, the API stores the pointer
to the backend holding the object.
Suggested-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Albert Esteve <aesteve@redhat.com>
Message-Id: <20231002065706.94707-3-aesteve@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Move the definition of VhostUserProtocolFeature to
include/hw/virtio/vhost-user.h.
Remove previous definitions in hw/scsi/vhost-user-scsi.c,
hw/virtio/vhost-user.c, and hw/virtio/virtio-qmp.c.
Previously there were 3 separate definitions of this over 3 different
files. Now only 1 definition of this will be present for these 3 files.
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Reviewed-by: Emmanouil Pitsidianakis <manos.pitsidianakis@linaro.org>
Message-Id: <20230926224107.2951144-4-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>