Commit Graph

110778 Commits

Author SHA1 Message Date
Daniel P. Berrangé c645bac4e0 iotests: fix leak of tmpdir in dry-run mode
Creating an instance of the 'TestEnv' class will create a temporary
directory. This dir is only deleted, however, in the __exit__ handler
invoked by a context manager.

In dry-run mode, we don't use the TestEnv via a context manager, so
were leaking the temporary directory. Since meson invokes 'check'
5 times on each configure run, developers /tmp was filling up with
empty temporary directories.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20240205154019.1841037-1-berrange@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07 14:58:53 +01:00
Hanna Czenczek 1604c04931 scsi: Await request purging
scsi_device_for_each_req_async() currently does not provide any way to
be awaited.  One of its callers is scsi_device_purge_requests(), which
therefore currently does not guarantee that all requests are fully
settled when it returns.

We want all requests to be settled, because scsi_device_purge_requests()
is called through the unrealize path, including the one invoked by
virtio_scsi_hotunplug() through qdev_simple_device_unplug_cb(), which
most likely assumes that all SCSI requests are done then.

In fact, scsi_device_purge_requests() already contains a blk_drain(),
but this will not fully await scsi_device_for_each_req_async(), only the
I/O requests it potentially cancels (not the non-I/O requests).
However, we can have scsi_device_for_each_req_async() increment the BB
in-flight counter, and have scsi_device_for_each_req_async_bh()
decrement it when it is done.  This way, the blk_drain() will fully
await all SCSI requests to be purged.

This also removes the need for scsi_device_for_each_req_async_bh() to
double-check the current context and potentially re-schedule itself,
should it now differ from the BB's context: Changing a BB's AioContext
with a root node is done through bdrv_try_change_aio_context(), which
creates a drained section.  With this patch, we keep the BB in-flight
counter elevated throughout, so we know the BB's context cannot change.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20240202144755.671354-3-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07 14:55:32 +01:00
Hanna Czenczek ad89367202 block-backend: Allow concurrent context changes
Since AioContext locks have been removed, a BlockBackend's AioContext
may really change at any time (only exception is that it is often
confined to a drained section, as noted in this patch).  Therefore,
blk_get_aio_context() cannot rely on its root node's context always
matching that of the BlockBackend.

In practice, whether they match does not matter anymore anyway: Requests
can be sent to BDSs from any context, so anyone who requests the BB's
context should have no reason to require the root node to have the same
context.  Therefore, we can and should remove the assertion to that
effect.

In addition, because the context can be set and queried from different
threads concurrently, it has to be accessed with atomic operations.

Buglink: https://issues.redhat.com/browse/RHEL-19381
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20240202144755.671354-2-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07 14:55:25 +01:00
Stefan Hajnoczi 1f25c172f8 monitor: use aio_co_reschedule_self()
The aio_co_reschedule_self() API is designed to avoid the race
condition between scheduling the coroutine in another AioContext and
yielding.

The QMP dispatch code uses the open-coded version that appears
susceptible to the race condition at first glance:

  aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
  qemu_coroutine_yield();

The code is actually safe because the iohandler and qemu_aio_context
AioContext run under the Big QEMU Lock. Nevertheless, set a good example
and use aio_co_reschedule_self() so it's obvious that there is no race.

Suggested-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240206190610.107963-6-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07 14:44:21 +01:00
Stefan Hajnoczi 8802f4f4e3 virtio-blk: declare VirtIOBlock::rq with a type
The VirtIOBlock::rq field has had the type void * since its introduction
in commit 869a5c6df1 ("Stop VM on error in virtio-blk. (Gleb
Natapov)").

Perhaps this was done to avoid the forward declaration of
VirtIOBlockReq.

Hanna Czenczek <hreitz@redhat.com> pointed out the missing type. Specify
the actual type because there is no need to use void * here.

Suggested-by: Hanna Czenczek <hreitz@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240206190610.107963-5-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07 14:44:19 +01:00
Stefan Hajnoczi f2eea93c6b virtio-blk: add vq_rq[] bounds check in virtio_blk_dma_restart_cb()
Hanna Czenczek <hreitz@redhat.com> noted that the array index in
virtio_blk_dma_restart_cb() is not bounds-checked:

  g_autofree VirtIOBlockReq **vq_rq = g_new0(VirtIOBlockReq *, num_queues);
  ...
  while (rq) {
      VirtIOBlockReq *next = rq->next;
      uint16_t idx = virtio_get_queue_index(rq->vq);

      rq->next = vq_rq[idx];
                 ^^^^^^^^^^

The code is correct because both rq->vq and vq_rq[] depend on
num_queues, but this is indirect and not 100% obvious. Add an assertion.

Suggested-by: Hanna Czenczek <hreitz@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240206190610.107963-4-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07 14:44:13 +01:00
Stefan Hajnoczi 5fbcbd50fc virtio-blk: clarify that there is at least 1 virtqueue
It is not possible to instantiate a virtio-blk device with 0 virtqueues.
The following check is located in ->realize():

  if (!conf->num_queues) {
      error_setg(errp, "num-queues property must be larger than 0");
      return;
  }

Later on we access s->vq_aio_context[0] under the assumption that there
is as least one virtqueue. Hanna Czenczek <hreitz@redhat.com> noted that
it would help to show that the array index is already valid.

Add an assertion to document that s->vq_aio_context[0] is always
safe...and catch future code changes that break this assumption.

Suggested-by: Hanna Czenczek <hreitz@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240206190610.107963-3-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07 14:44:09 +01:00
Stefan Hajnoczi 1f995a4782 virtio-blk: enforce iothread-vq-mapping validation
Hanna Czenczek <hreitz@redhat.com> noticed that the safety of
`vq_aio_context[vq->value] = ctx;` with user-defined vq->value inputs is
not obvious.

The code is structured in validate() + apply() steps so input validation
is there, but it happens way earlier and there is nothing that
guarantees apply() can only be called with validated inputs.

This patch moves the validate() call inside the apply() function so
validation is guaranteed. I also added the bounds checking assertion
that Hanna suggested.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20240206190610.107963-2-stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07 14:44:05 +01:00
Peter Xu 940bf8ff1c ci: Update comment for migration-compat-aarch64
It turns out that we may not be able to enable this test even for the
upcoming v9.0.  Document what we're still missing.

Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Link: https://lore.kernel.org/r/20240207005403.242235-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 10:51:27 +08:00
Peter Xu 7070477902 ci: Remove tag dependency for build-previous-qemu
The new build-previous-qemu job relies on QEMU release tag being present,
while that may not be always true for personal git repositories since by
default tag is not pushed.  The job can fail on those CI kicks, as reported
by Peter Maydell.

Fix it by fetching the tags remotely from the official repository, as
suggested by Dan.

[1] https://lore.kernel.org/r/ZcC9ScKJ7VvqektA@redhat.com

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Suggested-by: "Daniel P. Berrangé" <berrange@redhat.com>
Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Link: https://lore.kernel.org/r/20240207005403.242235-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 10:51:27 +08:00
Peter Xu bdb0ade663 tests/migration-test: Stick with gicv3 in aarch64 test
Recently we introduced cross-binary migration test.  It's always wanted
that migration-test uses stable guest ABI for both QEMU binaries in this
case, so that both QEMU binaries will be compatible on the migration
stream with the cmdline specified.

Switch to a static gic version "3" rather than using version "max", so that
GIC should be stable now across any future QEMU binaries for migration-test.

Here the version can actually be anything as long as the ABI is stable.  We
choose "3" because it's the majority of what we already use in QEMU while
still new enough: "git grep gic-version=3" shows 6 hit, while version 4 has
no direct user yet besides "max".

Note that even with this change, aarch64 won't be able to work yet with
migration cross binary test, but then the only missing piece will be the
stable CPU model.

Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Link: https://lore.kernel.org/r/20240207005403.242235-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 10:51:27 +08:00
Fabiano Rosas 93fa9dc2e0 migration/multifd: Add a synchronization point for channel creation
It is possible that one of the multifd channels fails to be created at
multifd_new_send_channel_async() while the rest of the channel
creation tasks are still in flight.

This could lead to multifd_save_cleanup() executing the
qemu_thread_join() loop too early and not waiting for the threads
which haven't been created yet, leading to the freeing of resources
that the newly created threads will try to access and crash.

Add a synchronization point after which there will be no attempts at
thread creation and therefore calling multifd_save_cleanup() past that
point will ensure it properly waits for the threads.

A note about performance: Prior to this patch, if a channel took too
long to be established, other channels could finish connecting first
and already start taking load. Now we're bounded by the
slowest-connecting channel.

Reported-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-7-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:18 +08:00
Fabiano Rosas 2576ae488e migration/multifd: Unify multifd and TLS connection paths
During multifd channel creation (multifd_send_new_channel_async) when
TLS is enabled, the multifd_channel_connect function is called twice,
once to create the TLS handshake thread and another time after the
asynchrounous TLS handshake has finished.

This creates a slightly confusing call stack where
multifd_channel_connect() is called more times than the number of
channels. It also splits error handling between the two callers of
multifd_channel_connect() causing some code duplication. Lastly, it
gets in the way of having a single point to determine whether all
channel creation tasks have been initiated.

Refactor the code to move the reentrancy one level up at the
multifd_new_send_channel_async() level, de-duplicating the error
handling and allowing for the next patch to introduce a
synchronization point common to all the multifd channel creation,
regardless of TLS.

Note that the previous code would never fail once p->c had been set.
This patch changes this assumption, which affects refcounting, so add
comments around object_unref to explain the situation.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-6-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:18 +08:00
Fabiano Rosas dd904bc13f migration/multifd: Move multifd_send_setup into migration thread
We currently have an unfavorable situation around multifd channels
creation and the migration thread execution.

We create the multifd channels with qio_channel_socket_connect_async
-> qio_task_run_in_thread, but only connect them at the
multifd_new_send_channel_async callback, called from
qio_task_complete, which is registered as a glib event.

So at multifd_send_setup() we create the channels, but they will only
be actually usable after the whole multifd_send_setup() calling stack
returns back to the main loop. Which means that the migration thread
is already up and running without any possibility for the multifd
channels to be ready on time.

We currently rely on the channels-ready semaphore blocking
multifd_send_sync_main() until channels start to come up and release
it. However there have been bugs recently found when a channel's
creation fails and multifd_send_cleanup() is allowed to run while
other channels are still being created.

Let's start to organize this situation by moving the
multifd_send_setup() call into the migration thread. That way we
unblock the main-loop to dispatch the completion callbacks and
actually have a chance of getting the multifd channels ready for when
the migration thread needs them.

The next patches will deal with the synchronization aspects.

Note that this takes multifd_send_setup() out of the BQL.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-5-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:18 +08:00
Fabiano Rosas bd8b0a8f82 migration/multifd: Move multifd_send_setup error handling in to the function
Hide the error handling inside multifd_send_setup to make it cleaner
for the next patch to move the function around.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-4-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:18 +08:00
Fabiano Rosas a2a63c4abd migration/multifd: Remove p->running
We currently only need p->running to avoid calling qemu_thread_join()
on a non existent thread if the thread has never been created.

However, there are at least two bugs in this logic:

1) On the sending side, p->running is set too early and
qemu_thread_create() can be skipped due to an error during TLS
handshake, leaving the flag set and leading to a crash when
multifd_send_cleanup() calls qemu_thread_join().

2) During exit, the multifd thread clears the flag while holding the
channel lock. The counterpart at multifd_send_cleanup() reads the flag
outside of the lock and might free the mutex while the multifd thread
still has it locked.

Fix the first issue by setting the flag right before creating the
thread. Rename it from p->running to p->thread_created to clarify its
usage.

Fix the second issue by not clearing the flag at the multifd thread
exit. We don't have any use for that.

Note that these bugs are straight-forward logic issues and not race
conditions. There is still a gap for races to affect this code due to
multifd_send_cleanup() being allowed to run concurrently with the
thread creation loop. This issue is solved in the next patches.

Cc: qemu-stable <qemu-stable@nongnu.org>
Fixes: 2964714015 ("migration/tls: add support for multifd tls-handshake")
Reported-by: Avihai Horon <avihaih@nvidia.com>
Reported-by: chenyuhui5@huawei.com
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:18 +08:00
Fabiano Rosas e1921f10d9 migration/multifd: Join the TLS thread
We're currently leaking the resources of the TLS thread by not joining
it and also overwriting the p->thread pointer altogether.

Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake")
Cc: qemu-stable <qemu-stable@nongnu.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:18 +08:00
Avihai Horon 3205bebd4f migration: Fix logic of channels and transport compatibility check
The commit in the fixes line mistakenly modified the channels and
transport compatibility check logic so it now checks multi-channel
support only for socket transport type.

Thus, running multifd migration using a transport other than socket that
is incompatible with multi-channels (such as "exec") would lead to a
segmentation fault instead of an error message.
For example:
  (qemu) migrate_set_capability multifd on
  (qemu) migrate -d "exec:cat > /tmp/vm_state"
  Segmentation fault (core dumped)

Fix it by checking multi-channel compatibility for all transport types.

Cc: qemu-stable <qemu-stable@nongnu.org>
Fixes: d95533e1cd ("migration: modify migration_channels_and_uri_compatible() for new QAPI syntax")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240125162528.7552-2-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:00 +08:00
Ilya Leoshkevich a1a9800e97 meson: Link with libinotify on FreeBSD
make vm-build-freebsd fails with:

    ld: error: undefined symbol: inotify_init1
    >>> referenced by filemonitor-inotify.c:183 (../src/util/filemonitor-inotify.c:183)
    >>>               util_filemonitor-inotify.c.o:(qemu_file_monitor_new) in archive libqemuutil.a

On FreeBSD the inotify functions are defined in libinotify.so. Add it
to the dependencies.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20240206002344.12372-5-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06 10:27:50 +01:00
Ilya Leoshkevich 3faebbcd64 test-util-filemonitor: Adapt to the FreeBSD inotify rename semantics
Unlike on Linux, on FreeBSD renaming a file when the destination
already exists results in an IN_DELETE event for that existing file:

    $ FILEMONITOR_DEBUG=1 build/tests/unit/test-util-filemonitor
    Rename /tmp/test-util-filemonitor-K13LI2/fish/one.txt -> /tmp/test-util-filemonitor-K13LI2/two.txt
    Event id=200000000 event=2 file=one.txt
    Queue event id 200000000 event 2 file one.txt
    Queue event id 100000000 event 2 file two.txt
    Queue event id 100000002 event 2 file two.txt
    Queue event id 100000000 event 0 file two.txt
    Queue event id 100000002 event 0 file two.txt
    Event id=100000000 event=0 file=two.txt
    Expected event 0 but got 2

This difference in behavior is not expected to break the real users, so
teach the test to accept it.

Suggested-by: "Daniel P. Berrange" <berrange@redhat.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-ID: <20240206002344.12372-4-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06 10:27:50 +01:00
Ilya Leoshkevich fceffd6b3f tests/vm/freebsd: Reload the sshd configuration
After console_sshd_config(), the SSH server needs to be nudged to pick
up the new configs. The scripts for the other BSD flavors already do
this with a reboot, but a simple reload is sufficient.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-ID: <20240206002344.12372-3-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06 10:27:50 +01:00
Ilya Leoshkevich 7f0bf5ea41 tests/vm: Set UseDNS=no in the sshd configuration
make vm-build-freebsd sometimes fails with "Connection timed out during
banner exchange". The client strace shows:

    13:59:30 write(3, "SSH-2.0-OpenSSH_9.3\r\n", 21) = 21
    13:59:30 getpid()                       = 252655
    13:59:30 poll([{fd=3, events=POLLIN}], 1, 5000) = 1 ([{fd=3, revents=POLLIN}])
    13:59:32 read(3, "S", 1)                = 1
    13:59:32 poll([{fd=3, events=POLLIN}], 1, 3625) = 1 ([{fd=3, revents=POLLIN}])
    13:59:32 read(3, "S", 1)                = 1
    13:59:32 poll([{fd=3, events=POLLIN}], 1, 3625) = 1 ([{fd=3, revents=POLLIN}])
    13:59:32 read(3, "H", 1)                = 1

There is a 2s delay during connection, and ConnectTimeout is set to 1.
Raising it makes the issue go away, but we can do better. The server
truss shows:

    888: 27.811414714 socket(PF_INET,SOCK_DGRAM|SOCK_CLOEXEC,0) = 5 (0x5)
    888: 27.811765030 connect(5,{ AF_INET 10.0.2.3:53 },16) = 0 (0x0)
    888: 27.812166941 sendto(5,"\^Z/\^A\0\0\^A\0\0\0\0\0\0\^A2"...,39,0,NULL,0) = 39 (0x27)
    888: 29.363970743 poll({ 5/POLLRDNORM },1,5000) = 1 (0x1)

So the delay is due to a DNS query. Disable DNS queries in the server
config.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-ID: <20240206002344.12372-2-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06 10:27:50 +01:00
Philippe Mathieu-Daudé d0143fa9ee target/s390x: Prefer fast cpu_env() over slower CPU QOM cast macro
Mechanical patch produced running the command documented
in scripts/coccinelle/cpu_env.cocci_template header.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20240129164514.73104-25-philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06 10:27:50 +01:00
Ilya Leoshkevich eb14b021f8 tests/tcg/s390x: Test CONVERT TO BINARY
Check the CVB's, CVBY's, and CVBG's corner cases.

Co-developed-by: Pavel Zbitskiy <pavel.zbitskiy@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-ID: <20240205205830.6425-5-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06 09:51:37 +01:00
Ilya Leoshkevich 5b003b59ac tests/tcg/s390x: Test CONVERT TO DECIMAL
Check the CVD's, CVDY's, and CVDG's corner cases.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-ID: <20240205205830.6425-4-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06 09:51:37 +01:00
Ilya Leoshkevich b4b8d58e56 target/s390x: Emulate CVB, CVBY and CVBG
Convert to Binary - counterparts of the already implemented Convert
to Decimal (CVD*) instructions.
Example from the Principles of Operation: 25594C becomes 63FA.

Co-developed-by: Pavel Zbitskiy <pavel.zbitskiy@gmail.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-ID: <20240205205830.6425-3-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06 09:51:37 +01:00
Ilya Leoshkevich a6e55a82e9 target/s390x: Emulate CVDG
CVDG is the same as CVD, except that it converts 64 bits into 128,
rather than 32 into 64. Create a new helper, which uses Int128
wrappers.

Reported-by: Ido Plat <Ido.Plat@ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-ID: <20240205205830.6425-2-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06 09:51:37 +01:00
Mark Kanda 04accf43df oslib-posix: initialize backend memory objects in parallel
QEMU initializes preallocated backend memory as the objects are parsed from
the command line. This is not optimal in some cases (e.g. memory spanning
multiple NUMA nodes) because the memory objects are initialized in series.

Allow the initialization to occur in parallel (asynchronously). In order to
ensure optimal thread placement, asynchronous initialization requires prealloc
context threads to be in use.

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Message-ID: <20240131165327.3154970-2-mark.kanda@oracle.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2024-02-06 08:15:22 +01:00
David Hildenbrand 540a1abbf0 memory-device: reintroduce memory region size check
We used to check that the memory region size is multiples of the overall
requested address alignment for the device memory address.

We removed that check, because there are cases (i.e., hv-balloon) where
devices unconditionally request an address alignment that has a very large
alignment (i.e., 32 GiB), but the actual memory device size might not be
multiples of that alignment.

However, this change:

(a) allows for some practically impossible DIMM sizes, like "1GB+1 byte".
(b) allows for DIMMs that partially cover hugetlb pages, previously
    reported in [1].

Both scenarios don't make any sense: we might even waste memory.

So let's reintroduce that check, but only check that the
memory region size is multiples of the memory region alignment (i.e.,
page size, huge page size), but not any additional memory device
requirements communicated using md->get_min_alignment().

The following examples now fail again as expected:

(a) 1M with 2M THP
 qemu-system-x86_64 -m 4g,maxmem=16g,slots=1 -S -nodefaults -nographic \
                     -object memory-backend-ram,id=mem1,size=1M \
                     -device pc-dimm,id=dimm1,memdev=mem1
 -> backend memory size must be multiple of 0x200000

(b) 1G+1byte

 qemu-system-x86_64 -m 4g,maxmem=16g,slots=1 -S -nodefaults -nographic \
                   -object memory-backend-ram,id=mem1,size=1073741825B \
                   -device pc-dimm,id=dimm1,memdev=mem1
 -> backend memory size must be multiple of 0x200000

(c) Unliagned hugetlb size (2M)

 qemu-system-x86_64 -m 4g,maxmem=16g,slots=1 -S -nodefaults -nographic \
                   -object memory-backend-file,id=mem1,mem-path=/dev/hugepages/tmp,size=511M \
                   -device pc-dimm,id=dimm1,memdev=mem1
 backend memory size must be multiple of 0x200000

(d) Unliagned hugetlb size (1G)

 qemu-system-x86_64 -m 4g,maxmem=16g,slots=1 -S -nodefaults -nographic \
                    -object memory-backend-file,id=mem1,mem-path=/dev/hugepages1G/tmp,size=2047M \
                    -device pc-dimm,id=dimm1,memdev=mem1
 -> backend memory size must be multiple of 0x40000000

Note that this fix depends on a hv-balloon change to communicate its
additional alignment requirements using get_min_alignment() instead of
through the memory region.

[1] https://lkml.kernel.org/r/f77d641d500324525ac036fe1827b3070de75fc1.1701088320.git.mprivozn@redhat.com

Message-ID: <20240117135554.787344-3-david@redhat.com>
Reported-by: Zhenyu Zhang <zhenyzha@redhat.com>
Reported-by: Michal Privoznik <mprivozn@redhat.com>
Fixes: eb1b7c4bd4 ("memory-device: Drop size alignment check")
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2024-02-06 08:14:59 +01:00
Peter Xu 488c84acb4 migration/multifd: Optimize sender side to be lockless
When reviewing my attempt to refactor send_prepare(), Fabiano suggested we
try out with dropping the mutex in multifd code [1].

I thought about that before but I never tried to change the code.  Now
maybe it's time to give it a stab.  This only optimizes the sender side.

The trick here is multifd has a clear provider/consumer model, that the
migration main thread publishes requests (either pending_job/pending_sync),
while the multifd sender threads are consumers.  Here we don't have a lot
of complicated data sharing, and the jobs can logically be submitted
lockless.

Arm the code with atomic weapons.  Two things worth mentioning:

  - For multifd_send_pages(): we can use qatomic_load_acquire() when trying
  to find a free channel, but that's expensive if we attach one ACQUIRE per
  channel.  Instead, keep the qatomic_read() on reading the pending_job
  flag as we do already, meanwhile use one smp_mb_acquire() after the loop
  to guarantee the memory ordering.

  - For pending_sync: it doesn't have any extra data required since now
  p->flags are never touched, it should be safe to not use memory barrier.
  That's different from pending_job.

Provide rich comments for all the lockless operations to state how they are
paired.  With that, we can remove the mutex.

[1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de

Suggested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-24-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-06 11:05:49 +08:00
Richard Henderson 23c5692abc tcg/tci: Support TCG_COND_TST{EQ,NE}
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-05 22:45:41 +00:00
Richard Henderson 585b7a4247 tcg/s390x: Support TCG_COND_TST{EQ,NE}
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-05 22:45:41 +00:00
Thomas Huth 6a41a62171 docs/about: Deprecate the old "power5+" and "power7+" CPU names
For consistency we should drop the names with a "+" in it in the
long run.

Message-ID: <20240117141054.73841-3-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-05 14:21:21 +01:00
Thomas Huth 5bfb75f152 target/ppc/cpu-models: Rename power5+ and power7+ for new QOM naming rules
The character "+" is now forbidden in QOM device names (see commit
b447378e12 - "Limit type names to alphanumerical and some few special
characters"). For the "power5+" and "power7+" CPU names, there is
currently a hack in type_name_is_valid() to still allow them for
compatibility reasons. However, there is a much nicer solution for this:
Simply use aliases! This way we can still support the old names without
the need for the ugly hack in type_name_is_valid().

Message-ID: <20240117141054.73841-2-thuth@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-05 14:21:21 +01:00
Sven Schnelle 8b09b7fe47 hw/scsi/lsi53c895a: add missing decrement of reentrancy counter
When the maximum count of SCRIPTS instructions is reached, the code
stops execution and returns, but fails to decrement the reentrancy
counter. This effectively renders the SCSI controller unusable
because on next entry the reentrancy counter is still above the limit.

This bug was seen on HP-UX 10.20 which seems to trigger SCRIPTS
loops.

Fixes: b987718bbb ("hw/scsi/lsi53c895a: Fix reentrancy issues in the LSI controller (CVE-2023-0330)")
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Message-ID: <20240128202214.2644768-1-svens@stackframe.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-05 14:21:21 +01:00
Peter Xu 98ea497d8b migration/multifd: Fix MultiFDSendParams.packet_num race
As reported correctly by Fabiano [1] (while per Fabiano, it sourced back to
Elena's initial report in Oct 2023), MultiFDSendParams.packet_num is buggy
to be assigned and stored.  Consider two consequent operations of: (1)
queue a job into multifd send thread X, then (2) queue another sync request
to the same send thread X.  Then the MultiFDSendParams.packet_num will be
assigned twice, and the first assignment can get lost already.

To avoid that, we move the packet_num assignment from p->packet_num into
where the thread will fill in the packet.  Use atomic operations to protect
the field, making sure there's no race.

Note that atomic fetch_add() may not be good for scaling purposes, however
multifd should be fine as number of threads should normally not go beyond
16 threads.  Let's leave that concern for later but fix the issue first.

There's also a trick on how to make it always work even on 32 bit hosts for
uint64_t packet number.  Switching to uintptr_t as of now to simply the
case.  It will cause packet number to overflow easier on 32 bit, but that
shouldn't be a major concern for now as 32 bit systems is not the major
audience for any performance concerns like what multifd wants to address.

We also need to move multifd_send_state definition upper, so that
multifd_send_fill_packet() can reference it.

[1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de

Reported-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-23-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:11 +08:00
Peter Xu cde85c37ca migration/multifd: Stick with send/recv on function names
Most of the multifd code uses send/recv to represent the two sides, but
some rare cases use save/load.

Since send/recv is the majority, replacing the save/load use cases to use
send/recv globally.  Now we reach a consensus on the naming.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-22-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:11 +08:00
Peter Xu 5e6ea8a1d6 migration/multifd: Cleanup multifd_load_cleanup()
Use similar logic to cleanup the recv side.

Note that multifd_recv_terminate_threads() may need some similar rework
like the sender side, but let's leave that for later.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-21-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu 12808db3b8 migration/multifd: Cleanup multifd_save_cleanup()
Shrink the function by moving relevant works into helpers: move the thread
join()s into multifd_send_terminate_threads(), then create two more helpers
to cover channel/state cleanups.

Add a TODO entry for the thread terminate process because p->running is
still buggy.  We need to fix it at some point but not yet covered.

Suggested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-20-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu f88f86c4ee migration/multifd: Rewrite multifd_queue_page()
The current multifd_queue_page() is not easy to read and follow.  It is not
good with a few reasons:

  - No helper at all to show what exactly does a condition mean; in short,
  readability is low.

  - Rely on pages->ramblock being cleared to detect an empty queue.  It's
  slightly an overload of the ramblock pointer, per Fabiano [1], which I
  also agree.

  - Contains a self recursion, even if not necessary..

Rewrite this function.  We add some comments to make it even clearer on
what it does.

[1] https://lore.kernel.org/r/87wmrpjzew.fsf@suse.de

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-19-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu 3b40964a86 migration/multifd: Change retval of multifd_send_pages()
Using int is an overkill when there're only two options.  Change it to a
boolean.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-18-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu d6556d174a migration/multifd: Change retval of multifd_queue_page()
Using int is an overkill when there're only two options.  Change it to a
boolean.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-17-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu 3ab4441d97 migration/multifd: Split multifd_send_terminate_threads()
Split multifd_send_terminate_threads() into two functions:

  - multifd_send_set_error(): used when an error happened on the sender
    side, set error and quit state only

  - multifd_send_terminate_threads(): used only by the main thread to kick
    all multifd send threads out of sleep, for the last recycling.

Use multifd_send_set_error() in the three old call sites where only the
error will be set.

Use multifd_send_terminate_threads() in the last one where the main thread
will kick the multifd threads at last in multifd_save_cleanup().

Both helpers will need to set quitting=1.

Suggested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-16-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu 859ebaf346 migration/multifd: Forbid spurious wakeups
Now multifd's logic is designed to have no spurious wakeup.  I still
remember a talk to Juan and he seems to agree we should drop it now, and if
my memory was right it was there because multifd used to hit that when
still debugging.

Let's drop it and see what can explode; as long as it's not reaching
soft-freeze.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-15-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu 25a1f87875 migration/multifd: Move header prepare/fill into send_prepare()
This patch redefines the interfacing of ->send_prepare().  It further
simplifies multifd_send_thread() especially on zero copy.

Now with the new interface, we require the hook to do all the work for
preparing the IOVs to send.  After it's completed, the IOVs should be ready
to be dumped into the specific multifd QIOChannel later.

So now the API looks like:

  p->pages ----------->  send_prepare() -------------> IOVs

This also prepares for the case where the input can be extended to even not
any p->pages.  But that's for later.

This patch will achieve similar goal of what Fabiano used to propose here:

https://lore.kernel.org/r/20240126221943.26628-1-farosas@suse.de

However the send() interface may not be necessary.  I'm boldly attaching a
"Co-developed-by" for Fabiano.

Co-developed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-14-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu 452b205702 migration/multifd: multifd_send_prepare_header()
Introduce a helper multifd_send_prepare_header() to setup the header packet
for multifd sender.

It's fine to setup the IOV[0] _before_ send_prepare() because the packet
buffer is already ready, even if the content is to be filled in.

With this helper, we can already slightly clean up the zero copy path.

Note that I explicitly put it into multifd.h, because I want it inlined
directly into multifd*.c where necessary later.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-13-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu 8a9ef17380 migration/multifd: Move trace_multifd_send|recv()
Move them into fill/unfill of packets.  With that, we can further cleanup
the send/recv thread procedure, and remove one more temp var.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-12-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu db7e1cc510 migration/multifd: Move total_normal_pages accounting
Just like the previous patch, move the accounting for total_normal_pages on
both src/dst sides into the packet fill/unfill procedures.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-11-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu 05b7ec1890 migration/multifd: Rename p->num_packets and clean it up
This field, no matter whether on src or dest, is only used for debugging
purpose.

They can even be removed already, unless it still more or less provide some
accounting on "how many packets are sent/recved for this thread".  The
other more important one is called packet_num, which is embeded in the
multifd packet headers (MultiFDPacket_t).

So let's keep them for now, but make them much easier to understand, by
doing below:

  - Rename both of them to packets_sent / packets_recved, the old
  name (num_packets) are waaay too confusing when we already have
  MultiFDPacket_t.packets_num.

  - Avoid worrying on the "initial packet": we know we will send it, that's
  good enough.  The accounting won't matter a great deal to start with 0 or
  with 1.

  - Move them to where we send/recv the packets.  They're:

    - multifd_send_fill_packet() for senders.
    - multifd_recv_unfill_packet() for receivers.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-10-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu 83c560fb42 migration/multifd: Drop pages->num check in sender thread
Now with a split SYNC handler, we always have pages->num set for
pending_job==true.  Assert it instead.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-9-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00