xemu/migration
Peter Xu 6621883f93 migration: Fix potential race on postcopy_qemufile_src
postcopy_qemufile_src object should be owned by one thread, either the main
thread (e.g. when at the beginning, or at the end of migration), or by the
return path thread (when during a preempt enabled postcopy migration).  If
that's not the case the access to the object might be racy.

postcopy_preempt_shutdown_file() can be potentially racy, because it's
called at the end phase of migration on the main thread, however during
which the return path thread hasn't yet been recycled; the recycle happens
in await_return_path_close_on_source() which is after this point.

It means, logically it's posslbe the main thread and the return path thread
are both operating on the same qemufile.  While I don't think qemufile is
thread safe at all.

postcopy_preempt_shutdown_file() used to be needed because that's where we
send EOS to dest so that dest can safely shutdown the preempt thread.

To avoid the possible race, remove this only place that a race can happen.
Instead we figure out another way to safely close the preempt thread on
dest.

The core idea during postcopy on deciding "when to stop" is that dest will
send a postcopy SHUT message to src, telling src that all data is there.
Hence to shut the dest preempt thread maybe better to do it directly on
dest node.

This patch proposed such a way that we change postcopy_prio_thread_created
into PreemptThreadStatus, so that we kick the preempt thread on dest qemu
by a sequence of:

  mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
  qemu_file_shutdown(mis->postcopy_qemufile_dst);

While here shutdown() is probably so far the easiest way to kick preempt
thread from a blocked qemu_get_be64().  Then it reads preempt_thread_status
to make sure it's not a network failure but a willingness to quit the
thread.

We could have avoided that extra status but just rely on migration status.
The problem is postcopy_ram_incoming_cleanup() is just called early enough
so we're still during POSTCOPY_ACTIVE no matter what.. So just make it
simple to have the status introduced.

One flag x-preempt-pre-7-2 is added to keep old pre-7.2 behaviors of
postcopy preempt.

Fixes: 9358982744 ("migration: Send requested page directly in rp-return thread")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-12 21:44:38 +02:00
..
block-dirty-bitmap.c migration: Rename res_{postcopy,precopy}_only 2023-02-15 20:04:30 +01:00
block.c migration/block: replace uses of blk_nb_sectors that do not check result 2023-04-11 16:40:53 +02:00
block.h migration: disable auto-converge during bulk block migration 2017-09-27 11:27:14 +01:00
channel-block.c io: Add support for MSG_PEEK for socket channel 2023-02-06 19:22:56 +01:00
channel-block.h migration: introduce a QIOChannel impl for BlockDriverState VMState 2022-06-22 19:33:43 +01:00
channel.c migration: check magic value for deciding the mapping of channels 2023-02-06 19:22:57 +01:00
channel.h migration: check magic value for deciding the mapping of channels 2023-02-06 19:22:57 +01:00
colo-failover.c migration/colo: Improve an x-colo-lost-heartbeat error message 2023-02-23 14:10:17 +01:00
colo.c error: Drop superfluous #include "qapi/qmp/qerror.h" 2023-02-23 13:56:14 +01:00
dirtyrate.c *: Add missing includes of qemu/error-report.h 2023-03-22 15:06:57 +00:00
dirtyrate.h migration/dirtyrate: Refactor dirty page rate calculation 2022-07-20 12:15:08 +01:00
exec.c *: Add missing includes of qemu/error-report.h 2023-03-22 15:06:57 +00:00
exec.h migration: Export exec.c functions in its own file 2017-06-01 18:49:22 +02:00
fd.c monitor: Use getter/setter functions for cur_mon 2020-10-09 07:08:19 +02:00
fd.h migration: Fix fd protocol for incoming defer 2019-06-05 12:43:55 +02:00
global_state.c migration: Silence compiler warning in global_state_store_running() 2020-10-02 12:28:48 +01:00
meson.build migration: Introduce interface query-migrationthreads 2023-02-06 19:22:57 +01:00
migration-hmp-cmds.c error: Drop superfluous #include "qapi/qmp/qerror.h" 2023-02-23 13:56:14 +01:00
migration.c migration: Fix potential race on postcopy_qemufile_src 2023-04-12 21:44:38 +02:00
migration.h migration: Fix potential race on postcopy_qemufile_src 2023-04-12 21:44:38 +02:00
multifd-zlib.c multifd: Create page_size fields into both MultiFD{Recv,Send}Params 2022-12-15 10:30:37 +01:00
multifd-zstd.c multifd: Create page_size fields into both MultiFD{Recv,Send}Params 2022-12-15 10:30:37 +01:00
multifd.c migration/multifd: correct multifd_send_thread to trace the flags 2023-03-16 16:07:07 +01:00
multifd.h migration/multifd: Move load_cleanup inside incoming_state_destroy 2023-02-13 03:45:40 +01:00
page_cache.c migration: Fix cache_init()'s "Failed to allocate" error messages 2021-02-08 11:19:51 +00:00
page_cache.h migration: Clean up signed vs. unsigned XBZRLE cache-size 2021-02-08 11:19:51 +00:00
postcopy-ram.c migration: Fix potential race on postcopy_qemufile_src 2023-04-12 21:44:38 +02:00
postcopy-ram.h migration: Postpone postcopy preempt channel to be after main 2023-02-11 16:51:09 +01:00
qemu-file.c migration/qemu-file: Add qemu_file_get_to_fd() 2023-02-15 19:09:25 +01:00
qemu-file.h migration/qemu-file: Add qemu_file_get_to_fd() 2023-02-15 19:09:25 +01:00
ram.c migration: Rename res_{postcopy,precopy}_only 2023-02-15 20:04:30 +01:00
ram.h migration: Use atomic ops properly for page accountings 2022-12-15 10:30:37 +01:00
rdma.c migration/rdma: Remove deprecated variable rdma_return_path 2023-03-16 16:07:07 +01:00
rdma.h migration: Export rdma.c functions in its own file 2017-06-01 18:49:23 +02:00
savevm.c migration: Rename res_{postcopy,precopy}_only 2023-02-15 20:04:30 +01:00
savevm.h migration: Rename res_{postcopy,precopy}_only 2023-02-15 20:04:30 +01:00
socket.c migration: Postcopy preemption preparation on channel creation 2022-07-20 12:15:08 +01:00
socket.h migration: Postcopy preemption preparation on channel creation 2022-07-20 12:15:08 +01:00
target.c migration: fix populate_vfio_info 2023-03-16 16:07:07 +01:00
threadinfo.c migration: Introduce interface query-migrationthreads 2023-02-06 19:22:57 +01:00
threadinfo.h migration: Introduce interface query-migrationthreads 2023-02-06 19:22:57 +01:00
tls.c cleanup: Tweak and re-run return_directly.cocci 2022-12-14 16:19:35 +01:00
tls.h migration: Add helpers to detect TLS capability 2022-07-20 12:15:08 +01:00
trace-events migration: Remove unused res_compatible 2023-02-15 20:04:30 +01:00
trace.h trace: switch position of headers to what Meson requires 2020-08-21 06:18:24 -04:00
vmstate-types.c Move CPU softfloat unions to cpu-float.h 2022-04-06 14:31:43 +02:00
vmstate.c migration: Add canary to VMSTATE_END_OF_LIST 2023-02-06 19:22:56 +01:00
xbzrle.c migration/xbzrle: fix out-of-bounds write with axv512 2023-03-16 16:07:07 +01:00
xbzrle.h AVX512 support for xbzrle_encode_buffer 2023-02-11 16:51:09 +01:00
yank_functions.c migration: Move the yank unregister of channel_close out 2021-07-26 12:45:03 +01:00
yank_functions.h migration: Move the yank unregister of channel_close out 2021-07-26 12:45:03 +01:00