xemu/block
Vladimir Sementsov-Ogievskiy 1dc4718d84 block/nbd: use non-blocking connect: fix vm hang on connect()
This makes nbd's connection_co yield during reconnects, so that
reconnect doesn't block the main thread. This is very important in
case of an unavailable nbd server host: connect() call may take a long
time, blocking the main thread (and due to reconnect, it will hang
again and again with small gaps of working time during pauses between
connection attempts).

Realization notes:

 - We don't want to implement non-blocking connect() over non-blocking
 socket, because getaddrinfo() doesn't have portable non-blocking
 realization anyway, so let's just use a thread for both getaddrinfo()
 and connect().

 - We can't use qio_channel_socket_connect_async (which behaves
 similarly and starts a thread to execute connect() call), as it's relying
 on someone iterating main loop (g_main_loop_run() or something like
 this), which is not always the case.

 - We can't use thread_pool_submit_co API, as thread pool waits for all
 threads to finish (but we don't want to wait for blocking reconnect
 attempt on shutdown.

 So, we just create the thread by hand. Some additional difficulties
 are:

 - We want our connect to avoid blocking drained sections and aio context
 switches. To achieve this, we make it possible to "cancel" synchronous
 wait for the connect (which is a coroutine yield actually), still,
 the thread continues in background, and if successful, its result may be
 reused on next reconnect attempt.

 - We don't want to wait for reconnect on shutdown, so there is
 CONNECT_THREAD_RUNNING_DETACHED thread state, which means that the block
 layer is no longer interested in a result, and thread should close new
 connected socket on finish and free the state.

How to reproduce the bug, fixed with this commit:

1. Create an image on node1:
   qemu-img create -f qcow2 xx 100M

2. Start NBD server on node1:
   qemu-nbd xx

3. Start vm with second nbd disk on node2, like this:

  ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
    file=/work/images/cent7.qcow2 -drive file=nbd+tcp://192.168.100.2 \
    -vnc :0 -qmp stdio -m 2G -enable-kvm -vga std

4. Access the vm through vnc (or some other way?), and check that NBD
   drive works:

   dd if=/dev/sdb of=/dev/null bs=1M count=10

   - the command should succeed.

5. Now, let's trigger nbd-reconnect loop in Qemu process. For this:

5.1 Kill NBD server on node1

5.2 run "dd if=/dev/sdb of=/dev/null bs=1M count=10" in the guest
    again. The command should fail and a lot of error messages about
    failing disk may appear as well.

    Now NBD client driver in Qemu tries to reconnect.
    Still, VM works well.

6. Make node1 unavailable on NBD port, so connect() from node2 will
   last for a long time:

   On node1 (Note, that 10809 is just a default NBD port):

   sudo iptables -A INPUT -p tcp --dport 10809 -j DROP

   After some time the guest hangs, and you may check in gdb that Qemu
   hangs in connect() call, issued from the main thread. This is the
   BUG.

7. Don't forget to drop iptables rule from your node1:

   sudo iptables -D INPUT -p tcp --dport 10809 -j DROP

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200812145237.4396-1-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: minor wording and formatting tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-09-02 16:47:23 -05:00
..
monitor meson: convert block 2020-08-21 06:30:18 -04:00
accounting.c block: add empty account cookie type 2019-10-10 10:56:18 +02:00
aio_task.c block: introduce aio task pool 2019-10-10 10:56:17 +02:00
amend.c block/amend: Check whether the node exists 2020-07-27 12:37:25 +02:00
backup-top.c block: Drop @child_class from bdrv_child_perm() 2020-05-18 19:05:25 +02:00
backup-top.h block: introduce backup-top filter driver 2019-10-10 10:56:18 +02:00
backup.c backup: Make sure that source and target size match 2020-05-08 13:26:35 +02:00
blkdebug.c error: Eliminate error_propagate() with Coccinelle, part 2 2020-07-10 15:18:08 +02:00
blklogwrites.c error: Eliminate error_propagate() with Coccinelle, part 2 2020-07-10 15:18:08 +02:00
blkreplay.c block: Use bdrv_default_perms() 2020-05-18 19:05:25 +02:00
blkverify.c error: Eliminate error_propagate() with Coccinelle, part 2 2020-07-10 15:18:08 +02:00
block-backend.c block: fix bdrv_aio_cancel() for ENOMEDIUM requests 2020-07-21 12:00:38 +02:00
block-copy.c block/block-copy: always align copied region to cluster size 2020-08-10 17:12:46 +02:00
bochs.c block: Use bdrv_default_perms() 2020-05-18 19:05:25 +02:00
cloop.c block: Use bdrv_default_perms() 2020-05-18 19:05:25 +02:00
commit.c block: Drop @child_class from bdrv_child_perm() 2020-05-18 19:05:25 +02:00
copy-on-read.c block: Drop @child_class from bdrv_child_perm() 2020-05-18 19:05:25 +02:00
create.c block/create: Do not abort if a block driver is not available 2019-09-13 12:18:37 +02:00
crypto.c block/crypto: disallow write sharing by default 2020-07-21 10:49:02 +02:00
crypto.h block/crypto: implement the encryption key management 2020-07-06 08:49:28 +02:00
curl.c error: Eliminate error_propagate() with Coccinelle, part 1 2020-07-10 15:18:08 +02:00
dirty-bitmap.c block/dirty-bitmap: add bdrv_has_named_bitmaps helper 2020-05-28 13:15:22 -05:00
dmg-bz2.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
dmg-lzfse.c block: adding lzfse decompressing support as a module. 2018-12-14 11:52:40 +01:00
dmg.c block: Use bdrv_default_perms() 2020-05-18 19:05:25 +02:00
dmg.h Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
file-posix.c file-posix: Handle `EINVAL` fallocate return value 2020-07-21 16:28:57 +02:00
file-win32.c error: Eliminate error_propagate() with Coccinelle, part 2 2020-07-10 15:18:08 +02:00
filter-compress.c block: Use bdrv_default_perms() 2020-05-18 19:05:25 +02:00
gluster.c error: Reduce unnecessary error propagation 2020-07-10 15:18:08 +02:00
io.c block: Fix bdrv_aligned_p*v() for qiov_offset != 0 2020-07-28 15:28:47 +02:00
io_uring.c io_uring: use io_uring_cq_ready() to check for ready cqes 2020-06-05 09:54:48 +01:00
iscsi-opts.c Include qemu/module.h where needed, drop it from qemu-common.h 2019-06-12 13:18:33 +02:00
iscsi.c iscsi: return -EIO when sense fields are meaningless 2020-07-10 18:02:23 -04:00
linux-aio.c misc: Replace zero-length arrays with flexible array member (automatic) 2020-03-16 22:07:42 +01:00
meson.build block: always link with zlib 2020-09-01 01:51:51 -04:00
mirror.c block: Drop @child_class from bdrv_child_perm() 2020-05-18 19:05:25 +02:00
nbd.c block/nbd: use non-blocking connect: fix vm hang on connect() 2020-09-02 16:47:23 -05:00
nfs.c qapi: Smooth another visitor error checking pattern 2020-07-10 15:18:08 +02:00
null.c replay: add BH oneshot event for block layer 2019-10-14 17:12:48 +02:00
nvme.c block/nvme: support nested aio_poll() 2020-06-23 15:46:08 +01:00
parallels.c error: Avoid error_propagate() after migrate_add_blocker() 2020-07-10 15:18:08 +02:00
parallels.h Clean up includes 2018-02-09 05:05:11 +01:00
qapi-sysemu.c block: Move system emulator QMP commands to block/qapi-sysemu.c 2020-03-06 17:15:38 +01:00
qapi.c block: Fix VM size field width in snapshot dump 2020-02-20 16:43:42 +01:00
qcow.c qcow: Tolerate backing_fmt= 2020-07-14 15:18:59 +02:00
qcow2-bitmap.c qcow2: Release read-only bitmaps when inactivated 2020-08-03 08:59:37 -05:00
qcow2-cache.c core: replace getpagesize() with qemu_real_host_page_size 2019-10-26 15:38:06 +02:00
qcow2-cluster.c qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters 2020-08-25 10:20:15 +02:00
qcow2-refcount.c qcow2: Add subcluster support to check_refcounts_l2() 2020-08-25 08:33:20 +02:00
qcow2-snapshot.c qcow2: Allow resize of images with internal snapshots 2020-05-05 13:17:36 +02:00
qcow2-threads.c qcow2: add zstd cluster compression 2020-05-13 14:20:31 +02:00
qcow2.c qcow2: Allow preallocation and backing files if extended_l2 is set 2020-08-25 09:20:04 +02:00
qcow2.h qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit 2020-08-25 09:19:55 +02:00
qed-check.c block/qed: add missed coroutine_fn markers 2019-04-30 15:29:00 +02:00
qed-cluster.c qed: protect table cache with CoMutex 2017-07-17 11:34:11 +08:00
qed-l2-cache.c qed: protect table cache with CoMutex 2017-07-17 11:34:11 +08:00
qed-table.c block/qed: add missed coroutine_fn markers 2019-04-30 15:29:00 +02:00
qed.c qapi: Smooth another visitor error checking pattern 2020-07-10 15:18:08 +02:00
qed.h qed: Simplify backing reads 2020-07-06 10:34:14 +02:00
quorum.c error: Reduce unnecessary error propagation 2020-07-10 15:18:08 +02:00
raw-format.c error: Eliminate error_propagate() with Coccinelle, part 2 2020-07-10 15:18:08 +02:00
rbd.c qapi: Smooth another visitor error checking pattern 2020-07-10 15:18:08 +02:00
replication.c error: Reduce unnecessary error propagation 2020-07-10 15:18:08 +02:00
sheepdog.c sheepdog: Add trivial backing_fmt support 2020-07-14 15:18:59 +02:00
snapshot.c block/snapshot: rename Error ** parameter to more common errp 2019-12-18 08:43:19 +01:00
ssh.c qapi: Smooth another visitor error checking pattern 2020-07-10 15:18:08 +02:00
stream.c block: Add support to warn on backing file change without format 2020-07-14 15:18:59 +02:00
throttle-groups.c throttle-groups: Move ThrottleGroup typedef to header 2020-08-27 14:04:54 -04:00
throttle.c error: Eliminate error_propagate() with Coccinelle, part 2 2020-07-10 15:18:08 +02:00
trace-events qcow2: Make Qcow2AioTask store the full host offset 2020-08-25 08:33:20 +02:00
trace.h trace: switch position of headers to what Meson requires 2020-08-21 06:18:24 -04:00
vdi.c error: Avoid error_propagate() after migrate_add_blocker() 2020-07-10 15:18:08 +02:00
vhdx-endian.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
vhdx-log.c block: Add flags to bdrv(_co)_truncate() 2020-04-30 17:51:07 +02:00
vhdx.c error: Avoid error_propagate() after migrate_add_blocker() 2020-07-10 15:18:08 +02:00
vhdx.h block/vhdx: Use IEC binary prefixes for size constants 2019-04-30 15:29:00 +02:00
vmdk.c block/vmdk: Remove superfluous breaks 2020-09-01 08:37:28 +02:00
vpc.c error: Avoid error_propagate() after migrate_add_blocker() 2020-07-10 15:18:08 +02:00
vvfat.c error: Avoid error_propagate() after migrate_add_blocker() 2020-07-10 15:18:08 +02:00
win32-aio.c Include qemu/module.h where needed, drop it from qemu-common.h 2019-06-12 13:18:33 +02:00
write-threshold.c qapi: Drop qapi_event_send_FOO()'s Error ** argument 2018-08-28 18:21:38 +02:00