From beb71c1c02ed05a705f5af3e0395a35cfa7bb02b Mon Sep 17 00:00:00 2001 From: Max Reitz Date: Tue, 11 Aug 2020 10:41:50 +0200 Subject: [PATCH 1/6] iotests/059: Fix reference output As of the patch to flush qemu-img's "Formatting" message before the error message, 059 has been broken for vmdk. Fix it. Fixes: 4e2f4418784da09cb106264340241856cd2846df ("qemu-img: Flush stdout before before potential stderr messages") Signed-off-by: Max Reitz Message-Id: <20200811084150.326377-1-mreitz@redhat.com> Reviewed-by: Eric blake Signed-off-by: Eric Blake --- tests/qemu-iotests/059.out | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out index 6d127e28d8..2b83c0c8b6 100644 --- a/tests/qemu-iotests/059.out +++ b/tests/qemu-iotests/059.out @@ -19,8 +19,8 @@ file format: IMGFMT virtual size: 2 GiB (2147483648 bytes) === Testing monolithicFlat with zeroed_grain === -qemu-img: TEST_DIR/t.IMGFMT: Flat image can't enable zeroed grain Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2147483648 +qemu-img: TEST_DIR/t.IMGFMT: Flat image can't enable zeroed grain === Testing big twoGbMaxExtentFlat === Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824000 From 985d7f150c5d6ca0266c4c2844b54806311e0a40 Mon Sep 17 00:00:00 2001 From: Max Reitz Date: Tue, 11 Aug 2020 10:08:30 +0200 Subject: [PATCH 2/6] iotests/259: Fix reference output The error message has changed recently, breaking the test. Fix it. Fixes: a2b333c01880f56056d50c238834d62e32001e54 ("block: nbd: Fix convert qcow2 compressed to nbd") Signed-off-by: Max Reitz Message-Id: <20200811080830.289136-1-mreitz@redhat.com> Reviewed-by: Eric Blake Signed-off-by: Eric Blake --- tests/qemu-iotests/259.out | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/qemu-iotests/259.out b/tests/qemu-iotests/259.out index e27b9ff38d..1aaadfda4e 100644 --- a/tests/qemu-iotests/259.out +++ b/tests/qemu-iotests/259.out @@ -10,5 +10,5 @@ disk size: unavailable --- Testing creation for which the node would need to grow --- Formatting 'TEST_DIR/t.IMGFMT', fmt=qcow2 size=67108864 preallocation=metadata -qemu-img: TEST_DIR/t.IMGFMT: Could not resize image: Image format driver does not support resize +qemu-img: TEST_DIR/t.IMGFMT: Could not resize image: Cannot grow NBD nodes *** done From 1dc4718d849e1a1fe665ce5241ed79048cfa2cfc Mon Sep 17 00:00:00 2001 From: Vladimir Sementsov-Ogievskiy Date: Wed, 12 Aug 2020 17:52:37 +0300 Subject: [PATCH 3/6] block/nbd: use non-blocking connect: fix vm hang on connect() This makes nbd's connection_co yield during reconnects, so that reconnect doesn't block the main thread. This is very important in case of an unavailable nbd server host: connect() call may take a long time, blocking the main thread (and due to reconnect, it will hang again and again with small gaps of working time during pauses between connection attempts). Realization notes: - We don't want to implement non-blocking connect() over non-blocking socket, because getaddrinfo() doesn't have portable non-blocking realization anyway, so let's just use a thread for both getaddrinfo() and connect(). - We can't use qio_channel_socket_connect_async (which behaves similarly and starts a thread to execute connect() call), as it's relying on someone iterating main loop (g_main_loop_run() or something like this), which is not always the case. - We can't use thread_pool_submit_co API, as thread pool waits for all threads to finish (but we don't want to wait for blocking reconnect attempt on shutdown. So, we just create the thread by hand. Some additional difficulties are: - We want our connect to avoid blocking drained sections and aio context switches. To achieve this, we make it possible to "cancel" synchronous wait for the connect (which is a coroutine yield actually), still, the thread continues in background, and if successful, its result may be reused on next reconnect attempt. - We don't want to wait for reconnect on shutdown, so there is CONNECT_THREAD_RUNNING_DETACHED thread state, which means that the block layer is no longer interested in a result, and thread should close new connected socket on finish and free the state. How to reproduce the bug, fixed with this commit: 1. Create an image on node1: qemu-img create -f qcow2 xx 100M 2. Start NBD server on node1: qemu-nbd xx 3. Start vm with second nbd disk on node2, like this: ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \ file=/work/images/cent7.qcow2 -drive file=nbd+tcp://192.168.100.2 \ -vnc :0 -qmp stdio -m 2G -enable-kvm -vga std 4. Access the vm through vnc (or some other way?), and check that NBD drive works: dd if=/dev/sdb of=/dev/null bs=1M count=10 - the command should succeed. 5. Now, let's trigger nbd-reconnect loop in Qemu process. For this: 5.1 Kill NBD server on node1 5.2 run "dd if=/dev/sdb of=/dev/null bs=1M count=10" in the guest again. The command should fail and a lot of error messages about failing disk may appear as well. Now NBD client driver in Qemu tries to reconnect. Still, VM works well. 6. Make node1 unavailable on NBD port, so connect() from node2 will last for a long time: On node1 (Note, that 10809 is just a default NBD port): sudo iptables -A INPUT -p tcp --dport 10809 -j DROP After some time the guest hangs, and you may check in gdb that Qemu hangs in connect() call, issued from the main thread. This is the BUG. 7. Don't forget to drop iptables rule from your node1: sudo iptables -D INPUT -p tcp --dport 10809 -j DROP Signed-off-by: Vladimir Sementsov-Ogievskiy Message-Id: <20200812145237.4396-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake [eblake: minor wording and formatting tweaks] Signed-off-by: Eric Blake --- block/nbd.c | 266 +++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 265 insertions(+), 1 deletion(-) diff --git a/block/nbd.c b/block/nbd.c index 7bb881fef4..9daf003bea 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -38,6 +38,7 @@ #include "qapi/qapi-visit-sockets.h" #include "qapi/qmp/qstring.h" +#include "qapi/clone-visitor.h" #include "block/qdict.h" #include "block/nbd.h" @@ -62,6 +63,47 @@ typedef enum NBDClientState { NBD_CLIENT_QUIT } NBDClientState; +typedef enum NBDConnectThreadState { + /* No thread, no pending results */ + CONNECT_THREAD_NONE, + + /* Thread is running, no results for now */ + CONNECT_THREAD_RUNNING, + + /* + * Thread is running, but requestor exited. Thread should close + * the new socket and free the connect state on exit. + */ + CONNECT_THREAD_RUNNING_DETACHED, + + /* Thread finished, results are stored in a state */ + CONNECT_THREAD_FAIL, + CONNECT_THREAD_SUCCESS +} NBDConnectThreadState; + +typedef struct NBDConnectThread { + /* Initialization constants */ + SocketAddress *saddr; /* address to connect to */ + /* + * Bottom half to schedule on completion. Scheduled only if bh_ctx is not + * NULL + */ + QEMUBHFunc *bh_func; + void *bh_opaque; + + /* + * Result of last attempt. Valid in FAIL and SUCCESS states. + * If you want to steal error, don't forget to set pointer to NULL. + */ + QIOChannelSocket *sioc; + Error *err; + + /* state and bh_ctx are protected by mutex */ + QemuMutex mutex; + NBDConnectThreadState state; /* current state of the thread */ + AioContext *bh_ctx; /* where to schedule bh (NULL means don't schedule) */ +} NBDConnectThread; + typedef struct BDRVNBDState { QIOChannelSocket *sioc; /* The master data channel */ QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */ @@ -91,10 +133,17 @@ typedef struct BDRVNBDState { QCryptoTLSCreds *tlscreds; const char *hostname; char *x_dirty_bitmap; + + bool wait_connect; + NBDConnectThread *connect_thread; } BDRVNBDState; static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr, Error **errp); +static QIOChannelSocket *nbd_co_establish_connection(BlockDriverState *bs, + Error **errp); +static void nbd_co_establish_connection_cancel(BlockDriverState *bs, + bool detach); static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc, Error **errp); @@ -191,6 +240,8 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs) if (s->connection_co_sleep_ns_state) { qemu_co_sleep_wake(s->connection_co_sleep_ns_state); } + + nbd_co_establish_connection_cancel(bs, false); } static void coroutine_fn nbd_client_co_drain_end(BlockDriverState *bs) @@ -223,6 +274,7 @@ static void nbd_teardown_connection(BlockDriverState *bs) if (s->connection_co_sleep_ns_state) { qemu_co_sleep_wake(s->connection_co_sleep_ns_state); } + nbd_co_establish_connection_cancel(bs, true); } if (qemu_in_coroutine()) { s->teardown_co = qemu_coroutine_self(); @@ -246,6 +298,216 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s) return s->state == NBD_CLIENT_CONNECTING_WAIT; } +static void connect_bh(void *opaque) +{ + BDRVNBDState *state = opaque; + + assert(state->wait_connect); + state->wait_connect = false; + aio_co_wake(state->connection_co); +} + +static void nbd_init_connect_thread(BDRVNBDState *s) +{ + s->connect_thread = g_new(NBDConnectThread, 1); + + *s->connect_thread = (NBDConnectThread) { + .saddr = QAPI_CLONE(SocketAddress, s->saddr), + .state = CONNECT_THREAD_NONE, + .bh_func = connect_bh, + .bh_opaque = s, + }; + + qemu_mutex_init(&s->connect_thread->mutex); +} + +static void nbd_free_connect_thread(NBDConnectThread *thr) +{ + if (thr->sioc) { + qio_channel_close(QIO_CHANNEL(thr->sioc), NULL); + } + error_free(thr->err); + qapi_free_SocketAddress(thr->saddr); + g_free(thr); +} + +static void *connect_thread_func(void *opaque) +{ + NBDConnectThread *thr = opaque; + int ret; + bool do_free = false; + + thr->sioc = qio_channel_socket_new(); + + error_free(thr->err); + thr->err = NULL; + ret = qio_channel_socket_connect_sync(thr->sioc, thr->saddr, &thr->err); + if (ret < 0) { + object_unref(OBJECT(thr->sioc)); + thr->sioc = NULL; + } + + qemu_mutex_lock(&thr->mutex); + + switch (thr->state) { + case CONNECT_THREAD_RUNNING: + thr->state = ret < 0 ? CONNECT_THREAD_FAIL : CONNECT_THREAD_SUCCESS; + if (thr->bh_ctx) { + aio_bh_schedule_oneshot(thr->bh_ctx, thr->bh_func, thr->bh_opaque); + + /* play safe, don't reuse bh_ctx on further connection attempts */ + thr->bh_ctx = NULL; + } + break; + case CONNECT_THREAD_RUNNING_DETACHED: + do_free = true; + break; + default: + abort(); + } + + qemu_mutex_unlock(&thr->mutex); + + if (do_free) { + nbd_free_connect_thread(thr); + } + + return NULL; +} + +static QIOChannelSocket *coroutine_fn +nbd_co_establish_connection(BlockDriverState *bs, Error **errp) +{ + QemuThread thread; + BDRVNBDState *s = bs->opaque; + QIOChannelSocket *res; + NBDConnectThread *thr = s->connect_thread; + + qemu_mutex_lock(&thr->mutex); + + switch (thr->state) { + case CONNECT_THREAD_FAIL: + case CONNECT_THREAD_NONE: + error_free(thr->err); + thr->err = NULL; + thr->state = CONNECT_THREAD_RUNNING; + qemu_thread_create(&thread, "nbd-connect", + connect_thread_func, thr, QEMU_THREAD_DETACHED); + break; + case CONNECT_THREAD_SUCCESS: + /* Previous attempt finally succeeded in background */ + thr->state = CONNECT_THREAD_NONE; + res = thr->sioc; + thr->sioc = NULL; + qemu_mutex_unlock(&thr->mutex); + return res; + case CONNECT_THREAD_RUNNING: + /* Already running, will wait */ + break; + default: + abort(); + } + + thr->bh_ctx = qemu_get_current_aio_context(); + + qemu_mutex_unlock(&thr->mutex); + + + /* + * We are going to wait for connect-thread finish, but + * nbd_client_co_drain_begin() can interrupt. + * + * Note that wait_connect variable is not visible for connect-thread. It + * doesn't need mutex protection, it used only inside home aio context of + * bs. + */ + s->wait_connect = true; + qemu_coroutine_yield(); + + qemu_mutex_lock(&thr->mutex); + + switch (thr->state) { + case CONNECT_THREAD_SUCCESS: + case CONNECT_THREAD_FAIL: + thr->state = CONNECT_THREAD_NONE; + error_propagate(errp, thr->err); + thr->err = NULL; + res = thr->sioc; + thr->sioc = NULL; + break; + case CONNECT_THREAD_RUNNING: + case CONNECT_THREAD_RUNNING_DETACHED: + /* + * Obviously, drained section wants to start. Report the attempt as + * failed. Still connect thread is executing in background, and its + * result may be used for next connection attempt. + */ + res = NULL; + error_setg(errp, "Connection attempt cancelled by other operation"); + break; + + case CONNECT_THREAD_NONE: + /* + * Impossible. We've seen this thread running. So it should be + * running or at least give some results. + */ + abort(); + + default: + abort(); + } + + qemu_mutex_unlock(&thr->mutex); + + return res; +} + +/* + * nbd_co_establish_connection_cancel + * Cancel nbd_co_establish_connection asynchronously: it will finish soon, to + * allow drained section to begin. + * + * If detach is true, also cleanup the state (or if thread is running, move it + * to CONNECT_THREAD_RUNNING_DETACHED state). s->connect_thread becomes NULL if + * detach is true. + */ +static void nbd_co_establish_connection_cancel(BlockDriverState *bs, + bool detach) +{ + BDRVNBDState *s = bs->opaque; + NBDConnectThread *thr = s->connect_thread; + bool wake = false; + bool do_free = false; + + qemu_mutex_lock(&thr->mutex); + + if (thr->state == CONNECT_THREAD_RUNNING) { + /* We can cancel only in running state, when bh is not yet scheduled */ + thr->bh_ctx = NULL; + if (s->wait_connect) { + s->wait_connect = false; + wake = true; + } + if (detach) { + thr->state = CONNECT_THREAD_RUNNING_DETACHED; + s->connect_thread = NULL; + } + } else if (detach) { + do_free = true; + } + + qemu_mutex_unlock(&thr->mutex); + + if (do_free) { + nbd_free_connect_thread(thr); + s->connect_thread = NULL; + } + + if (wake) { + aio_co_wake(s->connection_co); + } +} + static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) { int ret; @@ -289,7 +551,7 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) s->ioc = NULL; } - sioc = nbd_establish_connection(s->saddr, &local_err); + sioc = nbd_co_establish_connection(s->bs, &local_err); if (!sioc) { ret = -ECONNREFUSED; goto out; @@ -1946,6 +2208,8 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags, /* successfully connected */ s->state = NBD_CLIENT_CONNECTED; + nbd_init_connect_thread(s); + s->connection_co = qemu_coroutine_create(nbd_connection_entry, s); bdrv_inc_in_flight(bs); aio_co_schedule(bdrv_get_aio_context(bs), s->connection_co); From 98c5d2e7010a60eddeabd057c9e0cd4e3a08f85f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20P=2E=20Berrang=C3=A9?= Date: Tue, 25 Aug 2020 11:38:48 +0100 Subject: [PATCH 4/6] block: add missing socket_init() calls to tools MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Any tool that uses sockets needs to call socket_init() in order to work on the Windows platform. Reviewed-by: Eric Blake Signed-off-by: Daniel P. Berrangé Message-Id: <20200825103850.119911-2-berrange@redhat.com> Signed-off-by: Eric Blake --- qemu-img.c | 2 ++ qemu-io.c | 2 ++ qemu-nbd.c | 1 + 3 files changed, 5 insertions(+) diff --git a/qemu-img.c b/qemu-img.c index 5308773811..eb2fc1f862 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -41,6 +41,7 @@ #include "qemu/log.h" #include "qemu/main-loop.h" #include "qemu/module.h" +#include "qemu/sockets.h" #include "qemu/units.h" #include "qom/object_interfaces.h" #include "sysemu/block-backend.h" @@ -5410,6 +5411,7 @@ int main(int argc, char **argv) signal(SIGPIPE, SIG_IGN); #endif + socket_init(); error_init(argv[0]); module_call_init(MODULE_INIT_TRACE); qemu_init_exec_dir(argv[0]); diff --git a/qemu-io.c b/qemu-io.c index 3adc5a7d0d..7cc832b3d6 100644 --- a/qemu-io.c +++ b/qemu-io.c @@ -25,6 +25,7 @@ #include "qemu/config-file.h" #include "qemu/readline.h" #include "qemu/log.h" +#include "qemu/sockets.h" #include "qapi/qmp/qstring.h" #include "qapi/qmp/qdict.h" #include "qom/object_interfaces.h" @@ -542,6 +543,7 @@ int main(int argc, char **argv) signal(SIGPIPE, SIG_IGN); #endif + socket_init(); error_init(argv[0]); module_call_init(MODULE_INIT_TRACE); qemu_init_exec_dir(argv[0]); diff --git a/qemu-nbd.c b/qemu-nbd.c index d2657b8db5..b102874f0f 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -599,6 +599,7 @@ int main(int argc, char **argv) signal(SIGPIPE, SIG_IGN); #endif + socket_init(); error_init(argv[0]); module_call_init(MODULE_INIT_TRACE); qcrypto_init(&error_fatal); From 6e64dd572aa548aa6664ed02c6901d691f6a10ba Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20P=2E=20Berrang=C3=A9?= Date: Tue, 25 Aug 2020 11:38:49 +0100 Subject: [PATCH 5/6] nbd: skip SIGTERM handler if NBD device support is not built MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The termsig_handler function is used by the client thread handling the host NBD device connection to do a graceful shutdown. IOW, if we have disabled NBD device support at compile time, we don't need the SIGTERM handler. This fixes a build issue for Windows. Signed-off-by: Daniel P. Berrangé Message-Id: <20200825103850.119911-3-berrange@redhat.com> Reviewed-by: Eric Blake Signed-off-by: Eric Blake --- qemu-nbd.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/qemu-nbd.c b/qemu-nbd.c index b102874f0f..dc6ef089af 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -155,12 +155,13 @@ QEMU_COPYRIGHT "\n" , name); } +#if HAVE_NBD_DEVICE static void termsig_handler(int signum) { atomic_cmpxchg(&state, RUNNING, TERMINATE); qemu_notify_event(); } - +#endif /* HAVE_NBD_DEVICE */ static int qemu_nbd_client_list(SocketAddress *saddr, QCryptoTLSCreds *tls, const char *hostname) @@ -587,6 +588,7 @@ int main(int argc, char **argv) unsigned socket_activation; const char *pid_file_name = NULL; +#if HAVE_NBD_DEVICE /* The client thread uses SIGTERM to interrupt the server. A signal * handler ensures that "qemu-nbd -v -c" exits with a nice status code. */ @@ -594,6 +596,7 @@ int main(int argc, char **argv) memset(&sa_sigterm, 0, sizeof(sa_sigterm)); sa_sigterm.sa_handler = termsig_handler; sigaction(SIGTERM, &sa_sigterm, NULL); +#endif /* HAVE_NBD_DEVICE */ #ifdef CONFIG_POSIX signal(SIGPIPE, SIG_IGN); From eb705985f43d631438a318f1146eac61ae10d273 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20P=2E=20Berrang=C3=A9?= Date: Tue, 25 Aug 2020 11:38:50 +0100 Subject: [PATCH 6/6] nbd: disable signals and forking on Windows builds MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Disabling these parts are sufficient to get the qemu-nbd program compiling in a Windows build. Signed-off-by: Daniel P. Berrangé Message-Id: <20200825103850.119911-4-berrange@redhat.com> Reviewed-by: Eric Blake Signed-off-by: Eric Blake --- meson.build | 7 ++----- qemu-nbd.c | 5 +++++ 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/meson.build b/meson.build index 55c7d2318c..5aaa364730 100644 --- a/meson.build +++ b/meson.build @@ -1095,12 +1095,9 @@ if have_tools dependencies: [authz, block, crypto, io, qom, qemuutil], install: true) qemu_io = executable('qemu-io', files('qemu-io.c'), dependencies: [block, qemuutil], install: true) - qemu_block_tools += [qemu_img, qemu_io] - if targetos != 'windows' - qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'), + qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'), dependencies: [block, qemuutil], install: true) - qemu_block_tools += [qemu_nbd] - endif + qemu_block_tools += [qemu_img, qemu_io, qemu_nbd] subdir('storage-daemon') subdir('contrib/rdmacm-mux') diff --git a/qemu-nbd.c b/qemu-nbd.c index dc6ef089af..33476a1000 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -899,6 +899,7 @@ int main(int argc, char **argv) #endif if ((device && !verbose) || fork_process) { +#ifndef WIN32 int stderr_fd[2]; pid_t pid; int ret; @@ -962,6 +963,10 @@ int main(int argc, char **argv) */ exit(errors); } +#else /* WIN32 */ + error_report("Unable to fork into background on Windows hosts"); + exit(EXIT_FAILURE); +#endif /* WIN32 */ } if (device != NULL && sockpath == NULL) {