ShuriZma/xemu - xemu

Commit Graph

Author	SHA1	Message	Date
Viktor Prutyanov	ebd979c74e	block/file-win32: add reopen handlers Make 'qemu-img commit' work on Windows. Command 'commit' requires reopening backing file in RW mode. So, add reopen prepare/commit/abort handlers and change dwShareMode for CreateFile call in order to allow further read/write reopening. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/418 Suggested-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Viktor Prutyanov <viktor.prutyanov@phystech.edu> Tested-by: Helge Konetzka <hk@zapateado.de> Message-Id: <20210825173625.19415-1-viktor.prutyanov@phystech.edu> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:38:08 +02:00
Fabrice Fontaine	28031d5c74	block/export/fuse.c: fix fuse-lseek on uclibc or musl Include linux/fs.h to avoid the following build failure on uclibc or musl raised since version 6.0.0: ../block/export/fuse.c: In function 'fuse_lseek': ../block/export/fuse.c:641:19: error: 'SEEK_HOLE' undeclared (first use in this function) 641 \| if (whence != SEEK_HOLE && whence != SEEK_DATA) { \| ^~~~~~~~~ ../block/export/fuse.c:641:19: note: each undeclared identifier is reported only once for each function it appears in ../block/export/fuse.c:641:42: error: 'SEEK_DATA' undeclared (first use in this function); did you mean 'SEEK_SET'? 641 \| if (whence != SEEK_HOLE && whence != SEEK_DATA) { \| ^~~~~~~~~ \| SEEK_SET Fixes: - http://autobuild.buildroot.org/results/33c90ebf04997f4d3557cfa66abc9cf9a3076137 Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com> Message-Id: <20210827220301.272887-1-fontaine.fabrice@gmail.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:38:08 +02:00
Vladimir Sementsov-Ogievskiy	abde8ac2a5	block/block-copy: block_copy_state_new(): drop extra arguments The only caller pass copy_range and compress both false. Let's just drop these arguments. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210824083856.17408-35-vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:38:08 +02:00
Vladimir Sementsov-Ogievskiy	751cec7a26	block/copy-before-write: make public block driver Finally, copy-before-write gets own .bdrv_open and .bdrv_close handlers, block_init() call and becomes available through bdrv_open(). To achieve this: - cbw_init gets unused flags argument and becomes cbw_open - block_copy_state_free() call moved to new cbw_close() - in bdrv_cbw_append: - options are completed with driver and node-name, and we can simply use bdrv_insert_node() to do both open and drained replacing - in bdrv_cbw_drop: - cbw_close() is now responsible for freeing s->bcs, so don't do it here Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-22-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	201b4bb6c7	block/block-copy: make setting progress optional Now block-copy will crash if user don't set progress meter by block_copy_set_progress_meter(). copy-before-write filter will be used in separate of backup job, and it doesn't want any progress meter (for now). So, allow not setting it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-21-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	06e0a9c164	block/copy-before-write: initialize block-copy bitmap We are going to publish copy-before-write filter to be used in separate of backup. Future step would support bitmap for the filter. But let's start from full set bitmap. We have to modify backup, as bitmap is first initialized by copy-before-write filter, and then backup modifies it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-20-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	f44fd7399c	block/copy-before-write: cbw_init(): use options One more step closer to .bdrv_open(): use options instead of plain arguments. Move to bdrv_open_child() calls, native for drive open handlers. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824083856.17408-19-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	4c1e992bf2	block/copy-before-write: bdrv_cbw_append(): drop unused compress arg Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824083856.17408-18-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	5a50742674	block/copy-before-write: cbw_init(): use file child after attaching In the next commit we'll get rid of source argument of cbw_init(). Prepare to it now, to make next commit simpler: move the code block that uses source below attaching the child and use bs->file->bs instead of source variable. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-17-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	fe7ea40c0e	block/copy-before-write: cbw_init(): rename variables One more step closer to real .bdrv_open() handler: use more usual names for bs being initialized and its state. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-16-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	1f0cacb967	block/copy-before-write: introduce cbw_init() Move part of bdrv_cbw_append() to new function cbw_open(). It's an intermediate step for adding normal .bdrv_open() handler to the filter. With this commit no logic is changed, but we have a function which will be turned into .bdrv_open() handler in future commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-15-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	7ddbce2dec	block/copy-before-write: bdrv_cbw_append(): replace child at last Refactor the function to replace child at last. Thus we don't need to revert it and code is simplified. block-copy state initialization being done before replacing the child doesn't need any drained section. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-14-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	3c1e63277e	block/copy-before-write: use file child instead of backing We are going to publish copy-before-write filter, and there no public backing-child-based filter in Qemu. No reason to create a precedent, so let's refactor copy-before-write filter instead. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-13-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	451532311a	block/copy-before-write: drop extra bdrv_unref on failure path bdrv_attach_child() do bdrv_unref() on failure, so we shouldn't do it by hand here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-12-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	3860c02019	block/copy-before-write: relax permission requirements when no parents We are going to publish copy-before-write filter. So, user should be able to create it with blockdev-add first, specifying both filtered and target children. And then do blockdev-reopen, to actually insert the filter where needed. Currently, filter unshares write permission unconditionally on source node. It's good, but it will not allow to do blockdev-add. So, let's relax restrictions when filter doesn't have any parent. Test output is modified, as now permission conflict happens only when job creates a blk parent for filter node. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-11-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:47 +02:00
Vladimir Sementsov-Ogievskiy	b518e9e9ef	block/backup: move cluster size calculation to block-copy The main consumer of cluster-size is block-copy. Let's calculate it here instead of passing through backup-top. We are going to publish copy-before-write filter soon, so it will be created through options. But we don't want for now to make explicit option for cluster-size, let's continue to calculate it automatically. So, now is the time to get rid of cluster_size argument for bdrv_cbw_append(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-10-vsementsov@virtuozzo.com> [hreitz: Add qemu/error-report.h include to block/block-copy.c] Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 14:03:11 +02:00
Vladimir Sementsov-Ogievskiy	2a6511dfeb	block/backup: set copy_range and compress after filter insertion We are going to publish copy-before-write filter, so it would be initialized through options. Still we don't want to publish compress and copy-range options, as 1. Modern way to enable compression is to use compress filter. 2. For copy-range it's unclean how to make proper interface: - it's has experimental prefix for backup job anyway - the whole BackupPerf structure doesn't make sense for the filter So, let's just add copy-range possibility to the filter later if needed. Still, we are going to continue support for compression and experimental copy-range in backup job. So, set these options after filter creation. Note, that we can drop "compress" argument of bdrv_cbw_append() now, as well as "perf". The only reason not doing so is that now, when I prepare this patch the big series around it is already reviewed and I want to avoid extra rebase conflicts to simplify review of the following version. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824083856.17408-9-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Vladimir Sementsov-Ogievskiy	f8b9504bac	block/block-copy: introduce block_copy_set_copy_opts() We'll need a possibility to set compress and use_copy_range options after initialization of the state. So make corresponding part of block_copy_state_new() separate and public. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210824083856.17408-8-vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Vladimir Sementsov-Ogievskiy	49577723d4	block-copy: move detecting fleecing scheme to block-copy We want to simplify initialization interface of copy-before-write filter as we are going to make it public. So, let's detect fleecing scheme exactly in block-copy code, to not pass this information through extra levels. Why not just set BDRV_REQ_SERIALISING unconditionally: because we are going to implement new more efficient fleecing scheme which will not rely on backing feature. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824083856.17408-7-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Vladimir Sementsov-Ogievskiy	d003e0aece	block: rename backup-top to copy-before-write We are going to convert backup_top to full featured public filter, which can be used in separate of backup job. Start from renaming from "how it used" to "what it does". While updating comments in 283 iotest, drop and rephrase also things about ".active", as this field is now dropped, and filter doesn't have "inactive" mode. Note that this change may be considered as incompatible interface change, as backup-top filter format name was visible through query-block and query-named-block-nodes. Still, consider the following reasoning: 1. backup-top was never documented, so if someone depends on format name (for driver that can't be used other than it is automatically inserted on backup job start), it's a kind of "undocumented feature use". So I think we are free to change it. 2. There is a hope, that there is no such users: it's a lot more native to give a good node-name to backup-top filter if need to operate with it somehow, and don't touch format name. 3. Another "incompatible" change in further commit would be moving copy-before-write filter from using backing child to file child. And this is even more reasonable than renaming: for now all public filters are file-child based. So, it's a risky change, but risk seems small and good interface worth it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-6-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Vladimir Sementsov-Ogievskiy	ed089506ee	block: introduce blk_replace_bs Add function to change bs inside blk. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-3-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Stefan Hajnoczi	b68ce82409	raw-format: drop WRITE and RESIZE child perms when possible The following command-line fails due to a permissions conflict: $ qemu-storage-daemon \ --blockdev driver=nvme,node-name=nvme0,device=0000:08:00.0,namespace=1 \ --blockdev driver=raw,node-name=l1-1,file=nvme0,offset=0,size=1073741824 \ --blockdev driver=raw,node-name=l1-2,file=nvme0,offset=1073741824,size=1073741824 \ --nbd-server addr.type=unix,addr.path=/tmp/nbd.sock,max-connections=2 \ --export type=nbd,id=nbd-l1-1,node-name=l1-1,name=l1-1,writable=on \ --export type=nbd,id=nbd-l1-2,node-name=l1-2,name=l1-2,writable=on qemu-storage-daemon: --export type=nbd,id=nbd-l1-1,node-name=l1-1,name=l1-1,writable=on: Permission conflict on node 'nvme0': permissions 'resize' are both required by node 'l1-1' (uses node 'nvme0' as 'file' child) and unshared by node 'l1-2' (uses node 'nvme0' as 'file' child). The problem is that block/raw-format.c relies on bdrv_default_perms() to set permissions on the nvme node. The default permissions add RESIZE in anticipation of a format driver like qcow2 that needs to grow the image file. This fails because RESIZE is unshared, so we cannot get the RESIZE permission. Max Reitz pointed out that block/crypto.c already handles this case by implementing a custom ->bdrv_child_perm() function that adjusts the result of bdrv_default_perms(). This patch takes the same approach in block/raw-format.c so that RESIZE is only required if it's actually necessary (e.g. the parent is qcow2). Cc: Max Reitz <mreitz@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210726122839.822900-1-stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Mao Zhongyi	8cca0bd289	block/monitor: Consolidate hmp_handle_error calls to reduce redundant code Signed-off-by: Mao Zhongyi <maozhongyi@cmss.chinamobile.com> Message-Id: <20210802062507.347555-1-maozhongyi@cmss.chinamobile.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-09-01 12:57:31 +02:00
Fabrice Fontaine	50482fda98	block/export/fuse.c: fix musl build Fix the following build failure on musl raised since version 6.0.0 and `4ca37a96a7` because musl does not define FALLOC_FL_ZERO_RANGE: ../block/export/fuse.c: In function 'fuse_fallocate': ../block/export/fuse.c:563:23: error: 'FALLOC_FL_ZERO_RANGE' undeclared (first use in this function) 563 \| } else if (mode & FALLOC_FL_ZERO_RANGE) { \| ^~~~~~~~~~~~~~~~~~~~ Fixes: - http://autobuild.buildroot.org/results/b96e3d364fd1f8bbfb18904a742e73327d308f64 Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com> Message-Id: <20210809095101.1101336-1-fontaine.fabrice@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Hanna Reitz <hreitz@redhat.com>	2021-08-09 17:19:27 +02:00
Kevin Wolf	87ab880252	block: Fix in_flight leak in request padding error path When bdrv_pad_request() fails in bdrv_co_preadv_part(), bs->in_flight has been increased, but is never decreased again. This leads to a hang when trying to drain the block node. This bug was observed with Windows guests which issue a request that fully uses IOV_MAX during installation, so that when padding is necessary (O_DIRECT with a 4k sector size block device on the host), adding another entry causes failure. Call bdrv_dec_in_flight() to fix this. There is a larger problem to solve here because this request shouldn't even fail, but Windows doesn't seem to care and with this minimal fix the installation succeeds. So given that we're already in freeze, let's take this minimal fix for 6.1. Fixes: `98ca45494f` Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1972079 Reported-by: Qing Wang <qinwang@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210727154923.91067-1-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-08-03 15:43:30 +02:00
Fabian Ebner	54caccb365	block/io_uring: resubmit when result is -EAGAIN Linux SCSI can throw spurious -EAGAIN in some corner cases in its completion path, which will end up being the result in the completed io_uring request. Resubmitting such requests should allow block jobs to complete, even if such spurious errors are encountered. Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> Message-id: 20210729091029.65369-1-f.ebner@proxmox.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-07-29 17:14:55 +01:00
Philippe Mathieu-Daudé	15a730e7a3	block/nvme: Fix VFIO_MAP_DMA failed: No space left on device When the NVMe block driver was introduced (see commit `bdd6a90a9e`, January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning -ENOMEM in case of error. The driver was correctly handling the error path to recycle its volatile IOVA mappings. To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit DMA mappings per container", April 2019) added the -ENOSPC error to signal the user exhausted the DMA mappings available for a container. The block driver started to mis-behave: qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device (qemu) (qemu) info status VM status: paused (io-error) (qemu) c VFIO_MAP_DMA failed: No space left on device (qemu) c VFIO_MAP_DMA failed: No space left on device (The VM is not resumable from here, hence stuck.) Fix by handling the new -ENOSPC error (when DMA mappings are exhausted) without any distinction to the current -ENOMEM error, so we don't change the behavior on old kernels where the CVE-2019-3882 fix is not present. An easy way to reproduce this bug is to restrict the DMA mapping limit (65535 by default) when loading the VFIO IOMMU module: # modprobe vfio_iommu_type1 dma_entry_limit=666 Cc: qemu-stable@nongnu.org Cc: Fam Zheng <fam@euphon.net> Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Reported-by: Michal Prívozník <mprivozn@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20210723195843.1032825-1-philmd@redhat.com Fixes: `bdd6a90a9e` ("block: Add VFIO based NVMe driver") Buglink: https://bugs.launchpad.net/qemu/+bug/1863333 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-07-26 09:38:12 +01:00
Eric Blake	94075c28ee	iotests: Improve and rename test 291 to qemu-img-bitmap Enhance the test to demonstrate existing less-than-stellar behavior of qemu-img with a qcow2 image containing an inconsistent bitmap: we don't diagnose the problem until after copying the entire image (a potentially long time), and when we do diagnose the failure, we still end up leaving an empty bitmap in the destination. This mess will be cleaned up in the next patch. While at it, rename the test now that we support useful iotest names, and fix a missing newline in the error message thus exposed. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20210709153951.2801666-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Nir Soffer <nsoffer@redhat.com>	2021-07-21 14:14:41 -05:00
Stefano Garzarella	d7ddd0a161	linux-aio: limit the batch size using `aio-max-batch` parameter When there are multiple queues attached to the same AIO context, some requests may experience high latency, since in the worst case the AIO engine queue is only flushed when it is full (MAX_EVENTS) or there are no more queues plugged. Commit `2558cb8dd4` ("linux-aio: increasing MAX_EVENTS to a larger hardcoded value") changed MAX_EVENTS from 128 to 1024, to increase the number of in-flight requests. But this change also increased the potential maximum batch to 1024 elements. When there is a single queue attached to the AIO context, the issue is mitigated from laio_io_unplug() that will flush the queue every time is invoked since there can't be others queue plugged. Let's use the new `aio-max-batch` IOThread parameter to mitigate this issue, limiting the number of requests in a batch. We also define a default value (32): this value is obtained running some benchmarks and it represents a good tradeoff between the latency increase while a request is queued and the cost of the io_submit(2) system call. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20210721094211.69853-4-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-07-21 13:47:50 +01:00
Max Reitz	8573823f3b	block/export: Conditionally ignore set-context error When invoking block-export-add with some iothread and fixed-iothread=false, and changing the node's iothread fails, the error is supposed to be ignored. However, it is still stored in errp, which is wrong. If a second error occurs, the "errp must be NULL" assertion in error_setv() fails: qemu-system-x86_64: ../util/error.c:59: error_setv: Assertion `*errp == NULL' failed. So if fixed-iothread=false, we should ignore the error by passing NULL to bdrv_try_set_aio_context(). Fixes: `f51d23c80a` ("block/export: add iothread and fixed-iothread options") Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210624083825.29224-2-mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-20 16:49:31 +02:00
Vladimir Sementsov-Ogievskiy	6af72274ef	block/vvfat: fix: drop backing Most probably this fake backing child doesn't work anyway (see notes about it in `a8a4d15c1c`). Still, since `25f78d9e2d` drivers are required to set .supports_backing if they want to call bdrv_set_backing_hd, so now vvfat just doesn't work because of this check. Let's finally drop this fake backing file. Fixes: `25f78d9e2d` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210715124853.13335-1-vsementsov@virtuozzo.com> Tested-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-20 16:30:20 +02:00
Lukas Straub	c2cf0ecab5	replication: Remove workaround Remove the workaround introduced in commit `6ecbc6c526` "replication: Avoid blk_make_empty() on read-only child". It is not needed anymore since s->hidden_disk is guaranteed to be writable when secondary_do_checkpoint() runs. Because replication_start(), _do_checkpoint() and _stop() are only called by COLO migration code and COLO-migration activates all disks via bdrv_invalidate_cache_all() before it calls these functions. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <d3acfad43879e9f376bffa7dd797ae74d0a7c81a.1626619393.git.lukasstraub2@web.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-20 16:11:53 +02:00
Lukas Straub	3b78420bb1	replication: Properly attach children The replication driver needs access to the children block-nodes of it's child so it can issue bdrv_make_empty() and bdrv_co_pwritev() to manage the replication. However, it does this by directly copying the BdrvChilds, which is wrong. Fix this by properly attaching the block-nodes with bdrv_attach_child() and requesting the required permissions. This ultimatively fixes a potential crash in replication_co_writev(), because it may write to s->secondary_disk if it is in state BLOCK_REPLICATION_FAILOVER_FAILED, without requesting write permissions first. And now the workaround in secondary_do_checkpoint() can be removed. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <5d0539d729afb8072d0d7cde977c5066285591b4.1626619393.git.lukasstraub2@web.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-20 16:11:53 +02:00
Lukas Straub	a990a42b39	replication: Reduce usage of s->hidden_disk and s->secondary_disk In preparation for the next patch, initialize s->hidden_disk and s->secondary_disk later and replace access to them with local variables in the places where they aren't initialized yet. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1eb9dc179267207d9c7eccaeb30761758e32e9ab.1626619393.git.lukasstraub2@web.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-20 16:11:53 +02:00
Lukas Straub	1e12ecfd2c	replication: Remove s->active_disk s->active_disk is bs->file. Remove it and use local variables instead. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <2534f867ea9be5b666dfce19744b7d4e2b96c976.1626619393.git.lukasstraub2@web.de> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-20 16:08:38 +02:00
Vladimir Sementsov-Ogievskiy	d44dae1a7c	block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts It's possible that requests start to wait each other in mirror_wait_on_conflicts(). To avoid it let's use same technique as in block/io.c in bdrv_wait_serialising_requests_locked() / bdrv_find_conflicting_request(): don't wait on intersecting request if it is already waiting for some other request. For details of the dead-lock look at testIntersectingActiveIO() test-case which we actually fixing now. Fixes: `d06107ade0` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210702211636.228981-4-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-20 13:14:45 +02:00
Vladimir Sementsov-Ogievskiy	ead3f1bff9	block/mirror: set .co for active-write MirrorOp objects This field is unused, but it very helpful for debugging. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210702211636.228981-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-20 13:14:45 +02:00
Emanuele Giuseppe Esposito	36109bff17	blkdebug: protect rules and suspended_reqs with a lock First, categorize the structure fields to identify what needs to be protected and what doesn't. We essentially need to protect only .state, and the 3 lists in BDRVBlkdebugState. Then, add the lock and mark the functions accordingly. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210614082931.24925-7-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-07-19 17:38:38 +02:00
Emanuele Giuseppe Esposito	4153b553bd	block/blkdebug: remove new_state field and instead use a local variable There seems to be no benefit in using a field. Replace it with a local variable, and move the state update before the yields. The state update has do be done before the yields because now using a local variable does not allow the new updated state to be visible by the other yields. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210614082931.24925-6-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-07-19 17:38:38 +02:00
Emanuele Giuseppe Esposito	2196c341f7	blkdebug: do not suspend in the middle of QLIST_FOREACH_SAFE That would be unsafe in case a rule other than the current one is removed while the coroutine has yielded. Keep FOREACH_SAFE because suspend_request deletes the current rule. After this patch, all matching rules are deleted before suspending the coroutine, rather than just one. This doesn't affect the existing testcases. Use actions_count to see how many yield to issue. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210614082931.24925-5-eesposit@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-07-19 17:38:38 +02:00
Emanuele Giuseppe Esposito	51a463680d	blkdebug: track all actions Add a counter for each action that a rule can trigger. This is mainly used to keep track of how many coroutine_yield() we need to perform after processing all rules in the list. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210614082931.24925-4-eesposit@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-07-19 17:38:38 +02:00
Emanuele Giuseppe Esposito	f48ff5af13	blkdebug: move post-resume handling to resume_req_by_tag We want to move qemu_coroutine_yield() after the loop on rules, because QLIST_FOREACH_SAFE is wrong if the rule list is modified while the coroutine has yielded. Therefore move the suspended request to the heap and clean it up from the remove side. All that is left is for blkdebug_debug_event to handle the yielding. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210614082931.24925-3-eesposit@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-07-19 17:38:38 +02:00
Emanuele Giuseppe Esposito	69d0690c10	blkdebug: refactor removal of a suspended request Extract to a separate function. Do not rely on FOREACH_SAFE, which is only "safe" if the current node is removed---not if another node is removed. Instead, just walk the entire list from the beginning when asked to resume all suspended requests with a given tag. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210614082931.24925-2-eesposit@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-07-19 17:38:38 +02:00
Lukas Straub	0b9cd6b947	nbd: register yank function earlier Although unlikely, qemu might hang in nbd_send_request(). Allow recovery in this case by registering the yank function before calling it. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Message-Id: <20210704000730.1befb596@gecko.fritz.box> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-07-12 11:24:00 -05:00
Peter Maydell	d1987c8114	* More SVM fixes (Lara) * Module annotation database (Gerd) * Memory leak fixes (myself) * Build fixes (myself) * --with-devices-* support (Alex) -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDoeBgUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroMtFAgAippmxRt3lt+tcdSrCOZlKmxW6veK nUidtzfH5uE8vQsh5Q98WCEq871C/C+St1gK+q2H/MLrJeAqZD39DV+SKTuZ6Tcp 3jL0iYC+oO0OjkHppDQTUDweF9KrsAW1WEeNz2th1OUDSjBXuXbZ+N497taouX18 p2UN0gKNsOO2/QFrKL5KO7vSC56eBGoZz6gKtw/7dDtJBtizf1xKBRHW43b+CnQJ mHLs7Tj6oMC+vnMHkUKLH/6za3WJF1XHs5fp2isRgqoOSP8m0r6CMg8JnFIvmQf/ tbLospKSWqcgD5C5PlFm2wSOjdU7zuPKM7wchhKrrEIvdDPhXaKrlpwi5Q== =GFX1 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging * More SVM fixes (Lara) * Module annotation database (Gerd) * Memory leak fixes (myself) * Build fixes (myself) * --with-devices-* support (Alex) # gpg: Signature made Fri 09 Jul 2021 17:23:52 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (48 commits) meson: Use input/output for entitlements target configure: allow the selection of alternate config in the build configs: rename default-configs to configs and reorganise hw/arm: move CONFIG_V7M out of default-devices hw/arm: add dependency on OR_IRQ for XLNX_VERSAL meson: Introduce target-specific Kconfig meson: switch function tests from compilation to linking vl: fix leak of qdict_crumple return value target/i386: fix exceptions for MOV to DR target/i386: Added DR6 and DR7 consistency checks target/i386: Added MSRPM and IOPM size check monitor/tcg: move tcg hmp commands to accel/tcg, register them dynamically usb: build usb-host as module monitor/usb: register 'info usbhost' dynamically usb: drop usb_host_dev_is_scsi_storage hook monitor: allow register hmp commands accel: build tcg modular accel: add tcg module annotations accel: build qtest modular accel: add qtest module annotations ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-07-11 22:20:51 +01:00
Peter Maydell	42e1d798a6	Block layer patches - Make blockdev-reopen stable - Remove deprecated qemu-img backing file without format - rbd: Convert to coroutines and add write zeroes support - rbd: Updated MAINTAINERS - export/fuse: Allow other users access to the export - vhost-user: Fix backends without multiqueue support - Fix drive-backup transaction endless drained section -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmDoRdIRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9bvgQ/+Ogq24n1UOQc8FEKRYfyhajNToQ9ofzWN iLiblSGx2QDq+CauD3qdu6z7DLlqEXeoM4NYM462oIPumptQj+9XZt7ftfh6FLWW 4yJEbjfnVKOba+vFdJ+E0DStwnPaxYdnrPGd53cwHZfbZh4ZmkpTM350mzHHiLTb KYKOgWd+UHZbkYeCVNYTGe30SRBiKeAecTpsVZ5HVhe7LstjByuy5stk8dytLpdV YqdKOToZfOp77XiHr8YcLLp1HHBGlr5hw73V4SDas0beCp7hqtnAqsTYyXBue4xO 4zfD4Gujr5JVOCb0crDTyOmOQY5E+y2dqFoOUF00D5AoN2vj4nfQ9ESkbqlE9BVh mgJ1izSokYlN2X8rIwGXNR5fbxRmxxfkAA4rScNRytj1KxDHyrDxrp/k8YFemxSQ qwgb/FBm0fcr69evPRzovKwZFhcyPremksluHQE4rZZ66qBQ2cGuDJPE7PWVTpPH 67JCrIVK/O6n5p+4ilFHmQQ3aP3ol0frMFcboYVRchJ2MhIDTsfFL3F/tTK8hy86 AmrrdQ1BQIAoKNOKnAmOSOUdExM55OcfPmX69+AhEk2GeWP6kgz5Pks4H3qCiKGf YoRk8F1V+N4q+C0mFFovB61bNQ6COIlBuzmD9EtmpDD/Ta3Wib+3ZnoGVIdPS+OI jyj+qJxd9z4= =kH+r -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches - Make blockdev-reopen stable - Remove deprecated qemu-img backing file without format - rbd: Convert to coroutines and add write zeroes support - rbd: Updated MAINTAINERS - export/fuse: Allow other users access to the export - vhost-user: Fix backends without multiqueue support - Fix drive-backup transaction endless drained section # gpg: Signature made Fri 09 Jul 2021 13:49:22 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (28 commits) block: Make blockdev-reopen stable API iotests: Test reopening multiple devices at the same time block: Support multiple reopening with x-blockdev-reopen block: Acquire AioContexts during bdrv_reopen_multiple() block: Add bdrv_reopen_queue_free() qcow2: Fix dangling pointer after reopen for 'file' qemu-img: Improve error for rebase without backing format qemu-img: Require -F with -b backing image qcow2: Prohibit backing file changes in 'qemu-img amend' blockdev: fix drive-backup transaction endless drained section vhost-user: Fix backends without multiqueue support MAINTAINERS: add block/rbd.c reviewer block/rbd: fix type of task->complete iotests/fuse-allow-other: Test allow-other iotests/308: Test +w on read-only FUSE exports export/fuse: Let permissions be adjustable export/fuse: Give SET_ATTR_SIZE its own branch export/fuse: Add allow-other option export/fuse: Pass default_permissions for mount util/uri: do not check argument of uri_free() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-07-10 19:55:21 +01:00
Gerd Hoffmann	f8ade0dc01	modules: add block module annotations Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Jose R. Ziviani <jziviani@suse.de> Message-Id: <20210624103836.2382472-14-kraxel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-07-09 18:20:27 +02:00
Paolo Bonzini	63a7f85306	meson: fix missing preprocessor symbols While most libraries do not need a CONFIG_* symbol because the "when:" clauses are enough, some do. Add them back or stop using them if possible. In the case of libpmem, the statement to add the CONFIG_* symbol was still in configure, but could not be triggered because it checked for "no" instead of "disabled" (and it would be wrong anyway since the test for the library has not been done yet). Reported-by: Li Zhijian <lizhijian@cn.fujitsu.com> Fixes: `587d59d6cc` ("configure, meson: convert virgl detection to meson", 2021-07-06) Fixes: `83ef16821a` ("configure, meson: convert libdaxctl detection to meson", 2021-07-06) Fixes: `e36e8c70f6` ("configure, meson: convert libpmem detection to meson", 2021-07-06) Fixes: `53c22b68e3` ("configure, meson: convert liburing detection to meson", 2021-07-06) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-07-09 18:19:00 +02:00
Kevin Wolf	6cf42ca2f9	block: Acquire AioContexts during bdrv_reopen_multiple() As the BlockReopenQueue can contain nodes in multiple AioContexts, only one of which may be locked when AIO_WAIT_WHILE() can be called, we can't let the caller lock the right contexts. Instead, individually lock the AioContext of a single node when iterating the queue. Reintroduce bdrv_reopen() as a wrapper for reopening a single node that drains the node and temporarily drops the AioContext lock for bdrv_reopen_multiple(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210708114709.206487-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 13:19:11 +02:00
Kevin Wolf	bcfd86d6a6	qcow2: Fix dangling pointer after reopen for 'file' Without an external data file, s->data_file is a second pointer with the same value as bs->file. When changing bs->file to a different BdrvChild and freeing the old BdrvChild, s->data_file must also be updated, otherwise it points to freed memory and causes crashes. This problem was caught by iotests case 245. Fixes: df2b7086f169239ebad5d150efa29c9bb6d4f820 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210708114709.206487-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 13:19:11 +02:00
Eric Blake	5a385bf5c5	qcow2: Prohibit backing file changes in 'qemu-img amend' This was deprecated back in `bc5ee6da7` (qcow2: Deprecate use of qemu-img amend to change backing file), and no one in the meantime has given any reasons why it should be supported. Time to make change attempts a hard error (but for convenience, specifying the _same_ backing chain is not forbidden). Update a couple of iotests to match. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20210503213600.569128-2-eblake@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Peter Lieven	64cc845bdb	block/rbd: fix type of task->complete task->complete is a bool not an integer. Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <20210707180449.32665-1-pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Max Reitz	6aeeaed29c	export/fuse: Let permissions be adjustable Allow changing the file mode, UID, and GID through SETATTR. Without allow_other, UID and GID are not allowed to be changed, because it would not make sense. Also, changing group or others' permissions is not allowed either. For read-only exports, +w cannot be set. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210625142317.271673-5-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Max Reitz	9bad96a8cc	export/fuse: Give SET_ATTR_SIZE its own branch In order to support changing other attributes than the file size in fuse_setattr(), we have to give each its own independent branch. This also applies to the only attribute we do support right now. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210625142317.271673-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Max Reitz	8fc54f9428	export/fuse: Add allow-other option Without the allow_other mount option, no user (not even root) but the one who started qemu/the storage daemon can access the export. Allow users to configure the export such that such accesses are possible. While allow_other is probably what users want, we cannot make it an unconditional default, because passing it is only possible (for non-root users) if the global fuse.conf configuration file allows it. Thus, the default is an 'auto' mode, in which we first try with allow_other, and then fall back to without. FuseExport.allow_other reports whether allow_other was actually used as a mount option or not. Currently, this information is not used, but a future patch will let this field decide whether e.g. an export's UID and GID can be changed through chmod. One notable thing about 'auto' mode is that libfuse may print error messages directly to stderr, and so may fusermount (which it executes). Our export code cannot really filter or hide them. Therefore, if 'auto' fails its first attempt and has to fall back, fusermount will print an error message that mounting with allow_other failed. This behavior necessitates a change to iotest 308, namely we need to filter out this error message (because if the first attempt at mounting with allow_other succeeds, there will be no such message). Furthermore, common.rc's _make_test_img should use allow-other=off for FUSE exports, because iotests generally do not need to access images from other users, so allow-other=on or allow-other=auto have no advantage. OTOH, allow-other=on will not work on systems where user_allow_other is disabled, and with allow-other=auto, we get said error message that we would need to filter out again. Just disabling allow-other is simplest. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210625142317.271673-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Max Reitz	2c7dd057aa	export/fuse: Pass default_permissions for mount We do not do any permission checks in fuse_open(), so let the kernel do them. We already let fuse_getattr() report the proper UNIX permissions, so this should work the way we want. This causes a change in 308's reference output, because now opening a non-writable export with O_RDWR fails already, instead of only actually attempting to write to it. (That is an improvement.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210625142317.271673-2-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Heinrich Schuchardt	c2615bdfbd	util/uri: do not check argument of uri_free() uri_free() checks if its argument is NULL in uri_clean() and g_free(). There is no need to check the argument before the call. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Message-Id: <20210629063602.4239-1-xypron.glpk@gmx.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Peter Lieven	eb06cbab7e	block/rbd: drop qemu_rbd_refresh_limits librbd supports 1 byte alignment for all aio operations. Currently, there is no API call to query limits from the Ceph ObjectStore backend. So drop the bdrv_refresh_limits completely until there is such an API call. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-7-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Peter Lieven	c56ac27d2a	block/rbd: add write zeroes support This patch wittingly sets BDRV_REQ_NO_FALLBACK and silently ignores BDRV_REQ_MAY_UNMAP for older librbd versions. The rationale for this is as follows (citing Ilya Dryomov current RBD maintainer): ---8<--- a) remove the BDRV_REQ_MAY_UNMAP check in qemu_rbd_co_pwrite_zeroes() and as a consequence always unmap if librbd is too old It's not clear what qemu's expectation is but in general Write Zeroes is allowed to unmap. The only guarantee is that subsequent reads return zeroes, everything else is a hint. This is how it is specified in the kernel and in the NVMe spec. In particular, block/nvme.c implements it as follows: if (flags & BDRV_REQ_MAY_UNMAP) { cdw12 \|= (1 << 25); } This sets the Deallocate bit. But if it's not set, the device may still deallocate: """ If the Deallocate bit (CDW12.DEAC) is set to '1' in a Write Zeroes command, and the namespace supports clearing all bytes to 0h in the values read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from a deallocated logical block and its metadata (excluding protection information), then for each specified logical block, the controller: - should deallocate that logical block; ... If the Deallocate bit is cleared to '0' in a Write Zeroes command, and the namespace supports clearing all bytes to 0h in the values read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from a deallocated logical block and its metadata (excluding protection information), then, for each specified logical block, the controller: - may deallocate that logical block; """ https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-2021.06.02-Ratified-1.pdf b) set BDRV_REQ_NO_FALLBACK in supported_zero_flags Again, it's not clear what qemu expects here, but without it we end up in a ridiculous situation where specifying the "don't allow slow fallback" switch immediately fails all efficient zeroing requests on a device where Write Zeroes is always efficient: $ qemu-io -c 'help write' \| grep -- '-[zun]' -n, -- with -z, don't allow slow fallback -u, -- with -z, allow unmapping -z, -- write zeroes using blk_co_pwrite_zeroes $ qemu-io -f rbd -c 'write -z -u -n 0 1M' rbd:foo/bar write failed: Operation not supported --->8--- Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-6-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Peter Lieven	c3e5fac534	block/rbd: migrate from aio to coroutines Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-5-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Peter Lieven	6d9214189e	block/rbd: update s->image_size in qemu_rbd_getlength While at it just call rbd_get_size and avoid rbd_image_info_t. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-4-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Peter Lieven	832a93dcb8	block/rbd: store object_size in BDRVRBDState Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-3-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Peter Lieven	48672ac058	block/rbd: bump librbd requirement to luminous release Ceph Luminous (version 12.2.z) is almost 4 years old at this point. Bump the requirement to get rid of the ifdef'ry in the code. Qemu 6.1 dropped the support for RHEL-7 which was the last supported OS that required an older librbd. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-2-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Or Ozeri	42e4ac9ef5	block/rbd: Add support for rbd image encryption Starting from ceph Pacific, RBD has built-in support for image-level encryption. Currently supported formats are LUKS version 1 and 2. There are 2 new relevant librbd APIs for controlling encryption, both expect an open image context: rbd_encryption_format: formats an image (i.e. writes the LUKS header) rbd_encryption_load: loads encryptor/decryptor to the image IO stack This commit extends the qemu rbd driver API to support the above. Signed-off-by: Or Ozeri <oro@il.ibm.com> Message-Id: <20210627114635.39326-1-oro@il.ibm.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-07-09 12:26:05 +02:00
Akihiko Odaki	9f460c64e1	block/io: Merge discard request alignments Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com> Message-id: 20210705130458.97642-3-akihiko.odaki@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-07-06 14:28:55 +01:00
Akihiko Odaki	0dfc7af2b2	block/file-posix: Optimize for macOS This commit introduces "punch hole" operation and optimizes transfer block size for macOS. Thanks to Konstantin Nazarov for detailed analysis of a flaw in an old version of this change: https://gist.github.com/akihikodaki/87df4149e7ca87f18dc56807ec5a1bc5#gistcomment-3654667 Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com> Message-id: 20210705130458.97642-1-akihiko.odaki@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-07-06 14:28:55 +01:00
Peter Maydell	9c2647f750	Block layer patches - Supporting changing 'file' in x-blockdev-reopen - ssh: add support for sha256 host key fingerprints - vhost-user-blk: Implement reconnection during realize - introduce QEMU_AUTO_VFREE - Don't require password of encrypted backing file for image creation - Code cleanups -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmDclTcRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9bzaw/+PYQ9vPG+ZROWl633TUOQu7IYGZynXCET ZHlV2JlXnFH8QoO8A53U72cgVg+GlwDpiMCCtjEGMG1yfMBNe+DXR1wFUMDne1Gs qIFX4gIVpPGDi3gPeQvefLCwoN8VXwIxvCJj40YR9BY0cqQR6joI81kjVlqKemqB cHG4qJHmnphihSLep/dd2BRInkeHWXWU63jK4d7ctdCwHZPNDf0u0FEraxEqS/d0 5tfdIl3haghhoDRRamyh5bLZBCW1KlpfTR98RdspcwpSlXq/FgV5N9CFa15XSYfQ rinSClSIOpLzG90+tBihROVBXvbugO5qZVQTG0Yg1tt4FGG8Cmiqf9MNXC5yctNg WnaQQipx/37deafGA4jqorZBJd1R87JLJTBFTpkB47XAFq/ltqsTDhrrfdS+jail Fd+qyqWg0Jx3JjdhSUpHvDKBBsErjoxtoyQIGakSreXGmj2UY6BFmGii7lnLZNLo +E81C7exnkCIGKkOHy+y9DkpVY/PEJKCG7uwcyy+F2qOqGUOxKLuZomWcLodo6Vf /eJ/UsLJt6HhXhXq/1ZZHmaORn8Lft1yr/9azoGXZ7er+jZcbEkhbcZmET+Y6ykq Vox/GmLkhyVkM96MA0lMW5hHPWUbF29m9Jmq3nNfvFWBcILEs4uWSlbd0M2oAmWj ung9sKIV/8s= =aB0a -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches - Supporting changing 'file' in x-blockdev-reopen - ssh: add support for sha256 host key fingerprints - vhost-user-blk: Implement reconnection during realize - introduce QEMU_AUTO_VFREE - Don't require password of encrypted backing file for image creation - Code cleanups # gpg: Signature made Wed 30 Jun 2021 17:00:55 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (24 commits) vhost-user-blk: Implement reconnection during realize vhost-user-blk: Factor out vhost_user_blk_realize_connect() vhost: Distinguish errors in vhost_dev_get_config() vhost-user-blk: Add Error parameter to vhost_user_blk_start() vhost: Return 0/-errno in vhost_dev_init() vhost: Distinguish errors in vhost_backend_init() vhost: Add Error parameter to vhost_dev_init() block/ssh: add support for sha256 host key fingerprints block/commit: use QEMU_AUTO_VFREE introduce QEMU_AUTO_VFREE iotests: Test replacing files with x-blockdev-reopen block: Allow changing bs->file on reopen block: BDRVReopenState: drop replace_backing_bs field block: move supports_backing check to bdrv_set_file_or_backing_noperm() block: bdrv_reopen_parse_backing(): simplify handling implicit filters block: bdrv_reopen_parse_backing(): don't check frozen child block: bdrv_reopen_parse_backing(): don't check aio context block: introduce bdrv_set_file_or_backing_noperm() block: introduce bdrv_remove_file_or_backing_child() block: comment graph-modifying function not updating permissions ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-07-02 11:46:32 +01:00
Daniel P. Berrangé	bf783261f0	block/ssh: add support for sha256 host key fingerprints Currently the SSH block driver supports MD5 and SHA1 for host key fingerprints. This is a cryptographically sensitive operation and so these hash algorithms are inadequate by modern standards. This adds support for SHA256 which has been supported in libssh since the 0.8.1 release. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210622115156.138458-1-berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-30 12:45:32 +02:00
Philippe Mathieu-Daudé	7b3b616838	block/nbd: Use qcrypto_tls_creds_check_endpoint() Avoid accessing QCryptoTLSCreds internals by using the qcrypto_tls_creds_check_endpoint() helper. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-29 18:29:47 +01:00
Vladimir Sementsov-Ogievskiy	7170170866	block/commit: use QEMU_AUTO_VFREE Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210628121133.193984-3-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-29 16:51:21 +02:00
Eric Blake	97efa8698e	block: Move read-only check during truncation earlier No need to start a tracked request that will always fail. The choice to check read-only after bdrv_inc_in_flight() predates `1bc5f09f2e` (block: Use tracked request for truncate), but waiting for serializing requests can make the effect more noticeable. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20210609163034.997943-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-29 16:51:00 +02:00
Peter Maydell	6512fa497c	* Some Meson test conversions * KVM dirty page ring buffer fix * KVM TSC scaling support * Fixes for SG_IO with /dev/sdX devices * (Non)support for host devices on iOS * -smp cleanups -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDV5TIUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNySgf9HMnAtLWp36p2ie74o4rrW9x3Ojrm fuCq2i3q3nBhEKqqiyp+QQJGubE44mXEZQYtX89tOfSFgg7o6SLIoAcQQskr+In6 f9I1jjpSVTls0AaGUO+iRn9KiTzeMWeo1l6Wht+2mfBL5XpNLaLLu/T49uPhjlvN zFi5blgILxIYMqMCD1joDBnIiqqDozr0p7QzRZD8re25sRhg0NHQxyIh3OxBPpJ9 3Jhy1Us0cDWrwvPbxz6S5N0zesLu1ojtojVPy6iKjyHSv+6eiE6bHyIbS8duG5+H zBC1THOsUV3X1UvPAjuSNlgfNeobGAzmxSJ/evLgWWkpkx1mLtsnL5RARQ== =YoOL -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging * Some Meson test conversions * KVM dirty page ring buffer fix * KVM TSC scaling support * Fixes for SG_IO with /dev/sdX devices * (Non)support for host devices on iOS * -smp cleanups # gpg: Signature made Fri 25 Jun 2021 15:16:18 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (28 commits) machine: reject -smp dies!=1 for non-PC machines machine: pass QAPI struct to mc->smp_parse machine: add error propagation to mc->smp_parse machine: move common smp_parse code to caller machine: move dies from X86MachineState to CpuTopology file-posix: handle EINTR during ioctl block: detect DKIOCGETBLOCKCOUNT/SIZE before use block: try BSD disk size ioctls one after another block: check for sys/disk.h block: feature detection for host block support file-posix: try BLKSECTGET on block devices too, do not round to power of 2 block: add max_hw_transfer to BlockLimits block-backend: align max_transfer to request alignment osdep: provide ROUND_DOWN macro scsi-generic: pass max_segments via max_iov field in BlockLimits file-posix: fix max_iov for /dev/sg devices KVM: Fix dirty ring mmap incorrect size due to renaming accident configure, meson: convert libusbredir detection to meson configure, meson: convert libcacard detection to meson configure, meson: convert libusb detection to meson ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-06-28 21:04:22 +01:00
Peter Maydell	9e654e1019	block: Make block-copy API thread-safe -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEi5wmzbL9FHyIDoahVh8kwfGfefsFAmDVzrgACgkQVh8kwfGf efuoKRAAsHqE46P2xwjjPROtwZC6HP/Ny5xfbF2CglfC4eX4ff89dwWSagjHfBig 1TQjHemOo5Fs8gyBGec0WPn0HaBVFthq75VGObt+QkHy7+owGDmtVghkXnEO8z2c gldPoVeOXAbU4DZXgITQYfN1ljCOdrdKQMrMYmKqXLmw/vMzAfKlBKqsOFQiyVys 4egn25QxyiOh+T29zyVwmGVABaH2TIuJqkDr+iMh4IypYZf0BlqJRw++kLhGdxGq RIpXQiXExy3lRm8htuh+GDAigSXNz93XKU3ZHe8RPBtPfdS0siGWwVTW2nLsH+Rc vfUvfkqBSGcFaFjGHg/eNGvoMTYAZKHx8yq72voqrfFIPtz3NAoS2ahz/hIU/NiL YLOmOcRZVx+xJ5lxjaxi/SvbTHVtZoBym/Aje/YiT3v4A8rziG6BeBUP9ec/bk+D YYxMZNfzW2jVq0Vl4TZyYmV8e/H8Ha3HbLLJip3tiLXsBIjajfNti9iJ/xac5NzD jV5pR27yIXilYHPR7GCYaMRp5LJv8uGiW704yAt2dwBizqonm7cgTg5Fcx0HFDyk +HyPwu/TGI3cEF7dl+8V+AmwKug+jFj3VFVh5UMtf4bqNeliYNx+QpCUuWPuPMFc U1nXWYBtcklD4JNPERUgUW7x2tmAJN5dGDY0LAa2brrOnKVTTwU= =JXBO -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/vsementsov/tags/pull-jobs-2021-06-25' into staging block: Make block-copy API thread-safe # gpg: Signature made Fri 25 Jun 2021 13:40:24 BST # gpg: using RSA key 8B9C26CDB2FD147C880E86A1561F24C1F19F79FB # gpg: Good signature from "Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 8B9C 26CD B2FD 147C 880E 86A1 561F 24C1 F19F 79FB * remotes/vsementsov/tags/pull-jobs-2021-06-25: block-copy: atomic .cancelled and .finished fields in BlockCopyCallState block-copy: add CoMutex lock block-copy: move progress_set_remaining in block_copy_task_end block-copy: streamline choice of copy_range vs. read/write block-copy: small refactor in block_copy_task_entry and block_copy_common co-shared-resource: protect with a mutex progressmeter: protect with a mutex blockjob: let ratelimit handle a speed of 0 block-copy: let ratelimit handle a speed of 0 ratelimit: treat zero speed as unlimited Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-06-28 18:58:19 +01:00
Emanuele Giuseppe Esposito	149009bef4	block-copy: atomic .cancelled and .finished fields in BlockCopyCallState By adding acquire/release pairs, we ensure that .ret and .error_is_read fields are written by block_copy_dirty_clusters before .finished is true, and that they are read by API user after .finished is true. The atomic here are necessary because the fields are concurrently modified in coroutines, and read outside. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-6-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:33:51 +03:00
Emanuele Giuseppe Esposito	d0c389d2ce	block-copy: add CoMutex lock Group various structures fields, to better understand what we need to protect with a lock and what doesn't need it. Then, add a CoMutex to protect concurrent access of block-copy data structures. This mutex also protects .copy_bitmap, because its thread-safe API does not prevent it from assigning two tasks to the same bitmap region. Exceptions to the lock: - .sleep_state is handled in the series "coroutine: new sleep/wake API" and thus here left as TODO. - .finished, .cancelled and reads to .ret and .error_is_read will be protected in the following patch, because are used also outside coroutines. - .skip_unallocated is atomic. Including it under the mutex would increase the critical sections and make them also much more complex. We can have it as atomic since it is only written from outside and read by block-copy coroutines. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-5-eesposit@redhat.com> [vsementsov: fix typo in comment] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:33:39 +03:00
Emanuele Giuseppe Esposito	e3dd339fee	block-copy: move progress_set_remaining in block_copy_task_end Moving this function in task_end ensures to update the progress anyways, even if there is an error. It also helps in next patch, allowing task_end to have only one critical section. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-4-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:33:35 +03:00
Paolo Bonzini	05d5e12b24	block-copy: streamline choice of copy_range vs. read/write Put the logic to determine the copy size in a separate function, so that there is a simple state machine for the possible methods of copying data from one BlockDriverState to the other. Use .method instead of .copy_range as in-out argument, and include also .zeroes as an additional copy method. While at it, store the common computation of block_copy_max_transfer into a new field of BlockCopyState, and make sure that we always obey max_transfer; that's more efficient even for the COPY_RANGE_READ_WRITE case. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210624072043.180494-3-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:33:33 +03:00
Emanuele Giuseppe Esposito	c6a3e3df30	block-copy: small refactor in block_copy_task_entry and block_copy_common Use a local variable instead of referencing BlockCopyState through a BlockCopyCallState or BlockCopyTask every time. This is in preparation for next patches. No functional change intended. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-2-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:32:09 +03:00
Emanuele Giuseppe Esposito	a7b4f8fc09	progressmeter: protect with a mutex Progressmeter is protected by the AioContext mutex, which is taken by the block jobs and their caller (like blockdev). We would like to remove the dependency of block layer code on the AioContext mutex, since most drivers and the core I/O code are already not relying on it. Create a new C file to implement the ProgressMeter API, but keep the struct as public, to avoid forcing allocation on the heap. Also add a mutex to be able to provide an accurate snapshot of the progress values to the caller. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210614081130.22134-5-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:24:24 +03:00
Paolo Bonzini	ca657c99e6	block-copy: let ratelimit handle a speed of 0 Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210614081130.22134-3-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-25 14:24:16 +03:00
Paolo Bonzini	bd80936a4f	file-posix: handle EINTR during ioctl Similar to other handle_aiocb_* functions, handle_aiocb_ioctl needs to cater for the possibility that ioctl is interrupted by a signal. Otherwise, the I/O is incorrectly reported as a failure to the guest. Reported-by: Gordon Watson <gwatson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Joelle van Dyne	09e20abdda	block: detect DKIOCGETBLOCKCOUNT/SIZE before use iOS hosts do not have these defined so we fallback to the default behaviour. Co-authored-by: Warner Losh <imp@bsdimp.com> Signed-off-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Paolo Bonzini	267cd53f5f	block: try BSD disk size ioctls one after another Try all the possible ioctls for disk size as long as they are supported, to keep the #if ladder simple. Extracted and cleaned up from a patch by Joelle van Dyne and Warner Losh. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Joelle van Dyne	14176c8d05	block: feature detection for host block support On Darwin (iOS), there are no system level APIs for directly accessing host block devices. We detect this at configure time. Signed-off-by: Joelle van Dyne <j@getutm.app> Message-Id: <20210315180341.31638-2-j@getutm.app> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Paolo Bonzini	18473467d5	file-posix: try BLKSECTGET on block devices too, do not round to power of 2 bs->sg is only true for character devices, but block devices can also be used with scsi-block and scsi-generic. Unfortunately BLKSECTGET returns bytes in an int for /dev/sgN devices, and sectors in a short for block devices, so account for that in the code. The maximum transfer also need not be a power of 2 (for example I have seen disks with 1280 KiB maximum transfer) so there's no need to pass the result through pow2floor. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Paolo Bonzini	24b36e9813	block: add max_hw_transfer to BlockLimits For block host devices, I/O can happen through either the kernel file descriptor I/O system calls (preadv/pwritev, io_submit, io_uring) or the SCSI passthrough ioctl SG_IO. In the latter case, the size of each transfer can be limited by the HBA, while for file descriptor I/O the kernel is able to split and merge I/O in smaller pieces as needed. Applying the HBA limits to file descriptor I/O results in more system calls and suboptimal performance, so this patch splits the max_transfer limit in two: max_transfer remains valid and is used in general, while max_hw_transfer is limited to the maximum hardware size. max_hw_transfer can then be included by the scsi-generic driver in the block limits page, to ensure that the stricter hardware limit is used. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Paolo Bonzini	b99f7fa08a	block-backend: align max_transfer to request alignment Block device requests must be aligned to bs->bl.request_alignment. It makes sense for drivers to align bs->bl.max_transfer the same way; however when there is no specified limit, blk_get_max_transfer just returns INT_MAX. Since the contract of the function does not specify that INT_MAX means "no maximum", just align the outcome of the function (whether INT_MAX or bs->bl.max_transfer) before returning it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-25 10:54:13 +02:00
Paolo Bonzini	01ef8185b8	scsi-generic: pass max_segments via max_iov field in BlockLimits I/O to a disk via read/write is not limited by the number of segments allowed by the host adapter; the kernel can split requests if needed, and the limit imposed by the host adapter can be very low (256k or so) to avoid that SG_IO returns EINVAL if memory is heavily fragmented. Since this value is only interesting for SG_IO-based I/O, do not include it in the max_transfer and only take it into account when patching the block limits VPD page in the scsi-generic device. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2021-06-25 10:54:12 +02:00
Paolo Bonzini	8ad5ab6148	file-posix: fix max_iov for /dev/sg devices Even though it was only called for devices that have bs->sg set (which must be character devices), sg_get_max_segments looked at /sys/dev/block which only works for block devices. On Linux the sg driver has its own way to provide the maximum number of iovecs in a scatter/gather list, so add support for it. The block device path is kept because it will be reinstated in the next patches. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2021-06-25 10:54:12 +02:00
Max Reitz	32a9a245d7	block/snapshot: Clarify goto fallback behavior In the bdrv_snapshot_goto() fallback code, we work with a pointer to either bs->file or bs->backing. We detach that child, close the node (with .bdrv_close()), apply the snapshot on the child node, and then re-open the node (with .bdrv_open()). In order for .bdrv_open() to attach the same child node that we had before, we pass "file={child-node}" or "backing={child-node}" to it. Therefore, when .bdrv_open() has returned success, we can assume that bs->file or bs->backing (respectively) points to our original child again. This is verified by an assertion. All of this is not immediately clear from a quick glance at the code, so add a comment to the assertion what it is for, and why it is valid. It certainly confused Coverity. Reported-by: Coverity (CID 1452774) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210503095418.31521-1-mreitz@redhat.com> [mreitz: s/close/detach/] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-06-24 09:49:04 +02:00
Vladimir Sementsov-Ogievskiy	bbfb7c2f35	block/nbd: safer transition to receiving request req->receiving is a flag of request being in one concrete yield point in nbd_co_do_receive_one_chunk(). Such kind of boolean flag is always better to unset before scheduling the coroutine, to avoid double scheduling. So, let's be more careful. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-33-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 12:21:22 -05:00
Vladimir Sementsov-Ogievskiy	91e0998f5a	block/nbd: add nbd_client_connected() helper We already have two similar helpers for other state. Let's add another one for convenience. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-32-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 12:21:22 -05:00
Vladimir Sementsov-Ogievskiy	a71d597b98	block/nbd: reuse nbd_co_do_establish_connection() in nbd_open() The only last step we need to reuse the function is coroutine-wrapper. nbd_open() may be called from non-coroutine context. So, generate the wrapper and use it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-31-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 12:21:22 -05:00
Vladimir Sementsov-Ogievskiy	97cf89259e	nbd/client-connection: add option for non-blocking connection attempt We'll need a possibility of non-blocking nbd_co_establish_connection(), so that it returns immediately, and it returns success only if a connections was previously established in background. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-30-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 12:21:22 -05:00
Vladimir Sementsov-Ogievskiy	51edbf537d	block/nbd: split nbd_co_do_establish_connection out of nbd_reconnect_attempt Split out the part that we want to reuse for nbd_open(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-29-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 12:21:21 -05:00
Vladimir Sementsov-Ogievskiy	43cb34dede	nbd/client-connection: return only one io channel block/nbd doesn't need underlying sioc channel anymore. So, we can update nbd/client-connection interface to return only one top-most io channel, which is more straight forward. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-27-vsementsov@virtuozzo.com> [eblake: squash in Vladimir's fixes for uninit usage caught by clang] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 12:20:53 -05:00
Vladimir Sementsov-Ogievskiy	95a078ea3e	block/nbd: drop BDRVNBDState::sioc Currently sioc pointer is used just to pass from socket-connection to nbd negotiation. Drop the field, and use local variables instead. With next commit we'll update nbd/client-connection.c to behave appropriately (return only top-most ioc, not two channels). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-26-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:54 -05:00
Vladimir Sementsov-Ogievskiy	c2405af0e4	block/nbd: don't touch s->sioc in nbd_teardown_connection() Negotiation during reconnect is now done in a thread, and s->sioc is not available during negotiation. Negotiation in thread will be cancelled by nbd_client_connection_release() called from nbd_clear_bdrvstate(). So, we don't need this code chunk anymore. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-25-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:54 -05:00
Vladimir Sementsov-Ogievskiy	6d2b0332d3	block/nbd: use negotiation of NBDClientConnection Now that we can opt in to negotiation as part of the client connection thread, use that to simplify connection_co. This is another step on the way to moving all reconnect code into NBDClientConnection. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-24-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:54 -05:00
Vladimir Sementsov-Ogievskiy	e9ba7788b0	block/nbd: split nbd_handle_updated_info out of nbd_client_handshake() To be reused in the following patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-23-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:54 -05:00
Vladimir Sementsov-Ogievskiy	130d49baa5	nbd/client-connection: add possibility of negotiation Add arguments and logic to support nbd negotiation in the same thread after successful connection. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-20-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	5276c87c12	nbd: move connection code from block/nbd to nbd/client-connection We now have bs-independent connection API, which consists of four functions: nbd_client_connection_new() nbd_client_connection_release() nbd_co_establish_connection() nbd_co_establish_connection_cancel() Move them to a separate file together with NBDClientConnection structure which becomes private to the new API. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-18-vsementsov@virtuozzo.com> [eblake: comment tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	248d470198	block/nbd: introduce nbd_client_connection_release() This is a last step of creating bs-independent nbd connection interface. With next commit we can finally move it to separate file. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-17-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	f68729747d	block/nbd: introduce nbd_client_connection_new() This is a step of creating bs-independent nbd connection interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-16-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	90ddc64fb2	block/nbd: rename NBDConnectThread to NBDClientConnection We are going to move the connection code to its own file, and want clear names and APIs first. The structure is shared between user and (possibly) several runs of connect-thread. So it's wrong to call it "thread". Let's rename to something more generic. Appropriately rename connect_thread and thr variables to conn. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-15-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	c3e7730485	block/nbd: make nbd_co_establish_connection_cancel() bs-independent nbd_co_establish_connection_cancel() actually needs only pointer to NBDConnectThread. So, make it clean. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-14-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	d33833d7af	block/nbd: bs-independent interface for nbd_co_establish_connection() We are going to split connection code to a separate file. Now we are ready to give nbd_co_establish_connection() clean and bs-independent interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-13-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	b8e8a3d116	block/nbd: drop thr->state We don't need all these states. The code refactored to use two boolean variables looks simpler. While moving the comment in nbd_co_establish_connection() rework it to give better information. Also, we are going to move the connection code to separate file and mentioning drained section would be confusing. Improve also the comment in NBDConnectThread, while dropping removed state names from it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-12-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: comment tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	08ea55d068	block/nbd: simplify waking of nbd_co_establish_connection() Instead of managing connect_bh, bh_ctx, and wait_connect fields, we can use a single link to the waiting coroutine with proper mutex protection. So new logic is: nbd_co_establish_connection() sets wait_co under the mutex, releases the mutex, then yield()s. Note that wait_co may be scheduled by the thread immediately after unlocking the mutex. Still, the main thread (or iothread) will not reach the code for entering the coroutine until the yield(), so we are safe. connect_thread_func() and nbd_co_establish_connection_cancel() do the following to handle wait_co: Under the mutex, if thr->wait_co is not NULL, make it NULL and schedule it. This way, we avoid scheduling the coroutine twice. Still scheduling is a bit different: In connect_thread_func() we can just call aio_co_wake under mutex, after commit [async: the main AioContext is only "current" if under the BQL] we are sure that aio_co_wake() will not try to acquire the aio context and do qemu_aio_coroutine_enter() but simply schedule the coroutine by aio_co_schedule(). nbd_co_establish_connection_cancel() will be called from non-coroutine context in further patch and will be able to go through qemu_aio_coroutine_enter() path of aio_co_wake(). So keep current behavior of waking the coroutine after the critical section. Also, this commit reduces the dependence of nbd_co_establish_connection() on the internals of bs (we now use a generic pointer to the coroutine, instead of direct use of s->connection_co). This is a step towards splitting the connection API out of nbd.c. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-11-vsementsov@virtuozzo.com> Reviewied-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	2def3edb4b	block/nbd: BDRVNBDState: drop unused connect_err and connect_status These fields are write-only. Drop them. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-10-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	2a25def4be	block/nbd: nbd_client_handshake(): fix leak of s->ioc Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Message-Id: <20210610100802.5888-9-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Roman Kagan	e8b35bf5dc	block/nbd: ensure ->connection_thread is always valid Simplify lifetime management of BDRVNBDState->connect_thread by delaying the possible cleanup of it until the BDRVNBDState itself goes away. This also reverts `0267101af6` "block/nbd: fix possible use after free of s->connect_thread" as now s->connect_thread can't be cleared until the very end. Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> [vsementsov: rebase, revert `0267101af6` changes] Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: tweak comment] Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-8-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	6cc702beac	block/nbd: call socket_address_parse_named_fd() in advance Detecting monitor by current coroutine works bad when we are not in coroutine context. And that's exactly so in nbd reconnect code, where qio_channel_socket_connect_sync() is called from thread. Monitor is needed only to parse named file descriptor. So, let's just parse it during nbd_open(), so that all further users of s->saddr don't need to access monitor. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-7-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	fb392b548e	block/nbd: connect_thread_func(): do qio_channel_set_delay(false) nbd_open() does it (through nbd_establish_connection()). Actually we lost that call on reconnect path in `1dc4718d84` "block/nbd: use non-blocking connect: fix vm hang on connect()" when we have introduced reconnect thread. Fixes: `1dc4718d84` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210610100802.5888-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Vladimir Sementsov-Ogievskiy	bbba1c376b	block/nbd: fix how state is cleared on nbd_open() failure paths We have two "return error" paths in nbd_open() after nbd_process_options(). Actually we should call nbd_clear_bdrvstate() on these paths. Interesting that nbd_process_options() calls nbd_clear_bdrvstate() by itself. Let's fix leaks and refactor things to be more obvious: - intialize yank at top of nbd_open() - move yank cleanup to nbd_clear_bdrvstate() - refactor nbd_open() so that all failure paths except for yank-register goes through nbd_clear_bdrvstate() Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-4-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:53 -05:00
Roman Kagan	3687ad4903	block/nbd: fix channel object leak nbd_free_connect_thread leaks the channel object if it hasn't been stolen. Unref it and fix the leak. Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210610100802.5888-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-06-18 10:59:52 -05:00
Daniel P. Berrangé	39683553f9	block: use GDateTime for formatting timestamp when dumping snapshot info The GDateTime APIs provided by GLib avoid portability pitfalls, such as some platforms where 'struct timeval.tv_sec' field is still 'long' instead of 'time_t'. When combined with automatic cleanup, GDateTime often results in simpler code too. Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-14 13:28:50 +01:00
Daniel P. Berrangé	99be1ac366	block: remove duplicate trace.h include Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-14 13:28:50 +01:00
Daniel P. Berrangé	60ff2ae2a2	block: add trace point when fdatasync fails A flush failure is a critical failure scenario for some operations. For example, it will prevent migration from completing, as it will make vm_stop() report an error. Thus it is important to have a trace point present for debugging. Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-14 13:28:50 +01:00
Daniel P. Berrangé	c7ddc8821d	block: preserve errno from fdatasync failures When fdatasync() fails on a file backend we set a flag that short-circuits any future attempts to call fdatasync(). The first failure returns the true errno, but the later short- circuited calls return a generic EIO. The latter is unhelpful because fdatasync() can return a variety of errnos, including EACCESS. Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-14 13:28:50 +01:00
Paolo Bonzini	7fa1c63553	iscsi: link libm into the module Depending on the configuration of QEMU, some binaries might not need libm at all. In that case libiscsi, which uses exp(), will fail to load. Link it in the module explicitly. Reported-by: Yi Sun <yisun@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-04 13:47:07 +02:00
Paolo Bonzini	96acfb1f25	meson: allow optional dependencies for block modules Right now all dependencies for block modules are passed to module_ss.add(when: ...), so they are mandatory. In the next patch we will need to add a libm dependency to a module, but libm does not exist on all systems. So, modify the creation of module_ss and modsrc so that dependencies can also be passed to module_ss.add(if_true: ...). While touching the array, remove the useless dependency of the curl module on glib. glib is always linked in QEMU and in fact all other block modules also need it, but they don't have to specify it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-04 13:47:07 +02:00
Peter Maydell	8e6dad2028	Block layer patches - NBD server: Fix crashes related to switching between AioContexts - file-posix: Workaround for discard/write_zeroes on buggy filesystems - Follow-up fixes for the reopen vs. permission changes - quorum: Fix error handling for flush - block-copy: Refactor copy_range handling - docs: Describe how to use 'null-co' block driver -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmC3iy8RHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9ZP0hAAuh07CFWLzHCcRC7PBzekSPfzRYYBLDSW EObJ1Ov4mvz8UZoP6BDJ5QVzLPhel6hXkxTd83B1D7t/Dq+yJYR0z8Kv3USpaVJ4 2U26SsoGQM8BmtVDL1Q8tQ5eDWQ4ykxNx6F2/lKBe1EH1lfaun04Xj1rNh7jpilo nNmKMDDI1UOkH0lKDR3tqfEV0XQE7o+ZKfPlIbvYMjXk9ZPKUHfjNPGVdCLQVnqH VJI01hF7eEx1ykSMdlC+TzNoVGG+mCBokGuW0JlUvOpX6FcGnAlxXQx8u53c1I8O lggZV8b2IbrNBUVwXQHLLrXxjOo+u54Ct9y/gXUTAj8qai+9jVRp60Y1pnCyeIeu DzFx10xwy04PGleb7AAZ4dT8du2+PTkuyQ9KmlvQ2U4IUcgW124CrDeO7XYr1aif hCTJPeEDrC9YNU6AQ8rLXrYUtkumSm2zUzU5nZ//i5WH41475/vsmgP5A+Jr457A Xu0yiI2Gqkr9CNsP9ZzMkNj03oIBhPFuGxiwibLQsy/6UVnaDYS0+rQ8FXYnF5+K iEpgXe3vKTWxM097kzJMBTDVRMXRa75NtK7KWXMDgVpHTbcv1t1otsn+6dfv+B55 ULJM1ETsyYS0T6BqNglvdytsraSt7JgSF+ZLHbYk3KVDshwnq/0ksgSqHNNA14ca kYTzHhMgo5w= =gSq/ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches - NBD server: Fix crashes related to switching between AioContexts - file-posix: Workaround for discard/write_zeroes on buggy filesystems - Follow-up fixes for the reopen vs. permission changes - quorum: Fix error handling for flush - block-copy: Refactor copy_range handling - docs: Describe how to use 'null-co' block driver # gpg: Signature made Wed 02 Jun 2021 14:44:15 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: docs/secure-coding-practices: Describe how to use 'null-co' block driver block-copy: refactor copy_range handling block-copy: fix block_copy_task_entry() progress update nbd/server: Use drained block ops to quiesce the server block-backend: add drained_poll block: improve permission conflict error message block: simplify bdrv_child_user_desc() block/vvfat: inherit child_vvfat_qcow from child_of_bds block: improve bdrv_child_get_parent_desc() block-backend: improve blk_root_get_parent_desc() block: document child argument of bdrv_attach_child_common() block/file-posix: Try other fallbacks after invalid FALLOC_FL_ZERO_RANGE block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS block: drop BlockBackendRootState::read_only block: drop BlockDriverState::read_only block: consistently use bdrv_is_read_only() block/vvfat: fix vvfat_child_perm crash block/vvfat: child_vvfat_qcow: add .get_parent_aio_context, fix crash qemu-io-cmds: assert that we don't have .perm requested in no-blk case block/quorum: Provide .bdrv_co_flush instead of .bdrv_co_flush_to_disk Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-06-02 19:34:03 +01:00
Vladimir Sementsov-Ogievskiy	bed9523471	block-copy: refactor copy_range handling Currently we update s->use_copy_range and s->copy_size in block_copy_do_copy(). It's not very good: 1. block_copy_do_copy() is intended to be a simple function, that wraps bdrv_co_<io> functions for need of block copy. That's why we don't pass BlockCopyTask into it. So, block_copy_do_copy() is bad place for manipulation with generic state of block-copy process 2. We are going to make block-copy thread-safe. So, it's good to move manipulation with state of block-copy to the places where we'll need critical sections anyway, to not introduce extra synchronization primitives in block_copy_do_copy(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210528141628.44287-3-vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy	8146b357d0	block-copy: fix block_copy_task_entry() progress update Don't report successful progress on failure, when call_state->ret is set. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210528141628.44287-2-vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Sergio Lopez	095cc4d0f6	block-backend: add drained_poll Allow block backends to poll their devices/users to check if they have been quiesced when entering a drained section. This will be used in the next patch to wait for the NBD server to be completely quiesced. Suggested-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Sergio Lopez <slp@redhat.com> Message-Id: <20210602060552.17433-2-slp@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy	8081f064e4	block/vvfat: inherit child_vvfat_qcow from child_of_bds Recently we've fixed a crash by adding .get_parent_aio_context handler to child_vvfat_qcow. Now we want it to support .get_parent_desc as well. child_vvfat_qcow wants to implement own .inherit_options, it's not bad. But omitting all other handlers is a bad idea. Let's inherit the class from child_of_bds instead, similar to chain_child_class and detach_by_driver_cb_class in test-bdrv-drain.c. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210601075218.79249-5-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy	fd240a184b	block-backend: improve blk_root_get_parent_desc() We have different types of parents: block nodes, block backends and jobs. So, it makes sense to specify type together with name. While being here also use g_autofree. iotest 307 output is updated. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210601075218.79249-3-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Thomas Huth	fa95e9fbab	block/file-posix: Try other fallbacks after invalid FALLOC_FL_ZERO_RANGE If fallocate(... FALLOC_FL_ZERO_RANGE ...) returns EINVAL, it's likely an indication that the file system is buggy and does not implement unaligned accesses right. We still might be lucky with the other fallback fallocate() calls later in this function, though, so we should not return immediately and try the others first. Since FALLOC_FL_ZERO_RANGE could also return EINVAL if the file descriptor is not a regular file, we ignore this filesystem bug silently, without printing an error message for the user. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210527172020.847617-3-thuth@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Thomas Huth	73ebf29729	block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS A customer reported that running qemu-img convert -t none -O qcow2 -f qcow2 input.qcow2 output.qcow2 fails for them with the following error message when the images are stored on a GPFS file system : qemu-img: error while writing sector 0: Invalid argument After analyzing the strace output, it seems like the problem is in handle_aiocb_write_zeroes(): The call to fallocate(FALLOC_FL_PUNCH_HOLE) returns EINVAL, which can apparently happen if the file system has a different idea of the granularity of the operation. It's arguably a bug in GPFS, since the PUNCH_HOLE mode should not result in EINVAL according to the man-page of fallocate(), but the file system is out there in production and so we have to deal with it. In commit `294682cc3a` ("block: workaround for unaligned byte range in fallocate()") we also already applied the a work-around for the same problem to the earlier fallocate(FALLOC_FL_ZERO_RANGE) call, so do it now similar with the PUNCH_HOLE call. But instead of silently catching and returning -ENOTSUP (which causes the caller to fall back to writing zeroes), let's rather inform the user once about the buggy file system and try the other fallback instead. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210527172020.847617-2-thuth@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy	260242a833	block: drop BlockBackendRootState::read_only Instead of keeping additional boolean field, let's store the information in BDRV_O_RDWR bit of BlockBackendRootState::open_flags. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210527154056.70294-4-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy	307261b243	block: consistently use bdrv_is_read_only() It's better to use accessor function instead of bs->read_only directly. In some places use bdrv_is_writable() instead of checking both BDRV_O_RDWR set and BDRV_O_INACTIVE not set. In bdrv_open_common() it's a bit strange to add one more variable, but we are going to drop bs->read_only in the next patch, so new ro local variable substitutes it here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210527154056.70294-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy	39df2c6d57	block/vvfat: fix vvfat_child_perm crash It's wrong to rely on s->qcow in vvfat_child_perm, as on permission update during bdrv_open_child() call this field is not set yet. Still prior to `aa5a04c7db`, it didn't crash, as bdrv_open_child passed NULL as child to bdrv_child_perm(), and NULL was equal to NULL in assertion (still, it was bad guarantee for child being s->qcow, not backing :). Since `aa5a04c7db` "add bdrv_attach_child_noperm" bdrv_refresh_perms called on parent node when attaching child, and new correct child pointer is passed to .bdrv_child_perm. Still, s->qcow is NULL at the moment. Let's rely only on role instead. Without that fix, ./build/qemu-system-x86_64 -usb -device usb-storage,drive=fat16 \ -drive \ file=fat:rw:fat-type=16:"<path of a host folder>",id=fat16,format=raw,if=none crashes: (gdb) bt 0 raise () at /lib64/libc.so.6 1 abort () at /lib64/libc.so.6 2 _nl_load_domain.cold () at /lib64/libc.so.6 3 annobin_assert.c_end () at /lib64/libc.so.6 4 vvfat_child_perm (bs=0x559186f3d690, c=0x559186f1ed20, role=3, reopen_queue=0x0, perm=0, shared=31, nperm=0x7ffe56f28298, nshared=0x7ffe56f282a0) at ../block/vvfat.c:3214 5 bdrv_child_perm (bs=0x559186f3d690, child_bs=0x559186f60190, c=0x559186f1ed20, role=3, reopen_queue=0x0, parent_perm=0, parent_shared=31, nperm=0x7ffe56f28298, nshared=0x7ffe56f282a0) at ../block.c:2094 6 bdrv_node_refresh_perm (bs=0x559186f3d690, q=0x0, tran=0x559186f65850, errp=0x7ffe56f28530) at ../block.c:2336 7 bdrv_list_refresh_perms (list=0x559186db5b90 = {...}, q=0x0, tran=0x559186f65850, errp=0x7ffe56f28530) at ../block.c:2358 8 bdrv_refresh_perms (bs=0x559186f3d690, errp=0x7ffe56f28530) at ../block.c:2419 9 bdrv_attach_child (parent_bs=0x559186f3d690, child_bs=0x559186f60190, child_name=0x559184d83e3d "write-target", child_class=0x5591852f3b00 <child_vvfat_qcow>, child_role=3, errp=0x7ffe56f28530) at ../block.c:2959 10 bdrv_open_child (filename=0x559186f5cb80 "/var/tmp/vl.7WYmFU", options=0x559186f66c20, bdref_key=0x559184d83e3d "write-target", parent=0x559186f3d690, child_class=0x5591852f3b00 <child_vvfat_qcow>, child_role=3, allow_none=false, errp=0x7ffe56f28530) at ../block.c:3351 11 enable_write_target (bs=0x559186f3d690, errp=0x7ffe56f28530) at ../block/vvfat.c:3177 12 vvfat_open (bs=0x559186f3d690, options=0x559186f42db0, flags=155650, errp=0x7ffe56f28530) at ../block/vvfat.c:1236 13 bdrv_open_driver (bs=0x559186f3d690, drv=0x5591853d97e0 <bdrv_vvfat>, node_name=0x0, options=0x559186f42db0, open_flags=155650, errp=0x7ffe56f28640) at ../block.c:1557 14 bdrv_open_common (bs=0x559186f3d690, file=0x0, options=0x559186f42db0, errp=0x7ffe56f28640) at ../block.c:1833 ... (gdb) fr 4 #4 vvfat_child_perm (bs=0x559186f3d690, c=0x559186f1ed20, role=3, reopen_queue=0x0, perm=0, shared=31, nperm=0x7ffe56f28298, nshared=0x7ffe56f282a0) at ../block/vvfat.c:3214 3214 assert(c == s->qcow \|\| (role & BDRV_CHILD_COW)); (gdb) p role $1 = 3 # BDRV_CHILD_DATA \| BDRV_CHILD_METADATA (gdb) p c $2 = {bs = 0x559186f60190, name = 0x559186f669d0 "write-target", klass = 0x5591852f3b00 <child_vvfat_qcow>, role = 3, opaque = 0x559186f3d690, perm = 3, shared_perm = 4, frozen = false, parent_quiesce_counter = 0, next = {le_next = 0x0, le_prev = 0x559186f41818}, next_parent = {le_next = 0x0, le_prev = 0x559186f64320}} (gdb) p s->qcow $3 = (BdrvChild ) 0x0 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210524101257.119377-3-vsementsov@virtuozzo.com> Tested-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Vladimir Sementsov-Ogievskiy	fb62b58896	block/vvfat: child_vvfat_qcow: add .get_parent_aio_context, fix crash Commit `3ca1f32257` "block: BdrvChildClass: add .get_parent_aio_context handler" introduced new handler and commit `228ca37e12` "block: drop ctx argument from bdrv_root_attach_child" made a generic use of it. But `3ca1f32257` didn't update child_vvfat_qcow. Fix that. Before that fix the command ./build/qemu-system-x86_64 -usb -device usb-storage,drive=fat16 \ -drive file=fat:rw:fat-type=16:"<path of a host folder>",id=fat16,format=raw,if=none crashes: 1 bdrv_child_get_parent_aio_context (c=0x559d62426d20) at ../block.c:1440 2 bdrv_attach_child_common (child_bs=0x559d62468190, child_name=0x559d606f9e3d "write-target", child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3, perm=3, shared_perm=4, opaque=0x559d62445690, child=0x7ffc74c2acc8, tran=0x559d6246ddd0, errp=0x7ffc74c2ae60) at ../block.c:2795 3 bdrv_attach_child_noperm (parent_bs=0x559d62445690, child_bs=0x559d62468190, child_name=0x559d606f9e3d "write-target", child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3, child=0x7ffc74c2acc8, tran=0x559d6246ddd0, errp=0x7ffc74c2ae60) at ../block.c:2855 4 bdrv_attach_child (parent_bs=0x559d62445690, child_bs=0x559d62468190, child_name=0x559d606f9e3d "write-target", child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3, errp=0x7ffc74c2ae60) at ../block.c:2953 5 bdrv_open_child (filename=0x559d62464b80 "/var/tmp/vl.h3TIS4", options=0x559d6246ec20, bdref_key=0x559d606f9e3d "write-target", parent=0x559d62445690, child_class=0x559d60c58d20 <child_vvfat_qcow>, child_role=3, allow_none=false, errp=0x7ffc74c2ae60) at ../block.c:3351 6 enable_write_target (bs=0x559d62445690, errp=0x7ffc74c2ae60) at ../block/vvfat.c:3176 7 vvfat_open (bs=0x559d62445690, options=0x559d6244adb0, flags=155650, errp=0x7ffc74c2ae60) at ../block/vvfat.c:1236 8 bdrv_open_driver (bs=0x559d62445690, drv=0x559d60d4f7e0 <bdrv_vvfat>, node_name=0x0, options=0x559d6244adb0, open_flags=155650, errp=0x7ffc74c2af70) at ../block.c:1557 9 bdrv_open_common (bs=0x559d62445690, file=0x0, options=0x559d6244adb0, errp=0x7ffc74c2af70) at ... (gdb) fr 1 #1 0x0000559d603ea3bf in bdrv_child_get_parent_aio_context (c=0x559d62426d20) at ../block.c:1440 1440 return c->klass->get_parent_aio_context(c); (gdb) p c->klass $1 = (const BdrvChildClass ) 0x559d60c58d20 <child_vvfat_qcow> (gdb) p c->klass->get_parent_aio_context $2 = (AioContext ()(BdrvChild )) 0x0 Fixes: `3ca1f32257` Fixes: `228ca37e12` Reported-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210524101257.119377-2-vsementsov@virtuozzo.com> Tested-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Lukas Straub	5529b02da2	block/quorum: Provide .bdrv_co_flush instead of .bdrv_co_flush_to_disk The quorum block driver uses a custom flush callback to handle the case when some children return io errors. In that case it still returns success if enough children are healthy. However, it provides it as the .bdrv_co_flush_to_disk callback, not as .bdrv_co_flush. This causes the block layer to do it's own generic flushing for the children instead, which doesn't handle errors properly. Fix this by providing .bdrv_co_flush instead of .bdrv_co_flush_to_disk so the block layer uses the custom flush callback. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reported-by: Minghao Yuan <meeho@qq.com> Message-Id: <20210518134214.11ccf05f@gecko.fritz.box> Tested-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-06-02 14:23:20 +02:00
Thomas Huth	b4c10fc6fe	block/ssh: Bump minimum libssh version to 0.8.7 It has been over two years since RHEL-8 was released, and thus per the platform build policy, we no longer need to support RHEL-7 as a build target. So from the RHEL-7 perspective, we do not have to support libssh v0.7 anymore now. Let's look at the versions from other distributions and operating systems - according to repology.org, current shipping versions are: RHEL-8: 0.9.4 Debian Buster: 0.8.7 openSUSE Leap 15.2: 0.8.7 Ubuntu LTS 18.04: 0.8.0 * Ubuntu LTS 20.04: 0.9.3 FreeBSD: 0.9.5 Fedora 33: 0.9.5 Fedora 34: 0.9.5 OpenBSD: 0.9.5 macOS HomeBrew: 0.9.5 HaikuPorts: 0.9.5 * The version of libssh in Ubuntu 18.04 claims to be 0.8.0 from the name of the package, but in reality it is a 0.7 patched up as a Frankenstein monster with patches from the 0.8 development branch. This gave us some headaches in the past already and so it never worked with QEMU. All attempts to get it supported have failed in the past, patches for QEMU have never been merged and a request to Ubuntu to fix it in their 18.04 distro has been ignored: https://bugs.launchpad.net/ubuntu/+source/libssh/+bug/1847514 Thus we really should ignore the libssh in Ubuntu 18.04 in QEMU, too. Fix it by bumping the minimum libssh version to something that is greater than 0.8.0 now. Debian Buster and openSUSE Leap have the oldest version and so 0.8.7 is the new minimum. Signed-off-by: Thomas Huth <thuth@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com> Message-Id: <20210519155859.344569-1-thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2021-06-02 06:51:09 +02:00
Stefano Garzarella	d0fb9657a3	docs: fix references to docs/devel/tracing.rst Commit `e50caf4a5c` ("tracing: convert documentation to rST") converted docs/devel/tracing.txt to docs/devel/tracing.rst. We still have several references to the old file, so let's fix them with the following command: sed -i s/tracing.txt/tracing.rst/ $(git grep -l docs/devel/tracing.txt) Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210517151702.109066-2-sgarzare@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2021-06-02 06:51:09 +02:00
Paolo Bonzini	b02629550d	replication: move include out of root directory The replication.h file is included from migration/colo.c and tests/unit/test-replication.c, so it should be in include/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-26 14:49:46 +02:00
Paolo Bonzini	29a6ea24eb	coroutine-sleep: replace QemuCoSleepState pointer with struct in the API Right now, users of qemu_co_sleep_ns_wakeable are simply passing a pointer to QemuCoSleepState by reference to the function. But QemuCoSleepState really is just a Coroutine*; making the content of the struct public is just as efficient and lets us skip the user_state_pointer indirection. Since the usage is changed, take the occasion to rename the struct to QemuCoSleep. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20210517100548.28806-6-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-05-21 18:22:33 +01:00
Paolo Bonzini	eaee072085	coroutine-sleep: allow qemu_co_sleep_wake that wakes nothing All callers of qemu_co_sleep_wake are checking whether they are passing a NULL argument inside the pointer-to-pointer: do the check in qemu_co_sleep_wake itself. As a side effect, qemu_co_sleep_wake can be called more than once and it will only wake the coroutine once; after the first time, the argument will be set to NULL via sleep_state->user_state_pointer. However, this would not be safe unless co_sleep_cb keeps using the QemuCoSleepState directly, so make it go through the pointer-to-pointer instead. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20210517100548.28806-4-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-05-21 18:22:33 +01:00
Stefan Hajnoczi	1b0b2e6d06	block/export: improve vu_blk_sect_range_ok() The checks in vu_blk_sect_range_ok() assume VIRTIO_BLK_SECTOR_SIZE is equal to BDRV_SECTOR_SIZE. This is true, but let's add a QEMU_BUILD_BUG_ON() to make it explicit. We might as well check that the request buffer size is a multiple of VIRTIO_BLK_SECTOR_SIZE while we're at it. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210331142727.391465-1-stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-05-18 11:08:13 +02:00
Vladimir Sementsov-Ogievskiy	38b4409647	qcow2: set bdi->is_dirty Set bdi->is_dirty, so that qemu-img info could show dirty flag. After this commit the following check will show '"dirty-flag": true': ./build/qemu-img create -f qcow2 -o lazy_refcounts=on x 1M ./build/qemu-io x qemu-io> write 0 1M After "write" command success, kill the qemu-io process: kill -9 <qemu-io pid> ./build/qemu-img info --output=json x This will show '"dirty-flag": true' among other things. (before this commit it shows '"dirty-flag": false') Note, that qcow2's dirty-bit is not a "dirty bit for the image". It only protects qcow2 lazy refcounts feature. So, there are a lot of conditions when qcow2 session may be not closed correctly, but bit is 0. Still, when bit is set, the last session is definitely not finished correctly and it's better to report it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210504160656.462836-1-vsementsov@virtuozzo.com> Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-05-18 11:08:13 +02:00
Vladimir Sementsov-Ogievskiy	c61ebf362d	write-threshold: deal with includes "qemu/typedefs.h" is enough for include/block/write-threshold.h header with forward declaration of BlockDriverState. Also drop extra includes from block/write-threshold.c and tests/unit/test-write-threshold.c Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210506090621.11848-9-vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy	2e0e9cbd89	block/write-threshold: drop extra APIs bdrv_write_threshold_exceeded() is unused. bdrv_write_threshold_is_set() is used only to double check the value of bs->write_threshold_offset in tests. No real sense in it (both tests do check real value with help of bdrv_write_threshold_get()) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210506090621.11848-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [mreitz: Adjusted commit message as per Eric's suggestion] Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy	ad578c56d5	block: drop write notifiers They are unused now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210506090621.11848-3-vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy	94783301b8	block/write-threshold: don't use write notifiers write-notifiers are used only for write-threshold. New code for such purpose should create filters. Let's better special-case write-threshold and drop write notifiers at all. (Actually, write-threshold is special-cased anyway, as the only user of write-notifiers) So, create a new direct interface for bdrv_co_write_req_prepare() and drop all write-notifier related logic from write-threshold.c. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210506090621.11848-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [mreitz: Adjusted comment as per Eric's suggestion] Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy	bcc8584c83	block/copy-on-read: use bdrv_drop_filter() and drop s->active Now, after huge update of block graph permission update algorithm, we don't need this workaround with active state of the filter. Drop it and use new smart bdrv_drop_filter() function. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210506194143.394141-1-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy	9c785cd714	mirror: stop cancelling in-flight requests on non-force cancel in READY If mirror is READY than cancel operation is not discarding the whole result of the operation, but instead it's a documented way get a point-in-time snapshot of source disk. So, we should not cancel any requests if mirror is READ and force=false. Let's fix that case. Note, that bug that we have before this commit is not critical, as the only .bdrv_cancel_in_flight implementation is nbd_cancel_in_flight() and it cancels only requests waiting for reconnection, so it should be rare case. Fixes: `521ff8b779` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210421075858.40197-1-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Vladimir Sementsov-Ogievskiy	78632a3d16	monitor: hmp_qemu_io: acquire aio contex, fix crash Max reported the following bug: $ ./qemu-img create -f raw src.img 1G $ ./qemu-img create -f raw dst.img 1G $ (echo ' {"execute":"qmp_capabilities"} {"execute":"blockdev-mirror", "arguments":{"job-id":"mirror", "device":"source", "target":"target", "sync":"full", "filter-node-name":"mirror-top"}} '; sleep 3; echo ' {"execute":"human-monitor-command", "arguments":{"command-line": "qemu-io mirror-top \"write 0 1G\""}}') \ \| x86_64-softmmu/qemu-system-x86_64 \ -qmp stdio \ -blockdev file,node-name=source,filename=src.img \ -blockdev file,node-name=target,filename=dst.img \ -object iothread,id=iothr0 \ -device virtio-blk,drive=source,iothread=iothr0 crashes: 0 raise () at /usr/lib/libc.so.6 1 abort () at /usr/lib/libc.so.6 2 error_exit (err=<optimized out>, msg=msg@entry=0x55fbb1634790 <__func__.27> "qemu_mutex_unlock_impl") at ../util/qemu-thread-posix.c:37 3 qemu_mutex_unlock_impl (mutex=mutex@entry=0x55fbb25ab6e0, file=file@entry=0x55fbb1636957 "../util/async.c", line=line@entry=650) at ../util/qemu-thread-posix.c:109 4 aio_context_release (ctx=ctx@entry=0x55fbb25ab680) at ../util/async.c:650 5 bdrv_do_drained_begin (bs=bs@entry=0x55fbb3a87000, recursive=recursive@entry=false, parent=parent@entry=0x0, ignore_bds_parents=ignore_bds_parents@entry=false, poll=poll@entry=true) at ../block/io.c:441 6 bdrv_do_drained_begin (poll=true, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x55fbb3a87000) at ../block/io.c:448 7 blk_drain (blk=0x55fbb26c5a00) at ../block/block-backend.c:1718 8 blk_unref (blk=0x55fbb26c5a00) at ../block/block-backend.c:498 9 blk_unref (blk=0x55fbb26c5a00) at ../block/block-backend.c:491 10 hmp_qemu_io (mon=0x7fffaf3fc7d0, qdict=<optimized out>) at ../block/monitor/block-hmp-cmds.c:628 man pthread_mutex_unlock ... EPERM The mutex type is PTHREAD_MUTEX_ERRORCHECK or PTHREAD_MUTEX_RECURSIVE, or the mutex is a robust mutex, and the current thread does not own the mutex. So, thread doesn't own the mutex. And we have iothread here. Next, note that AIO_WAIT_WHILE() documents that ctx must be acquired exactly once by caller. But where is it acquired in the call stack? Seems nowhere. qemuio_command do acquire aio context.. But we need context acquired around blk_unref() as well and actually around blk_insert_bs() too. Let's refactor qemuio_command so that it doesn't acquire aio context but callers do that instead. This way we can cleanly acquire aio context in hmp_qemu_io() around all three calls. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210423134233.51495-1-vsementsov@virtuozzo.com> [mreitz: Fixed comment] Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Connor Kuehl	2b99cfce08	block/rbd: Add an escape-aware strchr helper Sometimes the parser needs to further split a token it has collected from the token input stream. Right now, it does a cursory check to see if the relevant characters appear in the token to determine if it should break it down further. However, qemu_rbd_next_tok() will escape characters as it removes tokens from the token stream and plain strchr() won't. This can make the initial strchr() check slightly misleading since it implies qemu_rbd_next_tok() will find the token and split on it, except the reality is that qemu_rbd_next_tok() will pass over it if it is escaped. Use a custom strchr to avoid mixing escaped and unescaped string operations. Furthermore, this code is identical to how qemu_rbd_next_tok() seeks its next token, so incorporate this custom strchr into the body of that function to reduce duplication. Reported-by: Han Han <hhan@redhat.com> Fixes: https://bugzilla.redhat.com/1873913 Signed-off-by: Connor Kuehl <ckuehl@redhat.com> Message-Id: <20210421212343.85524-3-ckuehl@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-05-14 16:14:10 +02:00
Markus Armbruster	09ec85176e	block: Drop the sheepdog block driver It was deprecated in commit `e1c4269763`, v5.2.0. See that commit message for rationale. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210501075747.3293186-1-armbru@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com>	2021-05-12 17:42:23 +02:00
Peter Maydell	4cc10cae64	* NetBSD NVMM support * RateLimit mutex * Prepare for Meson 0.57 upgrade -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCROukUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOFXgf/ThwuBCbwC6pwoHpZzFXHdJRXIqHa iKTqjCLymz9NQBRTaMeG5CWjXl4o9syHLzEXLQxuQaynHK8AjbyeMSllBVLzBUme TU9AY3qwLShRJm3XGXkuUilFE+IR8FXWFgrTOsZXgbT+JQlkCgiuhCRqfAcDEgi/ F5SNqlMzPNvF6G0FY9DFBBkoKF4YWROx25SgNl3fxgWwC94px/a22BXTVpOxaClZ HE/H+kbJH5sD2dOJR5cqbgFg7eBemNdxO3tSbR6WoP9pcvVPx0Dgh5hUJb5+pUXY fV5O5zZ+CdyNjWM4yAHg0y8kOlnqrLwv7pH+NdqWFaWiZ9uCSrVFR13ejQ== =sKO4 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging * NetBSD NVMM support * RateLimit mutex * Prepare for Meson 0.57 upgrade # gpg: Signature made Tue 04 May 2021 13:15:37 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: glib-compat: accept G_TEST_SLOW environment variable gitlab-ci: use --meson=internal for CFI jobs configure: handle meson options that have changed type configure: reindent meson invocation slirp: add configure option to disable smbd ratelimit: protect with a mutex Add NVMM Accelerator: add maintainers for NetBSD/NVMM Add NVMM accelerator: acceleration enlightenments Add NVMM accelerator: x86 CPU support Add NVMM accelerator: configure and build logic oslib-win32: do not rely on macro to get redefined function name Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-05-06 18:56:17 +01:00
Paolo Bonzini	4951967d84	ratelimit: protect with a mutex Right now, rate limiting is protected by the AioContext mutex, which is taken for example both by the block jobs and by qmp_block_job_set_speed (via find_block_job). We would like to remove the dependency of block layer code on the AioContext mutex, since most drivers and the core I/O code are already not relying on it. However, there is no existing lock that can easily be taken by both ratelimit_set_speed and ratelimit_calculate_delay, especially because the latter might run in coroutine context (and therefore under a CoMutex) but the former will not. Since concurrent calls to ratelimit_calculate_delay are not possible, one idea could be to use a seqlock to get a snapshot of slice_ns and slice_quota. But for now keep it simple, and just add a mutex to the RateLimit struct; block jobs are generally not performance critical to the point of optimizing the clock cycles spent in synchronization. This also requires the introduction of init/destroy functions, so add them to the two users of ratelimit.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-04 14:15:35 +02:00
Thomas Huth	4c386f8064	Do not include sysemu/sysemu.h if it's not really necessary Stop including sysemu/sysemu.h in files that don't need it. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210416171314.2074665-2-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-05-02 17:24:50 +02:00
Kevin Wolf	35b7f4abd5	block: Add BDRV_O_NO_SHARE for blk_new_open() Normally, blk_new_open() just shares all permissions. This was fine originally when permissions only protected against uses in the same process because no other part of the code would actually get to access the block nodes opened with blk_new_open(). However, since we use it for file locking now, unsharing permissions becomes desirable. Add a new BDRV_O_NO_SHARE flag that is used in blk_new_open() to unshare any permissions that can be unshared. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210422164344.283389-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:48 +02:00
Vladimir Sementsov-Ogievskiy	72373e40fb	block: bdrv_reopen_multiple: refresh permissions on updated graph Move bdrv_reopen_multiple to new paradigm of permission update: first update graph relations, then do refresh the permissions. We have to modify reopen process in file-posix driver: with new scheme we don't have prepared permissions in raw_reopen_prepare(), so we should reconfigure fd in raw_check_perm(). Still this seems more native and simple anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-31-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:48 +02:00
Vladimir Sementsov-Ogievskiy	1e4c797c75	block: make bdrv_refresh_limits() to be a transaction action To be used in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-28-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:48 +02:00
Vladimir Sementsov-Ogievskiy	b75d64b329	block/backup-top: drop .active We don't need this workaround anymore: bdrv_append is already smart enough and we can use new bdrv_drop_filter(). This commit efficiently reverts also recent `705dde27c6`, which checked .active on io path. Still it said that the problem should be theoretical. And the logic of filter removement is changed anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-25-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:48 +02:00
Vladimir Sementsov-Ogievskiy	228ca37e12	block: drop ctx argument from bdrv_root_attach_child Passing parent aio context is redundant, as child_class and parent opaque pointer are enough to retrieve it. Drop the argument and use new bdrv_child_get_parent_aio_context() interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-7-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:47 +02:00
Vladimir Sementsov-Ogievskiy	3ca1f32257	block: BdrvChildClass: add .get_parent_aio_context handler Add new handler to get aio context and implement it in all child classes. Add corresponding public interface to be used soon. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-6-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:47 +02:00
Vladimir Sementsov-Ogievskiy	ae9d441706	block: bdrv_append(): don't consume reference We have too much comments for this feature. It seems better just don't do it. Most of real users (tests don't count) have to create additional reference. Drop also comment in external_snapshot_prepare: - bdrv_append doesn't "remove" old bs in common sense, it sounds strange - the fact that bdrv_append can fail is obvious from the context - the fact that we must rollback all changes in transaction abort is known (it's the direct role of abort) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-5-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-30 12:27:47 +02:00
Vladimir Sementsov-Ogievskiy	0267101af6	block/nbd: fix possible use after free of s->connect_thread If on nbd_close() we detach the thread (in nbd_co_establish_connection_cancel() thr->state becomes CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use s->connect_thread (which is set to NULL), as running thread may free it at any time. Still nbd_co_establish_connection() does exactly this: it saves s->connect_thread to local variable (just for better code style) and use it even after yield point, when thread may be already detached. Fix that. Also check thr to be non-NULL on nbd_co_establish_connection() start for safety. After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes impossible in the second switch in nbd_co_establish_connection(). Still, don't add extra abort() just before the release. If it somehow possible to reach this "case:" it won't hurt. Anyway, good refactoring of all this reconnect mess will come soon. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210406155114.1057355-1-vsementsov@virtuozzo.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-04-13 15:35:12 +02:00
Max Reitz	00769414cd	mirror: Do not enter a paused job on completion Currently, it is impossible to complete jobs on standby (i.e. paused ready jobs), but actually the only thing in mirror_complete() that does not work quite well with a paused job is the job_enter() at the end. If we make it conditional, this function works just fine even if the mirror job is paused. So technically this is a no-op, but obviously the intention is to accept block-job-complete even for jobs on standby, which we need this patch for first. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210409120422.144040-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-09 18:00:29 +02:00
Max Reitz	c41f5b96ee	mirror: Move open_backing_file to exit_common This is a graph change and therefore should be done in job-finalize (which is what invokes mirror_exit_common()). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210409120422.144040-2-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-09 18:00:29 +02:00
Stefano Garzarella	b084b420d9	block/rbd: fix memory leak in qemu_rbd_co_create_opts() When we allocate 'q_namespace', we forgot to set 'has_q_namespace' to true. This can cause several issues, including a memory leak, since qapi_free_BlockdevCreateOptions() does not deallocate that memory, as reported by valgrind: 13 bytes in 1 blocks are definitely lost in loss record 7 of 96 at 0x4839809: malloc (vg_replace_malloc.c:307) by 0x48CEBB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x48E3FE3: g_strdup (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x180010: qemu_rbd_co_create_opts (rbd.c:446) by 0x1AE72C: bdrv_create_co_entry (block.c:492) by 0x241902: coroutine_trampoline (coroutine-ucontext.c:173) by 0x57530AF: ??? (in /usr/lib64/libc-2.32.so) by 0x1FFEFFFA6F: ??? Fix setting 'has_q_namespace' to true when we allocate 'q_namespace'. Fixes: `19ae9ae014` ("block/rbd: Add support for ceph namespaces") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20210329150129.121182-3-sgarzare@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-09 18:00:29 +02:00
Stefano Garzarella	c1c1f6cf51	block/rbd: fix memory leak in qemu_rbd_connect() In qemu_rbd_connect(), 'mon_host' is allocated by qemu_rbd_mon_host() using g_strjoinv(), but it's only freed in the error path, leaking memory in the success path as reported by valgrind: 80 bytes in 4 blocks are definitely lost in loss record 5,028 of 6,516 at 0x4839809: malloc (vg_replace_malloc.c:307) by 0x5315BB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x532B6FF: g_strjoinv (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x87D07E: qemu_rbd_mon_host (rbd.c:538) by 0x87D07E: qemu_rbd_connect (rbd.c:562) by 0x87E1CE: qemu_rbd_open (rbd.c:740) by 0x840EB1: bdrv_open_driver (block.c:1528) by 0x8453A9: bdrv_open_common (block.c:1802) by 0x8453A9: bdrv_open_inherit (block.c:3444) by 0x8464C2: bdrv_open (block.c:3537) by 0x8108CD: qmp_blockdev_add (blockdev.c:3569) by 0x8EA61B: qmp_marshal_blockdev_add (qapi-commands-block-core.c:1086) by 0x90B528: do_qmp_dispatch_bh (qmp-dispatch.c:131) by 0x907EA4: aio_bh_poll (async.c:164) Fix freeing 'mon_host' also when qemu_rbd_connect() ends correctly. Fixes: `0a55679b4a` Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20210329150129.121182-2-sgarzare@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-04-09 18:00:29 +02:00
David Edmondson	07ee2ab4fd	block/vdi: Don't assume that blocks are larger than VdiHeader Given that the block size is read from the header of the VDI file, a wide variety of sizes might be seen. Rather than re-using a block sized memory region when writing the VDI header, allocate an appropriately sized buffer. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Max Reitz <mreitz@redhat.com> Message-id: 20210325112941.365238-3-pbonzini@redhat.com Message-Id: <20210309144015.557477-3-david.edmondson@oracle.com> Acked-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-03-31 10:44:21 +01:00
David Edmondson	574b8304cf	block/vdi: When writing new bmap entry fails, don't leak the buffer If a new bitmap entry is allocated, requiring the entire block to be written, avoiding leaking the buffer allocated for the block should the write fail. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Max Reitz <mreitz@redhat.com> Message-id: 20210325112941.365238-2-pbonzini@redhat.com Message-Id: <20210309144015.557477-2-david.edmondson@oracle.com> Acked-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2021-03-31 10:44:21 +01:00
Max Reitz	484108293d	qcow2: Force preallocation with data-file-raw Setting the qcow2 data-file-raw bit means that you can ignore the qcow2 metadata when reading from the external data file. It does not mean that you have to ignore it, though. Therefore, the data read must be the same regardless of whether you interpret the metadata or whether you ignore it, and thus the L1/L2 tables must all be present and give a 1:1 mapping. This patch changes 244's output: First, the qcow2 file is larger right after creation, because of metadata preallocation. Second, the qemu-img map output changes: Everything that was not explicitly discarded or zeroed is now a data area. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210326145509.163455-2-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2021-03-30 13:02:10 +02:00
Max Reitz	53431b9086	block/mirror: Fix mirror_top's permissions mirror_top currently shares all permissions, and takes only the WRITE permission (if some parent has taken that permission, too). That is wrong, though; mirror_top is a filter, so it should take permissions like any other filter does. For example, if the parent needs CONSISTENT_READ, we need to take that, too, and if it cannot share the WRITE permission, we cannot share it either. The exception is when mirror_top is used for active commit, where we cannot take CONSISTENT_READ (because it is deliberately unshared above the base node) and where we must share WRITE (so that it is shared for all images in the backing chain, so the mirror job can take it for the target BB). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210211172242.146671-2-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2021-03-29 18:09:00 +02:00
Pavel Dovgalyuk	ad0ce64279	qcow2: use external virtual timers Regular virtual timers are used to emulate timings related to vCPU and peripheral states. QCOW2 uses timers to clean the cache. These timers should have external flag. In the opposite case they affect the execution and it can't be recorded and replayed. This patch adds external flag to the timer for qcow2 cache clean. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <161700516327.1141158.8366564693714562536.stgit@pasha-ThinkPad-X280> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-03-29 18:04:29 +02:00
Markus Armbruster	bdabafc683	block: Remove monitor command block_passwd Command block_passwd always fails since Commit `c01c214b69` "block: remove all encryption handling APIs" (v2.10.0) turned block_passwd into a stub that always fails, and hardcoded encryption_key_missing to false in query-named-block-nodes and query-block. Commit `ad1324e044` "block: remove 'encryption_key_missing' flag from QAPI" just landed. Complete the cleanup job: remove block_passwd. Cc: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210323101951.3686029-1-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2021-03-23 22:31:56 +01:00
Stefan Hajnoczi	6f4b1996b4	block/export: disable VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD for now The vhost-user in-flight shmfd feature has not been tested with qemu-storage-daemon's vhost-user-blk server. Disable this optional feature for now because it requires MFD_ALLOW_SEALING, which is not available in some CI environments. If we need this feature in the future it can be re-enabled after testing. Reported-by: Peter Maydell <peter.maydell@linaro.org> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210309094106.196911-2-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-19 10:15:06 +01:00
Max Reitz	0f418a2076	curl: Disconnect sockets from CURLState When a curl transfer is finished, that does not mean that CURL lets go of all the sockets it used for it. We therefore must not free a CURLSocket object before CURL has invoked curl_sock_cb() to tell us to remove it. Otherwise, we may get a use-after-free, as described in this bug report: https://bugs.launchpad.net/qemu/+bug/1916501 (Reproducer from that report: $ qemu-img convert -f qcow2 -O raw \ https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img \ out.img ) (Alternatively, it might seem logical to force-drop all sockets that have been used for a state when the respective transfer is done, kind of like it is done now, but including unsetting the AIO handlers. Unfortunately, doing so makes the driver just hang instead of crashing, which seems to evidence that CURL still uses those sockets.) Make the CURLSocket object independent of "its" CURLState by putting all sockets into a hash table belonging to the BDRVCURLState instead of a list that belongs to a CURLState. Do not touch any sockets in curl_clean_state(). Testing, it seems like all sockets are indeed gone by the time the curl BDS is closed, so it seems like there really was no point in freeing any socket just because a transfer is done. libcurl does invoke curl_sock_cb() with CURL_POLL_REMOVE for every socket it has. Buglink: https://bugs.launchpad.net/qemu/+bug/1916501 Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210309130541.37540-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-19 10:15:06 +01:00
Max Reitz	3663dca461	curl: Store BDRVCURLState pointer in CURLSocket A socket does not really belong to any specific state. We do not need to store a pointer to "its" state in it, a pointer to the common BDRVCURLState is sufficient. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210309130541.37540-2-mreitz@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-19 10:15:06 +01:00
Kevin Wolf	1bf26076d6	stream: Don't crash when node permission is denied The image streaming block job restricts shared permissions of the nodes it accesses. This can obviously fail when other users already got these permissions. &error_abort is therefore wrong and can crash. Handle these errors gracefully and just fail starting the block job. Reported-by: Nini Gu <ngu@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210309173451.45152-1-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-19 10:15:06 +01:00
Daniel P. Berrangé	8d17adf34f	block: remove support for using "file" driver with block/char devices The 'host_device' and 'host_cdrom' drivers must be used instead. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-03-18 09:22:55 +00:00
Daniel P. Berrangé	e67d8e2928	block: remove 'dirty-bitmaps' field from 'BlockInfo' struct The same data is available in the 'BlockDeviceInfo' struct. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-03-18 09:22:55 +00:00
Daniel P. Berrangé	81cbfd5088	block: remove dirty bitmaps 'status' field The same information is available via the 'recording' and 'busy' fields. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-03-18 09:22:55 +00:00
Daniel P. Berrangé	ad1324e044	block: remove 'encryption_key_missing' flag from QAPI This has been hardcoded to "false" since 2.10.0, since secrets required to unlock block devices are now always provided up front instead of using interactive prompts. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-03-18 09:22:55 +00:00
Peter Maydell	6f34661b6c	Pull request -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAmBJQHkSHGxhdXJlbnRA dml2aWVyLmV1AAoJEPMMOL0/L748EdsP/2U2CGTM95tjDunTs9uZV/7zM6PWt85M vAPItNVU2jYPfzmaJN8twrzlj0PEDhvB9Q+OJjE4HEGxEbPcdblLg/R6Zs/EaWuY N6oKHPXnOnHb+e80UUJdiAq+Y5RUnJbb5L3ArycnVzBgws+Oj3DtqjB2VDccY4C/ Gkt23tZ7ikU4958e5VBqW2NUUrr+BQO0mqsW+sbbeE3WPj75NQc6srvS3TWvsg7W OYEyVYwm52/q2W/1a3Knfv/YO6UU9NGMpGyDLD2kwQwKbgUWYLW2BiWVwOAUldo9 De3nfKbKnFezLCZAZro20lfCa/aKwNGCOXWzlrKxqUQCmGYUx7gM1+3ahrSd5N0v zUgLdZm7O428ZHL6GujWGLA1UwwzpM9X3P3yo4c0S1J6fHypbI6a9jtewrUFvFgP TuQ7dp6cn2DTBYUcsrWilPHbTZMADYQNRD/xUtKqalYBEWy3FX5W75+OYBJKKh+X Qip68m6JBzgkszXhCcu6xlLb8ynZJr2VsHvtvIgf4NnLqNOIEgVLcMtoMZT8DPrp rIoRc5oUFz8zj5lHnJuLADBUvlCMqoCCoU3h2aqHwH8a7RGb180f+82BW9aBcb2u Jk+WgAhBUjWBBC97ReFgrINUD/qZRXVoOq8LthTuQSSyr/i1zq+oLM1F0EDXcMDm ssATku2IxL24 =moUF -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-6.0-pull-request' into staging Pull request # gpg: Signature made Wed 10 Mar 2021 21:56:09 GMT # gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C # gpg: issuer "laurent@vivier.eu" # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full] # gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full] # gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full] # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C * remotes/vivier2/tags/trivial-branch-for-6.0-pull-request: (22 commits) sysemu: Let VMChangeStateHandler take boolean 'running' argument sysemu/runstate: Let runstate_is_running() return bool hw/lm32/Kconfig: Have MILKYMIST select LM32_DEVICES hw/lm32/Kconfig: Rename CONFIG_LM32 -> CONFIG_LM32_DEVICES hw/lm32/Kconfig: Introduce CONFIG_LM32_EVR for lm32-evr/uclinux boards qemu-common.h: Update copyright string to 2021 tests/fp/fp-test: Replace the word 'blacklist' qemu-options: Replace the word 'blacklist' seccomp: Replace the word 'blacklist' scripts/tracetool: Replace the word 'whitelist' ui: Replace the word 'whitelist' virtio-gpu: Adjust code space style exec/memory: Use struct Object typedef fuzz-test: remove unneccessary debugging flags net: Use id_generate() in the network subsystem, too MAINTAINERS: Fix the location of tools manuals vhost_user_gpu: Drop dead check for g_malloc() failure backends/dbus-vmstate: Fix short read error handling target/hexagon/gen_tcg_funcs: Fix a typo hw/elf_ops: Fix a typo ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-03-11 18:55:27 +00:00
Peter Maydell	9abda42bf2	nbd patches for 2021-03-09 - Add Vladimir as NBD co-maintainer - Fix reporting of holes in NBD_CMD_BLOCK_STATUS - Improve command-line parsing accuracy of large numbers (anything going through qemu_strtosz), including the deprecation of hex+suffix - Improve some error reporting in the block layer -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmBHlmIACgkQp6FrSiUn Q2q2cQgAqJWNb4J/ShjvzocDDPzJ0iBitFbg0huFPfbt4DScubEZo5wBJG7vOhOW hIHrWCRzGvRgsn0tcSfrgFaegmHKrLgjkibM7ou8ni9NC1kUBd3R/3FBNIMxhYf7 Q8Kfspl0LRfMJDKF9jdCnQ4Gxcd6h2OIYZqiWVg8V4Tc8WdCpIVOah7e7wjuW8bT vgZvfboUWm5AmIF9j/MxuMn+HFZ4ArSuFVL80ZaXlD00vRra7u3HZ8pUfcOlOujg 7HeouM1E5j3NNE6aZSN++x/EQ3sg0zmirbWUCcgAyRfdRkAmB15uh2PUzPxEIJKH UHUIW5LvNtz2+yzOAz2yK29OE523Yg== =blE1 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2021-03-09' into staging nbd patches for 2021-03-09 - Add Vladimir as NBD co-maintainer - Fix reporting of holes in NBD_CMD_BLOCK_STATUS - Improve command-line parsing accuracy of large numbers (anything going through qemu_strtosz), including the deprecation of hex+suffix - Improve some error reporting in the block layer # gpg: Signature made Tue 09 Mar 2021 15:38:10 GMT # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2021-03-09: block/qcow2: refactor qcow2_update_options_prepare error paths block/qed: bdrv_qed_do_open: deal with errp block/qcow2: simplify qcow2_co_invalidate_cache() block/qcow2: read_cache_sizes: return status value block/qcow2-bitmap: return status from qcow2_store_persistent_dirty_bitmaps block/qcow2-bitmap: improve qcow2_load_dirty_bitmaps() interface block/qcow2: qcow2_get_specific_info(): drop error propagation blockjob: return status from block_job_set_speed() block/mirror: drop extra error propagation in commit_active_start() block: drop extra error propagation for bdrv_set_backing_hd blockdev: fix drive_backup_prepare() missed error block: check return value of bdrv_open_child and drop error propagation utils: Deprecate hex-with-suffix sizes utils: Improve qemu_strtosz() to have 64 bits of precision utils: Enhance testsuite for do_strtosz() nbd: server: Report holes for raw images MAINTAINERS: add Vladimir as co-maintainer of NBD Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-03-11 13:57:08 +00:00
Philippe Mathieu-Daudé	538f049704	sysemu: Let VMChangeStateHandler take boolean 'running' argument The 'running' argument from VMChangeStateHandler does not require other value than 0 / 1. Make it a plain boolean. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20210111152020.1422021-3-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-03-09 23:13:57 +01:00
Peter Maydell	a557b00469	Block layer patches: - qemu-storage-daemon: add --pidfile option - qemu-storage-daemon: CLI error messages include the option name now - vhost-user-blk export: Misc fixes - docs: Improvements for qemu-storage-daemon documentation - parallels: load bitmap extension - backup-top: Don't crash on post-finalize accesses - Improve error messages related to node-name options - iotests improvements -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmBGWHURHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9ZpyxAAk0gRiayMUidSzgvzU/CeUhzBsC4ayEkn dLtTZ8hl7cW/w3GjDK1Wri4MANRN/0YHjiLSzO38lfVpK0z8SJr5aU4CwhRlOKVm VWgx+OLlV4Azht9fMNF4SwUXgXhl7pUNiFMNnomb++gvqhjMCedDZcWlnVKhbuQ+ O3TKGO4tToSGaXP85jCM4xukw5HZ//4QMYg6MH0gDk8ahfE2MhyTHz64oDp412os qhxvc4bU2S5xGLaBfLGhsc6VPQFKjblG704P/Y73zeoxq12A0L2Ru98WvrNaXw7Z m54jJUINiDkJ7ZOl6W04zdeiLvs3BOrNe+7mxawOTmdkBsLOKErrhrTO1gJmHHmX kJLWEh9VYWxVbvE7C3KQt9bclR6wt+aPup4X1XE8pHtocPVONVq5bvctrVgxgK0b btN06NcK+2jQxcQkG4MnBJ8S41qmxHyIEQlQWKyUWXvKt6zsFU/NuWKMQrAfYZZi 5J+RPU/fB073LY4lpAgou0OP1/RIvQmi5zWzjWm/Qbp3JpgC+azcYvxn7UU7J71P +u8IEQ4+Q9s0gvXQAh/U8AQg2eOqAwEAyFUJl9wpPN56O03dbI8KyCV5ECIRJu49 CC8uKlJxZkbw9ZBs11SAmm/0J64WcNb2AMWxDPC8Z6oQbVaRRRznoRwRP2H6odUu uBolS43+5cI= =eAjH -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - qemu-storage-daemon: add --pidfile option - qemu-storage-daemon: CLI error messages include the option name now - vhost-user-blk export: Misc fixes - docs: Improvements for qemu-storage-daemon documentation - parallels: load bitmap extension - backup-top: Don't crash on post-finalize accesses - Improve error messages related to node-name options - iotests improvements # gpg: Signature made Mon 08 Mar 2021 17:01:41 GMT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (30 commits) blockdev: Clarify error messages pertaining to 'node-name' block: Clarify error messages pertaining to 'node-name' docs: qsd: Explain --export nbd,name=... default MAINTAINERS: update parallels block driver iotests: add parallels-read-bitmap test iotests.py: add unarchive_sample_image() helper parallels: support bitmap extension for read-only mode block/parallels: BDRVParallelsState: add cluster_size field parallels.txt: fix bitmap L1 table description qcow2-bitmap: make bytes_covered_by_bitmap_cluster() public block/export: port virtio-blk read/write range check block/export: port virtio-blk discard/write zeroes input validation block/export: fix vhost-user-blk export sector number calculation block/export: use VIRTIO_BLK_SECTOR_BITS block/export: fix blk_size double byteswap libqtest: add qtest_remove_abrt_handler() libqtest: add qtest_kill_qemu() libqtest: add qtest_socket_server() vhost-user-blk: fix blkcfg->num_queues endianness docs: replace insecure /tmp examples in qsd docs ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-03-09 21:31:18 +00:00
Vladimir Sementsov-Ogievskiy	1184b41101	block/qcow2: refactor qcow2_update_options_prepare error paths Keep setting ret close to setting errp and don't merge different error paths into one. This way it's more obvious that we don't return error without setting errp. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210202124956.63146-15-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 16:04:46 -06:00
Vladimir Sementsov-Ogievskiy	15ce94a68c	block/qed: bdrv_qed_do_open: deal with errp Always set errp on failure. The generic bdrv_open_driver supports driver functions which can return a negative value but forget to set errp. That's a strange thing. Let's improve bdrv_qed_do_open to not behave this way. This allows the simplification of code in bdrv_qed_co_invalidate_cache(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20210202124956.63146-14-vsementsov@virtuozzo.com> [eblake: commit message grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 16:03:32 -06:00
Vladimir Sementsov-Ogievskiy	e6247c9c9f	block/qcow2: simplify qcow2_co_invalidate_cache() qcow2_do_open correctly sets errp on each failure path. So, we can simplify code in qcow2_co_invalidate_cache() and drop explicit error propagation. Add ERRP_GUARD() as mandated by the documentation in include/qapi/error.h so that error_prepend() is actually called even if errp is &error_fatal. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20210202124956.63146-13-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 16:03:27 -06:00
Vladimir Sementsov-Ogievskiy	772c4cad13	block/qcow2: read_cache_sizes: return status value It's better to return status together with setting errp. It allows to reduce error propagation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210202124956.63146-12-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 16:03:23 -06:00
Vladimir Sementsov-Ogievskiy	526e31de99	block/qcow2-bitmap: return status from qcow2_store_persistent_dirty_bitmaps It's better to return status together with setting errp. It makes possible to avoid error propagation. While being here, put ERRP_GUARD() to fix error_prepend(errp, ...) usage inside qcow2_store_persistent_dirty_bitmaps() (see the comment above ERRP_GUARD() definition in include/qapi/error.h) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210202124956.63146-11-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 16:03:21 -06:00
Vladimir Sementsov-Ogievskiy	0c1e9d2a9a	block/qcow2-bitmap: improve qcow2_load_dirty_bitmaps() interface It's recommended for bool functions with errp to return true on success and false on failure. Non-standard interfaces don't help to understand the code. The change is also needed to reduce error propagation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20210202124956.63146-10-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 16:01:47 -06:00
Vladimir Sementsov-Ogievskiy	83bad8cbf5	block/qcow2: qcow2_get_specific_info(): drop error propagation Don't use error propagation in qcow2_get_specific_info(). For this refactor qcow2_get_bitmap_info_list, its current interface is rather weird. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210202124956.63146-9-vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> [eblake: separate local 'tail' variable from 'info_list' parameter] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 15:16:11 -06:00
Vladimir Sementsov-Ogievskiy	eb5becc18f	block/mirror: drop extra error propagation in commit_active_start() Let's check return value of mirror_start_job to check for failure instead of local_err. Rename ret to job, as ret is usually integer variable. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210202124956.63146-7-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 15:14:14 -06:00
Vladimir Sementsov-Ogievskiy	bc52024959	block: check return value of bdrv_open_child and drop error propagation This patch is generated by cocci script: @@ symbol bdrv_open_child, errp, local_err; expression file; @@ file = bdrv_open_child(..., - &local_err + errp ); - if (local_err) + if (!file) { ... - error_propagate(errp, local_err); ... } with command spatch --sp-file x.cocci --macro-file scripts/cocci-macro-file.h \ --in-place --no-show-diff --max-width 80 --use-gitgrep block Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210202124956.63146-4-vsementsov@virtuozzo.com> [eblake: fix qcow2_do_open() to use ERRP_GUARD, necessary as the only caller to pass allow_none=true] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-03-08 15:07:09 -06:00
Vladimir Sementsov-Ogievskiy	baefd97700	parallels: support bitmap extension for read-only mode Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210224104707.88430-5-vsementsov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:55 +01:00
Vladimir Sementsov-Ogievskiy	e0b5207f54	block/parallels: BDRVParallelsState: add cluster_size field We are going to use it in more places, calculating "s->tracks << BDRV_SECTOR_BITS" doesn't look good. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210224104707.88430-4-vsementsov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:54 +01:00
Vladimir Sementsov-Ogievskiy	35f428ba39	qcow2-bitmap: make bytes_covered_by_bitmap_cluster() public Rename bytes_covered_by_bitmap_cluster() to bdrv_dirty_bitmap_serialization_coverage() and make it public. It is needed as we are going to share it with bitmap loading in parallels format. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-Id: <20210224104707.88430-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:54 +01:00
Stefan Hajnoczi	05ae4e674e	block/export: port virtio-blk read/write range check Check that the sector number and byte count are valid. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210223144653.811468-13-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:54 +01:00
Stefan Hajnoczi	db4eadf9f1	block/export: port virtio-blk discard/write zeroes input validation Validate discard/write zeroes the same way we do for virtio-blk. Some of these checks are mandated by the VIRTIO specification, others are internal to QEMU. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210223144653.811468-11-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:54 +01:00
Stefan Hajnoczi	e44362ce31	block/export: fix vhost-user-blk export sector number calculation The driver is supposed to honor the blk_size field but the protocol still uses 512-byte sector numbers. It is incorrect to multiply req->sector_num by blk_size. VIRTIO 1.1 5.2.5 Device Initialization says: blk_size can be read to determine the optimal sector size for the driver to use. This does not affect the units used in the protocol (always 512 bytes), but awareness of the correct value can affect performance. Fixes: `3578389bcf` ("block/export: vhost-user block device backend server") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210223144653.811468-10-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:54 +01:00
Stefan Hajnoczi	524bac0744	block/export: use VIRTIO_BLK_SECTOR_BITS Use VIRTIO_BLK_SECTOR_BITS and VIRTIO_BLK_SECTOR_SIZE when dealing with virtio-blk sector numbers. Although the values happen to be the same as BDRV_SECTOR_BITS and BDRV_SECTOR_SIZE, they are conceptually different. This makes it clearer when we are dealing with virtio-blk sector units. Use VIRTIO_BLK_SECTOR_BITS in vu_blk_initialize_config(). Later patches will use it the new constants the virtqueue request processing code path. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210223144653.811468-9-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:54 +01:00
Stefan Hajnoczi	a4f1542af5	block/export: fix blk_size double byteswap The config->blk_size field is little-endian. Use the native-endian blk_size variable to avoid double byteswapping. Fixes: `11f60f7eae` ("block/export: make vhost-user-blk config space little-endian") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210223144653.811468-8-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:56:54 +01:00
Max Reitz	705dde27c6	backup-top: Refuse I/O in inactive state When the backup-top node transitions from active to inactive in bdrv_backup_top_drop(), the BlockCopyState is freed and the filtered child is removed, so the node effectively becomes unusable. However, noone told its I/O functions this, so they will happily continue accessing bs->backing and s->bcs. Prevent that by aborting early when s->active is false. (After the preceding patch, the node should be gone after bdrv_backup_top_drop(), so this should largely be a theoretical problem. But still, better to be safe than sorry, and also I think it just makes sense to check s->active in the I/O functions.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210219153348.41861-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:55:18 +01:00
Max Reitz	bdc4c4c5e3	backup: Remove nodes from job in .clean() The block job holds a reference to the backup-top node (because it is passed as the main job BDS to block_job_create()). Therefore, bdrv_backup_top_drop() cannot delete the backup-top node (replacing it by its child does not affect the job parent, because that has .stay_at_node set). That is a problem, because all of its I/O functions assume the BlockCopyState (s->bcs) to be valid and that it has a filtered child; but after bdrv_backup_top_drop(), neither of those things are true. It does not make sense to add new parents to backup-top after backup_clean(), so we should detach it from the job before bdrv_backup_top_drop(). Because there is no function to do that for a single node, just detach all of the job's nodes -- the job does not do anything past backup_clean() anyway. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210219153348.41861-2-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-03-08 14:55:18 +01:00
Paolo Bonzini	f7544edcd3	qemu-config: add error propagation to qemu_config_parse This enables some simplification of vl.c via error_fatal, and improves error messages. Before: $ ./qemu-system-x86_64 -readconfig . qemu-system-x86_64: error reading file qemu-system-x86_64: -readconfig .: read config .: Invalid argument $ /usr/libexec/qemu-kvm -readconfig foo qemu-kvm: -readconfig foo: read config foo: No such file or directory After: $ ./qemu-system-x86_64 -readconfig . qemu-system-x86_64: -readconfig .: Cannot read config file: Is a directory $ ./qemu-system-x86_64 -readconfig foo qemu-system-x86_64: -readconfig foo: Could not open 'foo': No such file or directory Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210226170816.231173-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-03-06 11:41:54 +01:00
Maxim Levitsky	6094cbeb72	block: qcow2: remove the created file on initialization error If the qcow initialization fails, we should remove the file if it was already created, to avoid leaving stale files around. We already do this for luks raw images. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20201217170904.946013-4-mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-02-15 15:10:14 +01:00
Maxim Levitsky	a890f08e58	block: add bdrv_co_delete_file_noerr This function wraps bdrv_co_delete_file for the common case of removing a file, which was just created by format driver, on an error condition. It hides the -ENOTSUPP error, and reports all other errors otherwise. Use it in luks driver Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20201217170904.946013-3-mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-02-15 15:10:14 +01:00
Maxim Levitsky	dcb6699512	crypto: luks: Fix tiny memory leak When the underlying block device doesn't support the bdrv_co_delete_file interface, an 'Error' object was leaked. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201217170904.946013-2-mlevitsk@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-02-15 15:10:14 +01:00
Peter Maydell	392b9a74b9	bitmaps patches for 2021-02-12 - add 'transform' member to manipulate bitmaps across migration - work towards better error handling during bdrv_open -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmAnDQsACgkQp6FrSiUn Q2qc5Qf/SKVdpX4j7OnHF6sBuf/8LVWz4KazSqEU0ohazBJmafgJpH2EA5pXMXR4 frZDWeanGmhj1MjMkta/++uvEBU/TMpW2z98mZvjErteXdnRQAlII/hOCI+QZJvg viQ5t1EyrkyXzUePOjs+AwqA5KHWbCKt6QqyItQ78HvI23sw/fuvHj0G67KbVzXZ VcSrVr0J7PXnZV/hWfg+C+Nn9Ro9tsVdn79awLYVQ7/SDro3hzylpcHMQaHMK2oe mX4D2kNq7s21E27Zb6vlknUhQPkMdETk0gfEbpn7sTVMEc58GRLC7Tqfx7l0JIFK 5izVyA5vndKVxDGYPkbDK6VL2uDg4A== =+Epy -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-bitmaps-2021-02-12' into staging bitmaps patches for 2021-02-12 - add 'transform' member to manipulate bitmaps across migration - work towards better error handling during bdrv_open # gpg: Signature made Fri 12 Feb 2021 23:19:39 GMT # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-bitmaps-2021-02-12: block: use return status of bdrv_append() block: return status from bdrv_append and friends qemu-iotests: 300: Add test case for modifying persistence of bitmap migration: dirty-bitmap: Allow control of bitmap persistence migration: dirty-bitmap: Use struct for alias map inner members Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-02-13 21:26:00 +00:00
Vladimir Sementsov-Ogievskiy	934aee14d3	block: use return status of bdrv_append() Now bdrv_append returns status and we can drop all the local_err things around it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20210202124956.63146-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 15:39:44 -06:00
Vladimir Sementsov-Ogievskiy	ff789bf5a9	block/backup: implement .cancel job handler Cancel in-flight io on target to not waste the time. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210205163720.887197-10-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 12:17:08 -06:00
Vladimir Sementsov-Ogievskiy	521ff8b779	block/mirror: implement .cancel job handler Cancel in-flight io on target to not waste the time. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210205163720.887197-6-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 11:29:40 -06:00
Vladimir Sementsov-Ogievskiy	3fc1ec3725	block/raw-format: implement .bdrv_cancel_in_flight handler We are going to cancel in-flight requests on mirror nbd target on job cancel. Still nbd is often used not directly but as raw-format child. So, add pass-through handler here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210205163720.887197-4-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 09:45:18 -06:00
Vladimir Sementsov-Ogievskiy	c4f7f24e1f	block/nbd: implement .bdrv_cancel_in_flight Just stop waiting for connection in existing requests. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210205163720.887197-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 09:45:18 -06:00
Vladimir Sementsov-Ogievskiy	bd54669a4a	block: add new BlockDriver handler: bdrv_cancel_in_flight It will be used to stop retrying NBD requests on mirror cancel. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210205163720.887197-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 09:45:18 -06:00
Daniel P. Berrangé	3d3e9b1f66	block: rename and alter bdrv_all_find_snapshot semantics Currently bdrv_all_find_snapshot() will return 0 if it finds a snapshot, -1 if an error occurs, or if it fails to find a snapshot. New callers to be added want to distinguish between the error scenario and failing to find a snapshot. Rename it to bdrv_all_has_snapshot and make it return -1 on error, 0 if no snapshot is found and 1 if snapshot is found. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-7-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	c22d644ca7	block: allow specifying name of block device for vmstate storage Currently the vmstate will be stored in the first block device that supports snapshots. Historically this would have usually been the root device, but with UEFI it might be the variable store. There needs to be a way to override the choice of block device to store the state in. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-6-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	cf3a74c94f	block: add ability to specify list of blockdevs during snapshot When running snapshot operations, there are various rules for which blockdevs are included/excluded. While this provides reasonable default behaviour, there are scenarios that are not well handled by the default logic. Some of the conditions do not have a single correct answer. Thus there needs to be a way for the mgmt app to provide an explicit list of blockdevs to perform snapshots across. This can be achieved by passing a list of node names that should be used. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-5-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	e26f98e209	block: push error reporting into bdrv_all__snapshot functions The bdrv_all__snapshot functions return a BlockDriverState pointer for the invalid backend, which the callers then use to report an error message. In some cases multiple callers are reporting the same error message, but with slightly different text. In the future there will be more error scenarios for some of these methods, which will benefit from fine grained error message reporting. So it is helpful to push error reporting down a level. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> [PMD: Initialize variables] Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210204124834.774401-2-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Roman Kagan	ddde5ee769	block/nbd: only enter connection coroutine if it's present When an NBD block driver state is moved from one aio_context to another (e.g. when doing a drain in a migration thread), nbd_client_attach_aio_context_bh is executed that enters the connection coroutine. However, the assumption that ->connection_co is always present here appears incorrect: the connection may have encountered an error other than -EIO in the underlying transport, and thus may have decided to quit rather than keep trying to reconnect, and therefore it may have terminated the connection coroutine. As a result an attempt to reassign the client in this state (NBD_CLIENT_QUIT) to a different aio_context leads to a null pointer dereference: #0 qio_channel_detach_aio_context (ioc=0x0) at /build/qemu-gYtjVn/qemu-5.0.1/io/channel.c:452 #1 0x0000562a242824b3 in bdrv_detach_aio_context (bs=0x562a268d6a00) at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6151 #2 bdrv_set_aio_context_ignore (bs=bs@entry=0x562a268d6a00, new_context=new_context@entry=0x562a260c9580, ignore=ignore@entry=0x7feeadc9b780) at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6230 #3 0x0000562a24282969 in bdrv_child_try_set_aio_context (bs=bs@entry=0x562a268d6a00, ctx=0x562a260c9580, ignore_child=<optimized out>, errp=<optimized out>) at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6332 #4 0x0000562a242bb7db in blk_do_set_aio_context (blk=0x562a2735d0d0, new_context=0x562a260c9580, update_root_node=update_root_node@entry=true, errp=errp@entry=0x0) at /build/qemu-gYtjVn/qemu-5.0.1/block/block-backend.c:1989 #5 0x0000562a242be0bd in blk_set_aio_context (blk=<optimized out>, new_context=<optimized out>, errp=errp@entry=0x0) at /build/qemu-gYtjVn/qemu-5.0.1/block/block-backend.c:2010 #6 0x0000562a23fbd953 in virtio_blk_data_plane_stop (vdev=<optimized out>) at /build/qemu-gYtjVn/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292 #7 0x0000562a241fc7bf in virtio_bus_stop_ioeventfd (bus=0x562a260dbf08) at /build/qemu-gYtjVn/qemu-5.0.1/hw/virtio/virtio-bus.c:245 #8 0x0000562a23fefb2e in virtio_vmstate_change (opaque=0x562a260dbf90, running=0, state=<optimized out>) at /build/qemu-gYtjVn/qemu-5.0.1/hw/virtio/virtio.c:3220 #9 0x0000562a2402ebfd in vm_state_notify (running=running@entry=0, state=state@entry=RUN_STATE_FINISH_MIGRATE) at /build/qemu-gYtjVn/qemu-5.0.1/softmmu/vl.c:1275 #10 0x0000562a23f7bc02 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE, send_stop=<optimized out>) at /build/qemu-gYtjVn/qemu-5.0.1/cpus.c:1032 #11 0x0000562a24209765 in migration_completion (s=0x562a260e83a0) at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:2914 #12 migration_iteration_run (s=0x562a260e83a0) at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:3275 #13 migration_thread (opaque=opaque@entry=0x562a260e83a0) at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:3439 #14 0x0000562a2435ca96 in qemu_thread_start (args=<optimized out>) at /build/qemu-gYtjVn/qemu-5.0.1/util/qemu-thread-posix.c:519 #15 0x00007feed31466ba in start_thread (arg=0x7feeadc9c700) at pthread_create.c:333 #16 0x00007feed2e7c41d in __GI___sysctl (name=0x0, nlen=608471908, oldval=0x562a2452b138, oldlenp=0x0, newval=0x562a2452c5e0 <__func__.28102>, newlen=0) at ../sysdeps/unix/sysv/linux/sysctl.c:30 #17 0x0000000000000000 in ?? () Fix it by checking that the connection coroutine is non-null before trying to enter it. If it is null, no entering is needed, as the connection is probably going down anyway. Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210129073859.683063-3-rvkagan@yandex-team.ru> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:12 -06:00
Roman Kagan	3b5e4db673	block/nbd: only detach existing iochannel from aio_context When the reconnect in NBD client is in progress, the iochannel used for NBD connection doesn't exist. Therefore an attempt to detach it from the aio_context of the parent BlockDriverState results in a NULL pointer dereference. The problem is triggerable, in particular, when an outgoing migration is about to finish, and stopping the dataplane tries to move the BlockDriverState from the iothread aio_context to the main loop. If the NBD connection is lost before this point, and the NBD client has entered the reconnect procedure, QEMU crashes: #0 qemu_aio_coroutine_enter (ctx=0x5618056c7580, co=0x0) at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-coroutine.c:109 #1 0x00005618034b1b68 in nbd_client_attach_aio_context_bh ( opaque=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block/nbd.c:164 #2 0x000056180353116b in aio_wait_bh (opaque=0x7f60e1e63700) at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:55 #3 0x0000561803530633 in aio_bh_call (bh=0x7f60d40a7e80) at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:136 #4 aio_bh_poll (ctx=ctx@entry=0x5618056c7580) at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:164 #5 0x0000561803533e5a in aio_poll (ctx=ctx@entry=0x5618056c7580, blocking=blocking@entry=true) at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-posix.c:650 #6 0x000056180353128d in aio_wait_bh_oneshot (ctx=0x5618056c7580, cb=<optimized out>, opaque=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:71 #7 0x000056180345c50a in bdrv_attach_aio_context (new_context=0x5618056c7580, bs=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6172 #8 bdrv_set_aio_context_ignore (bs=bs@entry=0x561805ed4c00, new_context=new_context@entry=0x5618056c7580, ignore=ignore@entry=0x7f60e1e63780) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6237 #9 0x000056180345c969 in bdrv_child_try_set_aio_context ( bs=bs@entry=0x561805ed4c00, ctx=0x5618056c7580, ignore_child=<optimized out>, errp=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6332 #10 0x00005618034957db in blk_do_set_aio_context (blk=0x56180695b3f0, new_context=0x5618056c7580, update_root_node=update_root_node@entry=true, errp=errp@entry=0x0) at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:1989 #11 0x00005618034980bd in blk_set_aio_context (blk=<optimized out>, new_context=<optimized out>, errp=errp@entry=0x0) at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:2010 #12 0x0000561803197953 in virtio_blk_data_plane_stop (vdev=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292 #13 0x00005618033d67bf in virtio_bus_stop_ioeventfd (bus=0x5618056d9f08) at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio-bus.c:245 #14 0x00005618031c9b2e in virtio_vmstate_change (opaque=0x5618056d9f90, running=0, state=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio.c:3220 #15 0x0000561803208bfd in vm_state_notify (running=running@entry=0, state=state@entry=RUN_STATE_FINISH_MIGRATE) at /build/qemu-6MF7tq/qemu-5.0.1/softmmu/vl.c:1275 #16 0x0000561803155c02 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE, send_stop=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/cpus.c:1032 #17 0x00005618033e3765 in migration_completion (s=0x5618056e6960) at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:2914 #18 migration_iteration_run (s=0x5618056e6960) at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3275 #19 migration_thread (opaque=opaque@entry=0x5618056e6960) at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3439 #20 0x0000561803536ad6 in qemu_thread_start (args=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-thread-posix.c:519 #21 0x00007f61085d06ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #22 0x00007f610830641d in sysctl () from /lib/x86_64-linux-gnu/libc.so.6 #23 0x0000000000000000 in ?? () Fix it by checking that the iochannel is non-null before trying to detach it from the aio_context. If it is null, no detaching is needed, and it will get reattached in the proper aio_context once the connection is reestablished. Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210129073859.683063-2-rvkagan@yandex-team.ru> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy	a5215b8fdf	block/io: use int64_t bytes in copy_range We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert now copy_range parameters which are already 64bit to signed type. It's safe as we don't work with requests overflowing BDRV_MAX_LENGTH (which is less than INT64_MAX), and do check the requests in bdrv_co_copy_range_internal() (by bdrv_check_request32(), which calls bdrv_check_request()). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-17-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy	e9e52efdc5	block/io: support int64_t bytes in read/write wrappers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). Now, since bdrv_co_preadv_part() and bdrv_co_pwritev_part() have been updated, update all their wrappers. For all of them type of 'bytes' is widening, so callers are safe. We have update request_fn in blkverify.c simultaneously. Still it's just a pointer to one of bdrv_co_pwritev() or bdrv_co_preadv(), and type is widening for callers of the request_fn anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-16-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:12 -06:00
Vladimir Sementsov-Ogievskiy	37e9403ea8	block/io: support int64_t bytes in bdrv_co_p{read,write}v_part() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_co_preadv_part() and bdrv_co_pwritev_part() and their remaining dependencies now. bdrv_pad_request() is updated simultaneously, as pointer to bytes passed to it both from bdrv_co_pwritev_part() and bdrv_co_preadv_part(). So, all callers of bdrv_pad_request() are updated to pass 64bit bytes. bdrv_pad_request() is already good for 64bit requests, add corresponding assertion. Look at bdrv_co_preadv_part() and bdrv_co_pwritev_part(). Type is widening, so callers are safe. Let's look inside the functions. In bdrv_co_preadv_part() and bdrv_aligned_pwritev() we only pass bytes to other already int64_t interfaces (and some obviously safe calculations), it's OK. In bdrv_co_do_zero_pwritev() aligned_bytes may become large now, still it's passed to bdrv_aligned_pwritev which supports int64_t bytes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-15-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy	8b0c5d7659	block/io: support int64_t bytes in bdrv_aligned_preadv() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_aligned_preadv() now. Make the bytes variable in bdrv_padding_rmw_read() int64_t, as it is only used for pass-through to bdrv_aligned_preadv(). All bdrv_aligned_preadv() callers are safe as type is widening. Let's look inside: - add a new-style assertion that request is good. - callees bdrv_is_allocated(), bdrv_co_do_copy_on_readv() supports int64_t bytes - conversion of bytes_remaining is OK, as we never have requests overflowing BDRV_MAX_LENGTH - looping through bytes_remaining is ok, num is updated to int64_t - for bdrv_driver_preadv we have same limit of max_transfer - qemu_iovec_memset is OK, as bytes+qiov_offset should not overflow qiov->size anyway (thanks to bdrv_check_qiov_request()) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-14-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy	9df5afbdd1	block/io: support int64_t bytes in bdrv_co_do_copy_on_readv() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_co_do_copy_on_readv() now. 'bytes' type widening, so callers are safe. Look at the function itself: bytes, skip_bytes and progress become int64_t. bdrv_round_to_clusters() is OK, cluster_bytes now may be large. trace_bdrv_co_do_copy_on_readv() is OK looping through cluster_bytes is still OK. pnum is still capped to max_transfer, and to MAX_BOUNCE_BUFFER when we are going to do COR operation. Therefor calculations in qemu_iovec_from_buf() and bdrv_driver_preadv() should not change. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-13-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:17:11 -06:00
Vladimir Sementsov-Ogievskiy	fcfd9ade68	block/io: support int64_t bytes in bdrv_aligned_pwritev() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_aligned_pwritev() now and convert the dependencies: bdrv_co_write_req_prepare() and bdrv_co_write_req_finish() to signed type bytes. Conversion of bdrv_co_write_req_prepare() and bdrv_co_write_req_finish() is definitely safe, as all requests in block/io must not overflow BDRV_MAX_LENGTH. Still add assertions. For bdrv_aligned_pwritev() 'bytes' type is widened, so callers are safe. Let's check usage of the parameter inside the function. Passing to bdrv_co_write_req_prepare() and bdrv_co_write_req_finish() is OK. Passing to qemu_iovec_* is OK after new assertion. All other callees are already updated to int64_t. Checking alignment is not changed, offset + bytes and qiov_offset + bytes calculations are safe (thanks to new assertions). max_transfer is kept to be int for now. It has a default of INT_MAX here, and some drivers may rely on it. It's to be refactored later. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-12-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:16:03 -06:00
Vladimir Sementsov-Ogievskiy	5ae07b1410	block/io: support int64_t bytes in bdrv_co_do_pwrite_zeroes() We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_co_do_pwrite_zeroes() now. Callers are safe, as converting int to int64_t is safe. Concentrate on 'bytes' usage in the function (thx to Eric Blake): compute 'int tail' via % 'int alignment' - safe fragmentation loop 'int num' - still fragments with a cap on max_transfer use of 'num' within the loop MIN(bytes, max_transfer) as well as %alignment - still works, so calculations in if (head) {} are safe clamp size by 'int max_write_zeroes' - safe drv->bdrv_co_pwrite_zeroes(int) - safe because of clamping clamp size by 'int max_transfer' - safe buf allocation is still clamped to max_transfer qemu_iovec_init_buf(size_t) - safe because of clamping bdrv_driver_pwritev(uint64_t) - safe Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-11-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:16:03 -06:00
Vladimir Sementsov-Ogievskiy	17abcbeee2	block/io: use int64_t bytes in driver wrappers We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver wrappers parameters which are already 64bit to signed type. Requests in block/io.c must never exceed BDRV_MAX_LENGTH (which is less than INT64_MAX), which makes the conversion to signed 64bit type safe. Add corresponding assertions. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-10-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:16:03 -06:00
Eric Blake	8024726459	block: use int64_t as bytes type in tracked requests We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). All requests in block/io must not overflow BDRV_MAX_LENGTH, all external users of BdrvTrackedRequest already have corresponding assertions, so we are safe. Add some assertions still. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-9-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:14:15 -06:00
Vladimir Sementsov-Ogievskiy	63f4ad1186	block/io: improve bdrv_check_request: check qiov too Operations with qiov add more restrictions on bytes, let's cover it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-8-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy	801625e69d	block/throttle-groups: throttle_group_co_io_limits_intercept(): 64bit bytes The function is called from 64bit io handlers, and bytes is just passed to throttle_account() which is 64bit too (unsigned though). So, let's convert intermediate argument to 64bit too. This patch is a first in the 64-bit-blocklayer series, so we are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). Patch-correctness audit by Eric Blake: Caller has 32-bit, this patch now causes widening which is safe: block/block-backend.c: blk_do_preadv() passes 'unsigned int' block/block-backend.c: blk_do_pwritev_part() passes 'unsigned int' block/throttle.c: throttle_co_pwrite_zeroes() passes 'int' block/throttle.c: throttle_co_pdiscard() passes 'int' Caller has 64-bit, this patch fixes potential bug where pre-patch could narrow, except it's easy enough to trace that callers are still capped at 2G actions: block/throttle.c: throttle_co_preadv() passes 'uint64_t' block/throttle.c: throttle_co_pwritev() passes 'uint64_t' Implementation in question: block/throttle-groups.c throttle_group_co_io_limits_intercept() takes 'unsigned int bytes' and uses it: argument to util/throttle.c throttle_account(uint64_t) All safe: it patches a latent bug, and does not introduce any 64-bit gotchas once throttle_co_p{read,write}v are relaxed, and assuming throttle_account() is not buggy. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20201211183934.169161-7-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy	98ca45494f	block/io: bdrv_pad_request(): support qemu_iovec_init_extended failure Make bdrv_pad_request() honest: return error if qemu_iovec_init_extended() failed. Update also bdrv_padding_destroy() to clean the structure for safety. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-6-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:14:00 -06:00
Vladimir Sementsov-Ogievskiy	f0deecff82	block/io: refactor bdrv_pad_request(): move bdrv_pad_request() up Prepare for the following patch when bdrv_pad_request() will be able to fail. Update the comments. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:00:52 -06:00
Vladimir Sementsov-Ogievskiy	a56ed80c42	block: fix theoretical overflow in bdrv_init_padding() Calculation of sum may theoretically overflow, so use 64bit type and add some good assertions. Use int64_t constantly. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: tweak assertion order] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:00:33 -06:00
Vladimir Sementsov-Ogievskiy	4c002cef0e	util/iov: make qemu_iovec_init_extended() honest Actually, we can't extend the io vector in all cases. Handle possible MAX_IOV and size_t overflows. For now add assertion to callers (actually they rely on success anyway) and fix them in the following patch. Add also some additional good assertions to qemu_iovec_init_slice() while being here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:00:33 -06:00
Vladimir Sementsov-Ogievskiy	69b55e03f7	block: refactor bdrv_check_request: add errp It's better to pass &error_abort than just assert that result is 0: on crash, we'll immediately see the reason in the backtrace. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix iotest 206 fallout] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-03 08:00:33 -06:00
Kevin Wolf	26513a0174	block: Fix VM size column width in bdrv_snapshot_dump() size_to_str() can return a size like "4.24 MiB", with a single digit integer part and two fractional digits. This is eight characters, but commit `b39847a5` changed the format string to only reserve seven characters for the column. This can result in unaligned columns, which in turn changes the output of iotests case 267 because exceeding the column size defeats the attempt to filter the size out of the output (observed with the ppc64 emulator). The resulting change is only a whitespace change, but since commit `f203080b` this is enough for iotests to consider the test failed. Taking a character away from the tag name column and adding it to the VM size column doesn't change anything in the common case (the tag name is left justified, the VM size is right justified), but fixes this case. Fixes: `b39847a505` Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210202155911.179865-1-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-02-02 17:23:55 +01:00
Philippe Mathieu-Daudé	fcc8672aca	block/nvme: Trace NVMe spec version supported by the controller NVMe controllers implement different versions of the spec, and different features of it. It is useful to gather this information when debugging. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210127212137.3482291-3-philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-02-02 17:05:38 +01:00
Philippe Mathieu-Daudé	97b709f32e	block/nvme: Properly display doorbell stride length in trace event Commit `15b2260bef` ("block/nvme: Trace controller capabilities") misunderstood the doorbell stride value from the datasheet, use the correct one. The 'doorbell_scale' variable used few lines later is correct. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210127212137.3482291-2-philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-02-02 17:05:38 +01:00
Peter Maydell	7e7eb9f852	QAPI patches patches for 2021-01-28 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmASY10SHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZT4M4P+gKN64+WaErotLHHsqtiA0aoTwbTFXin OEyR5du0+PjX96qYHHV+ZDn5uxxKI57/SRNooPndjU63sYgAbNApfsu+wUDZC844 qMSlrmyw2+Lw1EIykoLXK49+pEDU0XpVIciL5+zEdtCgjiJRjrOOJ/JRBcKoQNHn UArGNQ8y0D+0i8uXyJjyvQeHdz6KUr9sX1vqwRGMt9axEMDJks0+Si4Zg3z2wlWJ Sc3WsXEhikxK1qkF2/6VsopOgNGB0UUvV6q1GO6ngdqag1Hb6mACzSv9mtIShGjh a2MISBhxF8h4wfO8U5TiS9vBgYR3elA3kRGsn4FOfD3sSilt/SWLPHWXdlO1aL2E TollRPtYBqn2YIYQP1SEp7NIqaWC/QaGkP/mH8Jvv0YlL64RK879lv6KiHKzfvI7 HBD7WGZBwMQqPczuw308tqDTQPKUsPDYoEJAFRywkLry86wL8DBOlkQ0lWUjF06s UQk/i09nhrcNLo0GbmgAOHUVj4m03zLyMW/fYmsQ8xe9/b6GBwJvtm2v5wKwg0HE ixxj4oBIk5YV5Xwt7DKLkT0voPAAgNK13a6ywzbyfsigwaJaO9tLtZ0PMuaT9kgs b/OBdeeIYpFdIT/DlcMWpIFi53VYe0McX8MmprHcMZb1133wk5Z5gk+FAWLMifrw 2ltmoUPoB1dC =djiE -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2021-01-28' into staging QAPI patches patches for 2021-01-28 # gpg: Signature made Thu 28 Jan 2021 07:10:21 GMT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-qapi-2021-01-28: qapi: More complex uses of QAPI_LIST_APPEND qapi: Use QAPI_LIST_APPEND in trivial cases qapi: Introduce QAPI_LIST_APPEND qapi: A couple more QAPI_LIST_PREPEND() stragglers net: Clarify early exit condition Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-01-28 22:43:18 +00:00
Eric Blake	95b3a8c8a8	qapi: More complex uses of QAPI_LIST_APPEND These cases require a bit more thought to review; in each case, the code was appending to a list, but not with a FOOList **tail variable. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210113221013.390592-6-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Flawed change to qmp_guest_network_get_interfaces() dropped] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2021-01-28 08:08:45 +01:00
Eric Blake	c3033fd372	qapi: Use QAPI_LIST_APPEND in trivial cases The easiest spots to use QAPI_LIST_APPEND are where we already have an obvious pointer to the tail of a list. While at it, consistently use the variable name 'tail' for that purpose. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210113221013.390592-5-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2021-01-28 08:08:45 +01:00
Kevin Wolf	86b1cf3227	block: Separate blk_is_writable() and blk_supports_write_perm() Currently, blk_is_read_only() tells whether a given BlockBackend can only be used in read-only mode because its root node is read-only. Some callers actually try to answer a slightly different question: Is the BlockBackend configured to be writable, by taking write permissions on the root node? This can differ, for example, for CD-ROM devices which don't take write permissions, but may be backed by a writable image file. scsi-cd allows write requests to the drive if blk_is_read_only() returns false. However, the write request will immediately run into an assertion failure because the write permission is missing. This patch introduces separate functions for both questions. blk_supports_write_perm() answers the question whether the block node/image file can support writable devices, whereas blk_is_writable() tells whether the BlockBackend is currently configured to be writable. All calls of blk_is_read_only() are converted to one of the two new functions. Fixes: https://bugs.launchpad.net/bugs/1906693 Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210118123448.307825-2-kwolf@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2021-01-27 20:45:20 +01:00
David Edmondson	797e3e3805	block: report errno when flock fcntl fails When a call to fcntl(2) for the purpose of adding file locks fails with an error other than EAGAIN or EACCES, report the error returned by fcntl. EAGAIN or EACCES are elided as they are considered to be common failures, indicating that a conflicting lock is held by another process. No errors are elided when removing file locks. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210113164447.2545785-1-david.edmondson@oracle.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	143a6384f5	block/block-copy: drop unused argument of block_copy() Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-21-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	5b49c2bdc1	block/block-copy: drop unused block_copy_set_progress_callback() Drop unused code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-20-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	71eed4cebe	backup: move to block-copy This brings async request handling and block-status driven chunk sizes to backup out of the box, which improves backup performance. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-18-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	511e7d31bf	block/backup: drop extra gotos from backup_run() Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-17-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	d51590fc3e	block/block-copy: make progress_bytes_callback optional We are going to stop use of this callback in the following commit. Still the callback handling code will be dropped in a separate commit. So, for now let's make it optional. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-16-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	2c59fd833a	qapi: backup: add max-chunk and max-workers to x-perf struct Add new parameters to configure future backup features. The patch doesn't introduce aio backup requests (so we actually have only one worker) neither requests larger than one cluster. Still, formally we satisfy these maximums anyway, so add the parameters now, to facilitate further patch which will really change backup job behavior. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-11-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	a6d23d56df	block/block-copy: add block_copy_cancel Add function to cancel running async block-copy call. It will be used in backup. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-8-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	7e032df0ea	block/block-copy: add ratelimit to block-copy We are going to directly use one async block-copy operation for backup job, so we need rate limiter. We want to maintain current backup behavior: only background copying is limited and copy-before-write operations only participate in limit calculation. Therefore we need one rate limiter for block-copy state and boolean flag for block-copy call state for actual limitation. Note, that we can't just calculate each chunk in limiter after successful copying: it will not save us from starting a lot of async sub-requests which will exceed limit too much. Instead let's use the following scheme on sub-request creation: 1. If at the moment limit is not exceeded, create the request and account it immediately. 2. If at the moment limit is already exceeded, drop create sub-request and handle limit instead (by sleep). With this approach we'll never exceed the limit more than by one sub-request (which pretty much matches current backup behavior). Note also, that if there is in-flight block-copy async call, block_copy_kick() should be used after set-speed to apply new setup faster. For that block_copy_kick() published in this patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	2e099a9d29	block/block-copy: add list of all call-states It simplifies debugging. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-6-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	26be9d62dd	block/block-copy: add max_chunk and max_workers parameters They will be used for backup. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-5-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	de4641b46b	block/block-copy: implement block_copy_async We'll need async block-copy invocation to use in backup directly. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	3b8c2329b5	block/block-copy: More explicit call_state Refactor common path to use BlockCopyCallState pointer as parameter, to prepare it for use in asynchronous block-copy (at least, we'll need to run block-copy in a coroutine, passing the whole parameters as one pointer). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	86c6a3b690	qapi: backup: add perf.use-copy-range parameter Experiments show, that copy_range is not always making things faster. So, to make experimentation simpler, let's add a parameter. Some more perf parameters will be added soon, so here is a new struct. For now, add new backup qmp parameter with x- prefix for the following reasons: - We are going to add more performance parameters, some will be related to the whole block-copy process, some only to background copying in backup (ignored for copy-before-write operations). - On the other hand, we are going to use block-copy interface in other block jobs, which will need performance options as well.. And it should be the same structure or at least somehow related. So, there are too much unclean things about how the interface and now we need the new options mostly for testing. Let's keep them experimental for a while. In do_backup_common() new x-perf parameter handled in a way to make further options addition simpler. We add use-copy-range with default=true, and we'll change the default in further patch, after moving backup to use block-copy. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210116214705.822267-2-vsementsov@virtuozzo.com> [mreitz: s/5\.2/6.0/] Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Andrey Shinkevich	205736f488	block: apply COR-filter to block-stream jobs This patch completes the series with the COR-filter applied to block-stream operations. Adding the filter makes it possible in future implement discarding copied regions in backing files during the block-stream job, to reduce the disk overuse (we need control on permissions). Also, the filter now is smart enough to do copy-on-read with specified base, so we have benefit on guest reads even when doing block-stream of the part of the backing chain. Several iotests are slightly modified due to filter insertion. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201216061703.70908-14-vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	0f6c94988a	block/stream: add s->target_bs Add a direct link to target bs for convenience and to simplify following commit which will insert COR filter above target bs. This is a part of original commit written by Andrey. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201216061703.70908-13-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Vladimir Sementsov-Ogievskiy	7f4a396d76	qapi: block-stream: add "bottom" argument The code already don't freeze base node and we try to make it prepared for the situation when base node is changed during the operation. In other words, block-stream doesn't own base node. Let's introduce a new interface which should replace the current one, which will in better relations with the code. Specifying bottom node instead of base, and requiring it to be non-filter gives us the following benefits: - drop difference between above_base and base_overlay, which will be renamed to just bottom, when old interface dropped - clean way to work with parallel streams/commits on the same backing chain, which otherwise become a problem when we introduce a filter for stream job - cleaner interface. Nobody will surprised the fact that base node may disappear during block-stream, when there is no word about "base" in the interface. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201216061703.70908-11-vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Andrey Shinkevich	000e5a1cda	stream: rework backing-file changing Stream in stream_prepare calls bdrv_change_backing_file() to change backing-file in the metadata of bs. It may use either backing-file parameter given by user or just take filename of base on job start. Backing file format is determined by base on job finish. There are some problems with this design, we solve only two by this patch: 1. Consider scenario with backing-file unset. Current concept of stream supports changing of the base during the job (we don't freeze link to the base). So, we should not save base filename at job start, - let's determine name of the base on job finish. 2. Using direct base to determine filename and format is not very good: base node may be a filter, so its filename may be JSON, and format_name is not good for storing into qcow2 metadata as backing file format. - let's use unfiltered_base Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [vsementsov: change commit subject, change logic in stream_prepare] Message-Id: <20201216061703.70908-10-vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Andrey Shinkevich	e275458b29	copy-on-read: skip non-guest reads if no copy needed If the flag BDRV_REQ_PREFETCH was set, skip idling read/write operations in COR-driver. It can be taken into account for the COR-algorithms optimization. That check is being made during the block stream job by the moment. Add the BDRV_REQ_PREFETCH flag to the supported_read_flags of the COR-filter. block: Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to use it alone and pass it to the COR-filter driver for further processing. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201216061703.70908-9-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Andrey Shinkevich	897dd0ec4f	block: include supported_read_flags into BDS structure Add the new member supported_read_flags to the BlockDriverState structure. It will control the flags set for copy-on-read operations. Make the block generic layer evaluate supported read flags before they go to a block driver. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [vsementsov: use assert instead of abort] Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201216061703.70908-8-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 14:36:37 +01:00
Andrey Shinkevich	e4c8fddde7	qapi: copy-on-read filter: add 'bottom' option Add an option to limit copy-on-read operations to specified sub-chain of backing-chain, to make copy-on-read filter useful for block-stream job. Suggested-by: Max Reitz <mreitz@redhat.com> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [vsementsov: change subject, modified to freeze the chain, do some fixes] Message-Id: <20201216061703.70908-6-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 11:26:54 +01:00
Andrey Shinkevich	880747a887	qapi: add filter-node-name to block-stream Provide the possibility to pass the 'filter-node-name' parameter to the block-stream job as it is done for the commit block job. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [vsementsov: comment indentation, s/Since: 5.2/Since: 6.0/] Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201216061703.70908-5-vsementsov@virtuozzo.com> [mreitz: s/commit/stream/] Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 11:26:54 +01:00
Andrey Shinkevich	16e09a21af	copy-on-read: add filter drop function Provide API for the COR-filter removal. Also, drop the filter child permissions for an inactive state when the filter node is being removed. To insert the filter, the block generic layer function bdrv_insert_node() can be used. The new function bdrv_cor_filter_drop() may be considered as an intermediate solution before the QEMU permission update system has overhauled. Then we are able to implement the API function bdrv_remove_node() on the block generic layer. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201216061703.70908-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 11:26:54 +01:00
Andrey Shinkevich	1252e03b8e	copy-on-read: support preadv/pwritev_part functions Add support for the recently introduced functions bdrv_co_preadv_part() and bdrv_co_pwritev_part() to the COR-filter driver. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201216061703.70908-2-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2021-01-26 11:26:54 +01:00
Peter Maydell	45240eed4f	Yank patches patches for 2021-01-13 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/+vJoSHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZTyv8P/3Dqb9sM8p2WIeoUt5KjgkgWto8anCXM /vzAvVdOrPcLXgHF1HOEkcYkp5ZDzFb1PP+LNWsbIB82HmGUGfA7CiOpCpXEoDJs Z9OYR3K8W5fvSKYTI/m+s7d+9aYqRZajI6ON5M4Eqem0ZwV93/SZMBHOcs1GmvIR diXIztaWVgHjU1Q37MlvJTM4lLN3RH1kTWEdfp3dkNMO6HxBet0B1g7xwaPxKrgb 4y8D/kk9TA1m4wnwrr9s1l3UnfDiZ7mSfsEsXMcTMmQQrAtonD/xiX2YFsHwf6+U 9cX9BCG2XP65t3ynY5goddcjVX3R6SuP4YWSgYUpJGrSx+GVxXrtF0ZumgiCk+4T uv8sOkJdPe9/aRy86UkNzJx7V50nCZnJ2if9neukVjHGW4Hw4txcYV5/7ZuuDqKl tF5NdF/LcHEZLKkBt+4g4TpJ7vxEmJc8/ukn9niLT381SkRBFnnP+Bnl9taSlHOY xNtQJ3Jrcd/cvBjAPCKgtE4Fx6wzQG3c7Yg4WHxbZzcYZBPp4fifUTLCK5XKHqhb rlqCQIO9DzGz5tqOgG7hWmlodSafGiDsHo9tJVpyF5pSSUv3A4KzX2xW4FZZLKJn 7uBrcV0bLmR4tyw+fr+u2EW0ClYrs/JxeXAnsnTp9JrzkXILf5RjEuK0Sc1ZIuZW cmuPa8027ybj =fLO1 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-yank-2021-01-13' into staging Yank patches patches for 2021-01-13 # gpg: Signature made Wed 13 Jan 2021 09:25:46 GMT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-yank-2021-01-13: tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown io/channel-tls.c: make qio_channel_tls_shutdown thread-safe migration: Add yank feature chardev/char-socket.c: Add yank feature block/nbd.c: Add yank feature Introduce yank feature Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-01-13 14:19:24 +00:00
Lukas Straub	fee091cdff	block/nbd.c: Add yank feature Register a yank function which shuts down the socket and sets s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an error occured. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <b73eb07db6d1fcd00667beb13ae6117260f002c3.1609167865.git.lukasstraub2@web.de> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2021-01-13 10:21:17 +01:00
Roman Bolshakov	3eacf70bb5	meson: Propagate gnutls dependency crypto/tlscreds.h includes GnuTLS headers if CONFIG_GNUTLS is set, but GNUTLS_CFLAGS, that describe include path, are not propagated transitively to all users of crypto and build fails if GnuTLS headers reside in non-standard directory (which is a case for homebrew on Apple Silicon). Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com> Message-Id: <20210102125213.41279-1-r.bolshakov@yadro.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-12 12:38:03 +01:00
Peter Maydell	729cc68373	Remove superfluous timer_del() calls This commit is the result of running the timer-del-timer-free.cocci script on the whole source tree. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20201215154107.3255-4-peter.maydell@linaro.org	2021-01-08 15:13:38 +00:00
Paolo Bonzini	9db405a335	libiscsi: convert to meson Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-02 21:03:37 +01:00
Paolo Bonzini	8e4e2b551d	curl: remove compatibility code, require 7.29.0 cURL 7.16.0 was released in October 2006. Just remove code that is in all likelihood not being used anywhere, and require the oldest version found in currently supported distros, which is 7.29.0 from CentOS 7. pkg-config is enough for QEMU, since it does not need extra information such as the path for certicate authorities. All supported platforms today will all have pkg-config for curl, so we can drop curl-config. Suggested-by: Daniel Berrangé <berrange@redhat.com> Reviewed-by: Daniel Berrangé <berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-02 21:03:37 +01:00
Paolo Bonzini	2f2a376a42	meson: use dependency to gate block modules This allows converting the dependencies to meson options one by one. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-02 21:03:36 +01:00
Peter Maydell	1f7c02797f	QAPI patches patches for 2020-12-19 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/dynUSHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZT3igP/3bWwsKR5vKVsDUTmMfrhcgaFvQiaYoG F29Bond8Xy0Zd0gl7OWh/5jKL0vGlrEVPrKfYLUjMnfkeRec/pOkIB2oOmIxpnPs 9zi4kh2hQ3dEoRBuvSnnZzedetYPTuCpWMIjlztkgfxgcimqm8TPNVSxRaSApjC3 Y8108wGwBWVf2C0rhKO9E2xA51uo6khy05i1psUtqUlC+PuDQ/OwzQHM2dnWdDB6 kUwBDK17nhL6WwsYqCyKLSiDModReYfDiY8GS5MDLo74dzwXiatEefCR7+sbM4xq eX/SBoqoeS1jLPNuCryNeGNKvNA2KAbEJTnbQA2NxBXHgZ9/1SxVZFxuPp4nDMSQ N7BDuDI8YtJE479RjT/ZzRG65xadGBSe/HXkXM9mZwh1zitop8SVZ9fArFBHvNzw Y5zAv3fQd54+87psffg4dYFK0wGmqTabLEEuVzM8KIVqcAdYA2yC2b2EHy+vsxuq GMkr0WaA6Sq2gthXmzdTjmUPuHdan/NIhuV6d66SbPNH2oH31piptFxuznyFWSKV isciFFdUrkg5QrF8DSt2nmdwMFf8QGbszqP8QIGMzhJCCS9GXIiGG8f149++q8X8 HO1lFAdLQJdrDwCYmfx36tOvi2rS/rcoTGgvg66UX3xKko1ruoxR1ZWcS54obJN6 vEQDZ+PxubDg =vGLy -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2020-12-19' into staging QAPI patches patches for 2020-12-19 # gpg: Signature made Sat 19 Dec 2020 09:40:05 GMT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-qapi-2020-12-19: (33 commits) qobject: Make QString immutable block: Use GString instead of QString to build filenames keyval: Use GString to accumulate value strings json: Use GString instead of QString to accumulate strings migration: Replace migration's JSON writer by the general one qobject: Factor JSON writer out of qobject_to_json() qobject: Factor quoted_str() out of to_json() qobject: Drop qstring_get_try_str() qobject: Drop qobject_get_try_str() Revert "qobject: let object_property_get_str() use new API" block: Avoid qobject_get_try_str() qmp: Fix tracing of non-string command IDs qobject: Move internals to qobject-internal.h hw/rdma: Replace QList by GQueue Revert "qstring: add qstring_free()" qobject: Change qobject_to_json()'s value to GString qobject: Use GString instead of QString to accumulate JSON qobject: Make qobject_to_json_pretty() take a pretty argument monitor: Use GString instead of QString for output buffer hmp: Simplify how qmp_human_monitor_command() gets output ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-01-01 14:33:03 +00:00
Peter Maydell	26f6b15e26	Block patches: - New block filter: preallocate (which, on writes beyond an image file's end, allocates big chunks of data so that such post-EOF writes will occur less frequently) - write-zeroes and block-status support for Quorum - Implementation of truncate for the nvme block driver similarly to the existing implementations for host block devices and iscsi devices - Block layer refactoring: Drop the tighten_restrictions concept in the block permission functions - iotest fixes -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl/cwIoSHG1yZWl0ekBy ZWRoYXQuY29tAAoJEPQH2wBh1c9AnvMH+gOnZCwEUKWuBxGX3Wjb/kqV1OuhAhcP IVrKLRnqdarCYMQ9M4SZL6pedfsujHA7vClTV7NTrenXBsEIradBQ59ztQ0oDirS 4ipIjVtNqj7m86l+IRZDq5HlwOYwwFnWogmLo2bcmNJGLpPQQfrhL2vRJ1wLgFYk WjeAVlkkYcHnTIDvs4ne9WRSlxGVBWJ4X5nSlRdZqeyUcMY9v4wL4P9Wc4ZuORmq /5HRcT5JKGaT2bAueaqAGEdtPFGbazEP5uU7MTTK/fueDKIRAXO2d0gqhANtOOJQ 7hMmKhwOPOOhrrpCVi9nxsVwdCOHfurV0km6cOs+Iprm/Wm2UtuS/A8= =z+7k -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2020-12-18' into staging Block patches: - New block filter: preallocate (which, on writes beyond an image file's end, allocates big chunks of data so that such post-EOF writes will occur less frequently) - write-zeroes and block-status support for Quorum - Implementation of truncate for the nvme block driver similarly to the existing implementations for host block devices and iscsi devices - Block layer refactoring: Drop the tighten_restrictions concept in the block permission functions - iotest fixes # gpg: Signature made Fri 18 Dec 2020 14:45:30 GMT # gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40 # gpg: issuer "mreitz@redhat.com" # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full] # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * remotes/maxreitz/tags/pull-block-2020-12-18: (30 commits) iotests: Fix _send_qemu_cmd with bash 5.1 iotests/102: Pass $QEMU_HANDLE to _send_qemu_cmd block/nvme: Implement fake truncate() coroutine quorum: Implement bdrv_co_pwrite_zeroes() quorum: Implement bdrv_co_block_status() scripts/simplebench: add bench_prealloc.py simplebench/results_to_text: make executable simplebench/results_to_text: add difference line to the table simplebench/results_to_text: improve view of the table simplebench: move results_to_text() into separate file simplebench: rename ascii() to results_to_text() scripts/simplebench: use standard deviation for +- error scripts/simplebench: support iops scripts/simplebench: fix grammar: s/successed/succeeded/ iotests: add 298 to test new preallocate filter driver iotests.py: execute_setup_common(): add required_fmts argument iotests: qemu_io_silent: support --image-opts qemu-io: add preallocate mode parameter for truncate command block: introduce preallocate filter block: bdrv_check_perm(): process children anyway ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-12-31 23:26:46 +00:00
Markus Armbruster	eab3a4678b	qobject: Change qobject_to_json()'s value to GString qobject_to_json() and qobject_to_json_pretty() build a GString, then covert it to QString. Just one of the callers actually needs a QString: qemu_rbd_parse_filename(). A few others need a string they can modify: qmp_send_response(), qga's send_response(), to_json_str(), and qmp_fd_vsend_fds(). The remainder just need a string. Change qobject_to_json() and qobject_to_json_pretty() to return the GString. qemu_rbd_parse_filename() now has to convert to QString. All others save a QString temporary. to_json_str() actually becomes a bit simpler, because GString provides more convenient modification functions. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201211171152.146877-6-armbru@redhat.com>	2020-12-19 10:38:43 +01:00
Eric Blake	54aa3de72e	qapi: Use QAPI_LIST_PREPEND() where possible Anywhere we create a list of just one item or by prepending items (typically because order doesn't matter), we can use QAPI_LIST_PREPEND(). But places where we must keep the list in order by appending remain open-coded until later patches. Note that as a side effect, this also performs a cleanup of two minor issues in qga/commands-posix.c: the old code was performing new = g_malloc0(sizeof(*ret)); which 1) is confusing because you have to verify whether 'new' and 'ret' are variables with the same type, and 2) would conflict with C++ compilation (not an actual problem for this file, but makes copy-and-paste harder). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201113011340.463563-5-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> [Straightforward conflicts due to commit `a8aa94b5f8` "qga: update schema for guest-get-disks 'dependents' field" and commit `a10b453a52` "target/mips: Move mips_cpu_add_definition() from helper.c to cpu.c" resolved. Commit message tweaked.] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-12-19 10:20:14 +01:00
Markus Armbruster	be7c5ddd0d	block/vpc: Use sizeof() instead of HEADER_SIZE for footer size Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-10-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:43:30 +01:00
Markus Armbruster	a3d2761719	block/vpc: Pass footer buffers as VHDFooter * instead of uint8_t * Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-9-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:43:28 +01:00
Markus Armbruster	275734e479	block/vpc: Pad VHDFooter, replace uint8_t[] buffers Pad VHDFooter as specified in the "Virtual Hard Disk Image Format Specification" version 1.0[]. Change footer buffers from uint8_t[HEADER_SIZE] to VHDFooter. Their size remains the same. The VHDFooter variables pointing to a VHDFooter variable right next to it are now silly. Eliminate them, and shorten the remaining variables' names. Most variables pointing to s->footer are now also silly. Eliminate them, too. [*] http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-8-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:43:26 +01:00
Markus Armbruster	3d6101a3f2	block/vpc: Use sizeof() instead of 1024 for dynamic header size Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-7-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:43:23 +01:00
Markus Armbruster	e326f0783e	block/vpc: Pad VHDDynDiskHeader, replace uint8_t[] buffers Pad VHDDynDiskHeader as specified in the "Virtual Hard Disk Image Format Specification" version 1.0[]. Change dynamic disk header buffers from uint8_t[1024] to VHDDynDiskHeader. Their size remains the same. The VHDDynDiskHeader variables pointing to a VHDDynDiskHeader variable right next to it are now silly. Eliminate them. [*] http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-6-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:43:18 +01:00
Markus Armbruster	7550379ded	block/vpc: Make vpc_checksum() take void * Some of the next commits will checksum structs. Change vpc_checksum() to take void * instead of uint8_t, to save us pointless casts to uint8_t *. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-5-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:43:16 +01:00
Markus Armbruster	a18dc3a14d	block/vpc: Don't abuse the footer buffer for dynamic header create_dynamic_disk() takes a buffer holding the footer as first argument. It writes out the footer (512 bytes), then reuses the buffer to initialize and write out the dynamic header (1024 bytes). Works, because the caller passes a buffer that is large enough for both purposes. I hate that. Use a separate buffer for the dynamic header, and adjust the caller's buffer. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-4-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:43:14 +01:00
Markus Armbruster	b0ce8cb0e8	block/vpc: Don't abuse the footer buffer as BAT sector buffer create_dynamic_disk() takes a buffer holding the footer as first argument. It writes out the footer (512 bytes), then reuses the buffer to initialize and write out the dynamic header (1024 bytes), then reuses it again to initialize and write out BAT sectors (512). Works, because the caller passes a buffer that is large enough for all three purposes. I hate that. Use a separate buffer for writing out BAT sectors. The next commit will do the same for the dynamic header. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-3-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:43:06 +01:00
Markus Armbruster	02df95c4a1	block/vpc: Make vpc_open() read the full dynamic header The dynamic header's size is 1024 bytes. vpc_open() reads only the 512 bytes of the dynamic header into buf[]. Works, because it doesn't actually access the second half. However, a colleague told me that GCC 11 warns: ../block/vpc.c:358:51: error: array subscript 'struct VHDDynDiskHeader[0]' is partly outside array bounds of 'uint8_t[512]' [-Werror=array-bounds] Clean up to read the full header. Rename buf[] to dyndisk_header_buf[] while there. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-2-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 12:42:34 +01:00
Philippe Mathieu-Daudé	c8807c5edc	block/nvme: Implement fake truncate() coroutine NVMe drive cannot be shrunk. Since commit `c80d8b06cf` we can use the @exact parameter (set to false) to return success if the block device is larger than the requested offset (even if we can not be shrunk). Use this parameter to implement the NVMe truncate() coroutine, similarly how it is done for the iscsi and file-posix drivers (see commit `82325ae5f2` "Evaluate @exact in protocol drivers"). Reported-by: Xueqiang Wei <xuwei@redhat.com> Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20201210125202.858656-1-philmd@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Alberto Garcia	5cddb2e95f	quorum: Implement bdrv_co_pwrite_zeroes() This simply calls bdrv_co_pwrite_zeroes() in all children. bs->supported_zero_flags is also set to the flags that are supported by all children. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <2f09c842781fe336b4c2e40036bba577b7430190.1605286097.git.berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Alberto Garcia	ef9bba1484	quorum: Implement bdrv_co_block_status() The quorum driver does not implement bdrv_co_block_status() and because of that it always reports to contain data even if all its children are known to be empty. One consequence of this is that if we for example create a quorum with a size of 10GB and we mirror it to a new image the operation will write 10GB of actual zeroes to the destination image wasting a lot of time and disk space. Since a quorum has an arbitrary number of children of potentially different formats there is no way to report all possible allocation status flags in a way that makes sense, so this implementation only reports when a given region is known to contain zeroes (BDRV_BLOCK_ZERO) or not (BDRV_BLOCK_DATA). If all children agree that a region contains zeroes then we can return BDRV_BLOCK_ZERO using the smallest size reported by the children (because all agree that a region of at least that size contains zeroes). If at least one child disagrees we have to return BDRV_BLOCK_DATA. In this case we use the largest of the sizes reported by the children that didn't return BDRV_BLOCK_ZERO (because we know that there won't be an agreement for at least that size). Signed-off-by: Alberto Garcia <berto@igalia.com> Tested-by: Tao Xu <tao3.xu@intel.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <db83149afcf0f793effc8878089d29af4c46ffe1.1605286097.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	33fa2222eb	block: introduce preallocate filter It's intended to be inserted between format and protocol nodes to preallocate additional space (expanding protocol file) on writes crossing EOF. It improves performance for file-systems with slow allocation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201021145859.11201-9-vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> [mreitz: Two comment fixes, and bumped the version from 5.2 to 6.0] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	d1a764d126	block: introduce BDRV_REQ_NO_WAIT flag Add flag to make serialising request no wait: if there are conflicting requests, just return error immediately. It's will be used in upcoming preallocate filter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201021145859.11201-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	8ac5aab255	block: bdrv_mark_request_serialising: split non-waiting function We'll need a separate function, which will only "mark" request serialising with specified align but not wait for conflicting requests. So, it will be like old bdrv_mark_request_serialising(), before merging bdrv_wait_serialising_requests_locked() into it. To reduce the possible mess, let's do the following: Public function that does both marking and waiting will be called bdrv_make_request_serialising, and private function which will only "mark" will be called tracked_request_set_serialising(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201021145859.11201-6-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	ec1c886831	block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg bs is linked in req, so no needs to pass it separately. Most of tracked-requests API doesn't have bs argument. Actually, after this patch only tracked_request_begin has it, but it's for purpose. While being here, also add a comment about what "_locked" is. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201021145859.11201-5-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	3183937ff9	block/io: split out bdrv_find_conflicting_request To be reused in separate. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201021145859.11201-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Vladimir Sementsov-Ogievskiy	2e36da62cf	block/io.c: drop assertion on double waiting for request serialisation The comments states, that on misaligned request we should have already been waiting. But for bdrv_padding_rmw_read, we called bdrv_mark_request_serialising with align = request_alignment, and now we serialise with align = cluster_size. So we may have to wait again with larger alignment. Note, that the only user of BDRV_REQ_SERIALISING is backup which issues cluster-aligned requests, so seems the assertion should not fire for now. But it's wrong anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20201021145859.11201-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-12-18 12:35:55 +01:00
Peter Lieven	182454dc63	block/nfs: fix int overflow in nfs_client_open_qdict nfs_client_open returns the file size in sectors. This effectively makes it impossible to open files larger than 1TB. Fixes: `c22a034545` Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <20201209121735.16437-1-pl@kamp.de> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-18 11:48:39 +01:00
Pan Nengyuan	cb8d0851f1	block/file-posix: fix a possible undefined behavior local_err is not initialized to NULL, it will cause a assert error as below: qemu/util/error.c:59: error_setv: Assertion `*errp == NULL' failed. Fixes: `c644751069` Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Message-Id: <20201023061218.2080844-8-kuhn.chenqun@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-12-13 23:56:16 +01:00
Kevin Wolf	960d5fb3e8	block: Fix deadlock in bdrv_co_yield_to_drain() If bdrv_co_yield_to_drain() is called for draining a block node that runs in a different AioContext, it keeps that AioContext locked while it yields and schedules a BH in the AioContext to do the actual drain. As long as executing the BH is the very next thing that the event loop of the node's AioContext does, this actually happens to work, but when it tries to execute something else that wants to take the AioContext lock, it will deadlock. (In the bug report, this other thing is a virtio-scsi device running virtio_scsi_data_plane_handle_cmd().) Instead, always drop the AioContext lock across the yield and reacquire it only when the coroutine is reentered. The BH needs to unconditionally take the lock for itself now. This fixes the 'block_resize' QMP command on a block node that runs in an iothread. Cc: qemu-stable@nongnu.org Fixes: `eb94b81a94` Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1903511 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20201203172311.68232-4-kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy	8b1170012b	block: introduce BDRV_MAX_LENGTH We are going to modify block layer to work with 64bit requests. And first step is moving to int64_t type for both offset and bytes arguments in all block request related functions. It's mostly safe (when widening signed or unsigned int to int64_t), but switching from uint64_t is questionable. So, let's first establish the set of requests we want to work with. First signed int64_t should be enough, as off_t is signed anyway. Then, obviously offset + bytes should not overflow. And most interesting: (offset + bytes) being aligned up should not overflow as well. Aligned to what alignment? First thing that comes in mind is bs->bl.request_alignment, as we align up request to this alignment. But there is another thing: look at bdrv_mark_request_serialising(). It aligns request up to some given alignment. And this parameter may be bdrv_get_cluster_size(), which is often a lot greater than bs->bl.request_alignment. Note also, that bdrv_mark_request_serialising() uses signed int64_t for calculations. So, actually, we already depend on some restrictions. Happily, bdrv_get_cluster_size() returns int and bs->bl.request_alignment has 32bit unsigned type, but defined to be a power of 2 less than INT_MAX. So, we may establish, that INT_MAX is absolute maximum for any kind of alignment that may occur with the request. Note, that bdrv_get_cluster_size() is not documented to return power of 2, still bdrv_mark_request_serialising() behaves like it is. Also, backup uses bdi.cluster_size and is not prepared to it not being power of 2. So, let's establish that Qemu supports only power-of-2 clusters and alignments. So, alignment can't be greater than 2^30. Finally to be safe with calculations, to not calculate different maximums for different nodes (depending on cluster size and request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30) as absolute maximum bytes length for Qemu. Actually, it's not much less than INT64_MAX. OK, then, let's apply it to block/io. Let's consider all block/io entry points of offset/bytes: 4 bytes/offset interface functions: bdrv_co_preadv_part(), bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and bdrv_co_pdiscard() and we check them all with bdrv_check_request(). We also have one entry point with only offset: bdrv_co_truncate(). Check the offset. And one public structure: BdrvTrackedRequest. Happily, it has only three external users: file-posix.c: adopted by this patch write-threshold.c: only read fields test-write-threshold.c: sets obviously small constant values Better is to make the structure private and add corresponding interfaces.. Still it's not obvious what kind of interface is needed for file-posix.c. Let's keep it public but add corresponding assertions. After this patch we'll convert functions in block/io.c to int64_t bytes and offset parameters. We can assume that offset/bytes pair always satisfy new restrictions, and make corresponding assertions where needed. If we reach some offset/bytes point in block/io.c missing bdrv_check_request() it is considered a bug. As well, if block/io.c modifies a offset/bytes request, expanding it more then aligning up to request_alignment, it's a bug too. For all io requests except for discard we keep for now old restriction of 32bit request length. iotest 206 output error message changed, as now test disk size is larger than new limit. Add one more test case with new maximum disk size to cover too-big-L1 case. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy	f4dad307ef	block/io: bdrv_check_byte_request(): drop bdrv_is_inserted() Move bdrv_is_inserted() calls into callers. We are going to make bdrv_check_byte_request() a clean thing. bdrv_is_inserted() is not about checking the request, it's about checking the bs. So, it should be separate. With this patch we probably change error path for some failure scenarios. But depending on the fact that querying too big request on empty cdrom (or corrupted qcow2 node with no drv) will result in EIO and not ENOMEDIUM would be very strange. More over, we are going to move to 64bit requests, so larger requests will be allowed anyway. More over, keeping in mind that cdrom is the only driver that has .bdrv_is_inserted() handler it's strange that we should care so much about it in generic block layer, intuitively we should just do read and write, and cdrom driver should return correct errors if it is not inserted. But it's a work for another series. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-4-vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy	33985614bd	block/io: bdrv_refresh_limits(): use ERRP_GUARD This simplifies following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-3-vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Vladimir Sementsov-Ogievskiy	9b100af30f	block/file-posix: fix workaround in raw_do_pwrite_zeroes() We should not set overlap_bytes: 1. Don't worry: it is calculated by bdrv_mark_request_serialising() and will be equal to or greater than bytes anyway. 2. If the request was already aligned up to some greater alignment, than we may break things: we reduce overlap_bytes, and further bdrv_mark_request_serialising() may not help, as it will not restore old bigger alignment. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Li Feng	eb43ea16dc	file-posix: check the use_lock before setting the file lock The scenario is that when accessing a volume on an NFS filesystem without supporting the file lock, Qemu will complain "Failed to lock byte 100", even when setting the file.locking = off. We should do file lock related operations only when the file.locking is enabled, otherwise, the syscall of 'fcntl' will return non-zero. Signed-off-by: Li Feng <fengli@smartx.com> Message-Id: <1607341446-85506-1-git-send-email-fengli@smartx.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Max Reitz	df4ea7091b	fuse: Implement hole detection through lseek This is a relatively new feature in libfuse (available since 3.8.0, which was released in November 2019), so we have to add a dedicated check whether it is available before making use of it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-7-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Max Reitz	4ca37a96a7	fuse: (Partially) implement fallocate() This allows allocating areas after the (old) EOF as part of a growing resize, writing zeroes, and discarding. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Max Reitz	4fba06d594	fuse: Allow growable exports These will behave more like normal files in that writes beyond the EOF will automatically grow the export size. As an optimization, keep the RESIZE permission for growable exports so we do not have to take it for every post-EOF write. (This permission is not released when the export is destroyed, because at that point the BlockBackend is destroyed altogether anyway.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-5-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:40 +01:00
Max Reitz	41429e3d79	fuse: Implement standard FUSE operations This makes the export actually useful instead of only producing errors whenever it is accessed. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:39 +01:00
Max Reitz	0c9b70d590	fuse: Allow exporting BDSs via FUSE block-export-add type=fuse allows mounting block graph nodes via FUSE on some existing regular file. That file should then appears like a raw disk image, and accesses to it result in accesses to the exported BDS. Right now, we only implement the necessary block export functions to set it up and shut it down. We do not implement any access functions, so accessing the mount point only results in errors. This will be addressed by a followup patch. We keep a hash table of exported mount points, because we want to be able to detect when users try to use a mount point twice. This is because we invoke stat() to check whether the given mount point is a regular file, but if that file is served by ourselves (because it is already used as a mount point), then this stat() would have to be served by ourselves, too, which is impossible to do while we (as the caller) are waiting for it to settle. Therefore, keep track of mount point paths to at least catch the most obvious instances of that problem. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:39 +01:00
Gan Qixin	c208b0ef96	block/iscsi: Use lock guard macros Replace manual lock()/unlock() calls with lock guard macros (QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/iscsi. Signed-off-by: Gan Qixin <ganqixin@huawei.com> Message-Id: <20201203075055.127773-5-ganqixin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:39 +01:00
Gan Qixin	3af613ebdb	block/throttle-groups: Use lock guard macros Replace manual lock()/unlock() calls with lock guard macros (QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/throttle-groups. Signed-off-by: Gan Qixin <ganqixin@huawei.com> Message-Id: <20201203075055.127773-4-ganqixin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:39 +01:00
Gan Qixin	f5056b70e6	block/curl: Use lock guard macros Replace manual lock()/unlock() calls with lock guard macros (QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/curl. Signed-off-by: Gan Qixin <ganqixin@huawei.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20201203075055.127773-3-ganqixin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:39 +01:00
Gan Qixin	c37c973660	block/accounting: Use lock guard macros Replace manual lock()/unlock() calls with lock guard macros (QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/accounting. Signed-off-by: Gan Qixin <ganqixin@huawei.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20201203075055.127773-2-ganqixin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-12-11 17:52:39 +01:00
Markus Armbruster	6cc0667d9b	Tweak a few "Parameter 'NAME' expects THING" error message Change to "expects a THING" where that's an obvious improvement Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201113082626.2725812-11-armbru@redhat.com>	2020-12-10 17:16:44 +01:00
Stefan Hajnoczi	552c2c4c10	block/export: avoid g_return_val_if() input validation Do not validate input with g_return_val_if(). This API is intended for checking programming errors and is compiled out with -DG_DISABLE_CHECKS. Use an explicit if statement for input validation so it cannot accidentally be compiled out. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201118091644.199527-5-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-12-09 13:04:17 -05:00
Marc-André Lureau	0df750e9d3	libvhost-user: make it a meson subproject By making libvhost-user a subproject, check it builds standalone (without the global QEMU cflags etc). Note that the library still relies on QEMU include/qemu/atomic.h and linux_headers/. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20201125100640.366523-6-marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-12-08 13:48:58 -05:00
Maxim Levitsky	c8bf9a9169	qcow2: Fix corruption on write_zeroes with MAY_UNMAP Commit `205fa50750` ("qcow2: Add subcluster support to zero_in_l2_slice()") introduced a subtle change to code in zero_in_l2_slice: It swapped the order of 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); To 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); It seems harmless, however the call to qcow2_free_any_clusters can trigger a cache flush which can mark the L2 table as clean, and assuming that this was the last write to it, a stale version of it will remain on the disk. Now we have a valid L2 entry pointing to a freed cluster. Oops. Fixes: `205fa50750` ("qcow2: Add subcluster support to zero_in_l2_slice()") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> [ kwolf: Fixed to restore the correct original order from before 205fa50750; added comments like in discard_in_l2_slice(). ] Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20201124092815.39056-1-kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-11-24 11:29:41 +01:00
Peter Maydell	683685e72d	Pull request for 5.2 NVMe fixes to solve IOMMU issues on non-x86 and error message/tracing improvements. Elena Afanasova's ioeventfd fixes are also included. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAl+ixjgACgkQnKSrs4Gr c8iZYgf+OB2eAGsdZO97fKh6VUUoRKa+BgWKuh37Cfpp3q+dLuIFMSKfU/UgprLc aowt6uTFfwudDV9KltUB2EiXIzpuf7JhMNOiDRkyEvYSj4KHRPsQmFCd35Nrjezy VvxSGafe2Z60Qnvcx+iGeMATSFX9YTcTZeHttC07v7dWn/yEK3b1hobcmjCcwWeR Ud8pjMyh5E2z/NpW8E669/byJf9iahx3LSQxSWt+9PVTPuftAB0Suu+m6svz1wvk sjVfIbtVWCp2BdGf5U6a2rEqF3+kIcFkfHp+MwgE0EdMz1wfjudaPl13a0C4DSun PSt9E+Ct5BTrDUvqCHvQDOaFiMZTPg== =Poyb -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/stefanha-gitlab/tags/block-pull-request' into staging Pull request for 5.2 NVMe fixes to solve IOMMU issues on non-x86 and error message/tracing improvements. Elena Afanasova's ioeventfd fixes are also included. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> # gpg: Signature made Wed 04 Nov 2020 15:18:16 GMT # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha-gitlab/tags/block-pull-request: (33 commits) util/vfio-helpers: Assert offset is aligned to page size util/vfio-helpers: Convert vfio_dump_mapping to trace events util/vfio-helpers: Improve DMA trace events util/vfio-helpers: Trace where BARs are mapped util/vfio-helpers: Trace PCI BAR region info util/vfio-helpers: Trace PCI I/O config accesses util/vfio-helpers: Improve reporting unsupported IOMMU type block/nvme: Fix nvme_submit_command() on big-endian host block/nvme: Fix use of write-only doorbells page on Aarch64 arch block/nvme: Align iov's va and size on host page size block/nvme: Change size and alignment of prp_list_pages block/nvme: Change size and alignment of queue block/nvme: Change size and alignment of IDENTIFY response buffer block/nvme: Correct minimum device page size block/nvme: Set request_alignment at initialization block/nvme: Simplify nvme_cmd_sync() block/nvme: Simplify ADMIN queue access block/nvme: Correctly initialize Admin Queue Attributes block/nvme: Use definitions instead of magic values in add_io_queue() block/nvme: Introduce Completion Queue definitions ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-11-23 13:03:13 +00:00
Peter Maydell	c8e5c4b246	Patches for 5.2.0-rc2: - quorum: Fix crash with rewrite-corrupted and without read-write user - io_uring: do not use pointer after free - file-posix: Use fallback path for -EBUSY from FALLOC_FL_PUNCH_HOLE - iotests: Fix failure on Python 3.9 due to use of a deprecated function - char-stdio: Fix QMP default for 'signal' -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl+zt1URHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9ZxyxAAu8GIOeAb7atQvc+KpeBTUG4A+tfAXkC+ iUYdIpFeWWgmGf7myu3nlaAkeTDk6qHalmzkGRHi3yhX4eNIh5Sdff1YwPcZwf+q GLIqFFTW0z1Bd36N8G7Mkf04nKX4QTHqp6THHtSt9jNs56h5OP3axPXVA/3v9y8B 4ZAkOOvwnwO+U94crhy5y5pX/Vwafv/Dz4DH9hEupE+EI9AuzjZLBrS+sgkxjhmu gvHpDSqm6NXwWQA5a24J6NzCy3n/Fw/rqmnoOrN8eRz+4DSCMVDnTDDEMFLa/UoK Ci7AqWfG/MnQ4GrGsOx80KJhAFLTmI60vfnUizKtEjL/HJyK5PDyM+VxHz+P/Tkq 4hqQsHEsll4mAQiKCrrKOOXhn+YC4DhY/5O1EzEfhqfUjI+BFE9iC7LuqQevwKPL gytup7eoZjIHMtnKwY1B2ApAqHtodswjHkefcjEcvSlhqGi/BvwuWmeYlFXmA3r0 YO8fvbYJrwHwJy7CzMb5Rgs2461QGERmXoCsBxLAiqXU9rhpOZ6gKXIjjlYojZ8M W0kqbaccTRPuhooFdEQ9RTPSkX7AX2bI0nOoPxfz3YD/siw35YwnUkJqvQbckvJd vpPkCL5jt3d9sfO0z1xjSH2ey9bevSReYpCsk+kIZl7V2XoDAW0Nbi0Td3pW4j6x dEkg/+sjF+o= =0pFF -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Patches for 5.2.0-rc2: - quorum: Fix crash with rewrite-corrupted and without read-write user - io_uring: do not use pointer after free - file-posix: Use fallback path for -EBUSY from FALLOC_FL_PUNCH_HOLE - iotests: Fix failure on Python 3.9 due to use of a deprecated function - char-stdio: Fix QMP default for 'signal' # gpg: Signature made Tue 17 Nov 2020 11:43:17 GMT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: iotests/081: Test rewrite-corrupted without WRITE iotests/081: Filter image format after testdir quorum: Require WRITE perm with rewrite-corrupted io_uring: do not use pointer after free file-posix: allow -EBUSY errors during write zeros on raw block devices iotests: Replace deprecated ConfigParser.readfp() char-stdio: Fix QMP default for 'signal' Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-11-17 15:58:51 +00:00
Peter Maydell	1c7ab0930a	pc,vhost: fixes Fixes all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl+zlRgPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpmf8H/0BEjxnINJCN12Te+Mot8K9fjwc0zE0SUuYY 25LogfJMCfVy0SZk0ZQV9z33GEL5XyMlXQjEpLmlX4d3mOBLcbutI6UVLhu8+Ixj 89+jFphxIQPDOpA7BnPOD4AJ6TlhbewZ41QBR/J/qv946HayFW9QCAUywuj6H80m T3lw0FmPkd6/YupUdUm0pPgJjowckGis+cAa9UkTlqp8jpzFur28N02fE0L6QO3Z lR6zsk4yEvsVoeXSkEkmSqZGNcwoQCf4BhmDuD7lBLZ0LBvmd37CCoakStpdnQPH Swunmf7Q1H6LRtF7s8ZKXBB/ecVnss3kFTFj5KWx3fJH2SJuHG8= =v205 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging pc,vhost: fixes Fixes all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Tue 17 Nov 2020 09:17:12 GMT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: vhost-user-blk/scsi: Fix broken error handling for socket call contrib/libvhost-user: Fix bad printf format specifiers hw/i386/acpi-build: Fix maybe-uninitialized error when ACPI hotplug off configure: mark vhost-user Linux-only vhost-user-blk-server: depend on CONFIG_VHOST_USER meson: move vhost_user_blk_server to meson.build vhost-user: fix VHOST_USER_ADD/REM_MEM_REG truncation Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # meson.build	2020-11-17 11:50:11 +00:00
Max Reitz	9ca5b0e842	quorum: Require WRITE perm with rewrite-corrupted Using rewrite-corrupted means quorum may issue writes to its children just from receiving read requests from its parents. Thus, it must take the WRITE permission when rewrite-corrupted is used. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201113211718.261671-2-mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-11-17 12:38:28 +01:00
Paolo Bonzini	bd89f93603	io_uring: do not use pointer after free Even though only the pointer value is only printed, it is untidy and Coverity complains. Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20201113154102.1460459-1-pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-11-17 12:26:48 +01:00
Maxim Levitsky	ece4fa9152	file-posix: allow -EBUSY errors during write zeros on raw block devices On Linux, fallocate(fd, FALLOC_FL_PUNCH_HOLE) when it is used on a block device, without O_DIRECT can return -EBUSY if it races with another write to the same page. Since this is rare and discard is not a critical operation, ignore this error Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201111153913.41840-2-mlevitsk@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-11-17 12:26:48 +01:00
Chetan Pant	61f3c91a67	nomaintainer: Fix Lesser GPL version number There is no "version 2" of the "Lesser" General Public License. It is either "GPL version 2.0" or "Lesser GPL version 2.1". This patch replaces all occurrences of "Lesser GPL version 2" with "Lesser GPL version 2.1" in comment section. This patch contains all the files, whose maintainer I could not get from ‘get_maintainer.pl’ script. Signed-off-by: Chetan Pant <chetan4windows@gmail.com> Message-Id: <20201023124424.20177-1-chetan4windows@gmail.com> Reviewed-by: Thomas Huth <thuth@redhat.com> [thuth: Adapted exec.c and qdev-monitor.c to new location] Signed-off-by: Thomas Huth <thuth@redhat.com>	2020-11-15 17:04:40 +01:00
Stefan Hajnoczi	e5e856c1eb	meson: move vhost_user_blk_server to meson.build The --enable/disable-vhost-user-blk-server options were implemented in ./configure. There has been confusion about them and part of the problem is that the shell syntax used for setting the default value is not easy to read. Move the option over to meson where the conditions are easier to understand: have_vhost_user_blk_server = (targetos == 'linux') if get_option('vhost_user_blk_server').enabled() if targetos != 'linux' error('vhost_user_blk_server requires linux') endif elif get_option('vhost_user_blk_server').disabled() or not have_system have_vhost_user_blk_server = false endif This patch does not change behavior. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201110171121.1265142-2-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2020-11-12 09:19:40 -05:00
shiliyang	5f14f31d2b	block: Fix some code style problems, "foo* bar" should be "foo bar" There have some code style problems be found when read the block driver code. So I fixes some problems of this error, ERROR: "foo bar" should be "foo *bar". Signed-off-by: Liyang Shi <shiliyang@huawei.com> Reported-by: Euler Robot <euler.robot@huawei.com> Message-Id: <3211f389-6d22-46c1-4a16-e6a2ba66f070@huawei.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-11-09 18:42:47 +01:00
Yonggang Luo	c63b0201ae	block: Fixes nfs compiling error on msys2/mingw These compiling errors are fixed: ../block/nfs.c:27:10: fatal error: poll.h: No such file or directory 27 \| #include <poll.h> \| ^~~~~~~~ compilation terminated. ../block/nfs.c:63:5: error: unknown type name 'blkcnt_t' 63 \| blkcnt_t st_blocks; \| ^~~~~~~~ ../block/nfs.c: In function 'nfs_client_open': ../block/nfs.c:550:27: error: 'struct _stat64' has no member named 'st_blocks' 550 \| client->st_blocks = st.st_blocks; \| ^ ../block/nfs.c: In function 'nfs_get_allocated_file_size': ../block/nfs.c:751:41: error: 'struct _stat64' has no member named 'st_blocks' 751 \| return (task.ret < 0 ? task.ret : st.st_blocks * 512); \| ^ ../block/nfs.c: In function 'nfs_reopen_prepare': ../block/nfs.c:805:31: error: 'struct _stat64' has no member named 'st_blocks' 805 \| client->st_blocks = st.st_blocks; \| ^ ../block/nfs.c: In function 'nfs_get_allocated_file_size': ../block/nfs.c:752:1: error: control reaches end of non-void function [-Werror=return-type] 752 \| } \| ^ On msys2/mingw, there is no st_blocks in struct _stat64 yet, we disable the usage of it on msys2/mingw, and create a typedef long long blkcnt_t; for further implementation Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Message-Id: <20201105123116.674-2-luoyonggang@gmail.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-11-09 15:44:21 +01:00
Alberto Garcia	3441ad4bc4	qcow2: Document and enforce the QCowL2Meta invariants The QCowL2Meta structure is used to store information about a part of a write request that touches clusters that need changes in their L2 entries. This happens with newly-allocated clusters or subclusters. This structure has changed a bit since it was first created and its current documentation is not quite up-to-date. A write request can span a region consisting of a combination of clusters of different types, and qcow2_alloc_host_offset() can repeatedly call handle_copied() and handle_alloc() to add more clusters to the mix as long as they all are contiguous on the image file. Because of this a write request has a list of QCowL2Meta structures, one for each part of the request that needs changes in the L2 metadata. Each one of them spans nb_clusters and has two copy-on-write regions located immediately before and after the middle region touched by that part of the write request. Even when those regions themselves are empty their offsets must be correct because they are used to know the location of the middle region. This was not always the case but it is not a problem anymore because the only two places where QCowL2Meta structures are created (calculate_l2_meta() and qcow2_co_truncate()) ensure that the copy-on-write regions are correctly defined, and so do assertions like the ones in perform_cow(). The conditional initialization of the 'written_to' variable is therefore unnecessary and is removed by this patch. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201007161323.4667-1-berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-11-09 15:44:21 +01:00
AlexChen	3d86af858e	block: Remove unused include The "qemu-common.h" include is not used, remove it. Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: AlexChen <alex.chen@huawei.com> Message-Id: <5F8FFB94.3030209@huawei.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-11-09 15:44:21 +01:00
Peter Maydell	85c3ed4417	pc,pci,vhost,virtio: fixes Lots of fixes all over the place. virtio-mem and virtio-iommu patches are kind of fixes but it seems better to just make them behave sanely than try to educate users about the limitations ... Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl+i9YMPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpySQH/Ru/sxB9PncR1HsqSf0HC0tt/EMKgyZTXEwQ FITcjkCvBDS98a1VUvvZbjzTEDEZNnoUv94MjdLeBoptJ7GtK6nPoI6Ke0p1Zqbe mlY2BCb0FpN8FE+mthjAI03mhw6o8Qo/OPtyISQzUxCVVqUHL5TRAVAQdeidoK8n RBQ4WogwM/h7wI0d9GGgSxAON8IRQnBYImtzJieBb6zeScwKVFTWI1tqBdOyFN0/ AhzQiNZuhZ7a1XGJIsxmWB1NK2kcXNJuOF0ANh4coIHR0JzmH3xRy+Jnf5e3dYsw LI23DUZPSTJJXAwKPucyTG7RTX8F55N9DVHC9KDRD6Ntq1oreJ4= =pcbN -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging pc,pci,vhost,virtio: fixes Lots of fixes all over the place. virtio-mem and virtio-iommu patches are kind of fixes but it seems better to just make them behave sanely than try to educate users about the limitations ... Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Wed 04 Nov 2020 18:40:03 GMT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (31 commits) contrib/vhost-user-blk: fix get_config() information leak block/export: fix vhost-user-blk get_config() information leak block/export: make vhost-user-blk config space little-endian configure: introduce --enable-vhost-user-blk-server libvhost-user: follow QEMU comment style vhost-blk: set features before setting inflight feature Revert "vhost-blk: set features before setting inflight feature" net: Add vhost-vdpa in show_netdevs() vhost-vdpa: Add qemu_close in vhost_vdpa_cleanup vfio: Don't issue full 2^64 unmap virtio-iommu: Set supported page size mask vfio: Set IOMMU page size as per host supported page size memory: Add interface to set iommu page size mask virtio-iommu: Add notify_flag_changed() memory region callback virtio-iommu: Add replay() memory region callback virtio-iommu: Call memory notifiers in attach/detach virtio-iommu: Add memory notifiers for map/unmap virtio-iommu: Store memory region in endpoint struct virtio-iommu: Fix virtio_iommu_mr() hw/smbios: Fix leaked fd in save_opt_one() error path ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-11-05 15:16:43 +00:00
Stefan Hajnoczi	f8ffcb2bda	block/export: fix vhost-user-blk get_config() information leak Refuse get_config() requests in excess of sizeof(struct virtio_blk_config). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201027173528.213464-5-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-11-03 16:39:05 -05:00
Stefan Hajnoczi	11f60f7eae	block/export: make vhost-user-blk config space little-endian VIRTIO 1.0 devices have little-endian configuration space. The vhost-user-blk-server.c code already uses little-endian for virtqueue processing but not for the configuration space fields. Fix this so the vhost-user-blk export works on big-endian hosts. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201027173528.213464-4-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-11-03 16:39:05 -05:00
Stefan Hajnoczi	bc15e44cb2	configure: introduce --enable-vhost-user-blk-server Make it possible to compile out the vhost-user-blk server. It is enabled by default on Linux. Note that vhost-user-server.c depends on libvhost-user, which requires CONFIG_LINUX. The CONFIG_VHOST_USER dependency was erroneous since that option controls vhost-user frontends (previously known as "master") and not device backends (previously known as "slave"). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201027173528.213464-3-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2020-11-03 16:39:05 -05:00
Philippe Mathieu-Daudé	a0546a7b6f	block/nvme: Fix nvme_submit_command() on big-endian host The Completion Queue Command Identifier is a 16-bit value, so nvme_submit_command() is unlikely to work on big-endian hosts, as the relevant bits are truncated. Fix by using the correct byte-swap function. Fixes: `bdd6a90a9e` ("block: Add VFIO based NVMe driver") Reported-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20201029093306.1063879-25-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Philippe Mathieu-Daudé	4b19e9b815	block/nvme: Fix use of write-only doorbells page on Aarch64 arch qemu_vfio_pci_map_bar() calls mmap(), and mmap(2) states: 'offset' must be a multiple of the page size as returned by sysconf(_SC_PAGE_SIZE). In commit `f68453237b` we started to use an offset of 4K which broke this contract on Aarch64 arch. Fix by mapping at offset 0, and and accessing doorbells at offset=4K. Fixes: `f68453237b` ("block/nvme: Map doorbells pages write-only") Reported-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-24-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Eric Auger	9e13d59884	block/nvme: Align iov's va and size on host page size Make sure iov's va and size are properly aligned on the host page size. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-23-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Eric Auger	f8fd3ebac3	block/nvme: Change size and alignment of prp_list_pages In preparation of 64kB host page support, let's change the size and alignment of the prp_list_pages so that the VFIO DMA MAP succeeds with 64kB host page size. We align on the host page size. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-22-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Eric Auger	2387aaced7	block/nvme: Change size and alignment of queue In preparation of 64kB host page support, let's change the size and alignment of the queue so that the VFIO DMA MAP succeeds. We align on the host page size. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-21-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Eric Auger	0aecd06049	block/nvme: Change size and alignment of IDENTIFY response buffer In preparation of 64kB host page support, let's change the size and alignment of the IDENTIFY command response buffer so that the VFIO DMA MAP succeeds. We align on the host page size. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-20-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Philippe Mathieu-Daudé	a652a3ec69	block/nvme: Correct minimum device page size While trying to simplify the code using a macro, we forgot the 12-bit shift... Correct that. Fixes: `fad1eb6886` ("block/nvme: Use register definitions from 'block/nvme.h'") Reported-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-19-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:22 +00:00
Philippe Mathieu-Daudé	c8228ac355	block/nvme: Set request_alignment at initialization Commit `bdd6a90a9e` ("block: Add VFIO based NVMe driver") sets the request_alignment in nvme_refresh_limits(). For consistency, also set it during initialization. Reported-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-18-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	08d5406798	block/nvme: Simplify nvme_cmd_sync() As all commands use the ADMIN queue, it is pointless to pass it as argument each time. Remove the argument, and rename the function as nvme_admin_cmd_sync() to make this new behavior clearer. Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20201029093306.1063879-17-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	52b75ea8ec	block/nvme: Simplify ADMIN queue access We don't need to dereference from BDRVNVMeState each time. Use a NVMeQueuePair pointer on the admin queue. The nvme_init() becomes easier to review, matching the style of nvme_add_io_queue(). Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-16-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	3c363c073e	block/nvme: Correctly initialize Admin Queue Attributes From the specification chapter 3.1.8 "AQA - Admin Queue Attributes" the Admin Submission Queue Size field is a 0’s based value: Admin Submission Queue Size (ASQS): Defines the size of the Admin Submission Queue in entries. Enabling a controller while this field is cleared to 00h produces undefined results. The minimum size of the Admin Submission Queue is two entries. The maximum size of the Admin Submission Queue is 4096 entries. This is a 0’s based value. This bug has never been hit because the device initialization uses a single command synchronously :) Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-15-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	76a24781cc	block/nvme: Use definitions instead of magic values in add_io_queue() Replace magic values by definitions, and simplifiy since the number of queues will never reach 64K. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-14-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	dfa9c6c656	block/nvme: Make nvme_init_queue() return boolean indicating error Just for consistency, following the example documented since commit `e3fe3988d7` ("error: Document Error API usage rules"), return a boolean value indicating an error is set or not. Directly pass errp as the local_err is not requested in our case. This simplifies a bit nvme_create_queue_pair(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-12-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	7a5f00dde3	block/nvme: Make nvme_identify() return boolean indicating error Just for consistency, following the example documented since commit `e3fe3988d7` ("error: Document Error API usage rules"), return a boolean value indicating an error is set or not. Directly pass errp as the local_err is not requested in our case. Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20201029093306.1063879-11-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	1b539bd6db	block/nvme: Use unsigned integer for queue counter/size We can not have negative queue count/size/index, use unsigned type. Rename 'nr_queues' as 'queue_count' to match the spec naming. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-10-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	3214b0f094	block/nvme: Move definitions before structure declarations To be able to use some definitions in structure declarations, move them earlier. No logical change. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-9-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:21 +00:00
Philippe Mathieu-Daudé	6e1e9ff2d3	block/nvme: Trace queue pair creation/deletion Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-8-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé	51e98b6d21	block/nvme: Improve nvme_free_req_queue_wait() trace information What we want to trace is the block driver state and the queue index. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-7-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé	1c914cd120	block/nvme: Trace nvme_poll_queue() per queue As we want to enable multiple queues, report the event in each nvme_poll_queue() call, rather than once in the callback calling nvme_poll_queues(). Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-6-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé	15b2260bef	block/nvme: Trace controller capabilities Controllers have different capabilities and report them in the CAP register. We are particularly interested by the page size limits. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-5-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé	58ad6ae0cb	block/nvme: Report warning with warn_report() Instead of displaying warning on stderr, use warn_report() which also displays it on the monitor. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-4-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
Philippe Mathieu-Daudé	8526e39e99	block/nvme: Use hex format to display offset in trace events Use the same format used for the hw/vfio/ trace events. Suggested-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-3-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>	2020-11-03 19:06:20 +00:00
AlexChen	c9eb2f3e38	block/vvfat: Fix bad printf format specifiers We should use printf format specifier "%u" instead of "%d" for argument of type "unsigned int". In addition, fix two error format problems found by checkpatch.pl: ERROR: space required after that ',' (ctx:VxV) + fprintf(stderr,"%s attributes=0x%02x begin=%u size=%d\n", ^ ERROR: line over 90 characters + fprintf(stderr, "%d, %s (%u, %d)\n", i, commit->path ? commit->path : "(null)", commit->param.rename.cluster, commit->action); Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Alex Chen <alex.chen@huawei.com> Message-Id: <5FA12620.6030705@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-11-03 16:24:56 +01:00
Eric Blake	dbc7b01492	nbd: Add 'qemu-nbd -A' to expose allocation depth Allow the server to expose an additional metacontext to be requested by savvy clients. qemu-nbd adds a new option -A to expose the qemu:allocation-depth metacontext through NBD_CMD_BLOCK_STATUS; this can also be set via QMP when using block-export-add. qemu as client is hacked into viewing the key aspects of this new context by abusing the already-experimental x-dirty-bitmap option to collapse all depths greater than 2, which results in a tri-state value visible in the output of 'qemu-img map --output=json' (yes, that means x-dirty-bitmap is now a bit of a misnomer, but I didn't feel like renaming it as it would introduce a needless break of back-compat, even though we make no compat guarantees with x- members): unallocated (depth 0) => "zero":false, "data":true local (depth 1) => "zero":false, "data":false backing (depth 2+) => "zero":true, "data":true libnbd as client is probably a nicer way to get at the information without having to decipher such hacks in qemu as client. ;) Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201027050556.269064-11-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2020-10-30 15:22:00 -05:00
Eric Blake	a92b1b065e	block: Return depth level during bdrv_is_allocated_above When checking for allocation across a chain, it's already easy to count the depth within the chain at which the allocation is found. Instead of throwing that information away, return it to the caller. Existing callers only cared about allocated/non-allocated, but having a depth available will be used by NBD in the next patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201027050556.269064-9-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-30 15:21:23 -05:00
Greg Kurz	1a6d3bd229	block: End quiescent sections when a BDS is deleted If a BDS gets deleted during blk_drain_all(), it might miss a call to bdrv_do_drained_end(). This means missing a call to aio_enable_external() and the AIO context remains disabled for ever. This can cause a device to become irresponsive and to disrupt the guest execution, ie. hang, loop forever or worse. This scenario is quite easy to encounter with virtio-scsi on POWER when punching multiple blockdev-create QMP commands while the guest is booting and it is still running the SLOF firmware. This happens because SLOF disables/re-enables PCI devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND register after the initial probe/feature negotiation, as it tends to work with a single device at a time at various stages like probing and running block/network bootloaders without doing a full reset in-between. This naturally generates many dataplane stops and starts, and thus many drain sections that can race with blockdev_create_run(). In the end, SLOF bails out. It is somehow reproducible on x86 but it requires to generate articial dataplane start/stop activity with stop/cont QMP commands. In this case, seabios ends up looping for ever, waiting for the virtio-scsi device to send a response to a command it never received. Add a helper that pairs all previously called bdrv_do_drained_begin() with a bdrv_do_drained_end() and call it from bdrv_close(). While at it, update the "/bdrv-drain/graph-change/drain_all" test in test-bdrv-drain so that it can catch the issue. BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441 Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160346526998.272601.9045392804399803158.stgit@bahia.lan> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-27 15:26:20 +01:00
Alberto Garcia	46cd1e8a47	qcow2: Skip copy-on-write when allocating a zero cluster Since commit `c8bb23cbdb` when a write request results in a new allocation QEMU first tries to see if the rest of the cluster outside the written area contains only zeroes. In that case, instead of doing a normal copy-on-write operation and writing explicit zero buffers to disk, the code zeroes the whole cluster efficiently using pwrite_zeroes() with BDRV_REQ_NO_FALLBACK. This improves performance very significantly but it only happens when we are writing to an area that was completely unallocated before. Zero clusters (QCOW2_CLUSTER_ZERO_*) are treated like normal clusters and are therefore slower to allocate. This happens because the code uses bdrv_is_allocated_above() rather bdrv_block_status_above(). The former is not as accurate for this purpose but it is faster. However in the case of qcow2 the underlying call does already report zero clusters just fine so there is no reason why we cannot use that information. After testing 4KB writes on an image that only contains zero clusters this patch results in almost five times more IOPS. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <6d77cab968c501c44d6e1089b9bc91b04170b49e.1603731354.git.berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-27 15:26:20 +01:00
Alberto Garcia	d40f4a565a	qcow2: Report BDRV_BLOCK_ZERO more accurately in bdrv_co_block_status() If a BlockDriverState supports backing files but has none then any unallocated area reads back as zeroes. bdrv_co_block_status() is only reporting this is if want_zero is true, but this is an inexpensive test and there is no reason not to do it in all cases. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <66fa0914a0e2b727ab6d1b63ca773d7cd29a9a9e.1603731354.git.berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-27 15:26:20 +01:00
Peter Maydell	a95e0396c8	* fix --disable-tcg builds (Claudio) * Fixes for macOS --enable-modules build and OpenBSD curses/iconv detection (myself) * Start preparing for meson 0.56 (myself) * Move directory configuration to meson (myself) * Start untangling qemu_init (myself) * Windows fixes (Sunil) * Remove -no-kbm (Thomas) -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl+WrxEUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNQAggAqfucqEQvz6s+DCPv2u572diyMvhe Y7vmaQF0qYKoAvy5OLqGlqXVsn8lwf19zJWo9Z7k4qNefWl84ii0J/kEmnolzTGq 7Z0CRSnGbNQy9YedYXuymaR3E0VY+6lsPnzIpufQISzQRdjzT8OQ51DMAhc04oQl saXsts7y+om+tzvW2JFGtNsfFRUjcRKqjIAVfwneBXFW9TRD2epvYxz/S0o+XJwF eSiINvTqDxxPyy6XJykC46xf/TTfReHv6fQgTn7Jw3TQuo4m7qXLi5Vj8W1erZJv t3xhZNabt813T6ztNcAAuJ0srIn55Ac7Fuq3/1ecgeVD08ntmabe4WhKRg== =931x -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging * fix --disable-tcg builds (Claudio) * Fixes for macOS --enable-modules build and OpenBSD curses/iconv detection (myself) * Start preparing for meson 0.56 (myself) * Move directory configuration to meson (myself) * Start untangling qemu_init (myself) * Windows fixes (Sunil) * Remove -no-kbm (Thomas) # gpg: Signature made Mon 26 Oct 2020 11:12:17 GMT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: machine: move SMP initialization from vl.c machine: move UP defaults to class_base_init machine: remove deprecated -machine enforce-config-section option win32: boot broken when bind & data dir are the same WHPX: Fix WHPX build break configure: move install_blobs from configure to meson configure: remove unused variable from config-host.mak configure: move directory options from config-host.mak to meson configure: allow configuring localedir Makefile: separate meson rerun from the rest of the ninja invocation Remove deprecated -no-kvm option replay: do not build if TCG is not available qtest: unbreak non-TCG builds in bios-tables-test hw/core/qdev-clock: add a reference on aliased clocks do not use colons in test names meson: rewrite curses/iconv test build: fix macOS --enable-modules build Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-10-26 15:49:11 +00:00
Vladimir Sementsov-Ogievskiy	7e7e510077	block/io: fix bdrv_is_allocated_above bdrv_is_allocated_above wrongly handles short backing files: it reports after-EOF space as UNALLOCATED which is wrong, as on read the data is generated on the level of short backing file (if all overlays have unallocated areas at that place). Reusing bdrv_common_block_status_above fixes the issue and unifies code path. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20200924194003.22080-5-vsementsov@virtuozzo.com [Fix s/has/have/ as suggested by Eric Blake. Fix s/area/areas/. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Vladimir Sementsov-Ogievskiy	624f27bbe9	block/io: bdrv_common_block_status_above: support bs == base We are going to reuse bdrv_common_block_status_above in bdrv_is_allocated_above. bdrv_is_allocated_above may be called with include_base == false and still bs == base (for ex. from img_rebase()). So, support this corner case. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20200924194003.22080-4-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Vladimir Sementsov-Ogievskiy	3555a43261	block/io: bdrv_common_block_status_above: support include_base In order to reuse bdrv_common_block_status_above in bdrv_is_allocated_above, let's support include_base parameter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200924194003.22080-3-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Vladimir Sementsov-Ogievskiy	67c095c8b8	block/io: fix bdrv_co_block_status_above bdrv_co_block_status_above has several design problems with handling short backing files: 1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but without BDRV_BLOCK_ALLOCATED flag, when actually short backing file which produces these after-EOF zeros is inside requested backing sequence. 2. With want_zero=false, it may return pnum=0 prior to actual EOF, because of EOF of short backing file. Fix these things, making logic about short backing files clearer. With fixed bdrv_block_status_above we also have to improve is_zero in qcow2 code, otherwise iotest 154 will fail, because with this patch we stop to merge zeros of different types (produced by fully unallocated in the whole backing chain regions vs produced by short backing files). Note also, that this patch leaves for another day the general problem around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated vs go-to-backing. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200924194003.22080-2-vsementsov@virtuozzo.com [Fix s/comes/come/ as suggested by Eric Blake --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	d9b495f9c6	block/export: add vhost-user-blk multi-queue support Allow the number of queues to be configured using --export vhost-user-blk,num-queues=N. This setting should match the QEMU --device vhost-user-blk-pci,num-queues=N setting but QEMU vhost-user-blk.c lowers its own value if the vhost-user-blk backend offers fewer queues than QEMU. The vhost-user-blk-server.c code is already capable of multi-queue. All virtqueue processing runs in the same AioContext. No new locking is needed. Add the num-queues=N option and set the VIRTIO_BLK_F_MQ feature bit. Note that the feature bit only announces the presence of the num_queues configuration space field. It does not promise that there is more than 1 virtqueue, so we can set it unconditionally. I tested multi-queue by running a random read fio test with numjobs=4 on an -smp 4 guest. After the benchmark finished the guest /proc/interrupts file showed activity on all 4 virtio-blk MSI-X. The /sys/block/vda/mq/ directory shows that Linux blk-mq has 4 queues configured. An automated test is included in the next commit. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20201001144604.559733-2-stefanha@redhat.com [Fixed accidental tab characters as suggested by Markus Armbruster --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	f51d23c80a	block/export: add iothread and fixed-iothread options Make it possible to specify the iothread where the export will run. By default the block node can be moved to other AioContexts later and the export will follow. The fixed-iothread option forces strict behavior that prevents changing AioContext while the export is active. See the QAPI docs for details. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200929125516.186715-5-stefanha@redhat.com [Fix stray '#' character in block-export.json and add missing "(since: 5.2)" as suggested by Eric Blake. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	cbc20bfb8f	block: move block exports to libblockdev Block exports are used by softmmu, qemu-storage-daemon, and qemu-nbd. They are not used by other programs and are not otherwise needed in libblock. Undo the recent move of blockdev-nbd.c from blockdev_ss into block_ss. Since bdrv_close_all() (libblock) calls blk_exp_close_all() (libblockdev) a stub function is required.. Make qemu-nbd.c use signal handling utility functions instead of duplicating the code. This helps because os-posix.c is in libblockdev and it depends on a qemu_system_killed() symbol that qemu-nbd.c lacks. Once we use the signal handling utility functions we also end up providing the necessary symbol. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200929125516.186715-4-stefanha@redhat.com [Fixed s/ndb/nbd/ typo in commit description as suggested by Eric Blake --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	3a213f83d9	util/vhost-user-server: use static library in meson.build Don't compile contrib/libvhost-user/libvhost-user.c again. Instead build the static library once and then reuse it throughout QEMU. Also switch from CONFIG_LINUX to CONFIG_VHOST_USER, which is what the vhost-user tools (vhost-user-gpu, etc) do. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-14-stefanha@redhat.com [Added CONFIG_LINUX again because libvhost-user doesn't build on macOS. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	80a06cc52b	util/vhost-user-server: move header to include/ Headers used by other subsystems are located in include/. Also add the vhost-user-server and vhost-user-blk-server headers to MAINTAINERS. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-13-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	90fc91d50b	block/export: convert vhost-user-blk server to block export API Use the new QAPI block exports API instead of defining our own QOM objects. This is a large change because the lifecycle of VuBlockDev needs to follow BlockExportDriver. QOM properties are replaced by QAPI options objects. VuBlockDev is renamed VuBlkExport and contains a BlockExport field. Several fields can be dropped since BlockExport already has equivalents. The file names and meson build integration will be adjusted in a future patch. libvhost-user should probably be built as a static library that is linked into QEMU instead of as a .c file that results in duplicate compilation. The new command-line syntax is: $ qemu-storage-daemon \ --blockdev file,node-name=drive0,filename=test.img \ --export vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock Note that unix-socket is optional because we may wish to accept chardevs too in the future. Markus noted that supported address families are not explicit in the QAPI schema. It is unlikely that support for more address families will be added since file descriptor passing is required and few address families support it. If a new address family needs to be added, then the QAPI 'features' syntax can be used to advertize them. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20200924151549.913737-12-stefanha@redhat.com [Skip test on big-endian host architectures because this device doesn't support them yet (as already mentioned in a code comment). --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	0534b1b227	block/export: report flush errors Propagate the flush return value since errors are possible. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-11-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	7185c85776	util/vhost-user-server: rework vu_client_trip() coroutine lifecycle The vu_client_trip() coroutine is leaked during AioContext switching. It is also unsafe to destroy the vu_dev in panic_cb() since its callers still access it in some cases. Rework the lifecycle to solve these safety issues. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-10-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	47ba680466	util/vhost-user-server: drop unused DevicePanicNotifier The device panic notifier callback is not used. Drop it. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Stefan Hajnoczi	df6af7ce77	block/export: consolidate request structs into VuBlockReq Only one struct is needed per request. Drop req_data and the separate VuBlockReq instance. Instead let vu_queue_pop() allocate everything at once. This fixes the req_data memory leak in vu_block_virtio_process_req(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Coiby Xu	3578389bcf	block/export: vhost-user block device backend server By making use of libvhost-user, block device drive can be shared to the connected vhost-user client. Only one client can connect to the server one time. Since vhost-user-server needs a block drive to be created first, delay the creation of this object. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Coiby Xu <coiby.xu@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-id: 20200918080912.321299-6-coiby.xu@gmail.com [Shorten "vhost_user_blk_server" string to "vhost_user_blk" to avoid the following compiler warning: ../block/export/vhost-user-blk-server.c:178:50: error: ‘%s’ directive output truncated writing 21 bytes into a region of size 20 [-Werror=format-truncation=] and fix "Invalid size %ld ..." ssize_t format string arguments for 32-bit hosts. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Philippe Mathieu-Daudé	f25e7ab2b0	block/nvme: Add driver statistics for access alignment and hw errors Keep statistics of some hardware errors, and number of aligned/unaligned I/O accesses. QMP example booting a full RHEL 8.3 aarch64 guest: { "execute": "query-blockstats" } { "return": [ { "device": "", "node-name": "drive0", "stats": { "flush_total_time_ns": 6026948, "wr_highest_offset": 3383991230464, "wr_total_time_ns": 807450995, "failed_wr_operations": 0, "failed_rd_operations": 0, "wr_merged": 3, "wr_bytes": 50133504, "failed_unmap_operations": 0, "failed_flush_operations": 0, "account_invalid": false, "rd_total_time_ns": 1846979900, "flush_operations": 130, "wr_operations": 659, "rd_merged": 1192, "rd_bytes": 218244096, "account_failed": false, "idle_time_ns": 2678641497, "rd_operations": 7406, }, "driver-specific": { "driver": "nvme", "completion-errors": 0, "unaligned-accesses": 2959, "aligned-accesses": 4477 }, "qdev": "/machine/peripheral-anon/device[0]/virtio-backend" } ] } Suggested-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20201001162939.1567915-1-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-23 13:42:16 +01:00
Claudio Fontana	9b1c911654	replay: do not build if TCG is not available this fixes non-TCG builds broken recently by replay reverse debugging. Stub the needed functions in stub/, splitting roughly between functions needed only by system emulation, by system emulation and tools, and by everyone. This includes duplicating some code in replay/, and puts the logic for non-replay related events in the replay/ module (+ the stubs), so this should be revisited in the future. Surprisingly, only _one_ qtest was affected by this, ide-test.c, which resulted in a buzz as the bh events were never delivered, and the bh never executed. Many other subsystems _should_ have been affected. This fixes the immediate issue, however a better way to group replay functionality to TCG-only code could be developed in the long term. Signed-off-by: Claudio Fontana <cfontana@suse.de> Message-Id: <20201013192123.22632-4-cfontana@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-22 11:53:54 -04:00
Daniel P. Berrangé	e1c4269763	block: deprecate the sheepdog block driver This thread from a little over a year ago: http://lists.wpkg.org/pipermail/sheepdog/2019-March/thread.html states that sheepdog is no longer actively developed. The only mentioned users are some companies who are said to have it for legacy reasons with plans to replace it by Ceph. There is talk about cutting out existing features to turn it into a simple demo of how to write a distributed block service. There is no evidence of anyone working on that idea: https://github.com/sheepdog/sheepdog/commits/master No real commits to git since Jan 2018, and before then just some minor technical debt cleanup. There is essentially no activity on the mailing list aside from patches to QEMU that get CC'd due to our MAINTAINERS entry. Fedora packages for sheepdog failed to build from upstream source because of the more strict linker that no longer merges duplicate global symbols. Fedora patches it to add the missing "extern" annotations and presumably other distros do to, but upstream source remains broken. There is only basic compile testing, no functional testing of the driver. Since there are no build pre-requisites the sheepdog driver is currently enabled unconditionally. This would result in configure issuing a deprecation warning by default for all users. Thus the configure default is changed to disable it, requiring users to pass --enable-sheepdog to build the driver. Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20201002113243.2347710-3-berrange@redhat.com> Reviewed-by: Neal Gompa <ngompa13@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-15 16:06:28 +02:00
Elena Afanasova	5b4c95d0a3	block/blkdebug: fix memory leak Spotted by PVS-Studio Signed-off-by: Elena Afanasova <eafanasova@gmail.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1e903f928eb3da332cc95e2a6f87243bd9fe66e4.camel@gmail.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-10-13 13:33:46 +02:00
Christian Borntraeger	cd466702f0	vmdk: fix maybe uninitialized warnings Fedora 32 gcc 10 seems to give false positives: Compiling C object libblock.fa.p/block_vmdk.c.o ../block/vmdk.c: In function ‘vmdk_parse_extents’: ../block/vmdk.c:587:5: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 587 \| g_free(extent->l1_table); \| ^~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:754:17: note: ‘extent’ was declared here 754 \| VmdkExtent extent; \| ^~~~~~ ../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 620 \| ret = vmdk_init_tables(bs, extent, errp); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:598:17: note: ‘extent’ was declared here 598 \| VmdkExtent extent; \| ^~~~~~ ../block/vmdk.c:1178:39: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 1178 \| extent->flat_start_offset = flat_offset << 9; \| ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~ ../block/vmdk.c: In function ‘vmdk_open_vmdk4’: ../block/vmdk.c:581:22: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 581 \| extent->l2_cache = \| ~~~~~~~~~~~~~~~~~^ 582 \| g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:872:17: note: ‘extent’ was declared here 872 \| VmdkExtent extent; \| ^~~~~~ ../block/vmdk.c: In function ‘vmdk_open’: ../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 620 \| ret = vmdk_init_tables(bs, extent, errp); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:598:17: note: ‘extent’ was declared here 598 \| VmdkExtent extent; \| ^~~~~~ cc1: all warnings being treated as errors make: *** [Makefile.ninja:884: libblock.fa.p/block_vmdk.c.o] Error 1 fix them by assigning a default value. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Fam Zheng <fam@euphon.net> Message-Id: <20200930155859.303148-2-borntraeger@de.ibm.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-10-13 13:33:45 +02:00
Vladimir Sementsov-Ogievskiy	99d72dba1c	block/nbd: nbd_co_reconnect_loop(): don't connect if drained In a recent commit `12c75e20a2` we've improved nbd_co_reconnect_loop() to not make drain wait for additional sleep. Similarly, we shouldn't try to connect, if previous sleep was interrupted by drain begin, otherwise drain_begin will have to wait for the whole connection attempt. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200903190301.367620-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-09 15:04:32 -05:00
Vladimir Sementsov-Ogievskiy	46f56631b5	block/nbd: fix reconnect-delay reconnect-delay has a design flaw: we handle it in the same loop where we do connection attempt. So, reconnect-delay may be exceeded by unpredictable time of connection attempt. Let's instead use separate timer. How to reproduce the bug: 1. Create an image on node1: qemu-img create -f qcow2 xx 100M 2. Start NBD server on node1: qemu-nbd xx 3. On node2 start qemu-io: ./build/qemu-io --image-opts \ driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=15 4. Type 'read 0 512' in qemu-io interface to check that connection works Be careful: you should make steps 5-7 in a short time, less than 15 seconds. 5. Kill nbd server on node1 6. Run 'read 0 512' in qemu-io interface again, to be sure that nbd client goes to reconnect loop. 7. On node1 run the following command sudo iptables -A INPUT -p tcp --dport 10809 -j DROP This will make the connect() call of qemu-io at node2 take a long time. And you'll see that read command in qemu-io will hang for a long time, more than 15 seconds specified by reconnect-delay parameter. It's the bug. 8. Don't forget to drop iptables rule on node1: sudo iptables -D INPUT -p tcp --dport 10809 -j DROP Important note: Step [5] is necessary to reproduce _this_ bug. If we miss step [5], the read command (step 6) will hang for a long time and this commit doesn't help, because there will be not long connect() to unreachable host, but long sendmsg() to unreachable host, which should be fixed by enabling and adjusting keep-alive on the socket, which is a thing for further patch set. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200903190301.367620-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-09 15:04:32 -05:00
Vladimir Sementsov-Ogievskiy	8a509afd72	block/nbd: correctly use qio_channel_detach_aio_context when needed Don't use nbd_client_detach_aio_context() driver handler where we want to finalize the connection. We should directly use qio_channel_detach_aio_context() in such cases. Driver handler may (and will) contain another things, unrelated to the qio channel. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200903190301.367620-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-09 15:04:32 -05:00
Vladimir Sementsov-Ogievskiy	8c517de24a	block/nbd: fix drain dead-lock because of nbd reconnect-delay We pause reconnect process during drained section. So, if we have some requests, waiting for reconnect we should cancel them, otherwise they deadlock the drained section. How to reproduce: 1. Create an image: qemu-img create -f qcow2 xx 100M 2. Start NBD server: qemu-nbd xx 3. Start vm with second nbd disk on node2, like this: ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \ file=/work/images/cent7.qcow2 -drive \ driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \ -vnc :0 -m 2G -enable-kvm -vga std 4. Access the vm through vnc (or some other way?), and check that NBD drive works: dd if=/dev/sdb of=/dev/null bs=1M count=10 - the command should succeed. 5. Now, kill the nbd server, and run dd in the guest again: dd if=/dev/sdb of=/dev/null bs=1M count=10 Now Qemu is trying to reconnect, and dd-generated requests are waiting for the connection (they will wait up to 60 seconds (see reconnect-delay option above) and than fail). But suddenly, vm may totally hang in the deadlock. You may need to increase reconnect-delay period to catch the dead-lock. VM doesn't respond because drain dead-lock happens in cpu thread with global mutex taken. That's not good thing by itself and is not fixed by this commit (true way is using iothreads). Still this commit fixes drain dead-lock itself. Note: probably, we can instead continue to reconnect during drained section. To achieve this, we may move negotiation to the connect thread to make it independent of bs aio context. But expanding drained section doesn't seem good anyway. So, let's now fix the bug the simplest way. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200903190301.367620-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-09 15:04:32 -05:00
Peter Maydell	f2687fdb75	* Reverse debugging (Pavel) * CFLAGS cleanup (Paolo) * ASLR fix (Mark) * cpus.c refactoring (Claudio) -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl98EB0UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOCsQf9G7EUAK1zcEOx20LtDdXFrk4tjsRp S83OGdihWe8SM+XiY9BfqsBbXdByqF+SitePOV3feGK0mOP5vtJIL7/2DLrtFTeF wOeARRA9ePVb7hcL5oXAQeE3bXrX8wq8Qtw9xAoHdw5JAEVmKIEJS6AL5Eu3M2Fh pvdBoV84pOm2/ARS3eRstRyW8gCC8rdLDlNsVDtCbYdNVq+VdkzR0l5Phc8JDx1M Qjdl1KpN6ZkuN8M6tnaQNTb9IUVu5c1tu5jdR6JdLUqAWp1wYZJ6r2jSatZWfLR3 H+gzFsDoLPfCjZ3IhfZyvzF5leSZmdbFfzI0tHS1UJ/ZZYjutDvlPlbyYA== =Jys5 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging * Reverse debugging (Pavel) * CFLAGS cleanup (Paolo) * ASLR fix (Mark) * cpus.c refactoring (Claudio) # gpg: Signature made Tue 06 Oct 2020 07:35:09 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (37 commits) tests/acceptance: add reverse debugging test replay: create temporary snapshot at debugger connection replay: describe reverse debugging in docs/replay.txt gdbstub: add reverse continue support in replay mode gdbstub: add reverse step support in replay mode replay: flush rr queue before loading the vmstate replay: implement replay-seek command replay: introduce breakpoint at the specified step replay: introduce info hmp/qmp command qapi: introduce replay.json for record/replay-related stuff migration: introduce icount field for snapshots qcow2: introduce icount field for snapshots replay: provide an accessor for rr filename replay: don't record interrupt poll configure: don't enable ASLR for --enable-debug Windows builds configure: consistently pass CFLAGS/CXXFLAGS/LDFLAGS to meson configure: do not clobber environment CFLAGS/CXXFLAGS/LDFLAGS dtc: Convert Makefile bits to meson bits slirp: Convert Makefile bits to meson bits accel/tcg: use current_machine as it is always set for softmmu ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-10-06 15:04:10 +01:00
Pavel Dovgalyuk	b39847a505	migration: introduce icount field for snapshots Saving icount as a parameters of the snapshot allows navigation between them in the execution replay scenario. This information can be used for finding a specific snapshot for proceeding the recorded execution to the specific moment of the time. E.g., 'reverse step' action (introduced in one of the following patches) needs to load the nearest snapshot which is prior to the current moment of time. This patch also updates snapshot test which verifies qemu monitor output. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> -- v4 changes: - squashed format update with test output update v7 changes: - introduced the spaces between the fields in snapshot info output - updated the test to match new field widths Message-Id: <160174518865.12451.14327573383978752463.stgit@pasha-ThinkPad-X280> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-06 08:34:49 +02:00
Pavel Dovgalyuk	bbacffc5f7	qcow2: introduce icount field for snapshots This patch introduces the icount field for saving within the snapshot. It is required for navigation between the snapshots in record/replay mode. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> -- v7 changes: - also fix the test which checks qcow2 snapshot extra data Message-Id: <160174518284.12451.2301137308458777398.stgit@pasha-ThinkPad-X280> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-06 08:34:49 +02:00
Vladimir Sementsov-Ogievskiy	b33b354f3a	block/io: refactor save/load vmstate Like for read/write in a previous commit, drop extra indirection layer, generate directly bdrv_readv_vmstate() and bdrv_writev_vmstate(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-8-vsementsov@virtuozzo.com>	2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy	fae2681add	block: drop bdrv_prwv Now that we are not maintaining boilerplate code for coroutine wrappers, there is no more sense in keeping the extra indirection layer of bdrv_prwv(). Let's drop it and instead generate pure bdrv_preadv() and bdrv_pwritev(). Currently, bdrv_pwritev() and bdrv_preadv() are returning bytes on success, auto generated functions will instead return zero, as their _co_ prototype. Still, it's simple to make the conversion safe: the only external user of bdrv_pwritev() is test-bdrv-drain, and it is comfortable enough with bdrv_co_pwritev() instead. So prototypes are moved to local block/coroutines.h. Next, the only internal use is bdrv_pread() and bdrv_pwrite(), which are modified to return bytes on success. Of course, it would be great to convert bdrv_pread() and bdrv_pwrite() to return 0 on success. But this requires audit (and probably conversion) of all their users, let's leave it for another day refactoring. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-7-vsementsov@virtuozzo.com>	2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy	9bb4b066cc	block: generate coroutine-wrapper code Use code generation implemented in previous commit to generated coroutine wrappers in block.c and block/io.c Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-6-vsementsov@virtuozzo.com>	2020-10-05 10:59:42 +01:00
Vladimir Sementsov-Ogievskiy	aaaa20b69b	scripts: add block-coroutine-wrapper.py We have a very frequent pattern of creating a coroutine from a function with several arguments: - create a structure to pack parameters - create _entry function to call original function taking parameters from struct - do different magic to handle completion: set ret to NOT_DONE or EINPROGRESS or use separate bool field - fill the struct and create coroutine from _entry function with this struct as a parameter - do coroutine enter and BDRV_POLL_WHILE loop Let's reduce code duplication by generating coroutine wrappers. This patch adds scripts/block-coroutine-wrapper.py together with some friends, which will generate functions with declared prototypes marked by the 'generated_co_wrapper' specifier. The usage of new code generation is as follows: 1. define the coroutine function somewhere int coroutine_fn bdrv_co_NAME(...) {...} 2. declare in some header file int generated_co_wrapper bdrv_NAME(...); with same list of parameters (generated_co_wrapper is defined in "include/block/block.h"). 3. Make sure the block_gen_c declaration in block/meson.build mentions the file with your marker function. Still, no function is now marked, this work is for the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924185414.28642-5-vsementsov@virtuozzo.com> [Added encoding='utf-8' to open() calls as requested by Vladimir. Fixed typo and grammar issues pointed out by Eric Blake. Removed clang-format dependency that caused build test issues. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-10-05 10:59:06 +01:00
Vladimir Sementsov-Ogievskiy	21c2283ebc	block: declare some coroutine functions in block/coroutines.h We are going to keep coroutine-wrappers code (structure-packing parameters, BDRV_POLL wrapper functions) in separate auto-generated files. So, we'll need a header with declaration of original _co_ functions, for those which are static now. As well, we'll need declarations for wrapper functions. Do these declarations now, as a preparation step. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-4-vsementsov@virtuozzo.com>	2020-10-05 09:35:52 +01:00
Vladimir Sementsov-Ogievskiy	f9e694cb32	block/io: refactor coroutine wrappers Most of our coroutine wrappers already follow this convention: We have 'coroutine_fn bdrv_co_<something>(<normal argument list>)' as the core function, and a wrapper 'bdrv_<something>(<same argument list>)' which does parameter packing and calls bdrv_run_co(). The only outsiders are the bdrv_prwv_co and bdrv_common_block_status_above wrappers. Let's refactor them to behave as the others, it simplifies further conversion of coroutine wrappers. This patch adds an indirection layer, but it will be compensated by a further commit, which will drop bdrv_co_prwv together with the is_write logic, to keep the read and write paths separate. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-3-vsementsov@virtuozzo.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	eefffb0244	block/nvme: Replace magic value by SCALE_MS definition Use self-explicit SCALE_MS definition instead of magic value (missed in similar commit `e4f310fe7f`). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-7-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	fad1eb6886	block/nvme: Use register definitions from 'block/nvme.h' Use the NVMe register definitions from "block/nvme.h" which ease a bit reviewing the code while matching the datasheet. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-6-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	9406e0d97e	block/nvme: Drop NVMeRegs structure, directly use NvmeBar NVMeRegs only contains NvmeBar. Simplify the code by using NvmeBar directly. This triggers a checkpatch.pl error: ERROR: Use of volatile is usually wrong, please add a comment #30: FILE: block/nvme.c:691: + volatile NvmeBar *regs; This is a false positive as in our case we are using I/O registers, so the 'volatile' use is justified. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-5-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	37d7a45abd	block/nvme: Reduce I/O registers scope We only access the I/O register in nvme_init(). Remove the reference in BDRVNVMeState and reduce its scope. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-4-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	f68453237b	block/nvme: Map doorbells pages write-only Per the datasheet sections 3.1.13/3.1.14: "The host should not read the doorbell registers." As we don't need read access, map the doorbells with write-only permission. We keep a reference to this mapped address in the BDRVNVMeState structure. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-3-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Philippe Mathieu-Daudé	b02c01a513	util/vfio-helpers: Pass page protections to qemu_vfio_pci_map_bar() Pages are currently mapped READ/WRITE. To be able to use different protections, add a new argument to qemu_vfio_pci_map_bar(). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-2-philmd@redhat.com>	2020-10-05 09:35:52 +01:00
Alberto Garcia	c508c73dca	qcow2: Use L1E_SIZE in qcow2_write_l1_entry() We overlooked these in `02b1ecfa10` Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200928162333.14998-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	30dbc81d31	block/export: Move writable to BlockExportOptions The 'writable' option is a basic option that will probably be applicable to most if not all export types that we will implement. Move it from NBD to the generic BlockExport layer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-26-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	8cade320c8	block/export: Add query-block-exports This adds a simple QMP command to query the list of block exports. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-25-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	331170e073	block/export: Create BlockBackend in blk_exp_add() Every export type will need a BlockBackend, so creating it centrally in blk_exp_add() instead of the .create driver callback avoids duplication. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-24-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	37a4f70cea	block/export: Move blk to BlockExport Every block export has a BlockBackend representing the disk that is exported. It should live in BlockExport therefore. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-23-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	1a9f7a804f	block/export: Add BLOCK_EXPORT_DELETED event Clients may want to know when an export has finally disappeard (block-export-del returns earlier than that in the general case), so add a QAPI event for it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200924152717.287415-22-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	3c3bc462ad	block/export: Add block-export-del Implement a new QMP command block-export-del and make nbd-server-remove a wrapper around it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-21-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	3859ad36f0	block/export: Move strong user reference to block_exports The reference owned by the user/monitor that is created when adding the export and dropped when removing it was tied to the 'exports' list in nbd/server.c. Every block export will have a user reference, so move it to the block export level and tie it to the 'block_exports' list in block/export/export.c instead. This is necessary for introducing a QMP command for removing exports. Note that exports are present in block_exports even after the user has requested shutdown. This is different from NBD's exports where exports are immediately removed on a shutdown request, even if they are still in the process of shutting down. In order to avoid that the user still interacts with an export that is shutting down (and possibly removes it a second time), we need to remember if the user actually still owns it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-20-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	d53be9ce55	block/export: Add 'id' option to block-export-add We'll need an id to identify block exports in monitor commands. This adds one. Note that this is different from the 'name' option in the NBD server, which is the externally visible export name. While block export ids need to be unique in the whole process, export names must be unique only for the same server. Different export types or (potentially in the future) multiple NBD servers can have the same export name externally, but still need different block export ids internally. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-19-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	bc4ee65b8c	block/export: Add blk_exp_close_all(_type) This adds a function to shut down all block exports, and another one to shut down the block exports of a single type. The latter is used for now when stopping the NBD server. As soon as we implement support for multiple NBD servers, we'll need a per-server list of exports and it will be replaced by a function using that. As a side effect, the BlockExport layer has a list tracking all existing exports now. closed_exports loses its only user and can go away. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-18-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	a6ff798966	block/export: Allocate BlockExport in blk_exp_add() Instead of letting the driver allocate and return the BlockExport object, allocate it already in blk_exp_add() and pass it. This allows us to initialise the generic part before calling into the driver so that the driver can just use these values instead of having to parse the options a second time. For symmetry, move freeing the BlockExport to blk_exp_unref(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-17-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	b6076afcab	block/export: Add node-name to BlockExportOptions Every block export needs a block node to export, so add a 'node-name' option to BlockExportOptions and remove the replaced option 'device' from BlockExportOptionsNbd. To maintain compatibility in nbd-server-add, BlockExportOptionsNbd needs to be wrapped by a new type NbdServerAddOptions that adds 'device' back because nbd-server-add doesn't use the BlockExportOptions base type at all (so even without changing it to a 'node-name' option in block-export-add, this compatibility code would be necessary). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-16-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	8612c68673	block/export: Move AioContext from NBDExport to BlockExport Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-15-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	c69de1bef5	block/export: Move refcount from NBDExport to BlockExport Having a refcount makes sense for all types of block exports. It is also a prerequisite for keeping a list of all exports at the BlockExport level. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-14-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	1c8222b014	nbd: Add max-connections to nbd-server-start This is a QMP equivalent of qemu-nbd's --shared option, limiting the maximum number of clients that can attach at the same time. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-9-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	9b562c646b	block/export: Remove magic from block-export-add nbd-server-add tries to be convenient and adds two questionable features that we don't want to share in block-export-add, even for NBD exports: 1. When requesting a writable export of a read-only device, the export is silently downgraded to read-only. This should be an error in the context of block-export-add. 2. When using a BlockBackend name, unplugging the device from the guest will automatically stop the NBD server, too. This may sometimes be what you want, but it could also be very surprising. Let's keep things explicit with block-export-add. If the user wants to stop the export, they should tell us so. Move these things into the nbd-server-add QMP command handler so that they apply only there. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-8-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	56ee86261e	block/export: Add BlockExport infrastructure and block-export-add We want to have a common set of commands for all types of block exports. Currently, this is only NBD, but we're going to add more types. This patch adds the basic BlockExport and BlockExportDriver structs and a QMP command block-export-add that creates a new export based on the given BlockExportOptions. qmp_nbd_server_add() becomes a wrapper around qmp_block_export_add(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-5-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	143ea7670c	qapi: Rename BlockExport to BlockExportOptions The name BlockExport will be used for the struct containing the runtime state of block exports, so change the name of export creation options. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-4-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Kevin Wolf	5daa6bfd8e	qapi: Create block-export module Move all block export related types and commands from block-core to the new QAPI module block-export. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924152717.287415-3-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Philippe Mathieu-Daudé	74f2e02766	block/sheepdog: Replace magic val by NANOSECONDS_PER_SECOND definition Use self-explicit NANOSECONDS_PER_SECOND definition instead of magic value. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200921110145.520944-1-philmd@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-10-02 15:46:40 +02:00
Philippe Mathieu-Daudé	f68c01470b	qapi: Restrict query-uuid command to machine code Only qemu-system-FOO and qemu-storage-daemon provide QMP monitors, therefore such declarations and definitions are irrelevant for user-mode emulation. Restricting the query-uuid command to machine.json pulls less QAPI-generated code into user-mode. Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200913195348.1064154-6-philmd@redhat.com> [Commit message tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-09-29 15:41:35 +02:00
Stefan Hajnoczi	d73415a315	qemu/atomic.h: rename atomic_ to qatomic_ clang's C11 atomic_fetch_() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int ' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic$64$\?_[a-z0-9_]\+' include/qemu/atomic.h \| \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>	2020-09-23 16:07:44 +01:00
Daniel P. Berrangé	b18a24a9f8	block/file: switch to use qemu_open/qemu_create for improved errors Currently at startup if using cache=none on a filesystem lacking O_DIRECT such as tmpfs, at startup QEMU prints qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: file system may not support O_DIRECT qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: Could not open '/tmp/foo.img': Invalid argument while at QMP level the hint is missing, so QEMU reports just "error": { "class": "GenericError", "desc": "Could not open '/tmp/foo.img': Invalid argument" } which is close to useless for the end user trying to figure out what they did wrong. With this change at startup QEMU prints qemu-system-x86_64: -drive file=/tmp/foo.img,cache=none: Unable to open '/tmp/foo.img': filesystem does not support O_DIRECT while at the QMP level QEMU reports a massively more informative "error": { "class": "GenericError", "desc": "Unable to open '/tmp/foo.img': filesystem does not support O_DIRECT" } Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2020-09-16 10:33:48 +01:00
Daniel P. Berrangé	448058aa99	util: rename qemu_open() to qemu_open_old() We want to introduce a new version of qemu_open() that uses an Error object for reporting problems and make this it the preferred interface. Rename the existing method to release the namespace for the new impl. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2020-09-16 10:33:48 +01:00
Stefano Garzarella	7bae7c805d	block/rbd: add 'namespace' to qemu_rbd_strong_runtime_opts[] Commit `19ae9ae014` ("block/rbd: Add support for ceph namespaces") introduced namespace support for RBD, but we forgot to add the new 'namespace' options to qemu_rbd_strong_runtime_opts[]. The 'namespace' is used to identify the image, so it is a strong option since it can changes the data of a BDS. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1821528 Fixes: `19ae9ae014` ("block/rbd: Add support for ceph namespaces") Cc: Florian Florensa <fflorensa@online.net> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20200914190553.74871-1-sgarzare@redhat.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:31:10 +02:00
Alberto Garcia	bfd0989acf	qcow2: Convert qcow2_alloc_cluster_offset() into qcow2_alloc_host_offset() qcow2_alloc_cluster_offset() takes an (unaligned) guest offset and returns the (aligned) offset of the corresponding cluster in the qcow2 image. In practice none of the callers need to know where the cluster starts so this patch makes the function calculate and return the final host offset directly. The function is also renamed accordingly. See `388e581615` for a similar change to qcow2_get_cluster_offset(). Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <9bfef50ec9200d752413be4fc2aeb22a28378817.1599833007.git.berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:31:10 +02:00
Alberto Garcia	8e958260c5	qcow2: Make preallocate_co() resize the image to the correct size This function preallocates metadata structures and then extends the image to its new size, but that new size calculation is wrong because it doesn't take into account that the host_offset variable is always cluster-aligned. This problem can be reproduced with preallocation=metadata when the original size is not cluster-aligned but the new size is. In this case the final image size will be shorter than expected. qemu-img create -f qcow2 img.qcow2 31k qemu-img resize --preallocation=metadata img.qcow2 128k Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <adeb8b059917b141d5f5b3bd2a016262d3052c79.1599833007.git.berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> [mreitz: Mark compat=0.10 unsupported for iotest 125] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:30:36 +02:00
John Snow	c1dadda02c	block/qcow: remove runtime opts Introduced by `d85f4222b4`, These were seemingly never used at all. Signed-off-by: John Snow <jsnow@redhat.com> Message-Id: <20200806211345.2925343-3-jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
John Snow	30b70f070f	block/rbd: remove runtime_opts This saw its last use in `4bfb274165`. Signed-off-by: John Snow <jsnow@redhat.com> Message-Id: <20200806211345.2925343-2-jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	580384d637	qcow2: Return the original error code in qcow2_co_pwrite_zeroes() This function checks the current status of a (sub)cluster in order to see if an unaligned 'write zeroes' request can be done efficiently by simply updating the L2 metadata and without having to write actual zeroes to disk. If the situation does not allow using the fast path then the function returns -ENOTSUP and the caller falls back to writing zeroes. If can happen however that the aforementioned check returns an actual error code so in this case we should pass it to the caller. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200909123739.719-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	3fec237fca	qcow2: Make qcow2_free_any_clusters() free only one cluster This function takes an L2 entry and a number of clusters to free. Although in principle it can free any type of cluster (using the L2 entry to determine its type) in practice the API is broken because compressed clusters have a variable size and there is no way to free more than one without having the L2 entry of each one of them. The good news all callers are passing nb_clusters=1 so we can simply get rid of that parameter. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <77cea0f4616f921d37e971b3c5b18a2faa24b173.1599573989.git.berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	1a52b73dba	qcow2: Handle QCowL2Meta on error in preallocate_co() If qcow2_alloc_cluster_offset() or qcow2_alloc_cluster_link_l2() fail then this function simply returns the error code, potentially leaking the QCowL2Meta structure and leaving stale items in s->cluster_allocs. A second problem is that this function calls qcow2_free_any_clusters() on failure but passing a host cluster offset instead of an L2 entry. Luckily for normal uncompressed clusters a raw offset also works like a valid L2 entry so it works just the same, but we should be using qcow2_free_clusters() instead. This patch fixes both problems by using qcow2_handle_l2meta(). Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <cd3a6b9abd43f9c0b60be413d760f0cacc67eb66.1599573989.git.berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Swapnil Ingle	83a6a90009	block/vhdx: Support vhdx image only with 512 bytes logical sector size block/vhdx uses qemu block layer where sector size is always 512 bytes. This may have issues with 4K logical sector sized vhdx image. For e.g qemu-img convert on such images fails with following assert: $qemu-img convert -f vhdx -O raw 4KTest1.vhdx test.raw qemu-img: util/iov.c:388: qiov_slice: Assertion `offset + len <= qiov->size' failed. Aborted This patch adds an check to return ENOTSUP for vhdx images which have logical sector size other than 512 bytes. Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com> Message-Id: <1596794594-44531-1-git-send-email-swapnil.ingle@nutanix.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	2b60c5b996	qcow2: Rewrite the documentation of qcow2_alloc_cluster_offset() The current text corresponds to an earlier, simpler version of this function and it does not explain how it works now. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <bb5bd06f07c5a05b0818611de0d06ec5b66c8df3.1599150873.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	f7bd5bba1b	qcow2: Don't check nb_clusters when removing l2meta from the list In the past, when a new cluster was allocated the l2meta structure was a variable in the stack so it was necessary to have a way to tell whether it had been initialized and contained valid data or not. The nb_clusters field was used for this purpose. Since commit `f50f88b9fe` this is no longer the case, l2meta (nowadays a pointer to a list) is only allocated when needed and nb_clusters is guaranteed to be > 0 so this check is unnecessary. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <ab0b67c29c7ba26e598db35f12aa5ab5982539c1.1599150873.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	184581fa4d	qcow2: Fix removal of list members from BDRVQcow2State.cluster_allocs When a write request needs to allocate new clusters (or change the L2 bitmap of existing ones) a QCowL2Meta structure is created so the L2 metadata can be later updated and any copy-on-write can be performed if necessary. A write request can span a region consisting of an arbitrary combination of previously unallocated and allocated clusters, and if the unallocated ones can be put contiguous to the existing ones then QEMU will do so in order to minimize the number of write operations. In practice this means that a write request has not just one but a number of QCowL2Meta structures. All of them are added to the cluster_allocs list that is stored in BDRVQcow2State and is used to detect overlapping requests. After the write request finishes all its associated QCowL2Meta are removed from that list. calculate_l2_meta() takes care of creating and putting those structures in the list, and qcow2_handle_l2meta() takes care of removing them. The problem is that the error path in handle_alloc() also tries to remove an item in that list, a remnant from the time when this was handled there (that code would not even be correct anymore because it only removes one struct and not all the ones from the same write request). This can trigger a double removal of the same item from the list, causing a crash. This is not easy to reproduce in practice because it requires that do_alloc_cluster_offset() fails after a successful previous allocation during the same write request, but it can be reproduced with the included test case. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <3440a1c4d53c4fe48312b478c96accb338cbef7c.1599150873.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:13 +02:00
Alberto Garcia	02b1ecfa10	qcow2: Use macros for the L1, refcount and bitmap table entry sizes This patch replaces instances of sizeof(uint64_t) in the qcow2 driver with macros that indicate what those sizes are actually referring to. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200828110828.13833-1-berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:12 +02:00
Lukas Straub	5eb9a3c7b0	block/quorum.c: stable children names If we remove the child with the highest index from the quorum, decrement s->next_child_index. This way we get stable children names as long as we only remove the last child. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Fixes: https://bugs.launchpad.net/bugs/1881231 Reviewed-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <5d5f930424c1c770754041aa8ad6421dc4e2b58e.1596536719.git.lukasstraub2@web.de> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-15 11:05:12 +02:00
Peter Maydell	2499453eb1	Block layer patches: - qemu-img create: Fail gracefully when backing file is an empty string - Fixes related to filter block nodes ("Deal with filters" series) - block/nvme: Various cleanups required to use multiple queues - block/nvme: Use NvmeBar structure from "block/nvme.h" - file-win32: Fix "locking" option - iotests: Allow running from different directory -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl9Z7bcRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9ZS6w//bos+A0RfRRF0YFWkIBLQWxqzKcGvMJ8W XWv3mFzd47UaDgRYwVnCC3CR6bLYEINISngZ3geA4jI1+w7AtYKDOO0HN32dUg+D ZrNMn02701CA6qkmpxJ+yjsrl9ltR3jYe0me4Wr39Pvdexa2pl/e+M4Vas6FhkYL ghAwNThypscGCrFjAlz3ru2Sc/K+sPWrGoqkzr+SWvsm9wy4vb8aLxr8Yy50x/zc CqALS9SQ/YA93BCVi9CzPkVyV3ioA0kg/y38WvLtAQ9GZ3m/ekMro3WvdYsRsFCN LGXsuwFig+U7Kd7lJrCS9TLnlTJstNGqPq9jEoV5cThPvGknFfMvVOzRmmP7tzqT YRcPRy39z44OoLKa3kyg3aF38BTxt+9gPqBnivKMr9j9EecMvPsXXHRvF+lP+LsP j753Ih561hX6FurcjX8pc9GOM2cQA0GjlyL77UTTAmLZyFXP/8e55oQbBuYTylc/ Xlvmc/T+yEGiEGTnK+FxgDAiUaxbCCM9cDVStJjTvsIq43dwXb48g1onDsGZ5eDf j9lmAD6TJxHNOB5ErNsDPODf4/D1wJ9t9WVF8UZp9ArfPHRdxMzT7Q4LvetaDmVl +hQC9cgTq8Qd8LwSqbKEYua4L6iGbmLAT7/N6htq5L1eVLg76/tLg/tKSwh/vKAY yzPmyHaVK84= =gaaW -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - qemu-img create: Fail gracefully when backing file is an empty string - Fixes related to filter block nodes ("Deal with filters" series) - block/nvme: Various cleanups required to use multiple queues - block/nvme: Use NvmeBar structure from "block/nvme.h" - file-win32: Fix "locking" option - iotests: Allow running from different directory # gpg: Signature made Thu 10 Sep 2020 10:11:19 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (65 commits) block/qcow2-cluster: Add missing "fallthrough" annotation block/nvme: Pair doorbell registers block/nvme: Use generic NvmeBar structure block/nvme: Group controller registers in NVMeRegs structure file-win32: Fix "locking" option iotests: Allow running from different directory iotests: Test committing to overridden backing iotests: Add test for commit in sub directory iotests: Add filter mirror test cases iotests: Add filter commit test cases iotests: Let complete_and_wait() work with commit iotests: Test that qcow2's data-file is flushed block: Leave BDS.backing_{file,format} constant block: Inline bdrv_co_block_status_from_*() blockdev: Fix active commit choice block: Drop backing_bs() qemu-img: Use child access functions nbd: Use CAF when looking for dirty bitmap commit: Deal with filters backup: Deal with filters ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-11 14:47:49 +01:00
Thomas Huth	b9be6faed1	block/qcow2-cluster: Add missing "fallthrough" annotation When compiling with -Werror=implicit-fallthrough, the compiler currently complains: ../../devel/qemu/block/qcow2-cluster.c: In function ‘cluster_needs_new_alloc’: ../../devel/qemu/block/qcow2-cluster.c:1320:12: error: this statement may fall through [-Werror=implicit-fallthrough=] if (l2_entry & QCOW_OFLAG_COPIED) { ^ ../../devel/qemu/block/qcow2-cluster.c:1323:5: note: here case QCOW2_CLUSTER_UNALLOCATED: ^~~~ It's quite obvious that the fallthrough is intended here, so let's add a comment to silence the compiler warning. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20200908070028.193298-1-thuth@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé	e5ff22ba9f	block/nvme: Pair doorbell registers For each queue doorbell registers are paired as: - Submission Queue Tail Doorbell - Completion Queue Head Doorbell Reflect that in the NVMeRegs structure, and adapt nvme_create_queue_pair() accordingly. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200904124130.583838-4-philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Fam Zheng <fam@euphon.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé	c7100f0a0b	block/nvme: Use generic NvmeBar structure Commit `f3c507adcd` ("NVMe: Initial commit for new storage interface") introduced the NvmeBar structure. Unfortunately in commit `bdd6a90a9e` ("block: Add VFIO based NVMe driver") we duplicated it. Apparently in commit `a3d9a352d4` ("block: Move NVMe constants to a separate header") we tried to unify headers but forgot to remove the structure declared in the block/nvme.c source file. Do it now, and remove the structure size check which is redundant with the header check added in commit `74e18435c0` ("hw/block/nvme: Align I/O BAR to 4 KiB"). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200904124130.583838-3-philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Fam Zheng <fam@euphon.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:13 +02:00
Philippe Mathieu-Daudé	0ea32f34ce	block/nvme: Group controller registers in NVMeRegs structure We want to use the NvmeBar structure from "block/nvme.h" in the next commit. As a preliminary step, group all the NVMe controller registers in the 'ctrl' field, keeping the doorbells registers out of it. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200904124130.583838-2-philmd@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Fam Zheng <fam@euphon.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:12 +02:00
Kevin Wolf	3b079ac0ff	file-win32: Fix "locking" option The intended behaviour was that locking=off/auto work and have no effect (to remain compatible with file-posix), whereas locking=on would return an error. Unfortunately, the code forgot to remove "locking" from the options QDict, so any attempt to use the option would fail. Replace the option parsing code for "locking" with something that is part of the raw_runtime_opts QemuOptsList (so it is properly removed from the QDict) and looks more like file-posix. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200907092739.9988-1-kwolf@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-10 11:11:12 +02:00
Markus Armbruster	b15e402fc8	trace-events: Fix attribution of trace points to source Some trace points are attributed to the wrong source file. Happens when we neglect to update trace-events for code motion, or add events in the wrong place, or misspell the file name. Clean up with help of scripts/cleanup-trace-events.pl. Funnies requiring manual post-processing: * accel/tcg/cputlb.c trace points are in trace-events. * block.c and blockdev.c trace points are in block/trace-events. * hw/block/nvme.c uses the preprocessor to hide its trace point use from cleanup-trace-events.pl. * hw/tpm/tpm_spapr.c uses pseudo trace point tpm_spapr_show_buffer to guard debug code. * include/hw/xen/xen_common.h trace points are in hw/xen/trace-events. * linux-user/trace-events abbreviates a tedious list of filenames to /signal.c. net/colo-compare and net/filter-rewriter.c use pseudo trace points colo_compare_miscompare and colo_filter_rewriter_debug to guard debug code. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200806141334.3646302-5-armbru@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-09-09 17:17:58 +01:00
Markus Armbruster	6ec9379870	trace-events: Delete unused trace points Tracked down with the help of scripts/cleanup-trace-events.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 20200806141334.3646302-4-armbru@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-09-09 17:17:02 +01:00
Max Reitz	0b877d09df	block: Leave BDS.backing_{file,format} constant Parts of the block layer treat BDS.backing_file as if it were whatever the image header says (i.e., if it is a relative path, it is relative to the overlay), other parts treat it like a cache for bs->backing->bs->filename (relative paths are relative to the CWD). Considering bs->backing->bs->filename exists, let us make it mean the former. Among other things, this now allows the user to specify a base when using qemu-img to commit an image file in a directory that is not the CWD (assuming, everything uses relative filenames). Before this patch: $ ./qemu-img create -f qcow2 foo/bot.qcow2 1M $ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2 $ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2 $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2 qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2' $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2 qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2' $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2 qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 'foo/top.qcow2' After this patch: $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2 Image committed. $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2 qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2' $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2 Image committed. With this change, bdrv_find_backing_image() must look at whether the user has overridden a BDS's backing file. If so, it can no longer use bs->backing_file, but must instead compare the given filename against the backing node's filename directly. Note that this changes the QAPI output for a node's backing_file. We had very inconsistent output there (sometimes what the image header said, sometimes the actual filename of the backing image). This inconsistent output was effectively useless, so we have to decide one way or the other. Considering that bs->backing_file usually at runtime contained the path to the image relative to qemu's CWD (or absolute), this patch changes QAPI's backing_file to always report the bs->backing->bs->filename from now on. If you want to receive the image header information, you have to refer to full-backing-filename. This necessitates a change to iotest 228. The interesting information it really wanted is the image header, and it can get that now, but it has to use full-backing-filename instead of backing_file. Because of this patch's changes to bs->backing_file's behavior, we also need some reference output changes. Along with the changes to bs->backing_file, stop updating BDS.backing_format in bdrv_backing_attach() as well. This way, ImageInfo's backing-filename and backing-filename-format fields will represent what the image header says and nothing else. iotest 245 changes in behavior: With the backing node no longer overriding the parent node's backing_file string, you can now omit the @backing option when reopening a node with neither a default nor a current backing file even if it used to have a backing node at some point. 273 also changes: The base image is opened without a format layer, so ImageInfo.backing-filename-format used to report "file" for the base image's overlay after blockdev-snapshot. However, the image header never says "file" anywhere, so it now reports $IMGFMT. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	549ec0d978	block: Inline bdrv_co_block_status_from_*() With bdrv_filter_bs(), we can easily handle this default filter behavior in bdrv_co_block_status(). blkdebug wants to have an additional assertion, so it keeps its own implementation, except bdrv_co_block_status_from_file() needs to be inlined there. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	9a71b9de3f	commit: Deal with filters This includes some permission limiting (for example, we only need to take the RESIZE permission if the base is smaller than the top). Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	2b088c60bb	backup: Deal with filters Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	3f072a7fb7	mirror: Deal with filters This includes some permission limiting (for example, we only need to take the RESIZE permission for active commits where the base is smaller than the top). base_overlay is introduced so we can query bdrv_is_allocated_above() on it - we cannot do that with base itself, because a filter's block_status is the same as its child node, so if there are filters on base, bdrv_is_allocated_above() on base would return information including base. Use this opportunity to rename qmp_drive_mirror()'s "source" BDS to "target_backing_bs", because that is what it really refers to. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	c6f6d8462c	block-copy: Use CAF to find sync=top base Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	0a7585dbba	block: Use child access functions for QAPI queries query-block, query-named-block-nodes, and query-blockstats now return any filtered child under "backing", not just bs->backing or COW children. This is so that filters do not interrupt the reported backing chain. This changes the output for iotest 184, as the throttled node now appears as a backing child. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	3f26191c73	block: Report data child for query-blockstats It makes no sense to report the block stats of a purely metadata-storing child in query-blockstats. So if the primary child does not have any data, try to find a unique data-storing child. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	07cd7b659a	block/null: Implement bdrv_get_allocated_file_size It is trivial, so we might as well do it. Remove _filter_actual_image_size from iotest 184, so we get to see the result in its reference output. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	c8af87573f	block/snapshot: Fix fallback If the top node's driver does not provide snapshot functionality and we want to fall back to a node down the chain, we need to snapshot all non-COW children. For simplicity's sake, just do not fall back if there is more than one such child. Furthermore, we really only can fall back to bs->file and bs->backing, because bdrv_snapshot_goto() has to modify the child link (notably, set it to NULL). Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	c4db2e25df	block: Use CAF in bdrv_co_rw_vmstate() If a node whose driver does not provide VM state functions has a metadata child, the VM state should probably go there; if it is a filter, the VM state should probably go there. It follows that we should generally go down to the primary child. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	66b129ac5e	block: Iterate over children in refresh_limits Instead of looking at just bs->file and bs->backing, we should look at all children that could end up receiving forwarded requests. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	fb787f02a6	vmdk: Drop vmdk_co_flush() Before HEAD^, we needed this because bdrv_co_flush() by itself would only flush bs->file. With HEAD^, bdrv_co_flush() will flush all children on which a WRITE or WRITE_UNCHANGED permission has been taken. Thus, vmdk no longer needs to do it itself. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	883833e29c	block: Flush all children in generic code If the driver does not support .bdrv_co_flush() so bdrv_co_flush() itself has to flush the children of the given node, it should not flush just bs->file->bs, but in fact all children that might have been written to (judging from the permissions taken on them). This is a bug fix for qcow2 images with an external data file, as they so far did not flush that data_file node. In any case, the BLKDBG_EVENT() should be emitted on the primary child, because that is where a blkdebug node would be if there is any. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	23b93525a2	block: Use bdrv_cow_child() in bdrv_co_truncate() The condition modified here is not about potentially filtered children, but only about COW sources (i.e. traditional backing files). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	67acfd2188	stream: Deal with filters Because of the (not so recent anymore) changes that make the stream job independent of the base node and instead track the node above it, we have to split that "bottom" node into two cases: The bottom COW node, and the node directly above the base node (which may be an R/W filter or the bottom COW node). Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	cb8503159a	block: Use CAFs in block status functions Use the child access functions in the block status inquiry functions as appropriate. Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	93393e698c	block: Use bdrv_filter_(bs\|child) where obvious Places that use patterns like if (bs->drv->is_filter && bs->file) { ... something about bs->file->bs ... } should be BlockDriverState *filtered = bdrv_filter_bs(bs); if (filtered) { ... something about @filtered ... } instead. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	4935e8be22	copy-on-read: Support compressed writes Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	e7e754aec3	throttle: Support compressed writes Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:31 +02:00
Max Reitz	8b8277cdb0	block: Drop bdrv_is_encrypted() The original purpose of bdrv_is_encrypted() was to inquire whether a BDS can be used without the user entering a password or not. It has not been used for that purpose for quite some time. Actually, it is not even fit for that purpose, because to answer that question, it would have recursively query all of the given node's children. So now we have to decide in which direction we want to fix bdrv_is_encrypted(): Recursively query all children, or drop it and just use bs->encrypted to get the current node's status? Nowadays, its only purpose is to report through bdrv_query_image_info() whether the given image is encrypted or not. For this purpose, it is probably more interesting to see whether a given node itself is encrypted or not (otherwise, a management application cannot discern for certain which nodes are really encrypted and which just have encrypted children). Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	b111b3fcde	block/nvme: Use an array of EventNotifier In preparation of using multiple IRQ (thus multiple eventfds) make BDRVNVMeState::irq_notifier an array (for now of a single element, the admin queue notifier). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-16-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	7a1fb2ef40	block/nvme: Extract nvme_poll_queue() As we want to do per-queue polling, extract the nvme_poll_queue() method which operates on a single queue. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-15-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	0a28b02ef9	block/nvme: Simplify nvme_create_queue_pair() arguments nvme_create_queue_pair() doesn't require BlockDriverState anymore. Replace it by BDRVNVMeState and AioContext to simplify. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-14-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	073a06978c	block/nvme: Replace BDRV_POLL_WHILE by AIO_WAIT_WHILE BDRV_POLL_WHILE() is defined as: #define BDRV_POLL_WHILE(bs, cond) ({ \ BlockDriverState *bs_ = (bs); \ AIO_WAIT_WHILE(bdrv_get_aio_context(bs_), \ cond); }) As we will remove the BlockDriverState use in the next commit, start by using the exploded version of BDRV_POLL_WHILE(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-13-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	3a6d34d066	block/nvme: Simplify nvme_init_queue() arguments nvme_init_queue() doesn't require BlockDriverState anymore. Replace it by BDRVNVMeState to simplify. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-12-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	38e1f8186f	block/nvme: Replace qemu_try_blockalign(bs) by qemu_try_memalign(pg_sz) qemu_try_blockalign() is a generic API that call back to the block driver to return its page alignment. As we call from within the very same driver, we already know to page alignment stored in our state. Remove indirections and use the value from BDRVNVMeState. This change is required to later remove the BlockDriverState argument, to make nvme_init_queue() per hardware, and not per block driver. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-11-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	2ed846930d	block/nvme: Replace qemu_try_blockalign0 by qemu_try_blockalign/memset In the next commit we'll get rid of qemu_try_blockalign(). To ease review, first replace qemu_try_blockalign0() by explicit calls to qemu_try_blockalign() and memset(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-10-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:31:30 +02:00
Philippe Mathieu-Daudé	7d3b214ae4	block/nvme: Use union of NvmeIdCtrl / NvmeIdNs structures We allocate an unique chunk of memory then use it for two different structures. By using an union, we make it clear the data is overlapping (and we can remove the casts). Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-9-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:24:53 +02:00
Philippe Mathieu-Daudé	4d98093937	block/nvme: Rename local variable We are going to modify the code in the next commit. Renaming the 'resp' variable to 'id' first makes the next commit easier to review. No logical changes. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-8-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	c8edbfb2cc	block/nvme: Use common error path in nvme_add_io_queue() Rearrange nvme_add_io_queue() by using a common error path. This will be proven useful in few commits where we add IRQ notification to the IO queues. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-7-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	bf6ce5ec6d	block/nvme: Improve error message when IO queue creation failed Do not use the same error message for different failures. Display a different error whether it is the CQ or the SQ. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-6-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	73159e52e6	block/nvme: Define INDEX macros to ease code review Use definitions instead of '0' or '1' indexes. Also this will be useful when using multi-queues later. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-5-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	0ea45f76eb	block/nvme: Let nvme_create_queue_pair() fail gracefully As nvme_create_queue_pair() is allowed to fail, replace the alloc() calls by try_alloc() to avoid aborting QEMU. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-4-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	e266f52cfb	block/nvme: Avoid further processing if trace event not enabled Avoid further processing if TRACE_NVME_SUBMIT_COMMAND_RAW is not enabled. This is an untested intend of performance optimization. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-3-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Philippe Mathieu-Daudé	e4f310fe7f	block/nvme: Replace magic value by SCALE_MS definition Use self-explicit SCALE_MS definition instead of magic value. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200821195359.1285345-2-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-09-07 12:23:55 +02:00
Peter Maydell	df8176274a	nbd patches for 2020-09-02 - fix a few iotests affected by earlier nbd changes - avoid blocking qemu by nbd client in connect() - build qemu-nbd for mingw -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl9QFB8ACgkQp6FrSiUn Q2pTwggArPN7dRwm1KD9jL8X6PV01uhLuLRFzrrofFX22Blroj5XPbR24BdKeN8V uDSFGzzoz2Vx3vKPHyZdx7bbeEL0pF9dzjZX6JwzT0McbVuge5aG2zC/ARdfc4PN 1Yf2FB/nY8Xt5G12usu3FIz7JBoQNm4mlPnqVqf7t0LQxgUFvO7F2LernyqEOYKS uSpXHNFqddZcax7etXeldJOlSGJBQTaCmplrJbw2ilVhLZJD+0OglY4SAsrrenN+ gb/KD4REjhtQsmoTaqthGCnGXipEoJYDfEOMhbkl0UK+9Mx0t+3cj3tflG0eGldM ERk1d4d7VSlyqIy7w43v3+IB8M99ag== =7Cn5 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-09-02' into staging nbd patches for 2020-09-02 - fix a few iotests affected by earlier nbd changes - avoid blocking qemu by nbd client in connect() - build qemu-nbd for mingw # gpg: Signature made Wed 02 Sep 2020 22:52:31 BST # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2020-09-02: nbd: disable signals and forking on Windows builds nbd: skip SIGTERM handler if NBD device support is not built block: add missing socket_init() calls to tools block/nbd: use non-blocking connect: fix vm hang on connect() iotests/259: Fix reference output iotests/059: Fix reference output Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-03 21:35:01 +01:00
Vladimir Sementsov-Ogievskiy	1dc4718d84	block/nbd: use non-blocking connect: fix vm hang on connect() This makes nbd's connection_co yield during reconnects, so that reconnect doesn't block the main thread. This is very important in case of an unavailable nbd server host: connect() call may take a long time, blocking the main thread (and due to reconnect, it will hang again and again with small gaps of working time during pauses between connection attempts). Realization notes: - We don't want to implement non-blocking connect() over non-blocking socket, because getaddrinfo() doesn't have portable non-blocking realization anyway, so let's just use a thread for both getaddrinfo() and connect(). - We can't use qio_channel_socket_connect_async (which behaves similarly and starts a thread to execute connect() call), as it's relying on someone iterating main loop (g_main_loop_run() or something like this), which is not always the case. - We can't use thread_pool_submit_co API, as thread pool waits for all threads to finish (but we don't want to wait for blocking reconnect attempt on shutdown. So, we just create the thread by hand. Some additional difficulties are: - We want our connect to avoid blocking drained sections and aio context switches. To achieve this, we make it possible to "cancel" synchronous wait for the connect (which is a coroutine yield actually), still, the thread continues in background, and if successful, its result may be reused on next reconnect attempt. - We don't want to wait for reconnect on shutdown, so there is CONNECT_THREAD_RUNNING_DETACHED thread state, which means that the block layer is no longer interested in a result, and thread should close new connected socket on finish and free the state. How to reproduce the bug, fixed with this commit: 1. Create an image on node1: qemu-img create -f qcow2 xx 100M 2. Start NBD server on node1: qemu-nbd xx 3. Start vm with second nbd disk on node2, like this: ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \ file=/work/images/cent7.qcow2 -drive file=nbd+tcp://192.168.100.2 \ -vnc :0 -qmp stdio -m 2G -enable-kvm -vga std 4. Access the vm through vnc (or some other way?), and check that NBD drive works: dd if=/dev/sdb of=/dev/null bs=1M count=10 - the command should succeed. 5. Now, let's trigger nbd-reconnect loop in Qemu process. For this: 5.1 Kill NBD server on node1 5.2 run "dd if=/dev/sdb of=/dev/null bs=1M count=10" in the guest again. The command should fail and a lot of error messages about failing disk may appear as well. Now NBD client driver in Qemu tries to reconnect. Still, VM works well. 6. Make node1 unavailable on NBD port, so connect() from node2 will last for a long time: On node1 (Note, that 10809 is just a default NBD port): sudo iptables -A INPUT -p tcp --dport 10809 -j DROP After some time the guest hangs, and you may check in gdb that Qemu hangs in connect() call, issued from the main thread. This is the BUG. 7. Don't forget to drop iptables rule from your node1: sudo iptables -D INPUT -p tcp --dport 10809 -j DROP Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200812145237.4396-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: minor wording and formatting tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-09-02 16:47:23 -05:00
Peter Maydell	e4d8b7c1a9	qemu-nvme -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE28EdLTc7SjdV9QLsYlFWYQpPbMAFAl9Pro4ACgkQYlFWYQpP bMC1CRAAkawTI4mMcOfI3smFoMeiY8kZJWXJUBXfHbMJ4asaoIjTkH/lXRXBw7KQ sH5tB9CuOums3VjagkZ0Sw6R/kP1LbyJTAwq/pwOXwYRDc/E3zQpMblkIHH1boIM Bxl814hw3hBqV+D0wgeKpl83lbiOpd10Cbpdb/xNKat6qVquLGurSGKgA7jNuF4s oPTPtfZpyH9LUr4DV+sL+fGX6vaCdSFZPZUhJqwFfx79+r3+YiHGLAE6fgsdGDJt 2RSSKMqBe2gg0BY5ToW9L55BsLnwMMrAZnGzEkeZvRKqm0JZBXQsERa61p4VEAJf uYkSEqOwsKjXQNTdDEekyH67AkgXaoqG0hiiOcgoLsla7C0zROtoKcfVM/+WC0LT T0/bfgubmoDV8kLzPuOV8xOGxjfbu4Qlxy1JsIC6BU4zBQvpDwOeTx3MUWaCUfvk YmDMEhZWGcZ3RBLrgQmzm4ZKMtGdYXnGQz5dwVkRRfghQs2fl5ZmUjGR7MqKe18n 4K0nzhPiXbOTlqvLVvzVlrBzdc8ECAs1kVoJF7C3LwRmXbT2N/fUhZP/nYpeM2Hj DQNmA8KpXMKae2+2iDnQNWbvdpz3SiHD6dK7A1bEsdoG0L60xfyeAF+JuPiESUnd OAhf+muxKiInv2k5GNh7mDZPWM6nDepf/PZP6ohc7dKxVam7N2M= =Y23H -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/nvme/tags/pull-nvme-20200902' into staging qemu-nvme # gpg: Signature made Wed 02 Sep 2020 15:39:10 BST # gpg: using RSA key DBC11D2D373B4A3755F502EC625156610A4F6CC0 # gpg: Good signature from "Keith Busch <kbusch@kernel.org>" [unknown] # gpg: aka "Keith Busch <keith.busch@gmail.com>" [unknown] # gpg: aka "Keith Busch <keith.busch@intel.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DBC1 1D2D 373B 4A37 55F5 02EC 6251 5661 0A4F 6CC0 * remotes/nvme/tags/pull-nvme-20200902: (39 commits) hw/block/nvme: remove explicit qsg/iov parameters hw/block/nvme: use preallocated qsg/iov in nvme_dma_prp hw/block/nvme: consolidate qsg/iov clearing hw/block/nvme: add ns/cmd references in NvmeRequest hw/block/nvme: be consistent about zeros vs zeroes hw/block/nvme: add check for mdts hw/block/nvme: refactor request bounds checking hw/block/nvme: verify validity of prp lists in the cmb hw/block/nvme: add request mapping helper hw/block/nvme: add tracing to nvme_map_prp hw/block/nvme: refactor dma read/write hw/block/nvme: destroy request iov before reuse hw/block/nvme: remove redundant has_sg member hw/block/nvme: replace dma_acct with blk_acct equivalent hw/block/nvme: add mapping helpers hw/block/nvme: memset preallocated requests structures hw/block/nvme: bump supported version to v1.3 hw/block/nvme: provide the mandatory subnqn field hw/block/nvme: enforce valid queue creation sequence hw/block/nvme: reject invalid nsid values in active namespace id list ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-02 21:20:20 +01:00
Klaus Jensen	69265150aa	hw/block/nvme: be consistent about zeros vs zeroes The NVM Express specification generally uses 'zeroes' and not 'zeros', so let us align with it. Cc: Fam Zheng <fam@euphon.net> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>	2020-09-02 08:48:50 +02:00
Klaus Jensen	c26f217370	hw/block/nvme: bump spec data structures to v1.3 Add missing fields in the Identify Controller and Identify Namespace data structures to bring them in line with NVMe v1.3. This also adds data structures and defines for SGL support which requires a couple of trivial changes to the nvme block driver as well. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Fam Zheng <fam@euphon.net> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Message-Id: <20200706061303.246057-2-its@irrelevant.dk>	2020-09-02 08:48:50 +02:00
Peter Maydell	887adde81d	meson fixes: * bump submodule to 0.55.1 * SDL, pixman and zlib fixes * firmwarepath fix * fix firmware builds meson related: * move install to Meson * move NSIS to Meson * do not make meson use cmake * add description to options -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl9OcpcUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroM9+Af+InfjEFtsoubgQA2L7B1sHeksINOI nMAw9plmJSX0Qabp0PJDrMcBiLZlbdFGCm/88heTyDGRtbhIXLUrE2J0dyV4R4nR OSyuTgDna75QLxy6k1dIh6qVAtcj2hg+CxaJrUf2Ix6A1d1PAfWoweNXSF1LBmJf pyQ39eZXStI7/bkmwgTY3qK1gwjEaskvf68fTp1hgTN0VHwUb23/nucPaizbVF+a A0nWprPQaTjWhAFQ2jJesBbhN6FtD3EnOZs56JOSQ9J7W6uDnw8a+tOdEDbSYKe2 DdEUEjcz2cqAMCiWrzOMgxB8T885H+8yE6jgEmPHWaUUORQADMcBDMihNA== =NGLI -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging meson fixes: * bump submodule to 0.55.1 * SDL, pixman and zlib fixes * firmwarepath fix * fix firmware builds meson related: * move install to Meson * move NSIS to Meson * do not make meson use cmake * add description to options # gpg: Signature made Tue 01 Sep 2020 17:11:03 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (26 commits) Makefile: Fix in-tree clean/distclean Makefile: Add back TAGS/ctags/cscope rules meson: add description to options build: fix recurse-all target meson: use pkg-config method to find dependencies configure: do not include ${prefix} in firmwarepath meson: add pixman dependency to UI modules meson: add pixman dependency to chardev/baum module meson: add NSIS building meson: use meson mandir instead of qemu_mandir meson: pass docdir option meson: use meson datadir instead of qemu_datadir meson: pass qemu_suffix option configure: build docdir like other suffixed directories configure: always /-seperate directory from qemu_suffix configure: rename confsuffix option meson: move zlib detection to meson build-sys: remove install target from Makefile meson: install $localstatedir/run for qga meson: install desktop file ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-01 22:50:23 +01:00
Liao Pingfang	f181ab4ba5	block/vmdk: Remove superfluous breaks Remove superfluous breaks, as there is a "return" before them. Signed-off-by: Liao Pingfang <liao.pingfang@zte.com.cn> Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <1594631107-36574-1-git-send-email-wang.yi59@zte.com.cn> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-09-01 08:37:28 +02:00
Paolo Bonzini	a10c8516ed	block: always link with zlib The qcow2 driver needs the zlib dependency. While emulators provided it through the migration code, this is not true of the tools. Move the dependency from the qcow1 rule directly into block_ss so that it is included unconditionally. Fixes build with --disable-qcow1. Reported-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-01 01:51:51 -04:00
Eduardo Habkost	7c9dcd6cab	throttle-groups: Move ThrottleGroup typedef to header Move typedef closer to the type check macros, to make it easier to convert the code to OBJECT_DEFINE_TYPE() in the future. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Tested-By: Roman Bolshakov <r.bolshakov@yadro.com> Message-Id: <20200825192110.3528606-17-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-08-27 14:04:54 -04:00
Alberto Garcia	7bbb59202a	qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters This function is only used by qcow2_expand_zero_clusters() to downgrade a qcow2 image to a previous version. This would require transforming all extended L2 entries into normal L2 entries but this is not a simple task and there are no plans to implement this at the moment. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <15e65112b4144381b4d8c0bdf8fb76b0d813e3d1.1594396418.git.berto@igalia.com> [mreitz: Fixed comment style] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 10:20:15 +02:00
Alberto Garcia	2118771ddf	qcow2: Allow preallocation and backing files if extended_l2 is set Traditional qcow2 images don't allow preallocation if a backing file is set. This is because once a cluster is allocated there is no way to tell that its data should be read from the backing file. Extended L2 entries have individual allocation bits for each subcluster, and therefore it is perfectly possible to have an allocated cluster with all its subclusters unallocated. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <6d5b0f38e7dc5f2f31d8cab1cb92044e9909aece.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 09:20:04 +02:00
Alberto Garcia	7be2025258	qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Now that the implementation of subclusters is complete we can finally add the necessary options to create and read images with this feature, which we call "extended L2 entries". Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <6476caaa73216bd05b7bb2d504a20415e1665176.1594396418.git.berto@igalia.com> [mreitz: %s/5\.1/5.2/; fixed 302's and 303's reference output] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 09:19:55 +02:00
Alberto Garcia	40dee94320	qcow2: Add prealloc field to QCowL2Meta This field allows us to indicate that the L2 metadata update does not come from a write request with actual data but from a preallocation request. For traditional images this does not make any difference, but for images with extended L2 entries this means that the clusters are allocated normally in the L2 table but individual subclusters are marked as unallocated. This will allow preallocating images that have a backing file. There is one special case: when we resize an existing image we can also request that the new clusters are preallocated. If the image already had a backing file then we have to hide any possible stale data and zero out the new clusters (see commit `955c7d6687` for more details). In this case the subclusters cannot be left as unallocated so the L2 bitmap must be updated. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <960d4c444a4f5a870e2b47e5da322a73cd9a2f5a.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	0dd07b298f	qcow2: Add subcluster support to qcow2_measure() Extended L2 entries are bigger than normal L2 entries so this has an impact on the amount of metadata needed for a qcow2 file. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <7efae2efd5e36b42d2570743a12576d68ce53685.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	a6841a2de6	qcow2: Add subcluster support to qcow2_co_pwrite_zeroes() This works now at the subcluster level and pwrite_zeroes_alignment is updated accordingly. qcow2_cluster_zeroize() is turned into qcow2_subcluster_zeroize() with the following changes: - The request can now be subcluster-aligned. - The cluster-aligned body of the request is still zeroized using zero_in_l2_slice() as before. - The subcluster-aligned head and tail of the request are zeroized with the new zero_l2_subclusters() function. There is just one thing to take into account for a possible future improvement: compressed clusters cannot be partially zeroized so zero_l2_subclusters() on the head or the tail can return -ENOTSUP. This makes the caller repeat the complete request and write actual zeroes to disk. This is sub-optimal because 1) if the head area was compressed we would still be able to use the fast path for the body and possibly the tail. 2) if the tail area was compressed we are writing zeroes to the head and the body areas, which are already zeroized. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <17e05e2ee7e12f10dcf012da81e83ebe27eb3bef.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	bf4a66eed4	qcow2: Add subcluster support to handle_alloc_space() The bdrv_co_pwrite_zeroes() call here fills complete clusters with zeroes, but it can happen that some subclusters are not part of the write request or the copy-on-write. This patch makes sure that only the affected subclusters are overwritten. A potential improvement would be to also fill with zeroes the other subclusters if we can guarantee that we are not overwriting existing data. However this would waste more disk space, so we should first evaluate if it's really worth doing. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <b3dc97e8e2240ddb5191a4f930e8fc9653f94621.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	ff4cdec7f6	qcow2: Clear the L2 bitmap when allocating a compressed cluster Compressed clusters always have the bitmap part of the extended L2 entry set to 0. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <04455b3de5dfeb9d1cfe1fc7b02d7060a6e09710.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	aca00cd971	qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() The L2 bitmap needs to be updated after each write to indicate what new subclusters are now allocated. This needs to happen even if the cluster was already allocated and the L2 entry was otherwise valid. In some cases however a write operation doesn't need change the L2 bitmap (because all affected subclusters were already allocated). This is detected in calculate_l2_meta(), and qcow2_alloc_cluster_link_l2() is never called in those cases. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <0875620d49f44320334b6a91c73b3f301f975f38.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	fc2e6528d5	qcow2: Add subcluster support to check_refcounts_l2() The offset field of an uncompressed cluster's L2 entry must be aligned to the cluster size, otherwise it is invalid. If the cluster has no data then it means that the offset points to a preallocation, so we can clear the offset field without affecting the guest-visible data. This is what 'qemu-img check' does when run in repair mode. On traditional qcow2 images this can only happen when QCOW_OFLAG_ZERO is set, and repairing such entries turns the clusters from ZERO_ALLOC into ZERO_PLAIN. Extended L2 entries have no ZERO_ALLOC clusters and no QCOW_OFLAG_ZERO but the idea is the same: if none of the subclusters are allocated then we can clear the offset field and leave the bitmap untouched. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <9f4ed1d0a34b0a545b032c31ecd8c14734065342.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	a68cd70326	qcow2: Add subcluster support to discard_in_l2_slice() Two things need to be taken into account here: 1) With full_discard == true the L2 entry must be cleared completely. This also includes the L2 bitmap if the image has extended L2 entries. 2) With full_discard == false we have to make the discarded cluster read back as zeroes. With normal L2 entries this is done with the QCOW_OFLAG_ZERO bit, whereas with extended L2 entries this is done with the individual 'all zeroes' bits for each subcluster. Note however that QCOW_OFLAG_ZERO is not supported in v2 qcow2 images so, if there is a backing file, discard cannot guarantee that the image will read back as zeroes. If this is important for the caller it should forbid it as qcow2_co_pdiscard() does (see `80f5c01183` for more details). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <5ef8274e628aa3ab559bfac467abf488534f2b76.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	205fa50750	qcow2: Add subcluster support to zero_in_l2_slice() The QCOW_OFLAG_ZERO bit that indicates that a cluster reads as zeroes is only used in standard L2 entries. Extended L2 entries use individual 'all zeroes' bits for each subcluster. This must be taken into account when updating the L2 entry and also when deciding that an existing entry does not need to be updated. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <b61d61606d8c9b367bd641ab37351ddb9172799a.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	3f9c6b3b1f	qcow2: Add subcluster support to qcow2_get_host_offset() The logic of this function remains pretty much the same, except that it uses count_contiguous_subclusters(), which combines the logic of count_contiguous_clusters() / count_contiguous_clusters_unallocated() and checks individual subclusters. qcow2_cluster_to_subcluster_type() is not necessary as a separate function anymore so it's inlined into its caller. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <d2193fd48653a350d80f0eca1c67b1d9053fb2f3.1594396418.git.berto@igalia.com> [mreitz: Initialize expected_type to anything] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	d53ec3d8d8	qcow2: Add subcluster support to calculate_l2_meta() If an image has subclusters then there are more copy-on-write scenarios that we need to consider. Let's say we have a write request from the middle of subcluster #3 until the end of the cluster: 1) If we are writing to a newly allocated cluster then we need copy-on-write. The previous contents of subclusters #0 to #3 must be copied to the new cluster. We can optimize this process by skipping all leading unallocated or zero subclusters (the status of those skipped subclusters will be reflected in the new L2 bitmap). 2) If we are overwriting an existing cluster: 2.1) If subcluster #3 is unallocated or has the all-zeroes bit set then we need copy-on-write (on subcluster #3 only). 2.2) If subcluster #3 was already allocated then there is no need for any copy-on-write. However we still need to update the L2 bitmap to reflect possible changes in the allocation status of subclusters #4 to #31. Because of this, this function checks if all the overwritten subclusters are already allocated and in this case it returns without creating a new QCowL2Meta structure. After all these changes l2meta_cow_start() and l2meta_cow_end() are not necessarily cluster-aligned anymore. We need to update the calculation of old_start and old_end in handle_dependencies() to guarantee that no two requests try to write on the same cluster. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <4292dd56e4446d386a2fe307311737a711c00708.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	97490a143e	qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC When dealing with subcluster types there is a new value called QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC that has no equivalent in QCow2ClusterType. This patch handles that value in all places where subcluster types are processed. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <bf09e2e2439a468a901bb96ace411eed9ee50295.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	10dabdc596	qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_* In order to support extended L2 entries some functions of the qcow2 driver need to start dealing with subclusters instead of clusters. qcow2_get_host_offset() is modified to return the subcluster type instead of the cluster type, and all callers are updated to replace all values of QCow2ClusterType with their QCow2SubclusterType equivalents. This patch only changes the data types, there are no semantic changes. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <f6c29737c295f32cbee74c903c30b01820363b34.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	ca4a0bb81b	qcow2: Add cluster type parameter to qcow2_get_host_offset() This function returns an integer that can be either an error code or a cluster type (a value from the QCow2ClusterType enum). We are going to start using subcluster types instead of cluster types in some functions so it's better to use the exact data types instead of integers for clarity and in order to detect errors more easily. This patch makes qcow2_get_host_offset() return 0 on success and puts the returned cluster type in a separate parameter. There are no semantic changes. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <396b6eab1859a271551dcd7dcba77f8934aa3c3f.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	c94d037825	qcow2: Add qcow2_cluster_is_allocated() This helper function tells us if a cluster is allocated (that is, there is an associated host offset for it). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <6d8771c5c79cbdc6c519875a5078e1cc85856d63.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	70d1cbae03	qcow2: Add qcow2_get_subcluster_range_type() There are situations in which we want to know how many contiguous subclusters of the same type there are in a given cluster. This can be done by simply iterating over the subclusters and repeatedly calling qcow2_get_subcluster_type() for each one of them. However once we determined the type of a subcluster we can check the rest efficiently by counting the number of adjacent ones (or zeroes) in the bitmap. This is what this function does. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <db917263d568ec6ffb4a41cac3c9100f96bf6c18.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	34905d8eb1	qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type() This patch adds QCow2SubclusterType, which is the subcluster-level version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the the same meaning as their QCOW2_CLUSTER_* equivalents (when they exist). See below for details and caveats. In images without extended L2 entries clusters are treated as having exactly one subcluster so it is possible to replace one data type with the other while keeping the exact same semantics. With extended L2 entries there are new possible values, and every subcluster in the same cluster can obviously have a different QCow2SubclusterType so functions need to be adapted to work on the subcluster level. There are several things that have to be taken into account: a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is compressed. We do not support compression at the subcluster level. b) There are two different values for unallocated subclusters: QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC which means that the cluster is allocated but the subcluster is not. The latter can only happen in images with extended L2 entries. c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2 entry has a value that violates the specification. The caller is responsible for handling these situations. To prevent compatibility problems with images that have invalid values but are currently being read by QEMU without causing side effects, QCOW2_SUBCLUSTER_INVALID is only returned for images with extended L2 entries. qcow2_cluster_to_subcluster_type() is added as a separate function from qcow2_get_subcluster_type(), but this is only temporary and both will be merged in a subsequent patch. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <26ef38e270f25851c98b51278852b4c4a7f97e69.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	39a9f0a50e	qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Extended L2 entries are 128-bit wide: 64 bits for the entry itself and 64 bits for the subcluster allocation bitmap. In order to support them correctly get/set_l2_entry() need to be updated so they take the entry width into account in order to calculate the correct offset. This patch also adds the get/set_l2_bitmap() functions that are used to access the bitmaps. For convenience we allow calling get_l2_bitmap() on images without subclusters. In this case the returned value is always 0 and has no meaning. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <6ee0f81ae3329c991de125618b3675e1e46acdbb.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	c8fd8554d9	qcow2: Add l2_entry_size() qcow2 images with subclusters have 128-bit L2 entries. The first 64 bits contain the same information as traditional images and the last 64 bits form a bitmap with the status of each individual subcluster. Because of that we cannot assume that L2 entries are sizeof(uint64_t) anymore. This function returns the proper value for the image. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <d34d578bd0380e739e2dde3e8dd6187d3d249fa9.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	3e71981592	qcow2: Add offset_into_subcluster() and size_to_subclusters() Like offset_into_cluster() and size_to_clusters(), but for subclusters. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <3cc2390dcdef3d234d47c741b708bd8734490862.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	a53e8b7202	qcow2: Add offset_to_sc_index() For a given offset, return the subcluster number within its cluster (i.e. with 32 subclusters per cluster it returns a number between 0 and 31). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <56e3e4ac0d827c6a2f5f259106c5ddb7c4ca2653.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	d0346b5591	qcow2: Add subcluster-related fields to BDRVQcow2State This patch adds the following new fields to BDRVQcow2State: - subclusters_per_cluster: Number of subclusters in a cluster - subcluster_size: The size of each subcluster, in bytes - subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size Images without subclusters are treated as if they had exactly one subcluster per cluster (i.e. subcluster_size = cluster_size). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <55bfeac86b092fa2c9d182a95cbeb479ff7eca4f.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	a3c7d91625	qcow2: Add dummy has_subclusters() function This function will be used by the qcow2 code to check if an image has subclusters or not. At the moment this simply returns false. Once all patches needed for subcluster support are ready then QEMU will be able to create and read images with subclusters and this function will return the actual value. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <905526221083581a1b7057bca1585487661c5c13.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	12c6aebedf	qcow2: Add get_l2_entry() and set_l2_entry() The size of an L2 entry is 64 bits, but if we want to have subclusters we need extended L2 entries. This means that we have to access L2 tables and slices differently depending on whether an image has extended L2 entries or not. This patch replaces all l2_slice[] accesses with calls to get_l2_entry() and set_l2_entry(). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <9586363531fec125ba1386e561762d3e4224e9fc.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	57538c864f	qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() When writing to a qcow2 file there are two functions that take a virtual offset and return a host offset, possibly allocating new clusters if necessary: - handle_copied() looks for normal data clusters that are already allocated and have a reference count of 1. In those clusters we can simply write the data and there is no need to perform any copy-on-write. - handle_alloc() looks for clusters that do need copy-on-write, either because they haven't been allocated yet, because their reference count is != 1 or because they are ZERO_ALLOC clusters. The ZERO_ALLOC case is a bit special because those are clusters that are already allocated and they could perfectly be dealt with in handle_copied() (as long as copy-on-write is performed when required). In fact, there is extra code specifically for them in handle_alloc() that tries to reuse the existing allocation if possible and frees them otherwise. This patch changes the handling of ZERO_ALLOC clusters so the semantics of these two functions are now like this: - handle_copied() looks for clusters that are already allocated and which we can overwrite (NORMAL and ZERO_ALLOC clusters with a reference count of 1). - handle_alloc() looks for clusters for which we need a new allocation (all other cases). One important difference after this change is that clusters found in handle_copied() may now require copy-on-write, but this will be necessary anyway once we add support for subclusters. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <eb17fc938f6be7be2e8d8ff42763d2c19241f866.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	c1587d877e	qcow2: Split cluster_needs_cow() out of count_cow_clusters() We are going to need it in other places. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <65e5d9627ca2ebe7e62deaeddf60949c33067d9d.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	8f91d6906c	qcow2: Add calculate_l2_meta() handle_alloc() creates a QCowL2Meta structure in order to update the image metadata and perform the necessary copy-on-write operations. This patch moves that code to a separate function so it can be used from other places. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <e5bc4a648dac31972bfa7a0e554be8064be78799.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	388e581615	qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset() qcow2_get_cluster_offset() takes an (unaligned) guest offset and returns the (aligned) offset of the corresponding cluster in the qcow2 image. In practice none of the callers need to know where the cluster starts so this patch makes the function calculate and return the final host offset directly. The function is also renamed accordingly. There is a pre-existing exception with compressed clusters: in this case the function returns the complete cluster descriptor (containing the offset and size of the compressed data). This does not change with this patch but it is now documented. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <ffae6cdc5ca8950e8280ac0f696dcc376cb07095.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Alberto Garcia	9c4269d54b	qcow2: Make Qcow2AioTask store the full host offset The file_cluster_offset field of Qcow2AioTask stores a cluster-aligned host offset. In practice this is not very useful because all users() of this structure need the final host offset into the cluster, which they calculate using host_offset = file_cluster_offset + offset_into_cluster(s, offset) There is no reason why Qcow2AioTask cannot store host_offset directly and that is what this patch does. () compressed clusters are the exception: in this case what file_cluster_offset was storing was the full compressed cluster descriptor (offset + size). This does not change with this patch but it is documented now. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <07c4b15c644dcf06c9459f98846ac1c4ea96e26f.1594396418.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-25 08:33:20 +02:00
Marc-André Lureau	5e5733e599	meson: convert block Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-08-21 06:30:18 -04:00
Paolo Bonzini	243af0225a	trace: switch position of headers to what Meson requires Meson doesn't enjoy the same flexibility we have with Make in choosing the include path. In particular the tracing headers are using $(build_root)/$(<D). In order to keep the include directives unchanged, the simplest solution is to generate headers with patterns like "trace/trace-audio.h" and place forwarding headers in the source tree such that for example "audio/trace.h" includes "trace/trace-audio.h". This patch is too ugly to be applied to the Makefiles now. It's only a way to separate the changes to the tracing header files from the Meson rewrite of the tracing logic. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-08-21 06:18:24 -04:00
Stefan Reiter	7661a886a1	block/block-copy: always align copied region to cluster size Since commit `42ac214406` (block/block-copy: refactor task creation) block_copy_task_create calculates the area to be copied via bdrv_dirty_bitmap_next_dirty_area, but that can return an unaligned byte count if the image's last cluster end is not aligned to the bitmap's granularity. Always ALIGN_UP the resulting bytes value to satisfy block_copy_do_copy, which requires the 'bytes' parameter to be aligned to cluster size. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> Message-Id: <20200810095523.15071-1-s.reiter@proxmox.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-08-10 17:12:46 +02:00
Tuguoyi	348fcc4f7a	qcow2-cluster: Fix integer left shift error in qcow2_alloc_cluster_link_l2() When calculating the offset, the result of left shift operation will be promoted to type int64 automatically because the left operand of + operator is uint64_t. but the result after integer promotion may be produce an error value for us and trigger the following asserting error. For example, consider i=0x2000, cluster_bits=18, the result of left shift operation will be 0x80000000. Cause argument i is of signed integer type, the result is automatically promoted to 0xffffffff80000000 which is not we expected The way to trigger the assertion error: qemu-img create -f qcow2 -o preallocation=full,cluster_size=256k tmpdisk 10G This patch fix it by casting @i to uint64_t before doing left shift operation Signed-off-by: Guoyi Tu <tu.guoyi@h3c.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 81ba90fe0c014f269621c283269b42ad@h3c.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-08-05 14:56:11 +01:00
Max Reitz	fe16c7ddf8	qcow2: Release read-only bitmaps when inactivated During migration, we release all bitmaps after storing them on disk, as long as they are (1) stored on disk, (2) not read-only, and (3) consistent. (2) seems arbitrary, though. The reason we do not release them is because we do not write them, as there is no need to; and then we just forget about all bitmaps that we have not written to the file. However, read-only persistent bitmaps are still in the file and in sync with their in-memory representation, so we may as well release them just like any R/W bitmap that we have updated. It leads to actual problems, too: After migration, letting the source continue may result in an error if there were any bitmaps on read-only nodes (such as backing images), because those have not been released by bdrv_inactive_all(), but bdrv_invalidate_cache_all() attempts to reload them (which fails, because they are still present in memory). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200730120234.49288-2-mreitz@redhat.com> Tested-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-08-03 08:59:37 -05:00
Peter Maydell	5045be872d	nbd patches for 2020-07-28 - fix NBD handling of trim/zero requests larger than 2G - allow no-op resizes on NBD (in turn fixing qemu-img convert -c into NBD) - several deadlock fixes when using NBD reconnect -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl8gPV4ACgkQp6FrSiUn Q2ozdQgAiDHaHG2NX4jmduID7677/XhsLoVl1MV7UZnU+y9qQ2p+Mbsw1oMneu8P Dtfgx/mlWVGu68gn31f4xVq74VTZH6p3IGV7PMcYZ50xbESoFs6CYUwUWUp1GeC3 +kPOl0EpLvm1W/V93sKmg8FflGmNiJHNkfl/ddfk0gs6Z3EfjkmGJt7IP/pv1UCs 4icWvCJsqw2z8TnEwtTpMX5HZlWth1x37lUOShlPL5kA5hZqU+zYU/bYB5iKx+16 MebYg7C7CXYCCtH9cDH/swUWhOdQLkywA6yBAwc1zENsKy84aIAJIUls/Ji0q6CY A4s5c0FovLBuMDd9oLr0kJbkJQeVZA== =DD6l -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-07-28' into staging nbd patches for 2020-07-28 - fix NBD handling of trim/zero requests larger than 2G - allow no-op resizes on NBD (in turn fixing qemu-img convert -c into NBD) - several deadlock fixes when using NBD reconnect # gpg: Signature made Tue 28 Jul 2020 15:59:42 BST # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2020-07-28: block/nbd: nbd_co_reconnect_loop(): don't sleep if drained block/nbd: on shutdown terminate connection attempt block/nbd: allow drain during reconnect attempt block/nbd: split nbd_establish_connection out of nbd_client_connect iotests: Test convert to qcow2 compressed to NBD iotests: Add more qemu_img helpers iotests: Make qemu_nbd_popen() a contextmanager block: nbd: Fix convert qcow2 compressed to nbd nbd: Fix large trim/zero requests Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-28 20:43:03 +01:00
Peter Maydell	0c4fa5bc1a	Block patches for 5.1.0: - Fix block I/O for split transfers - Fix iotest 197 for non-qcow2 formats -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl8gK/gSHG1yZWl0ekBy ZWRoYXQuY29tAAoJEPQH2wBh1c9AR+kIALv+Z/A6SPpsAHjpyuRbluuhznfqPuiX mIVX0qNhsFBDAUVw1tOkMtfxOIvuaQW/QWzM0UPaHqB/I4ckzE6Dp98ys9uwHPdq ez23blWvBuB3P3y2ZBAYhhRlCqt3w4uI/lIJMu7VZBghXxj3fGcuTnLlWx8gb1IH 74MiBX8XPt532FiFTnpzxgns8NYkZY8mF6zduGqBPx6bPmdNdDfqAhL68Fv8uKJA k4dVH6ffPLZD+RrCz9GL5rsYQ6NR6tfyEoRMPqtJznhtzWwu5h5EF3p46VkcKheI k0axygEBAr9JbeCwbIK3a4hjQ7eaFQ6j9JR+lPZBRaDbLHv/xGNNuvw= =C4Lq -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2020-07-28' into staging Block patches for 5.1.0: - Fix block I/O for split transfers - Fix iotest 197 for non-qcow2 formats # gpg: Signature made Tue 28 Jul 2020 14:45:28 BST # gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40 # gpg: issuer "mreitz@redhat.com" # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full] # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * remotes/maxreitz/tags/pull-block-2020-07-28: iotests/197: Fix for non-qcow2 formats iotests/028: Add test for cross-base-EOF reads block: Fix bdrv_aligned_p*v() for qiov_offset != 0 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-28 18:00:21 +01:00
Vladimir Sementsov-Ogievskiy	12c75e20a2	block/nbd: nbd_co_reconnect_loop(): don't sleep if drained We try to go to wakeable sleep, so that, if drain begins it will break the sleep. But what if nbd_client_co_drain_begin() already called and s->drained is already true? We'll go to sleep, and drain will have to wait for the whole timeout. Let's improve it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200727184751.15704-5-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-28 09:54:43 -05:00
Vladimir Sementsov-Ogievskiy	fbeb3e63b3	block/nbd: on shutdown terminate connection attempt On shutdown nbd driver may be in a connecting state. We should shutdown it as well, otherwise we may hang in nbd_teardown_connection, waiting for conneciton_co to finish in BDRV_POLL_WHILE(bs, s->connection_co) loop if remote server is down. How to reproduce the dead lock: 1. Create nbd-fault-injector.conf with the following contents: [inject-error "mega1"] event=data io=readwrite when=before 2. In one terminal run nbd-fault-injector in a loop, like this: n=1; while true; do echo $n; ((n++)); ./nbd-fault-injector.py 127.0.0.1:10000 nbd-fault-injector.conf; done 3. In another terminal run qemu-io in a loop, like this: n=1; while true; do echo $n; ((n++)); ./qemu-io -c 'read 0 512' nbd://127.0.0.1:10000; done After some time, qemu-io will hang. Note, that this hang may be triggered by another bug, so the whole case is fixed only together with commit "block/nbd: allow drain during reconnect attempt". Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200727184751.15704-4-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-28 09:54:43 -05:00
Vladimir Sementsov-Ogievskiy	dd1ec1a4af	block/nbd: allow drain during reconnect attempt It should be safe to reenter qio_channel_yield() on io/channel read/write path, so it's safe to reduce in_flight and allow attaching new aio context. And no problem to allow drain itself: connection attempt is not a guest request. Moreover, if remote server is down, we can hang in negotiation, blocking drain section and provoking a dead lock. How to reproduce the dead lock: 1. Create nbd-fault-injector.conf with the following contents: [inject-error "mega1"] event=data io=readwrite when=before 2. In one terminal run nbd-fault-injector in a loop, like this: n=1; while true; do echo $n; ((n++)); ./nbd-fault-injector.py 127.0.0.1:10000 nbd-fault-injector.conf; done 3. In another terminal run qemu-io in a loop, like this: n=1; while true; do echo $n; ((n++)); ./qemu-io -c 'read 0 512' nbd://127.0.0.1:10000; done After some time, qemu-io will hang trying to drain, for example, like this: #3 aio_poll (ctx=0x55f006bdd890, blocking=true) at util/aio-posix.c:600 #4 bdrv_do_drained_begin (bs=0x55f006bea710, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:427 #5 bdrv_drained_begin (bs=0x55f006bea710) at block/io.c:433 #6 blk_drain (blk=0x55f006befc80) at block/block-backend.c:1710 #7 blk_unref (blk=0x55f006befc80) at block/block-backend.c:498 #8 bdrv_open_inherit (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000", reference=0x0, options=0x55f006be86d0, flags=24578, parent=0x0, child_class=0x0, child_role=0, errp=0x7fffba154620) at block.c:3491 #9 bdrv_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000", reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at block.c:3513 #10 blk_new_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000", reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at block/block-backend.c:421 And connection_co stack like this: #0 qemu_coroutine_switch (from_=0x55f006bf2650, to_=0x7fe96e07d918, action=COROUTINE_YIELD) at util/coroutine-ucontext.c:302 #1 qemu_coroutine_yield () at util/qemu-coroutine.c:193 #2 qio_channel_yield (ioc=0x55f006bb3c20, condition=G_IO_IN) at io/channel.c:472 #3 qio_channel_readv_all_eof (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0, niov=1, errp=0x7fe96d729eb0) at io/channel.c:110 #4 qio_channel_readv_all (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0, niov=1, errp=0x7fe96d729eb0) at io/channel.c:143 #5 qio_channel_read_all (ioc=0x55f006bb3c20, buf=0x7fe96d729d28 "\300.\366\004\360U", buflen=8, errp=0x7fe96d729eb0) at io/channel.c:247 #6 nbd_read (ioc=0x55f006bb3c20, buffer=0x7fe96d729d28, size=8, desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at /work/src/qemu/master/include/block/nbd.h:365 #7 nbd_read64 (ioc=0x55f006bb3c20, val=0x7fe96d729d28, desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at /work/src/qemu/master/include/block/nbd.h:391 #8 nbd_start_negotiate (aio_context=0x55f006bdd890, ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0, outioc=0x55f006bf19f8, structured_reply=true, zeroes=0x7fe96d729dca, errp=0x7fe96d729eb0) at nbd/client.c:904 #9 nbd_receive_negotiate (aio_context=0x55f006bdd890, ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0, outioc=0x55f006bf19f8, info=0x55f006bf1a00, errp=0x7fe96d729eb0) at nbd/client.c:1032 #10 nbd_client_connect (bs=0x55f006bea710, errp=0x7fe96d729eb0) at block/nbd.c:1460 #11 nbd_reconnect_attempt (s=0x55f006bf19f0) at block/nbd.c:287 #12 nbd_co_reconnect_loop (s=0x55f006bf19f0) at block/nbd.c:309 #13 nbd_connection_entry (opaque=0x55f006bf19f0) at block/nbd.c:360 #14 coroutine_trampoline (i0=113190480, i1=22000) at util/coroutine-ucontext.c:173 Note, that the hang may be triggered by another bug, so the whole case is fixed only together with commit "block/nbd: on shutdown terminate connection attempt". Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200727184751.15704-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-28 09:54:43 -05:00
Vladimir Sementsov-Ogievskiy	fa35591b9c	block/nbd: split nbd_establish_connection out of nbd_client_connect We are going to implement non-blocking version of nbd_establish_connection, which for a while will be used only for nbd_reconnect_attempt, not for nbd_open, so we need to call it separately. Refactor nbd_reconnect_attempt in a way which makes next commit simpler. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200727184751.15704-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-28 09:54:43 -05:00
Nir Soffer	a2b333c018	block: nbd: Fix convert qcow2 compressed to nbd When converting to qcow2 compressed format, the last step is a special zero length compressed write, ending in a call to bdrv_co_truncate(). This call always fails for the nbd driver since it does not implement bdrv_co_truncate(). For block devices, which have the same limits, the call succeeds since the file driver implements bdrv_co_truncate(). If the caller asked to truncate to the same or smaller size with exact=false, the truncate succeeds. Implement the same logic for nbd. Example failing without this change: In one shell start qemu-nbd: $ truncate -s 1g test.tar $ qemu-nbd --socket=/tmp/nbd.sock --persistent --format=raw --offset 1536 test.tar In another shell convert an image to qcow2 compressed via NBD: $ echo "disk data" > disk.raw $ truncate -s 1g disk.raw $ qemu-img convert -f raw -O qcow2 -c disk1.raw nbd+unix:///?socket=/tmp/nbd.sock; echo $? 1 qemu-img failed, but the conversion was successful: $ qemu-img info nbd+unix:///?socket=/tmp/nbd.sock image: nbd+unix://?socket=/tmp/nbd.sock file format: qcow2 virtual size: 1 GiB (1073741824 bytes) ... $ qemu-img check nbd+unix:///?socket=/tmp/nbd.sock No errors were found on the image. 1/16384 = 0.01% allocated, 100.00% fragmented, 100.00% compressed clusters Image end offset: 393216 $ qemu-img compare disk.raw nbd+unix:///?socket=/tmp/nbd.sock Images are identical. Fixes: https://bugzilla.redhat.com/1860627 Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-Id: <20200727215846.395443-2-nsoffer@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: typo fixes] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-28 09:54:19 -05:00
Peter Maydell	2649915121	bitmaps patches for 2020-07-27 - Improve handling of various post-copy bitmap migration scenarios. A lost bitmap should merely mean that the next backup must be full rather than incremental, rather than abruptly breaking the entire guest migration. - Associated iotest improvements -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl8fPRkACgkQp6FrSiUn Q2qanQf/dRTrqZ7/hs8aENySf44o0dBzOLZr+FBcrqEj2sd0c6jPzV2X5CVtnA1v gBgKJJGLpti3mSeNQDbaXZIQrsesBAuxvJsc6vZ9npDCdMYnK/qPE3Zfw1bx12qR cb39ba28P4izgs216h92ZACtUewnvjkxyJgN7zfmCJdNcwZINMUItAS183tSbQjn n39Wb7a+umsRgV9HQv/6cXlQIPqFMyAOl5kkzV3evuw7EBoHFnNq4cjPrUnjkqiD xf2pcSomaedYd37SpvoH57JxfL3z/90OBcuXhFvbqFk4FgQ63rJ32nRve2ZbIDI0 XPbohnYjYoFv6Xs/jtTzctZCbZ+jTg== =1dmz -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-bitmaps-2020-07-27' into staging bitmaps patches for 2020-07-27 - Improve handling of various post-copy bitmap migration scenarios. A lost bitmap should merely mean that the next backup must be full rather than incremental, rather than abruptly breaking the entire guest migration. - Associated iotest improvements # gpg: Signature made Mon 27 Jul 2020 21:46:17 BST # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-bitmaps-2020-07-27: (24 commits) migration: Fix typos in bitmap migration comments iotests: Adjust which migration tests are quick qemu-iotests/199: add source-killed case to bitmaps postcopy qemu-iotests/199: add early shutdown case to bitmaps postcopy qemu-iotests/199: check persistent bitmaps qemu-iotests/199: prepare for new test-cases addition migration/savevm: don't worry if bitmap migration postcopy failed migration/block-dirty-bitmap: cancel migration on shutdown migration/block-dirty-bitmap: relax error handling in incoming part migration/block-dirty-bitmap: keep bitmap state for all bitmaps migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete migration/block-dirty-bitmap: rename finish_lock to just lock migration/block-dirty-bitmap: refactor state global variables migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup migration/block-dirty-bitmap: rename state structure types migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start qemu-iotests/199: increase postcopy period qemu-iotests/199: change discard patterns qemu-iotests/199: improve performance: set bitmap by discard ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-28 14:38:17 +01:00
Max Reitz	134b7dec6e	block: Fix bdrv_aligned_p*v() for qiov_offset != 0 Since these functions take a @qiov_offset, they must always take it into account when working with @qiov. There are a couple of places where they do not, but they should. Fixes: `65cd4424b9` ("block/io: bdrv_aligned_preadv: use and support qiov_offset") Fixes: `28c4da2869` ("block/io: bdrv_aligned_pwritev: use and support qiov_offset") Reported-by: Claudio Fontana <cfontana@suse.de> Reported-by: Bruce Rogers <brogers@suse.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200728120806.265916-2-mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Claudio Fontana <cfontana@suse.de> Tested-by: Bruce Rogers <brogers@suse.com>	2020-07-28 15:28:47 +02:00
Andrey Shinkevich	8098969cf2	qcow2: Fix capitalization of header extension constant. Make the capitalization of the hexadecimal numbers consistent for the QCOW2 header extension constants in docs/interop/qcow2.txt. Suggested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1594973699-781898-2-git-send-email-andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-07-27 15:39:58 -05:00
Max Reitz	984c367814	block/amend: Check whether the node exists We should check whether the user-specified node-name actually refers to a node. The simplest way to do that is to use bdrv_lookup_bs() instead of bdrv_find_node() (the former wraps the latter, and produces an error message if necessary). Reported-by: Coverity (CID 1430268) Fixes: `ced914d0ab` Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200710095037.10885-1-mreitz@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>	2020-07-27 12:37:25 +02:00
Peter Maydell	0c1fd2f41f	Block layer patches: - file-posix: Handle `EINVAL` fallocate return value - qemu-img convert -n: Keep qcow2 v2 target sparse -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl8XDZgRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9bpFQ/9EnS4iV9w0KW/NuJ4FIVdBD/VZFzokDLi 1vXVVEjoxAxxiP8KlGM9HRi5NtvOMgzKhNGias0wOFiBorx8Ppfc+3sqwygc2dnw Vbl/od2D7xQZkddnp4Upo70m+eWRW6xaxX+lAcl6iS3gBPDwExLaYfBN8lFUyRrs T4C0miD+abEEyL3C5A4cEZJ7CIs0n7AqZkqgytWA7clwy79VgDSuMOgP6DOP1tGH 1uK4gMCB0xbn+PHk96lXPORcLwDBOP0PIluo/zBmffzsEZN1Lv5ddVmxMQWSivin UmAbpeEtSw9Py5lRVmLSBYvolVOUleE/Rlzad2iue2be5/G8VP8xiRYMp9mUVpLO +LPMUd9NRkPx7wjUJMPKF0G9FgVO7R0+9J6rC33aKBj2XAlxY6qQlqUN2Jo11/fK 2+9AkU7WVqx3vuW2Zz7wjq3Rjvpg/sK+V3P3Cm6HTwwwPbEwv8GcFe6eKdvJrZ9K hhwiFSUOd90OUAdKOQXKMFSZ/t1TrZhdX882Hvth11/AlQAUY4cxQbSKcc2nrvLu Axk0Va3haOD+ReRTs8W/iYNdrXGZmbr3MCkNiK3QSvnrdj602ompco7xyDTX1/qH 6Hu28q7jUG3p3cApLQIZVjmogfqcGU7SWIY4lp9HZqtGP0z+pmWg46UNqzlKLyv5 Y/fVHHshlRU= =QhOf -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - file-posix: Handle `EINVAL` fallocate return value - qemu-img convert -n: Keep qcow2 v2 target sparse # gpg: Signature made Tue 21 Jul 2020 16:45:28 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: iotests: Test sparseness for qemu-img convert -n qcow2: Implement v2 zero writes with discard if possible file-posix: Handle `EINVAL` fallocate return value Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-21 19:25:48 +01:00
Peter Maydell	b50dab9eca	QOM patches for 2020-07-21 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl8XDGsSHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZTPTYP/11AHdREDvz+KDl7AXtndxE9wo6GPGUy GhMSXODgOm6a4/nk4xQQR7MU+57C2IMnYbAAdrNXA3YVHjUPMURVsAeTsl9ruchC 1JiNrtXKgaWv5P8WeNzuXcH9B7n3UZ/mV4FH8v0FhjZhsH6EHP0zOhgNScPyQurz AsmYN5hSoSz8jRFQ8xlNzBhNrYX5dG6fOs4M6yBRUgzaCuRcSn/+cgjngfAmcH4k F0KK+rBQA+KUAKYp0QUaUcbD5oj/6KaQMsfqsFROR8m+AAJrhkP2ffmcDCHfCW8A SQsp8k2SkdHXNgRoDJSea1TYMitCFrTBDv+MfBWOLH4ewQrGAzvsuG5o3FlnVeN7 CKHkkxqOEifJ/vnThVBTIsqxf8/HefDXi1B5BXSKYSvnPKnnh8HCZ7VxvAsNGZiR epr4gEGBCcb9/bXFUmVVPz6H+lUWORzhF4P0spNJwH5BT9FLybWgJt9o4KH+pxYA DL4GF5vOkBgIhgUR+vn535vik7M38u6gsB8m22s2FRkZmIxTpxp9eH2ehHGSoVYM Yl1kVzJmFMPakl0gG1dMmM4+DJGRTHLCfBFS4pzMs9DaCNHUF3CB8FjXQs3NIFCS XGnChbri/wF83DEueTIrUAqF2w0XgEy55aVBOZkmFT6DXPXbx+Y3Q/AomaktBLyv FFUe9SfMn63P =HTNj -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-qom-2020-07-21' into staging QOM patches for 2020-07-21 # gpg: Signature made Tue 21 Jul 2020 16:40:27 BST # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-qom-2020-07-21: qom: Make info qom-tree sort children more efficiently qom: Document object_get_canonical_path() returns malloced string qom: Change object_get_canonical_path_component() not to malloc Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-21 18:31:52 +01:00
Kevin Wolf	61b3043965	qcow2: Implement v2 zero writes with discard if possible qcow2 version 2 images don't support the zero flag for clusters, so for write_zeroes requests, we return -ENOTSUP and get explicit zero buffer writes. If the image doesn't have a backing file, we can do better: Just discard the respective clusters. This is relevant for 'qemu-img convert -O qcow2 -n', where qemu-img has to assume that the existing target image may contain any data, so it has to write zeroes. Without this patch, this results in a fully allocated target image, even if the source image was empty. Reported-by: Nir Soffer <nsoffer@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200721135520.72355-2-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-21 16:28:57 +02:00
Antoine Damhet	bae127d4dc	file-posix: Handle `EINVAL` fallocate return value The `detect-zeroes=unmap` option may issue unaligned `FALLOC_FL_PUNCH_HOLE` requests, raw block devices can (and will) return `EINVAL`, qemu should then write the zeroes to the blockdev instead of issuing an `IO_ERROR`. The problem can be reprodced like this: $ qemu-io -c 'write -P 0 42 1234' --image-opts driver=host_device,filename=/dev/loop0,detect-zeroes=unmap write failed: Invalid argument Signed-off-by: Antoine Damhet <antoine.damhet@blade-group.com> Message-Id: <20200717135603.51180-1-antoine.damhet@blade-group.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-21 16:28:57 +02:00
Markus Armbruster	7a309cc95b	qom: Change object_get_canonical_path_component() not to malloc object_get_canonical_path_component() returns a malloced copy of a property name on success, null on failure. 19 of its 25 callers immediately free the returned copy. Change object_get_canonical_path_component() to return the property name directly. Since modifying the name would be wrong, adjust the return type to const char *. Drop the free from the 19 callers become simpler, add the g_strdup() to the other six. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200714160202.3121879-4-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com>	2020-07-21 16:23:43 +02:00
Stefan Hajnoczi	1d719ddc35	block: fix bdrv_aio_cancel() for ENOMEDIUM requests bdrv_aio_cancel() calls aio_poll() on the AioContext for the given I/O request until it has completed. ENOMEDIUM requests are special because there is no BlockDriverState when the drive has no medium! Define a .get_aio_context() function for BlkAioEmAIOCB requests so that bdrv_aio_cancel() can find the AioContext where the completion BH is pending. Without this function bdrv_aio_cancel() aborts on ENOMEDIUM requests! libFuzzer triggered the following assertion: cat << EOF \| qemu-system-i386 -M pc-q35-5.0 \ -nographic -monitor none -serial none \ -qtest stdio -trace ide\* outl 0xcf8 0x8000fa24 outl 0xcfc 0xe106c000 outl 0xcf8 0x8000fa04 outw 0xcfc 0x7 outl 0xcf8 0x8000fb20 write 0x0 0x3 0x2780e7 write 0xe106c22c 0xd 0x1130c218021130c218021130c2 write 0xe106c218 0x15 0x110010110010110010110010110010110010110010 EOF ide_exec_cmd IDE exec cmd: bus 0x56170a77a2b8; state 0x56170a77a340; cmd 0xe7 ide_reset IDEstate 0x56170a77a340 Aborted (core dumped) (gdb) bt #1 0x00007ffff4f93895 in abort () at /lib64/libc.so.6 #2 0x0000555555dc6c00 in bdrv_aio_cancel (acb=0x555556765550) at block/io.c:2745 #3 0x0000555555dac202 in blk_aio_cancel (acb=0x555556765550) at block/block-backend.c:1546 #4 0x0000555555b1bd74 in ide_reset (s=0x555557213340) at hw/ide/core.c:1318 #5 0x0000555555b1e3a1 in ide_bus_reset (bus=0x5555572132b8) at hw/ide/core.c:2422 #6 0x0000555555b2aa27 in ahci_reset_port (s=0x55555720eb50, port=2) at hw/ide/ahci.c:650 #7 0x0000555555b29fd7 in ahci_port_write (s=0x55555720eb50, port=2, offset=44, val=16) at hw/ide/ahci.c:360 #8 0x0000555555b2a564 in ahci_mem_write (opaque=0x55555720eb50, addr=556, val=16, size=1) at hw/ide/ahci.c:513 #9 0x000055555598415b in memory_region_write_accessor (mr=0x55555720eb80, addr=556, value=0x7fffffffb838, size=1, shift=0, mask=255, attrs=...) at softmmu/memory.c:483 Looking at bdrv_aio_cancel: 2728 /* async I/Os / 2729 2730 void bdrv_aio_cancel(BlockAIOCB acb) 2731 { 2732 qemu_aio_ref(acb); 2733 bdrv_aio_cancel_async(acb); 2734 while (acb->refcnt > 1) { 2735 if (acb->aiocb_info->get_aio_context) { 2736 aio_poll(acb->aiocb_info->get_aio_context(acb), true); 2737 } else if (acb->bs) { 2738 /* qemu_aio_ref and qemu_aio_unref are not thread-safe, so 2739 * assert that we're not using an I/O thread. Thread-safe 2740 * code should use bdrv_aio_cancel_async exclusively. 2741 */ 2742 assert(bdrv_get_aio_context(acb->bs) == qemu_get_aio_context()); 2743 aio_poll(bdrv_get_aio_context(acb->bs), true); 2744 } else { 2745 abort(); <=============== 2746 } 2747 } 2748 qemu_aio_unref(acb); 2749 } Fixes: `02c50efe08` ("block: Add bdrv_aio_cancel_async") Reported-by: Alexander Bulekov <alxndr@bu.edu> Buglink: https://bugs.launchpad.net/qemu/+bug/1878255 Originally-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200720100141.129739-1-stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-21 12:00:38 +02:00
Maxim Levitsky	662d0c5392	block/crypto: disallow write sharing by default My commit 'block/crypto: implement the encryption key management' accidently allowed raw luks images to be shared between different qemu processes without share-rw=on explicit override. Fix that. Fixes: `bbfdae91fb` ("block/crypto: implement the encryption key management") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1857490 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200719122059.59843-2-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-21 10:49:02 +02:00
Kevin Wolf	a8c5cf27c9	file-posix: Fix leaked fd in raw_open_common() error path Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200717105426.51134-4-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-17 14:20:57 +02:00
Kevin Wolf	bca5283bd4	file-posix: Fix check_hdev_writable() with auto-read-only For Linux block devices, being able to open the device read-write doesn't necessarily mean that the device is actually writable (one example is a read-only LV, as you get with lvchange -pr <device>). We have check_hdev_writable() to check this condition and fail opening the image read-write if it's not actually writable. However, this check doesn't take auto-read-only into account, but results in a hard failure instead of downgrading to read-only where possible. Fix this and do the writable check not based on BDRV_O_RDWR, but only when this actually results in opening the file read-write. A second check is inserted in raw_reconfigure_getfd() to have the same check when dynamic auto-read-only upgrades an image file from read-only to read-write. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200717105426.51134-3-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-17 14:20:57 +02:00
Kevin Wolf	20eaf1bf6e	file-posix: Move check_hdev_writable() up We'll need to call it in raw_open_common(), so move the function to avoid a forward declaration. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200717105426.51134-2-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-17 14:20:57 +02:00
Kevin Wolf	5edc85571e	file-posix: Allow byte-aligned O_DIRECT with NFS Since commit `a6b257a08e` ('file-posix: Handle undetectable alignment'), we assume that if we open a file with O_DIRECT and alignment probing returns 1, we just couldn't find out the real alignment requirement because some filesystems make the requirement only for allocated blocks. In this case, a safe default of 4k is used. This is too strict for NFS, which does actually allow byte-aligned requests even with O_DIRECT. Because we can't distinguish both cases with generic code, let's just look at the file system magic and disable s->needs_alignment for NFS. This way, O_DIRECT can still be used on NFS for images that are not aligned to 4k. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200716142601.111237-3-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-17 14:20:57 +02:00
Marc-André Lureau	a08464521c	Remove VXHS block device The vxhs code doesn't compile since v2.12.0. There's no point in fixing and then adding CI for a config that our users have demonstrated that they do not use; better to just remove it. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200711065926.2204721-1-marcandre.lureau@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-17 14:20:57 +02:00
Peter Maydell	d2628b1eb7	Block layer patches: - file-posix: Mitigate file fragmentation with extent size hints - Tighten qemu-img rules on missing backing format - qemu-img map: Don't limit block status request size - Fix crash with virtio-scsi and iothreads -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAl8NsgMRHGt3b2xmQHJl ZGhhdC5jb20ACgkQfwmycsiPL9Z0tA//eqauxD7cTEpwrtLNrRtpiBtMG64BBpxz QfkURzB38bMVahHlwq3Gt7Zcov8V4V7vxK66h688Z/fhw3vmqIeVe8+P6+Y5s9FL jil8lewHuLTa6xELeugoV7SZXH8AAh1W2fQmiR7EPiOmpSE0wf7C5IShVlX8A04E r0n09+61qGjRIe1hNTwTtldqQEfx6UGnxQWcQb81JUPA1lZhX3cnPg/j94Bofr+m v/DbVTfsmUtTMjc0PdU7n4DKTWu8OS5B/X0unF21rTtO//cYBrhAeY3ax2jbFBWi CIZK8HLI5m9/HFyltql1LOsd+B5TtfnXMfSdvDh2jaVUlto7wTeTnWU1fv4wxUB5 hk7XgJo/y203ebFNHpTmW8tvLfGTP8uqCVfOEFxzjy+JHGrarlbWkwL2LMOFFAZ2 s2WcwlfqiYGFTG4+OFdhPf9qPWKSqMr+jTdZJTse64/c6+YXWHk+pP9lfYEUOgSi OYwdQUY9uiZ1K13q5Tif2TbFvs+c118xdTgVhAV7VtfPnWc3c647dX7iaq8Szknc IT93670Iqf/PzEj+L7XUbbLIIsAcmxD0sr7QAQEt7bfiYIDRIQLiVPyzXplETFg2 SEkvtqBovm84ct7pWQzqA6lFvr3oIFDNquR40XFGozHNnlBeNi5s7pXQnqUBLElr wDDuEi+z5QM= =DB0q -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - file-posix: Mitigate file fragmentation with extent size hints - Tighten qemu-img rules on missing backing format - qemu-img map: Don't limit block status request size - Fix crash with virtio-scsi and iothreads # gpg: Signature made Tue 14 Jul 2020 14:24:19 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: block: Avoid stale pointer dereference in blk_get_aio_context() qemu-img: Deprecate use of -b without -F block: Add support to warn on backing file change without format iotests: Specify explicit backing format where sensible qcow2: Deprecate use of qemu-img amend to change backing file block: Error if backing file fails during creation without -u qcow: Tolerate backing_fmt= vmdk: Add trivial backing_fmt support sheepdog: Add trivial backing_fmt support block: Finish deprecation of 'qemu-img convert -n -o' qemu-img: Flush stdout before before potential stderr messages file-posix: Mitigate file fragmentation with extent size hints iotests/059: Filter out disk size with more standard filter qemu-img map: Don't limit block status request size iotests: Simplify _filter_img_create() a bit Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-07-14 19:39:52 +01:00
Greg Kurz	e6cada9231	block: Avoid stale pointer dereference in blk_get_aio_context() It is possible for blk_remove_bs() to race with blk_drain_all(), causing the latter to dereference a stale blk->root pointer: blk_remove_bs(blk) bdrv_root_unref_child(blk->root) child_bs = blk->root->bs bdrv_detach_child(blk->root) ... g_free(blk->root) <============== blk->root becomes stale bdrv_unref(child_bs) <============ yield at some point A blk_drain_all() can be triggered by some guest action in the meantime, eg. on POWER, SLOF might disable bus mastering on a virtio-scsi-pci device: virtio_write_config() virtio_pci_stop_ioeventfd() virtio_bus_stop_ioeventfd() virtio_scsi_dataplane_stop() blk_drain_all() blk_get_aio_context() bs = blk->root ? blk->root->bs : NULL ^^^^^^^^^ stale Then, depending on one's luck, QEMU either crashes with SEGV or hits the assertion in blk_get_aio_context(). blk->root is set by blk_insert_bs() which calls bdrv_root_attach_child() first. The blk_remove_bs() function should rollback the changes made by blk_insert_bs() in the opposite order (or it should be documented somewhere why this isn't the case). Clear blk->root before calling bdrv_root_unref_child() in blk_remove_bs(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159430264541.389456.11925072456012783045.stgit@bahia.lan> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:24:15 +02:00
Eric Blake	e54ee1b385	block: Add support to warn on backing file change without format For now, this is a mechanical addition; all callers pass false. But the next patch will use it to improve 'qemu-img rebase -u' when selecting a backing file with no format. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Message-Id: <20200706203954.341758-10-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Eric Blake	bc5ee6da71	qcow2: Deprecate use of qemu-img amend to change backing file The use of 'qemu-img amend' to change qcow2 backing files is not tested very well. In particular, our implementation has a bug where if a new backing file is provided without a format, then the prior format is blindly reused, even if this results in data corruption, but this is not caught by iotests. There are also situations where amending other options needs access to the original backing file (for example, on a downgrade to a v2 image, knowing whether a v3 zero cluster must be allocated or may be left unallocated depends on knowing whether the backing file already reads as zero), but the command line does not have a nice way to tell us both the backing file to use for opening the image as well as the backing file to install after the operation is complete. Even if we do allow changing the backing file, it is redundant with the existing ability to change backing files via 'qemu-img rebase -u'. It is time to deprecate this support (leaving the existing behavior intact, even if it is buggy), and at a point in the future, require the use of only 'qemu-img rebase' for adjusting backing chain relations, saving 'qemu-img amend' for changes unrelated to the backing chain. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200706203954.341758-8-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Eric Blake	344acbd62f	qcow: Tolerate backing_fmt= qcow has no space in the metadata to store a backing format, and there are existing qcow images backed both by raw or by other formats (usually qcow) images, reliant on probing to tell the difference. On the bright side, because we probe every time, raw files are marked as probed and we thus forbid a commit action into the backing file where guest-controlled contents could change the result of the probe next time around (the iotest added here proves that). Still, allowing the user to specify the backing format during creation, even if we can't record it, is a good thing. This patch blindly allows any value that resolves to a known driver, even if the user's request is a mismatch from what probing finds; then the next patch will further enhance things to verify that the user's request matches what we actually probe. With this and the next patch in place, we will finally be ready to deprecate the creation of images where a backing format was not explicitly specified by the user. Note that this is only for QemuOpts usage; there is no change to the QAPI to allow a format through -blockdev. Add a new iotest 301 just for qcow, to demonstrate the latest behavior, and to make it easier to show the improvements made in the next patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200706203954.341758-6-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Eric Blake	d51a814cf4	vmdk: Add trivial backing_fmt support vmdk already requires that if backing_file is present, that it be another vmdk image (see vmdk_co_do_create). Meanwhile, we want to move towards always being explicit about the backing format for other drivers where it matters. So for convenience, make qemu-img create -F vmdk work, while rejecting all other explicit formats (note that this is only for QemuOpts usage; there is no change to the QAPI to allow a format through -blockdev). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200706203954.341758-5-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Eric Blake	80fa43e7df	sheepdog: Add trivial backing_fmt support Sheepdog already requires that if backing_file is present, that it be another sheepdog image (see sd_co_create). Meanwhile, we want to move towards always being explicit about the backing format for other drivers where it matters. So for convenience, make qemu-img create -F sheepdog work, while rejecting all other explicit formats (note that this is only for QemuOpts usage; there is no change to the QAPI to allow a format through -blockdev). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200706203954.341758-4-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Kevin Wolf	ffa244c84a	file-posix: Mitigate file fragmentation with extent size hints Especially when O_DIRECT is used with image files so that the page cache indirection can't cause a merge of allocating requests, the file will fragment on the file system layer, with a potentially very small fragment size (this depends on the requests the guest sent). On Linux, fragmentation can be reduced by setting an extent size hint when creating the file (at least on XFS, it can't be set any more after the first extent has been allocated), basically giving raw files a "cluster size" for allocation. This adds a create option to set the extent size hint, and changes the default from not setting a hint to setting it to 1 MB. The main reason why qcow2 defaults to smaller cluster sizes is that COW becomes more expensive, which is not an issue with raw files, so we can choose a larger size. The tradeoff here is only potentially wasted disk space. For qcow2 (or other image formats) over file-posix, the advantage should even be greater because they grow sequentially without leaving holes, so there won't be wasted space. Setting even larger extent size hints for such images may make sense. This can be done with the new option, but let's keep the default conservative for now. The effect is very visible with a test that intentionally creates a badly fragmented file with qemu-img bench (the time difference while creating the file is already remarkable) and then looks at the number of extents and the time a simple "qemu-img map" takes. Without an extent size hint: $ ./qemu-img create -f raw -o extent_size_hint=0 ~/tmp/test.raw 10G Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=0 $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192) Run completed in 25.848 seconds. $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192) Run completed in 19.616 seconds. $ filefrag ~/tmp/test.raw /home/kwolf/tmp/test.raw: 2000000 extents found $ time ./qemu-img map ~/tmp/test.raw Offset Length Mapped to File 0 0x1e8480000 0 /home/kwolf/tmp/test.raw real 0m1,279s user 0m0,043s sys 0m1,226s With the new default extent size hint of 1 MB: $ ./qemu-img create -f raw -o extent_size_hint=1M ~/tmp/test.raw 10G Formatting '/home/kwolf/tmp/test.raw', fmt=raw size=10737418240 extent_size_hint=1048576 $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 0 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192) Run completed in 11.833 seconds. $ ./qemu-img bench -f raw -t none -n -w ~/tmp/test.raw -c 1000000 -S 8192 -o 4096 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192) Run completed in 10.155 seconds. $ filefrag ~/tmp/test.raw /home/kwolf/tmp/test.raw: 178 extents found $ time ./qemu-img map ~/tmp/test.raw Offset Length Mapped to File 0 0x1e8480000 0 /home/kwolf/tmp/test.raw real 0m0,061s user 0m0,040s sys 0m0,014s Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200707142329.48303-1-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-14 15:18:59 +02:00
Eric Blake	00d69986da	nbd: Avoid off-by-one in long export name truncation When snprintf returns the same value as the buffer size, the final byte was truncated to ensure a NUL terminator. Fortunately, such long export names are unusual enough, with no real impact other than what is displayed to the user. Fixes: `5c86bdf120` Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200622210355.414941-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2020-07-13 09:01:01 -05:00
Xie Yongji	c58daf76a6	iscsi: return -EIO when sense fields are meaningless When an I/O request failed, now we only return correct value on scsi check condition. We should also have a default errno such as -EIO in other case. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20200701105444.3226-2-xieyongji@bytedance.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 18:02:23 -04:00
Xie Yongji	dd3b00202a	iscsi: handle check condition status in retry loop The handling of check condition was incorrect because we would only do it after retries exceed maximum. Fixes: `8c460269aa` ("iscsi: base all handling of check condition on scsi_sense_to_errno") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20200701105444.3226-1-xieyongji@bytedance.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-07-10 18:02:23 -04:00
Vladimir Sementsov-Ogievskiy	795d946d07	nbd: Use ERRP_GUARD() If we want to check error after errp-function call, we need to introduce local_err and then propagate it to errp. Instead, use the ERRP_GUARD() macro, benefits are: 1. No need of explicit error_propagate call 2. No need of explicit local_err variable: use errp directly 3. ERRP_GUARD() leaves errp as is if it's not NULL or &error_fatal, this means that we don't break error_abort (we'll abort on error_set, not on error_propagate) If we want to add some info to errp (by error_prepend() or error_append_hint()), we must use the ERRP_GUARD() macro. Otherwise, this info will not be added when errp == &error_fatal (the program will exit prior to the error_append_hint() or error_prepend() call). Fix several such cases, e.g. in nbd_read(). This commit is generated by command sed -n '/^Network Block Device (NBD)$/,/^$/{s/^F: //p}' \ MAINTAINERS \| \ xargs git ls-files \| grep '\.[hc]$' \| \ xargs spatch \ --sp-file scripts/coccinelle/errp-guard.cocci \ --macro-file scripts/cocci-macro-file.h \ --in-place --no-show-diff --max-width 80 Reported-by: Kevin Wolf <kwolf@redhat.com> Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Commit message tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200707165037.1026246-8-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [ERRP_AUTO_PROPAGATE() renamed to ERRP_GUARD(), and auto-propagated-errp.cocci to errp-guard.cocci. Commit message tweaked again.]	2020-07-10 15:18:09 +02:00
Markus Armbruster	386f6c07d2	error: Avoid error_propagate() after migrate_add_blocker() When migrate_add_blocker(blocker, &errp) is followed by error_propagate(errp, err), we can often just as well do migrate_add_blocker(..., errp). Do that with this Coccinelle script: @@ expression blocker, err, errp; expression ret; @@ - ret = migrate_add_blocker(blocker, &err); - if (err) { + ret = migrate_add_blocker(blocker, errp); + if (ret < 0) { ... when != err; - error_propagate(errp, err); ... } @@ expression blocker, err, errp; @@ - migrate_add_blocker(blocker, &err); - if (err) { + if (migrate_add_blocker(blocker, errp) < 0) { ... when != err; - error_propagate(errp, err); ... } Double-check @err is not used afterwards. Dereferencing it would be use after free, but checking whether it's null would be legitimate. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-43-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	b11a093c60	qapi: Smooth another visitor error checking pattern Convert visit_type_FOO(v, ..., &ptr, &err); ... if (err) { ... } to visit_type_FOO(v, ..., &ptr, errp); ... if (!ptr) { ... } for functions that set @ptr to non-null / null on success / error. Eliminate error_propagate() that are now unnecessary. Delete @err that are now unused. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-40-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	4bc6d7ee0e	block/parallels: Simplify parallels_open() after previous commit Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-39-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	a5f9b9df25	error: Reduce unnecessary error propagation When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away, even when we need to keep error_propagate() for other error paths. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-38-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	992861fb1e	error: Eliminate error_propagate() manually When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. The previous two commits did that for sufficiently simple cases with Coccinelle. Do it for several more manually. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-37-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	af175e85f9	error: Eliminate error_propagate() with Coccinelle, part 2 When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. The previous commit did that with a Coccinelle script I consider fairly trustworthy. This commit uses the same script with the matching of return taken out, i.e. we convert if (!foo(..., &err)) { ... error_propagate(errp, err); ... } to if (!foo(..., errp)) { ... ... } This is unsound: @err could still be read between afterwards. I don't know how to express "no read of @err without an intervening write" in Coccinelle. Instead, I manually double-checked for uses of @err. Suboptimal line breaks tweaked manually. qdev_realize() simplified further to placate scripts/checkpatch.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-36-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	668f62ec62	error: Eliminate error_propagate() with Coccinelle, part 1 When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. Convert if (!foo(..., &err)) { ... error_propagate(errp, err); ... return ... } to if (!foo(..., errp)) { ... ... return ... } where nothing else needs @err. Coccinelle script: @rule1 forall@ identifier fun, err, errp, lbl; expression list args, args2; binary operator op; constant c1, c2; symbol false; @@ if ( ( - fun(args, &err, args2) + fun(args, errp, args2) \| - !fun(args, &err, args2) + !fun(args, errp, args2) \| - fun(args, &err, args2) op c1 + fun(args, errp, args2) op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; \| return c2; \| return false; ) } @rule2 forall@ identifier fun, err, errp, lbl; expression list args, args2; expression var; binary operator op; constant c1, c2; symbol false; @@ - var = fun(args, &err, args2); + var = fun(args, errp, args2); ... when != err if ( ( var \| !var \| var op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; \| return c2; \| return false; \| return var; ) } @depends on rule1 \|\| rule2@ identifier err; @@ - Error *err = NULL; ... when != err Not exactly elegant, I'm afraid. The "when != lbl:" is necessary to avoid transforming if (fun(args, &err)) { goto out } ... out: error_propagate(errp, err); even though other paths to label out still need the error_propagate(). For an actual example, see sclp_realize(). Without the "when strict", Coccinelle transforms vfio_msix_setup(), incorrectly. I don't know what exactly "when strict" does, only that it helps here. The match of return is narrower than what I want, but I can't figure out how to express "return where the operand doesn't use @err". For an example where it's too narrow, see vfio_intx_enable(). Silently fails to convert hw/arm/armsse.c, because Coccinelle gets confused by ARMSSE being used both as typedef and function-like macro there. Converted manually. Line breaks tidied up manually. One nested declaration of @local_err deleted manually. Preexisting unwanted blank line dropped in hw/riscv/sifive_e.c. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-35-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	dcfe480544	error: Avoid unnecessary error_propagate() after error_setg() Replace error_setg(&err, ...); error_propagate(errp, err); by error_setg(errp, ...); Related pattern: if (...) { error_setg(&err, ...); goto out; } ... out: error_propagate(errp, err); return; When all paths to label out are that way, replace by if (...) { error_setg(errp, ...); return; } and delete the label along with the error_propagate(). When we have at most one other path that actually needs to propagate, and maybe one at the end that where propagation is unnecessary, e.g. foo(..., &err); if (err) { goto out; } ... bar(..., &err); out: error_propagate(errp, err); return; move the error_propagate() to where it's needed, like if (...) { foo(..., &err); error_propagate(errp, err); return; } ... bar(..., errp); return; and transform the error_setg() as above. In some places, the transformation results in obviously unnecessary error_propagate(). The next few commits will eliminate them. Bonus: the elimination of gotos will make later patches in this series easier to review. Candidates for conversion tracked down with this Coccinelle script: @@ identifier err, errp; expression list args; @@ - error_setg(&err, args); + error_setg(errp, args); ... when != err error_propagate(errp, err); Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-34-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	14217038bc	qapi: Use returned bool to check for failure, manual part The previous commit used Coccinelle to convert from checking the Error object to checking the return value. Convert a few more manually. Also tweak control flow in places to conform to the conventional "if error bail out" pattern. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-20-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	62a35aaa31	qapi: Use returned bool to check for failure, Coccinelle part The previous commit enables conversion of visit_foo(..., &err); if (err) { ... } to if (!visit_foo(..., errp)) { ... } for visitor functions that now return true / false on success / error. Coccinelle script: @@ identifier fun =~ "check_list\|input_type_enum\|lv_start_struct\|lv_type_bool\|lv_type_int64\|lv_type_str\|lv_type_uint64\|output_type_enum\|parse_type_bool\|parse_type_int64\|parse_type_null\|parse_type_number\|parse_type_size\|parse_type_str\|parse_type_uint64\|print_type_bool\|print_type_int64\|print_type_null\|print_type_number\|print_type_size\|print_type_str\|print_type_uint64\|qapi_clone_start_alternate\|qapi_clone_start_list\|qapi_clone_start_struct\|qapi_clone_type_bool\|qapi_clone_type_int64\|qapi_clone_type_null\|qapi_clone_type_number\|qapi_clone_type_str\|qapi_clone_type_uint64\|qapi_dealloc_start_list\|qapi_dealloc_start_struct\|qapi_dealloc_type_anything\|qapi_dealloc_type_bool\|qapi_dealloc_type_int64\|qapi_dealloc_type_null\|qapi_dealloc_type_number\|qapi_dealloc_type_str\|qapi_dealloc_type_uint64\|qobject_input_check_list\|qobject_input_check_struct\|qobject_input_start_alternate\|qobject_input_start_list\|qobject_input_start_struct\|qobject_input_type_any\|qobject_input_type_bool\|qobject_input_type_bool_keyval\|qobject_input_type_int64\|qobject_input_type_int64_keyval\|qobject_input_type_null\|qobject_input_type_number\|qobject_input_type_number_keyval\|qobject_input_type_size_keyval\|qobject_input_type_str\|qobject_input_type_str_keyval\|qobject_input_type_uint64\|qobject_input_type_uint64_keyval\|qobject_output_start_list\|qobject_output_start_struct\|qobject_output_type_any\|qobject_output_type_bool\|qobject_output_type_int64\|qobject_output_type_null\|qobject_output_type_number\|qobject_output_type_str\|qobject_output_type_uint64\|start_list\|visit_check_list\|visit_check_struct\|visit_start_alternate\|visit_start_list\|visit_start_struct\|visit_type_."; expression list args; typedef Error; Error err; @@ - fun(args, &err); - if (err) + if (!fun(args, &err)) { ... } A few line breaks tidied up manually. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-19-armbru@redhat.com>	2020-07-10 15:18:08 +02:00
Markus Armbruster	235e59cf03	qemu-option: Use returned bool to check for failure The previous commit enables conversion of foo(..., &err); if (err) { ... } to if (!foo(..., &err)) { ... } for QemuOpts functions that now return true / false on success / error. Coccinelle script: @@ identifier fun = { opts_do_parse, parse_option_bool, parse_option_number, parse_option_size, qemu_opt_parse, qemu_opt_rename, qemu_opt_set, qemu_opt_set_bool, qemu_opt_set_number, qemu_opts_absorb_qdict, qemu_opts_do_parse, qemu_opts_from_qdict_entry, qemu_opts_set, qemu_opts_validate }; expression list args, args2; typedef Error; Error *err; @@ - fun(args, &err, args2); - if (err) + if (!fun(args, &err, args2)) { ... } A few line breaks tidied up manually. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-15-armbru@redhat.com> [Conflict with commit `0b6786a9c1` "block/amend: refactor qcow2 amend options" resolved by rerunning Coccinelle on master's version]	2020-07-10 15:17:35 +02:00
Markus Armbruster	c6ecec43b2	qemu-option: Check return value instead of @err where convenient Convert uses like opts = qemu_opts_create(..., &err); if (err) { ... } to opts = qemu_opts_create(..., errp); if (!opts) { ... } Eliminate error_propagate() that are now unnecessary. Delete @err that are now unused. Note that we can't drop parallels_open()'s error_propagate() here. We continue to execute it even in the converted case. It's a no-op then: local_err is null. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20200707160613.848843-8-armbru@redhat.com>	2020-07-10 15:01:06 +02:00
Eric Blake	365fed5111	qed: Simplify backing reads The other four drivers that support backing files (qcow, qcow2, parallels, vmdk) all rely on the block layer to populate zeroes when reading beyond EOF of a short backing file. We can simplify the qed code by doing likewise. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200528094405.145708-11-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy	a2adbbf603	block: drop unallocated_blocks_are_zero Currently this field only set by qed and qcow2. But in fact, all backing-supporting formats (parallels, qcow, qcow2, qed, vmdk) share these semantics: on unallocated blocks, if there is no backing file they just memset the buffer with zeroes. So, document this behavior for .supports_backing and drop .unallocated_blocks_are_zero Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-10-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy	cdf9ebf18f	block/vhdx: drop unallocated_blocks_are_zero vhdx doesn't have .bdrv_co_block_status handler, so DATA\|ALLOCATED is always assumed for it in bdrv_co_block_status(). unallocated_blocks_are_zero is useless (it doesn't affect the only user of the field: bdrv_co_block_status()), drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-9-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy	ac9185603e	block/file-posix: drop unallocated_blocks_are_zero raw_co_block_status() in block/file-posix.c never returns 0, so unallocated_blocks_are_zero is useless (it doesn't affect the only user of the field: bdrv_co_block_status()). Drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-8-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy	32d293c8c6	block/iscsi: drop unallocated_blocks_are_zero We set bdi->unallocated_blocks_are_zero = iscsilun->lbprz, but iscsi_co_block_status doesn't return 0 in case of iscsilun->lbprz, it returns ZERO when appropriate. So actually unallocated_blocks_are_zero is useless (it doesn't affect the only user of the field: bdrv_co_block_status()). Drop it now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy	74036395ea	block/crypto: drop unallocated_blocks_are_zero It's false by default, no needs to set it. We are going to drop this variable at all, so drop it now here, it doesn't hurt. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-6-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 10:34:14 +02:00
Vladimir Sementsov-Ogievskiy	2c060c0f50	block/vpc: return ZERO block-status when appropriate In case when get_image_offset() returns -1, we do zero out the corresponding chunk of qiov. So, this should be reported as ZERO. Note that this changes visible output of "qemu-img map --output=json" and "qemu-io -c map" commands. For qemu-img map, the change is obvious: we just mark as zero what is really zero. For qemu-io it's less obvious: what was unallocated now is allocated. There is an inconsistency in understanding of unallocated regions in Qemu: backing-supporting format-drivers return 0 block-status to report go-to-backing logic for this area. Some protocol-drivers (iscsi) return 0 to report fs-unallocated-non-zero status (i.e., don't occupy space on disk, read result is undefined). BDRV_BLOCK_ALLOCATED is defined as something more close to go-to-backing logic. Still it is calculated as ZERO \| DATA, so 0 from iscsi is treated as unallocated. It doesn't influence backing-chain behavior, as iscsi can't have backing file. But it does influence "qemu-io -c map". We should solve this inconsistency at some future point. Now, let's just make backing-not-supporting format drivers (vdi in the previous patch and vpc now) to behave more like backing-supporting drivers and not report 0 block-status. More over, returning ZERO status is absolutely valid thing, and again, corresponds to how the other format-drivers (backing-supporting) work. After block-status update, it never reports 0, so setting unallocated_blocks_are_zero doesn't make sense (as the only user of it is bdrv_co_block_status and it checks unallocated_blocks_are_zero only for unallocated areas). Drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-5-vsementsov@virtuozzo.com> [mreitz: qemu-io -c map as used by iotest 146 now reports everything as allocated; in order to make the test do something useful, we use qemu-img map --output=json now] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 10:32:38 +02:00
Vladimir Sementsov-Ogievskiy	2ea0332f42	block/vdi: return ZERO block-status when appropriate In case of !VDI_IS_ALLOCATED[], we do zero out the corresponding chunk of qiov. So, this should be reported as ZERO. Note that this changes visible output of "qemu-img map --output=json" and "qemu-io -c map" commands. For qemu-img map, the change is obvious: we just mark as zero what is really zero. For qemu-io it's less obvious: what was unallocated now is allocated. There is an inconsistency in understanding of unallocated regions in Qemu: backing-supporting format-drivers return 0 block-status to report go-to-backing logic for this area. Some protocol-drivers (iscsi) return 0 to report fs-unallocated-non-zero status (i.e., don't occupy space on disk, read result is undefined). BDRV_BLOCK_ALLOCATED is defined as something more close to go-to-backing logic. Still it is calculated as ZERO \| DATA, so 0 from iscsi is treated as unallocated. It doesn't influence backing-chain behavior, as iscsi can't have backing file. But it does influence "qemu-io -c map". We should solve this inconsistency at some future point. Now, let's just make backing-not-supporting format drivers (vdi at this patch and vpc with the following) to behave more like backing-supporting drivers and not report 0 block-status. More over, returning ZERO status is absolutely valid thing, and again, corresponds to how the other format-drivers (backing-supporting) work. After block-status update, it never reports 0, so setting unallocated_blocks_are_zero doesn't make sense (as the only user of it is bdrv_co_block_status and it checks unallocated_blocks_are_zero only for unallocated areas). Drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Vladimir Sementsov-Ogievskiy	7b1efe996c	block: inline bdrv_unallocated_blocks_are_zero() The function has only one user: bdrv_co_block_status(). Inline it to simplify reviewing of the following patches, which will finally drop unallocated_blocks_are_zero field too. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200528094405.145708-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	8ea1613d91	block/qcow2: implement blockdev-amend Currently the implementation only supports amending the encryption options, unlike the qemu-img version Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608094030.670121-14-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	30da9dd88a	block/crypto: implement blockdev-amend Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608094030.670121-13-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	ced914d0ab	block/core: add generic infrastructure for x-blockdev-amend qmp command blockdev-amend will be used similiar to blockdev-create to allow on the fly changes of the structure of the format based block devices. Current plan is to first support encryption keyslot management for luks based formats (raw and embedded in qcow2) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20200608094030.670121-12-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	90766d9db9	block/qcow2: extend qemu-img amend interface with crypto options Now that we have all the infrastructure in place, wire it in the qcow2 driver and expose this to the user. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608094030.670121-9-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	bbfdae91fb	block/crypto: implement the encryption key management This implements the encryption key management using the generic code in qcrypto layer and exposes it to the user via qemu-img This code adds another 'write_func' because the initialization write_func works directly on the underlying file, and amend works on instance of luks device. This commit also adds a 'hack/workaround' I and Kevin Wolf (thanks) made to make the driver both support write sharing (to avoid breaking the users), and be safe against concurrent metadata update (the keyslots) Eventually the write sharing for luks driver will be deprecated and removed together with this hack. The hack is that we ask (as a format driver) for BLK_PERM_CONSISTENT_READ and then when we want to update the keys, we unshare that permission. So if someone else has the image open, even readonly, encryption key update will fail gracefully. Also thanks to Daniel Berrange for the idea of unsharing read, rather that write permission which allows to avoid cases when the other user had opened the image read-only. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608094030.670121-8-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	e0d0ddc591	block/crypto: rename two functions rename the write_func to create_write_func, and init_func to create_init_func. This is preparation for other write_func that will be used to update the encryption keys. No functional changes Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20200608094030.670121-7-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	0b6786a9c1	block/amend: refactor qcow2 amend options Some qcow2 create options can't be used for amend. Remove them from the qcow2 create options and add generic logic to detect such options in qemu-img Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [mreitz: Dropped some iotests reference output hunks that became unnecessary thanks to "iotests: Make _filter_img_create more active"] Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200625125548.870061-12-mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	df373fb0a3	block/amend: separate amend and create options for qemu-img Some options are only useful for creation (or hard to be amended, like cluster size for qcow2), while some other options are only useful for amend, like upcoming keyslot management options for luks Since currently only qcow2 supports amend, move all its options to a common macro and then include it in each action option list. In future it might be useful to remove some options which are not supported anyway from amend list, which currently cause an error message if amended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608094030.670121-5-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	a3579bfa0a	block/amend: add 'force' option 'force' option will be used for some unsafe amend operations. This includes things like erasing last keyslot in luks based formats (which destroys the data, unless the master key is backed up by external means), but that _might_ be desired result. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608094030.670121-4-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Maxim Levitsky	43cbd06df2	qcrypto/core: add generic infrastructure for crypto options amendment This will be used first to implement luks keyslot management. block_crypto_amend_opts_init will be used to convert qemu-img cmdline to QCryptoBlockAmendOptions Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20200608094030.670121-2-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:49:28 +02:00
Alberto Garcia	a5675f3901	qcow2: Fix preallocation on images with unaligned sizes When resizing an image with qcow2_co_truncate() using the falloc or full preallocation modes the code assumes that both the old and new sizes are cluster-aligned. There are two problems with this: 1) The calculation of how many clusters are involved does not always get the right result. Example: creating a 60KB image and resizing it (with preallocation=full) to 80KB won't allocate the second cluster. 2) No copy-on-write is performed, so in the previous example if there is a backing file then the first 60KB of the first cluster won't be filled with data from the backing file. This patch fixes both issues. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200617140036.20311-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:33:06 +02:00
Vladimir Sementsov-Ogievskiy	e8de7ba9ea	block/block-copy: block_copy_dirty_clusters: fix failure check ret may be > 0 on success path at this point. Fix assertion, which may crash currently. Fixes: `4ce5dd3e9b` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200526181347.489557-1-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-07-06 08:33:06 +02:00
Kevin Wolf	3dfa23b9ef	vvfat: Fix array_remove_slice() array_remove_slice() calls array_roll() with array->next - 1 as the destination index. This is only correct for count == 1, otherwise we're writing past the end of the array. array->next - count would be correct. However, this is the only place ever calling array_roll(), so this rather complicated operation isn't even necessary. Fix the problem and simplify the code by replacing it with a single memmove() call. array_roll() can now be removed. Reported-by: Nathan Huckleberry <nhuck15@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200623175534.38286-3-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-03 09:37:03 +02:00
Kevin Wolf	c79e243ed6	vvfat: Check that updated filenames are valid FAT allows only a restricted set of characters in file names, and for some of the illegal characters, it's actually important that we catch them: If filenames can contain '/', the guest can construct filenames containing "../" and escape from the assigned vvfat directory. The same problem could arise if ".." was ever accepted as a literal filename. Fix this by adding a check that all filenames are valid in check_directory_consistency(). Reported-by: Nathan Huckleberry <nhuck15@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200623175534.38286-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-07-03 09:37:03 +02:00
Stefan Hajnoczi	7838c67f22	block/nvme: support nested aio_poll() QEMU block drivers are supposed to support aio_poll() from I/O completion callback functions. This means completion processing must be re-entrant. The standard approach is to schedule a BH during completion processing and cancel it at the end of processing. If aio_poll() is invoked by a callback function then the BH will run. The BH continues the suspended completion processing. All of this means that request A's cb() can synchronously wait for request B to complete. Previously the nvme block driver would hang because it didn't process completions from nested aio_poll(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200617132201.1832152-8-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-23 15:46:08 +01:00
Stefan Hajnoczi	b75fd5f554	block/nvme: keep BDRVNVMeState pointer in NVMeQueuePair Passing around both BDRVNVMeState and NVMeQueuePair is unwieldy. Reduce the number of function arguments by keeping the BDRVNVMeState pointer in NVMeQueuePair. This will come in handly when a BH is introduced in a later patch and only one argument can be passed to it. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200617132201.1832152-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-23 15:46:08 +01:00
Stefan Hajnoczi	a5db74f324	block/nvme: clarify that free_req_queue is protected by q->lock Existing users access free_req_queue under q->lock. Document this. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200617132201.1832152-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-23 15:46:08 +01:00
Stefan Hajnoczi	1086e95da1	block/nvme: switch to a NVMeRequest freelist There are three issues with the current NVMeRequest->busy field: 1. The busy field is accidentally accessed outside q->lock when request submission fails. 2. Waiters on free_req_queue are not woken when a request is returned early due to submission failure. 2. Finding a free request involves scanning all requests. This makes request submission O(n^2). Switch to an O(1) freelist that is always accessed under the lock. Also differentiate between NVME_QUEUE_SIZE, the actual SQ/CQ size, and NVME_NUM_REQS, the number of usable requests. This makes the code simpler than using NVME_QUEUE_SIZE everywhere and having to keep in mind that one slot is reserved. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200617132201.1832152-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-23 15:46:08 +01:00
Stefan Hajnoczi	04b3fb39c8	block/nvme: don't access CQE after moving cq.head Do not access a CQE after incrementing q->cq.head and releasing q->lock. It is unlikely that this causes problems in practice but it's a latent bug. The reason why it should be safe at the moment is that completion processing is not re-entrant and the CQ doorbell isn't written until the end of nvme_process_completion(). Make this change now because QEMU expects completion processing to be re-entrant and later patches will do that. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200617132201.1832152-4-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-23 15:46:08 +01:00
Stefan Hajnoczi	d38253cf8b	block/nvme: drop tautologous assertion nvme_process_completion() explicitly checks cid so the assertion that follows is always true: if (cid == 0 \|\| cid > NVME_QUEUE_SIZE) { ... continue; } assert(cid <= NVME_QUEUE_SIZE); Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200617132201.1832152-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-23 15:46:08 +01:00
Stefan Hajnoczi	2446e0e2e9	block/nvme: poll queues without q->lock A lot of CPU time is spent simply locking/unlocking q->lock during polling. Check for completion outside the lock to make q->lock disappear from the profile. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200617132201.1832152-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-23 15:46:08 +01:00
Eric Blake	f17d684770	qcow2: Tweak comments on qcow2_get_persistent_dirty_bitmap_size For now, we don't have persistent bitmaps in any other formats, but that might not be true in the future. Make it obvious that our incoming parameter is not necessarily a qcow2 image, and therefore is limited to just the bdrv_dirty_bitmap_* API calls (rather than probing into qcow2 internals). Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200608190821.3293867-1-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-06-17 14:53:39 +02:00
Eric Blake	e37adbebd1	block: Refactor subdirectory recursion during make Rather than listing block/monitor from the top-level Makefile.objs, we should instead list monitor from block/Makefile.objs. Suggested-by: Kevin Wolf <kwolf@redhat.com> Fixes: `bb4e58c613` Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200608173339.3244211-1-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-06-17 14:53:39 +02:00
Eric Blake	5c86bdf120	block: Call attention to truncation of long NBD exports Commit `93676c88` relaxed our NBD client code to request export names up to the NBD protocol maximum of 4096 bytes without NUL terminator, even though the block layer can't store anything longer than 4096 bytes including NUL terminator for display to the user. Since this means there are some export names where we have to truncate things, we can at least try to make the truncation a bit more obvious for the user. Note that in spite of the truncated display name, we can still communicate with an NBD server using such a long export name; this was deemed nicer than refusing to even connect to such a server (since the server may not be under our control, and since determining our actual length limits gets tricky when nbd://host:port/export and nbd+unix:///export?socket=/path are themselves variable-length expansions beyond the export name but count towards the block layer name length). Reported-by: Xueqiang Wei <xuwei@redhat.com> Fixes: https://bugzilla.redhat.com/1843684 Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200610163741.3745251-3-eblake@redhat.com>	2020-06-10 12:58:59 -05:00
Vladimir Sementsov-Ogievskiy	7d2410cea1	block: Factor out bdrv_run_co() We have a few bdrv_*() functions that can either spawn a new coroutine and wait for it with BDRV_POLL_WHILE() or use a fastpath if they are alreeady running in a coroutine. All of them duplicate basically the same code. Factor the common code into a new function bdrv_run_co(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20200520144901.16589-1-vsementsov@virtuozzo.com [Factor out bdrv_run_co_entry too] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-05 09:54:48 +01:00
Stefano Garzarella	769335ecb1	io_uring: use io_uring_cq_ready() to check for ready cqes In qemu_luring_poll_cb() we are not using the cqe peeked from the CQ ring. We are using io_uring_peek_cqe() only to see if there are cqes ready, so we can replace it with io_uring_cq_ready(). Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20200519134942.118178-1-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-05 09:54:48 +01:00
Stefano Garzarella	b4e44c9944	io_uring: retry io_uring_submit() if it fails with errno=EINTR As recently documented [1], io_uring_enter(2) syscall can return an error (errno=EINTR) if the operation was interrupted by a delivery of a signal before it could complete. This should happen when IORING_ENTER_GETEVENTS flag is used, for example during io_uring_submit_and_wait() or during io_uring_submit() when IORING_SETUP_IOPOLL is enabled. We shouldn't have this problem for now, but it's better to prevent it. [1] `344355ec66` Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20200519133041.112138-1-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-06-05 09:54:48 +01:00
Eric Blake	5d72c68b49	qcow2: Expose bitmaps' size during measure It's useful to know how much space can be occupied by qcow2 persistent bitmaps, even though such metadata is unrelated to the guest-visible data. Report this value as an additional QMP field, present when measuring an existing image and output format that both support bitmaps. Update iotest 178 and 190 to updated output, as well as new coverage in 190 demonstrating non-zero values made possible with the recently-added qemu-img bitmap command (see `3b51ab4b`). The new 'bitmaps size:' field is displayed automatically as part of 'qemu-img measure' any time it is present in QMP (that is, any time both the source image being measured and destination format support bitmaps, even if the measurement is 0 because there are no bitmaps present). If the field is absent, it means that no bitmaps can be copied (source, destination, or both lack bitmaps, including when measuring based on size rather than on a source image). This behavior is compatible with an upcoming patch adding 'qemu-img convert --bitmaps': that command will fail in the same situations where this patch omits the field. The addition of a new field demonstrates why we should always zero-initialize qapi C structs; while the qcow2 driver still fully populates all fields, the raw and crypto drivers had to be tweaked to avoid uninitialized data. Consideration was also given towards having a 'qemu-img measure --bitmaps' which errors out when bitmaps are not possible, and otherwise sums the bitmaps into the existing allocation totals rather than displaying as a separate field, as a potential convenience factor. But this was ultimately decided to be more complexity than necessary when the QMP interface was sufficient enough with bitmaps remaining a separate field. See also: https://bugzilla.redhat.com/1779904 Reported-by: Nir Soffer <nsoffer@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200521192137.1120211-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2020-05-28 13:16:16 -05:00
Vladimir Sementsov-Ogievskiy	7ae89a0de9	block/dirty-bitmap: add bdrv_has_named_bitmaps helper To be used for bitmap migration in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200521220648.3255-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-05-28 13:15:22 -05:00
Eric Blake	bb4e58c613	blockdev: Split off basic bitmap operations for qemu-img Upcoming patches want to add some basic bitmap manipulation abilities to qemu-img. But blockdev.o is too heavyweight to link into qemu-img (among other things, it would drag in block jobs and transaction support - qemu-img does offline manipulation, where atomicity is less important because there are no concurrent modifications to compete with), so it's time to split off the bare bones of what we will need into a new file block/monitor/bitmap-qmp-cmds.o. This is sufficient to expose 6 QMP commands for use by qemu-img (add, remove, clear, enable, disable, merge), as well as move the three helper functions touched in the previous patch. Regarding MAINTAINERS, the new file is automatically part of block core, but also makes sense as related to other dirty bitmap files. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513011648.166876-6-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2020-05-19 10:32:14 -05:00
Eric Blake	ef893b5c84	block: Make it easier to learn which BDS support bitmaps Upcoming patches will enhance bitmap support in qemu-img, but in doing so, it turns out to be nice to suppress output when persistent bitmaps make no sense (such as on a qcow2 v2 image). Add a hook to make this easier to query. This patch adds a new callback .bdrv_supports_persistent_dirty_bitmap, rather than trying to shoehorn the answer in via existing callbacks. In particular, while it might have been possible to overload .bdrv_co_can_store_new_dirty_bitmap to special-case a NULL input to answer whether any persistent bitmaps are supported, that is at odds with whether a particular bitmap can be stored (for example, even on an image that supports persistent bitmaps but has currently filled up the maximum number of bitmaps, attempts to store another one should fail); and the new functionality doesn't require coroutine safety. Similarly, we could have added one more piece of information to .bdrv_get_info, but then again, most callers to that function tend to already discard extraneous information, and making it a catch-all rather than a series of dedicated scalar queries hasn't really simplified life. In the future, when we improve the ability to look up bitmaps through a filter, we will probably also want to teach the block layer to automatically let filters pass this request on through. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513011648.166876-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2020-05-19 10:32:14 -05:00
Philippe Mathieu-Daudé	d7eca54222	block/block-copy: Simplify block_copy_do_copy() block_copy_do_copy() is static, only used in block_copy_task_entry with the error_is_read argument set. No need to check for it, simplify. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200507121129.29760-3-philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Philippe Mathieu-Daudé	c78dd00e35	block/block-copy: Fix uninitialized variable in block_copy_task_entry Fix when building with -Os: CC block/block-copy.o block/block-copy.c: In function ‘block_copy_task_entry’: block/block-copy.c:428:38: error: ‘error_is_read’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 428 \| t->call_state->error_is_read = error_is_read; \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200507121129.29760-2-philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	e5d8a40685	block: Drop @child_class from bdrv_child_perm() Implementations should decide the necessary permissions based on @role. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-35-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	1f38f04eac	block: Pass BdrvChildRole in remaining cases These calls have no real use for the child role yet, but it will not harm to give one. Notably, the bdrv_root_attach_child() call in blockjob.c is left unmodified because there is not much the generic BlockJob object wants from its children. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-34-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	69dca43d6b	block: Use bdrv_default_perms() bdrv_default_perms() can decide which permission profile to use based on the BdrvChildRole, so block drivers do not need to select it explicitly. The blkverify driver now no longer shares the WRITE permission for the image to verify. We thus have to adjust two places in test-block-iothread not to take it. (Note that in theory, blkverify should behave like quorum in this regard and share neither WRITE nor RESIZE for both of its children. In practice, it does not really matter, because blkverify is used only for debugging, so we might as well keep its permissions rather liberal.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-30-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	58944401d6	block: Use child_of_bds in remaining places Replace child_file by child_of_bds in all remaining places (excluding tests). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-28-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	b3af2af43b	block: Make filter drivers use child_of_bds Note that some filters have secondary children, namely blkverify (the image to be verified) and blklogwrites (the log). This patch does not touch those children. Note that for blkverify, the filtered child should not be format-probed. While there is nothing enforcing this here, in practice, it will not be: blkverify implements .bdrv_file_open. The block layer ensures (and in fact, asserts) that BDRV_O_PROTOCOL is set for every BDS whose driver implements .bdrv_file_open. This flag will then be bequeathed to blkverify's children, and they will thus (by default) not be probed either. ("By default" refers to the fact that blkverify's other child (the non-filtered one) will have BDRV_O_PROTOCOL force-unset, because that is what happens for all non-filtered children of non-format drivers.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-27-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	8b1869daad	block: Make format drivers use child_of_bds Commonly, they need to pass the BDRV_CHILD_IMAGE set as the BdrvChildRole; but there are exceptions for drivers with external data files (qcow2 and vmdk). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-26-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	25191e5ff0	block: Make backing files child_of_bds children Make all parents of backing files pass the appropriate BdrvChildRole. By doing so, we can switch their BdrvChildClass over to the generic child_of_bds, which will do the right thing when given a correct BdrvChildRole. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-24-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	36ee58d13b	block: Switch child_format users to child_of_bds Both users (quorum and blkverify) use child_format for not-really-filtered children, so the appropriate BdrvChildRole in both cases is DATA. (Note that this will cause bdrv_inherited_options() to force-allow format probing.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-22-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	500e243420	raw-format: Split raw_read_options() Split raw_read_options() into one function that actually just reads the options, and another that applies them. This will allow us to detect whether the user has specified any options before attaching the file child (so we can decide on its role based on the options). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-21-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	3cdc69d31b	block: Pass parent_is_format to .inherit_options() We plan to unify the generic .inherit_options() functions. The resulting common function will need to decide whether to force-enable format probing, force-disable it, or leave it as-is. To make this decision, it will need to know whether the parent node is a format node or not (because we never want format probing if the parent is a format node already (except for the backing chain)). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-9-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	272c02eaef	block: Pass BdrvChildRole to .inherit_options() For now, all callers (effectively) pass 0 and no callee evaluates thie value. Later patches will change both. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-8-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	bf8e925eb5	block: Pass BdrvChildRole to bdrv_child_perm() For now, all callers pass 0 and no callee evaluates this value. Later patches will change both. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-7-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	258b776515	block: Add BdrvChildRole to BdrvChild For now, it is always set to 0. Later patches in this series will ensure that all callers pass an appropriate combination of flags. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	bd86fb990c	block: Rename BdrvChildRole to BdrvChildClass This structure nearly only contains parent callbacks for child state changes. It cannot really reflect a child's role, because different roles may overlap (as we will see when real roles are introduced), and because parents can have custom callbacks even when the child fulfills a standard role. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200513110544.176672-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	d67066d8bc	block: Add BlockDriver.is_format We want to unify child_format and child_file at some point. One of the important things that set format drivers apart from other drivers is that they do not expect other format nodes under them (except in the backing chain), i.e. we must not probe formats inside of formats. That means we need something on which to distinguish format drivers from others, and hence this flag. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200513110544.176672-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	6540fd153c	block: Mark commit, mirror, blkreplay as filters The commit, mirror, and blkreplay block nodes are filters, so they should be marked as such. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-2-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	f844ec01b3	block: Use bdrv_make_empty() where possible Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200429141126.85159-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Kevin Wolf	6ecbc6c526	replication: Avoid blk_make_empty() on read-only child This is just a bandaid to keep tests/test-replication working after bdrv_make_empty() starts to assert that we're not trying to call it on a read-only child. For the real solution in the future, replication should not steal the BdrvChild from its backing file (this is never correct to do!), but instead have its own child node references, with the appropriate permissions. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	2d97fde439	block: Use blk_make_empty() after commits bdrv_commit() already has a BlockBackend pointing to the BDS that we want to empty, it just has the wrong permissions. qemu-img commit has no BlockBackend pointing to the old backing file yet, but introducing one is simple. After this commit, bdrv_make_empty() is the only remaining caller of BlockDriver.bdrv_make_empty(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429141126.85159-5-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [kwolf: Fixed up reference output for 098] Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Max Reitz	2b7bbdbdef	block: Add blk_make_empty() Two callers of BlockDriver.bdrv_make_empty() remain that should not call this method directly. Both do not have access to a BdrvChild, but they can use a BlockBackend, so we add this function that lets them use it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429141126.85159-4-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Lukas Straub	e140f4b7b8	block/replication.c: Avoid cancelling the job twice If qemu in colo secondary mode is stopped, it crashes because s->backup_job is canceled twice: First with job_cancel_sync_all() in qemu_cleanup() and then in replication_stop(). Fix this by assigning NULL to s->backup_job when the job completes so replication_stop() and replication_do_checkpoint() won't touch the job. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Message-Id: <20200511090801.7ed5d8f3@luklap> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:25 +02:00
Kevin Wolf	e83dd6808c	mirror: Make sure that source and target size match If the target is shorter than the source, mirror would copy data until it reaches the end of the target and then fail with an I/O error when trying to write past the end. If the target is longer than the source, the mirror job would complete successfully, but the target wouldn't actually be an accurate copy of the source image (it would contain some additional garbage at the end). Fix this by checking that both images have the same size when the job starts. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200511135825.219437-4-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-18 19:05:24 +02:00
Markus Armbruster	d2623129a7	qom: Drop parameter @errp of object_property_add() & friends The only way object_property_add() can fail is when a property with the same name already exists. Since our property names are all hardcoded, failure is a programming error, and the appropriate way to handle it is passing &error_abort. Same for its variants, except for object_property_add_child(), which additionally fails when the child already has a parent. Parentage is also under program control, so this is a programming error, too. We have a bit over 500 callers. Almost half of them pass &error_abort, slightly fewer ignore errors, one test case handles errors, and the remaining few callers pass them to their own callers. The previous few commits demonstrated once again that ignoring programming errors is a bad idea. Of the few ones that pass on errors, several violate the Error API. The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second call. ich9_pm_add_properties(), sparc32_ledma_realize(), sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize() are wrong that way. When the one appropriate choice of argument is &error_abort, letting users pick the argument is a bad idea. Drop parameter @errp and assert the preconditions instead. There's one exception to "duplicate property name is a programming error": the way object_property_add() implements the magic (and undocumented) "automatic arrayification". Don't drop @errp there. Instead, rename object_property_add() to object_property_try_add(), and add the obvious wrapper object_property_add(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-15-armbru@redhat.com> [Two semantic rebase conflicts resolved]	2020-05-15 07:07:58 +02:00
Vladimir Sementsov-Ogievskiy	fc9aefc8c0	block/block-copy: fix use-after-free of task pointer Obviously, we should g_free the task after trace point and offset update. Reported-by: Coverity (CID 1428756) Fixes: `4ce5dd3e9b` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200507183800.22626-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-13 14:20:31 +02:00
Denis Plotnikov	d298ac10ad	qcow2: add zstd cluster compression zstd significantly reduces cluster compression time. It provides better compression performance maintaining the same level of the compression ratio in comparison with zlib, which, at the moment, is the only compression method available. The performance test results: Test compresses and decompresses qemu qcow2 image with just installed rhel-7.6 guest. Image cluster size: 64K. Image on disk size: 2.2G The test was conducted with brd disk to reduce the influence of disk subsystem to the test results. The results is given in seconds. compress cmd: time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib\|zstd] src.img [zlib\|zstd]_compressed.img decompress cmd time ./qemu-img convert -O qcow2 [zlib\|zstd]_compressed.img uncompressed.img compression decompression zlib zstd zlib zstd ------------------------------------------------------------ real 65.5 16.3 (-75 %) 1.9 1.6 (-16 %) user 65.0 15.8 5.3 2.5 sys 3.3 0.2 2.0 2.0 Both ZLIB and ZSTD gave the same compression ratio: 1.57 compressed image size in both cases: 1.4G Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> QAPI part: Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200507082521.29210-4-dplotnikov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-13 14:20:31 +02:00
Denis Plotnikov	25dd077d1d	qcow2: rework the cluster compression routine The patch enables processing the image compression type defined for the image and chooses an appropriate method for image clusters (de)compression. Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200507082521.29210-3-dplotnikov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-13 14:20:31 +02:00
Denis Plotnikov	572ad9783f	qcow2: introduce compression type feature The patch adds some preparation parts for incompatible compression type feature to qcow2 allowing the use different compression methods for image clusters (de)compressing. It is implied that the compression type is set on the image creation and can be changed only later by image conversion, thus compression type defines the only compression algorithm used for the image, and thus, for all image clusters. The goal of the feature is to add support of other compression methods to qcow2. For example, ZSTD which is more effective on compression than ZLIB. The default compression is ZLIB. Images created with ZLIB compression type are backward compatible with older qemu versions. Adding of the compression type breaks a number of tests because now the compression type is reported on image creation and there are some changes in the qcow2 header in size and offsets. The tests are fixed in the following ways: * filter out compression_type for many tests * fix header size, feature table size and backing file offset affected tests: 031, 036, 061, 080 header_size +=8: 1 byte compression type 7 bytes padding feature_table += 48: incompatible feature compression type backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change) * add "compression type" for test output matching when it isn't filtered affected tests: 049, 060, 061, 065, 082, 085, 144, 182, 185, 198, 206, 242, 255, 274, 280 Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> QAPI part: Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200507082521.29210-2-dplotnikov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-13 14:20:31 +02:00
Eric Blake	47e0b38a13	block: Drop unused .bdrv_has_zero_init_truncate Now that there are no clients of bdrv_has_zero_init_truncate, none of the drivers need to worry about providing it. What's more, this eliminates a source of some confusion: a literal reading of the documentation as written in `ceaca56f` and implemented in commit `1dcaf527` claims that a driver which returns 0 for bdrv_has_zero_init_truncate() must not return 1 for bdrv_has_zero_init(); this condition was violated for parallels, qcow, and sometimes for vdi, although in practice it did not matter since those drivers also lacked .bdrv_co_truncate. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-10-eblake@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	dbc636e791	vhdx: Rework truncation logic The vhdx driver uses truncation for image growth, with a special case for blocks that already read as zero but which are only being partially written. But with a bit of rearranging, it's just as easy to defer the decision on whether truncation resulted in zeroes to the actual allocation attempt, reducing the number of places that still use bdrv_has_zero_init_truncate. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-9-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	bda4cdcbb9	parallels: Rework truncation logic The parallels driver tries to use truncation for image growth, but can only do so when reads are guaranteed as zero. Now that we have a way to request zero contents from truncation, we can defer the decision to actual allocation attempts rather than up front, reducing the number of places that still use bdrv_has_zero_init_truncate. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-8-eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	be9c9404db	ssh: Support BDRV_REQ_ZERO_WRITE for truncate Our .bdrv_has_zero_init_truncate can detect when the remote side always zero fills; we can reuse that same knowledge to implement BDRV_REQ_ZERO_WRITE by ignoring it when the server gives it to us for free. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-7-eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	fec00559e7	sheepdog: Support BDRV_REQ_ZERO_WRITE for truncate Our .bdrv_has_zero_init_truncate always returns 1 because sheepdog always 0-fills; we can use that same knowledge to implement BDRV_REQ_ZERO_WRITE by ignoring it. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-6-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	2f98910d5b	rbd: Support BDRV_REQ_ZERO_WRITE for truncate Our .bdrv_has_zero_init_truncate always returns 1 because rbd always 0-fills; we can use that same knowledge to implement BDRV_REQ_ZERO_WRITE by ignoring it. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-5-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	8f23aaf5d6	nfs: Support BDRV_REQ_ZERO_WRITE for truncate Our .bdrv_has_zero_init_truncate returns 1 if we detect that the OS always 0-fills; we can use that same knowledge to implement BDRV_REQ_ZERO_WRITE by ignoring it when the OS gives it to us for free. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-4-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	8e51979504	file-win32: Support BDRV_REQ_ZERO_WRITE for truncate When using bdrv_file, .bdrv_has_zero_init_truncate always returns 1; therefore, we can behave just like file-posix, and always implement BDRV_REQ_ZERO_WRITE by ignoring it since the OS gives it to us for free (note that file-posix.c had to use an 'if' because it shared code between regular files and block devices, but in file-win32.c, bdrv_host_device uses a separate .bdrv_file_open). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-3-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Eric Blake	5e09bcee5b	gluster: Drop useless has_zero_init callback block.c already defaults to 0 if we don't provide a callback; there's no need to write a callback that always fails. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200428202905.770727-2-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Max Reitz	4b96fa3846	qcow2: Fix preallocation on block devices Calling bdrv_getlength() to get the pre-truncate file size will not really work on block devices, because they have always the same length, and trying to write beyond it will fail with a rather cryptic error message. Instead, we should use qcow2_get_last_cluster() and bdrv_getlength() only as a fallback. Before this patch: $ truncate -s 1G test.img $ sudo losetup -f --show test.img /dev/loop0 $ sudo qemu-img create -f qcow2 -o preallocation=full /dev/loop0 64M Formatting '/dev/loop0', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=full lazy_refcounts=off refcount_bits=16 qemu-img: /dev/loop0: Could not resize image: Failed to resize refcount structures: No space left on device With this patch: $ sudo qemu-img create -f qcow2 -o preallocation=full /dev/loop0 64M Formatting '/dev/loop0', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=full lazy_refcounts=off refcount_bits=16 qemu-img: /dev/loop0: Could not resize image: Failed to resize underlying file: Preallocation mode 'full' unsupported for this non-regular file So as you can see, it still fails, but now the problem is missing support on the block device level, so we at least get a better error message. Note that we cannot preallocate block devices on truncate by design, because we do not know what area to preallocate. Their length is always the same, the truncate operation does not change it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200505141801.1096763-1-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	958a04bd32	backup: Make sure that source and target size match Since the introduction of a backup filter node in commit `00e30f05d`, the backup block job crashes when the target image is smaller than the source image because it will try to write after the end of the target node without having BLK_PERM_RESIZE. (Previously, the BlockBackend layer would have caught this and errored out gracefully.) We can fix this and even do better than the old behaviour: Check that source and target have the same image size at the start of the block job and unshare BLK_PERM_RESIZE. (This permission was already unshared before the same commit `00e30f05d`, but the BlockBackend that was used to make the restriction was removed without a replacement.) This will immediately error out when starting the job instead of only when writing to a block that doesn't exist in the target. Longer target than source would technically work because we would never write to blocks that don't exist, but semantically these are invalid, too, because a backup is supposed to create a copy, not just an image that starts with a copy. Fixes: `00e30f05de` Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1778593 Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430142755.315494-4-kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	58226634c4	backup: Improve error for bdrv_getlength() failure bdrv_get_device_name() will be an empty string with modern management tools that don't use -drive. Use bdrv_get_device_or_node_name() instead so that the node name is used if the BlockBackend is anonymous. While at it, start with upper case to make the message consistent with the rest of the function. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200430142755.315494-3-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	2758be056b	vmdk: Flush only once in vmdk_L2update() If we have a backup L2 table, we currently flush once after writing to the active L2 table and again after writing to the backup table. A single flush is enough and makes things a little less slow. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-6-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	78cae78dbc	vmdk: Don't update L2 table for zero write on zero cluster If a cluster is already zeroed, we don't have to call vmdk_L2update(), which is rather slow because it flushes the image file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-5-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	4823cde58e	vmdk: Fix partial overwrite of zero cluster When overwriting a zero cluster, we must not perform copy-on-write from the backing file, but from a zeroed buffer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-4-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	2821c1cc0f	vmdk: Fix zero cluster allocation m_data must contain valid data even for zero clusters when no cluster was allocated in the image file. Without this, zero writes segfault with images that have zeroed_grain=on. For zero writes, we don't want to allocate a cluster in the image file even in compressed files. Fixes: `524089bce4` Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-3-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Kevin Wolf	4dc20e6465	vmdk: Rename VmdkMetaData.valid to new_allocation m_data is used for zero clusters even though valid == 0. It really only means that a new cluster was allocated in the image file. Rename it to reflect this. While at it, change it from int to bool, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Alberto Garcia	e4d7019e1a	qcow2: Avoid integer wraparound in qcow2_co_truncate() After commit `f01643fb8b` when an image is extended and BDRV_REQ_ZERO_WRITE is set then the new clusters are zeroized. The code however does not detect correctly situations when the old and the new end of the image are within the same cluster. The problem can be reproduced with these steps: qemu-img create -f qcow2 backing.qcow2 1M qemu-img create -f qcow2 -F qcow2 -b backing.qcow2 top.qcow2 qemu-img resize --shrink top.qcow2 520k qemu-img resize top.qcow2 567k In the last step offset - zero_start causes an integer wraparound. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200504155217.10325-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-05-08 13:26:35 +02:00
Maxim Levitsky	3d1900a471	block: luks: better error message when creating too large files Currently if you attampt to create too large file with luks you get the following error message: Formatting 'test.luks', fmt=luks size=17592186044416 key-secret=sec0 qemu-img: test.luks: Could not resize file: File too large While for raw format the error message is qemu-img: test.img: The image size is too large for file format 'raw' The reason for this is that qemu-img checks for errono of the failure, and presents the later error when it is -EFBIG However crypto generic code 'swallows' the errno and replaces it with -EIO. As an attempt to make it better, we can make luks driver, detect -EFBIG and in this case present a better error message, which is what this patch does The new error message is: qemu-img: error creating test.luks: The requested file size is too large Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534898 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2020-05-07 12:52:33 +01:00
Peter Maydell	ea1329bb3a	Block patches: - Asynchronous copying for block-copy (i.e., the backup job) - Allow resizing of qcow2 images when they have internal snapshots - iotests: Logging improvements for Python tests - iotest 153 fix, and block comment cleanups -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl6xYpoSHG1yZWl0ekBy ZWRoYXQuY29tAAoJEPQH2wBh1c9AF7IH/j1wr6m8OrZtdAoebVqFQn2buydT+kGP DP4IVfJ4YoTwmZCBxoR5ZuH6kWTWgUDz1w5x7U5A70tVLqK+RwCnAxrlz19s6rVP ACp1d6GO7iLAEH58KRsvSvy7OTvpKzEXP8tS8kDsK58xl8m65vSBLIt9xUpZMql2 VAiftyu7MYGpObEoy2SnTuhM5H9pPcNiuwATGLNrJqdmi+bW8sIKiV3+yND7s1cz lWqVHaWq0XUfBaSG4sx6BqBeBl+cchv0XyRrwNJefY3lVLxIiizwxmb7UYBRSBrY XIKGOl1qZNE0AETuF2wh7XzTJuHe7Asash5dIIip0sqtbiToVlSxQwY= =vtg/ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2020-05-05' into staging Block patches: - Asynchronous copying for block-copy (i.e., the backup job) - Allow resizing of qcow2 images when they have internal snapshots - iotests: Logging improvements for Python tests - iotest 153 fix, and block comment cleanups # gpg: Signature made Tue 05 May 2020 13:56:58 BST # gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40 # gpg: issuer "mreitz@redhat.com" # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full] # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * remotes/maxreitz/tags/pull-block-2020-05-05: (24 commits) block/block-copy: use aio-task-pool API block/block-copy: refactor task creation block/block-copy: add state pointer to BlockCopyTask block/block-copy: alloc task on each iteration block/block-copy: rename in-flight requests to tasks Fix iotest 153 block: Comment cleanups qcow2: Tweak comment about bitmaps vs. resize qcow2: Allow resize of images with internal snapshots block: Add blk_new_with_bs() helper iotests: use python logging for iotests.log() iotests: Mark verify functions as private iotest 258: use script_main iotests: add script_initialize iotests: add hmp helper with logging iotests: limit line length to 79 chars iotests: touch up log function signature iotests: drop pre-Python 3.4 compatibility code iotests: alphabetize standard imports iotests: add pylintrc file ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-05-05 16:46:37 +01:00
Peter Maydell	f19d118bed	nbd patches for 2020-05-04 - reduce client-side fragmentation of NBD trim and status requests - fix iotest 41 when run in deep tree - fix socket activation in qemu-nbd -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl6whTUACgkQp6FrSiUn Q2r4nAf7BtGSFMkUu6nWYeq+Ggg+Xwmz2FLAzWTK/rccGDC44c9ETzOIbWEddo6X FHpU07VXdLW1h2M7ox8lQVo0DZEFxTRBYTPtUtjB7izfkAs4CkYeElJsZAPAZKgU GsKqa3RM6uXubsQaXXXjMFCGlYgqi1dVkmkgtPebt7evSe0ATlTfYfd0y9gb5f9C cbHD3CVcGKQe4ZtNcSBpTzOvXJSrBZznyCyhBO2qmVXTynt/5Ygog+Ulq3DHZsPX UkRkTPohKA0BhXuS7wD49danlzCLiTlvswr62fAncM1+AJTbmIa+apy3SwiOkwMh Aawq5vDtaFV+HEBKbMC0QRhgtoEe1w== =ExlI -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-05-04' into staging nbd patches for 2020-05-04 - reduce client-side fragmentation of NBD trim and status requests - fix iotest 41 when run in deep tree - fix socket activation in qemu-nbd # gpg: Signature made Mon 04 May 2020 22:12:21 BST # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2020-05-04: block/nbd-client: drop max_block restriction from discard block/nbd-client: drop max_block restriction from block_status iotests/041: Fix NBD socket path tools: Fix use of fcntl(F_SETFD) during socket activation Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-05-05 15:47:44 +01:00
Peter Maydell	a2261b2754	trivial patches (20200504) Silent static analyzer warning Remove dead assignments Support -chardev serial on macOS Update MAINTAINERS Some cosmetic changes -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAl6wOI4SHGxhdXJlbnRA dml2aWVyLmV1AAoJEPMMOL0/L748p7UQAIFSNN0FrDV+K7i8qqq0X+JrS+dNOHNm DSpOf8IaGm/BezzL6XirXBVpFxg9iB5DQVLsjP1kUggO7rbBO0blx5H5eOPhnXZj xg60kLN16ty7NZ/WPS1G9jF4nDsjz0ZUtCXb0OXsuGJIOrsmN2r/lxdJwcjHZaqJ RzbcCSFXlvL0g7mOakJinMJH5r/nWCiUoEYsikhP10DcvuSBoCnjr+LYV6Ef02G0 Y5lgKN2G0EAMgWTJaL3gIF27zS8QLDNll+eO+PIU5K4yo75/wRCKr4e3PpErZlf6 B+hCAAPnXCpDKw+8sK2z+9OZXUGe1hQ8LHNgNNM921C66f+vLLXpIDTAECihM4K4 0wThYlFDwT4j+PMHFNlzIobGMtb33ui8m40lepMt/YOVFqY4tr8u3MLhHkVDo2+8 sNuOOWLXAoFOYyRqgTeVJvZvMUFQqtDiftghw1BR55TyIpDWjvLYRqae5CI+MGXs 6YylZVHGzVjMVptxvivvIQ735Nq8LaKq7N8Cb7uvcbRaCki39BsxXVPZx4p6NdwN dMndUOz/y75dNlRMDjK8l/oRFPJa/p1Yz8mZhl0uVOO6JeJhBwYmk+WkQ7g/GHZb Rx15HnVWRu6C/Icbw4kqZYyqrgl5lykS8aAWURePdpjzKY77rY1H71FesMhjifRN ZGgfUdWI88M4 =ibgH -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-5.1-pull-request' into staging trivial patches (20200504) Silent static analyzer warning Remove dead assignments Support -chardev serial on macOS Update MAINTAINERS Some cosmetic changes # gpg: Signature made Mon 04 May 2020 16:45:18 BST # gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C # gpg: issuer "laurent@vivier.eu" # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full] # gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full] # gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full] # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C * remotes/vivier2/tags/trivial-branch-for-5.1-pull-request: hw/timer/pxa2xx_timer: Add assertion to silent static analyzer warning hw/timer/stm32f2xx_timer: Remove dead assignment hw/gpio/aspeed_gpio: Remove dead assignment hw/isa/i82378: Remove dead assignment hw/ide/sii3112: Remove dead assignment hw/input/adb-kbd: Remove dead assignment hw/i2c/pm_smbus: Remove dead assignment blockdev: Remove dead assignment block: Avoid dead assignment Compress lines for immediate return chardev: Add macOS to list of OSes that support -chardev serial MAINTAINERS: Update Keith Busch's email address elf_ops: Don't try to g_mapped_file_unref(NULL) hw/mem/pc-dimm: Fix line over 80 characters warning hw/mem/pc-dimm: Print slot number on error at pc_dimm_pre_plug() MAINTAINERS: Mark the LatticeMico32 target as orphan timer/exynos4210_mct: Remove redundant statement in exynos4210_mct_write() display/blizzard: use extract16() for fix clang analyzer warning in blizzard_draw_line16_32() scsi/esp-pci: add g_assert() for fix clang analyzer warning in esp_pci_io_write() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-05-05 14:03:28 +01:00
Vladimir Sementsov-Ogievskiy	4ce5dd3e9b	block/block-copy: use aio-task-pool API Run block_copy iterations in parallel in aio tasks. Changes: - BlockCopyTask becomes aio task structure. Add zeroes field to pass it to block_copy_do_copy - add call state - it's a state of one call of block_copy(), shared between parallel tasks. For now used only to keep information about first error: is it read or not. - convert block_copy_dirty_clusters to aio-task loop. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200429130847.28124-6-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 14:03:28 +02:00
Vladimir Sementsov-Ogievskiy	42ac214406	block/block-copy: refactor task creation Instead of just relying on the comment "Called only on full-dirty region" in block_copy_task_create() let's move initial dirty area search directly to block_copy_task_create(). Let's also use effective bdrv_dirty_bitmap_next_dirty_area instead of looping through all non-dirty clusters. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429130847.28124-5-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 14:03:28 +02:00
Vladimir Sementsov-Ogievskiy	1348a65774	block/block-copy: add state pointer to BlockCopyTask We are going to use aio-task-pool API, so we'll need state pointer in BlockCopyTask anyway. Add it now and use where possible. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429130847.28124-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 14:03:28 +02:00
Vladimir Sementsov-Ogievskiy	f13e60a973	block/block-copy: alloc task on each iteration We are going to use aio-task-pool API, so tasks will be handled in parallel. We need therefore separate allocated task on each iteration. Introduce this logic now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429130847.28124-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 14:03:28 +02:00
Vladimir Sementsov-Ogievskiy	e9407785cc	block/block-copy: rename in-flight requests to tasks We are going to use aio-task-pool API and extend in-flight request structure to be a successor of AioTask, so rename things appropriately. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429130847.28124-2-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 14:03:28 +02:00
Eric Blake	f464906951	block: Comment cleanups It's been a while since we got rid of the sector-based bdrv_read and bdrv_write (commit `2e11d756`); let's finish the job on a few remaining comments. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428213807.776655-1-eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 13:17:36 +02:00
Eric Blake	ee1244a2e9	qcow2: Tweak comment about bitmaps vs. resize Our comment did not actually match the code. Rewrite the comment to be less sensitive to any future changes to qcow2-bitmap.c that might implement scenarios that we currently reject. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200428192648.749066-4-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 13:17:36 +02:00
Eric Blake	7fa140abf6	qcow2: Allow resize of images with internal snapshots We originally refused to allow resize of images with internal snapshots because the v2 image format did not require the tracking of snapshot size, making it impossible to safely revert to a snapshot with a different size than the current view of the image. But the snapshot size tracking was rectified in v3, and our recent fixes to qemu-img amend (see `0a85af35`) guarantee that we always have a valid snapshot size. Thus, we no longer need to artificially limit image resizes, but it does become one more thing that would prevent a downgrade back to v2. And now that we support different-sized snapshots, it's also easy to fix reverting to a snapshot to apply the new size. Upgrade iotest 61 to cover this (we previously had NO coverage of refusal to resize while snapshots exist). Note that the amend process can fail but still have effects: in particular, since we break things into upgrade, resize, downgrade, a failure during resize does not roll back changes made during upgrade, nor does failure in downgrade roll back a resize. But this situation is pre-existing even without this patch; and without journaling, the best we could do is minimize the chance of partial failure by collecting all changes prior to doing any writes - which adds a lot of complexity but could still fail with EIO. On the other hand, we are careful that even if we have partial modification but then fail, the image is left viable (that is, we are careful to sequence things so that after each successful cluster write, there may be transient leaked clusters but no corrupt metadata). And complicating the code to make it more transaction-like is not worth the effort: a user can always request multiple 'qemu-img amend' changing one thing each, if they need finer-grained control over detecting the first failure than what they get by letting qemu decide how to sequence multiple changes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200428192648.749066-3-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 13:17:36 +02:00
Eric Blake	a3aeeab557	block: Add blk_new_with_bs() helper There are several callers that need to create a new block backend from an existing BDS; make the task slightly easier with a common helper routine. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200424190903.522087-2-eblake@redhat.com> [mreitz: Set @ret only in error paths, see https://lists.nongnu.org/archive/html/qemu-block/2020-04/msg01216.html] Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200428192648.749066-2-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-05-05 13:17:36 +02:00
Vladimir Sementsov-Ogievskiy	714eb0dbc5	block/nbd-client: drop max_block restriction from discard The NBD spec was updated (see nbd.git commit 9f30fedb) so that max_block doesn't relate to NBD_CMD_TRIM. So, drop the restriction. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200401150112.9557-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: tweak commit message to call out NBD commit] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-05-04 15:16:46 -05:00
Vladimir Sementsov-Ogievskiy	6bf792b464	block/nbd-client: drop max_block restriction from block_status The NBD spec was updated (see nbd.git commit 9f30fedb) so that max_block doesn't relate to NBD_CMD_BLOCK_STATUS. So, drop the restriction. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200401150112.9557-2-vsementsov@virtuozzo.com> [eblake: tweak commit message to call out NBD commit] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-05-04 15:13:14 -05:00
Daniel Brodsky	6e8a355de6	lockable: replaced locks with lock guard macros where appropriate - ran regexp "qemu_mutex_lock$.$.\n.*if" to find targets - replaced result with QEMU_LOCK_GUARD if all unlocks at function end - replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end Signed-off-by: Daniel Brodsky <dnbrdsky@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-id: 20200404042108.389635-3-dnbrdsky@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-05-04 16:07:43 +01:00
Simran Singhal	b3ac2b94cd	Compress lines for immediate return Compress two lines into a single line if immediate return statement is found. It also remove variables progress, val, data, ret and sock as they are no longer needed. Remove space between function "mixer_load" and '(' to fix the checkpatch.pl error:- ERROR: space prohibited between function name and open parenthesis '(' Done using following coccinelle script: @@ local idexpression ret; expression e; @@ -ret = +return e; -return ret; Signed-off-by: Simran Singhal <singhalsimran0@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200401165314.GA3213@simran-Inspiron-5558> [lv: in handle_aiocb_write_zeroes_unmap() move "int ret" inside the #ifdef] Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-05-04 14:43:22 +02:00
Kevin Wolf	eb8a0cf3ba	qcow2: Forward ZERO_WRITE flag for full preallocation The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the image is possibly preallocated and then the zero flag is added to all clusters. This means that a copy-on-write operation may be needed when writing to these clusters, despite having used preallocation, negating one of the major benefits of preallocation. Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver, and if the protocol driver can ensure that the new area reads as zeros, we can skip setting the zero flag in the qcow2 layer. Unfortunately, the same approach doesn't work for metadata preallocation, so we'll still set the zero flag there. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424142701.67053-1-kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	955c7d6687	block: truncate: Don't make backing file data visible When extending the size of an image that has a backing file larger than its old size, make sure that the backing file data doesn't become visible in the guest, but the added area is properly zeroed out. Consider the following scenario where the overlay is shorter than its backing file: base.qcow2: AAAAAAAA overlay.qcow2: BBBB When resizing (extending) overlay.qcow2, the new blocks should not stay unallocated and make the additional As from base.qcow2 visible like before this patch, but zeros should be read. A similar case happens with the various variants of a commit job when an intermediate file is short (- for unallocated): base.qcow2: A-A-AAAA mid.qcow2: BB-B top.qcow2: C--C--C- After commit top.qcow2 to mid.qcow2, the following happens: mid.qcow2: CB-C00C0 (correct result) mid.qcow2: CB-C--C- (before this fix) Without the fix, blocks that previously read as zeros on top.qcow2 suddenly turn into A. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200424125448.63318-8-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	2f0c6e7a65	file-posix: Support BDRV_REQ_ZERO_WRITE for truncate For regular files, we always get BDRV_REQ_ZERO_WRITE behaviour from the OS, so we can advertise the flag and just ignore it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-7-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	1ddaabaecb	raw-format: Support BDRV_REQ_ZERO_WRITE for truncate The raw format driver can simply forward the flag and let its bs->file child take care of actually providing the zeros. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200424125448.63318-6-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	f01643fb8b	qcow2: Support BDRV_REQ_ZERO_WRITE for truncate If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't undo any previous preallocation, but just adds the zero flag to all relevant L2 entries. If an external data file is in use, a write_zeroes request to the data file is made instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200424125448.63318-5-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	8c6242b6f3	block-backend: Add flags to blk_truncate() Now that node level interface bdrv_truncate() supports passing request flags to the block driver, expose this on the BlockBackend level, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	7b8e485742	block: Add flags to bdrv(_co)_truncate() Now that block drivers can support flags for .bdrv_co_truncate, expose the parameter in the node level interfaces bdrv_co_truncate() and bdrv_truncate(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-3-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Kevin Wolf	92b92799dc	block: Add flags to BlockDriver.bdrv_co_truncate() This adds a new BdrvRequestFlags parameter to the .bdrv_co_truncate() driver callbacks, and a supported_truncate_flags field in BlockDriverState that allows drivers to advertise support for request flags in the context of truncate. For now, we always pass 0 and no drivers declare support for any flag. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-2-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-30 17:51:07 +02:00
Markus Armbruster	1f5842487a	qapi: Only input visitors can actually fail The previous few commits have made this more obvious, and removed the one exception. Time to clarify the documentation, and drop dead error checking. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200424084338.26803-13-armbru@redhat.com>	2020-04-30 07:26:40 +02:00
Markus Armbruster	77ed971b9d	block/file-posix: Fix check_cache_dropped() error handling The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second call. check_cache_dropped() calls error_setg() in a loop. It fails to break the loop in one instance. If a subsequent iteration error_setg()s again, it trips error_setv()'s assertion. Fix it to break the loop. Fixes: `31be8a2a97` Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200422130719.28225-3-armbru@redhat.com>	2020-04-29 08:01:52 +02:00
Philippe Mathieu-Daudé	78ee6bd048	various: Remove suspicious '\' character outside of #define in C code Fixes the following coccinelle warnings: $ spatch --sp-file --verbose-parsing ... \ scripts/coccinelle/remove_local_err.cocci ... SUSPICIOUS: a \ character appears outside of a #define at ./target/ppc/translate_init.inc.c:5213 SUSPICIOUS: a \ character appears outside of a #define at ./target/ppc/translate_init.inc.c:5261 SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:166 SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:167 SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:169 SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:170 SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:171 SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:172 SUSPICIOUS: a \ character appears outside of a #define at ./target/microblaze/cpu.c:173 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5787 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5789 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5800 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5801 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5802 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5804 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5805 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:5806 SUSPICIOUS: a \ character appears outside of a #define at ./target/i386/cpu.c:6329 SUSPICIOUS: a \ character appears outside of a #define at ./hw/sd/sdhci.c:1133 SUSPICIOUS: a \ character appears outside of a #define at ./hw/scsi/scsi-disk.c:3081 SUSPICIOUS: a \ character appears outside of a #define at ./hw/net/virtio-net.c:1529 SUSPICIOUS: a \ character appears outside of a #define at ./hw/riscv/sifive_u.c:468 SUSPICIOUS: a \ character appears outside of a #define at ./dump/dump.c:1895 SUSPICIOUS: a \ character appears outside of a #define at ./block/vhdx.c:2209 SUSPICIOUS: a \ character appears outside of a #define at ./block/vhdx.c:2215 SUSPICIOUS: a \ character appears outside of a #define at ./block/vhdx.c:2221 SUSPICIOUS: a \ character appears outside of a #define at ./block/vhdx.c:2222 SUSPICIOUS: a \ character appears outside of a #define at ./block/replication.c:172 SUSPICIOUS: a \ character appears outside of a #define at ./block/replication.c:173 Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20200412223619.11284-2-f4bug@amsat.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-04-29 08:01:51 +02:00
Chen Qun	ff0507c239	block/iscsi:fix heap-buffer-overflow in iscsi_aio_ioctl_cb There is an overflow, the source 'datain.data[2]' is 100 bytes, but the 'ss' is 252 bytes.This may cause a security issue because we can access a lot of unrelated memory data. The len for sbp copy data should take the minimum of mx_sb_len and sb_len_wr, not the maximum. If we use iscsi device for VM backend storage, ASAN show stack: READ of size 252 at 0xfffd149dcfc4 thread T0 #0 0xaaad433d0d34 in __asan_memcpy (aarch64-softmmu/qemu-system-aarch64+0x2cb0d34) #1 0xaaad45f9d6d0 in iscsi_aio_ioctl_cb /qemu/block/iscsi.c:996:9 #2 0xfffd1af0e2dc (/usr/lib64/iscsi/libiscsi.so.8+0xe2dc) #3 0xfffd1af0d174 (/usr/lib64/iscsi/libiscsi.so.8+0xd174) #4 0xfffd1af19fac (/usr/lib64/iscsi/libiscsi.so.8+0x19fac) #5 0xaaad45f9acc8 in iscsi_process_read /qemu/block/iscsi.c:403:5 #6 0xaaad4623733c in aio_dispatch_handler /qemu/util/aio-posix.c:467:9 #7 0xaaad4622f350 in aio_dispatch_handlers /qemu/util/aio-posix.c:510:20 #8 0xaaad4622f350 in aio_dispatch /qemu/util/aio-posix.c:520 #9 0xaaad46215944 in aio_ctx_dispatch /qemu/util/async.c:298:5 #10 0xfffd1bed12f4 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x512f4) #11 0xaaad46227de0 in glib_pollfds_poll /qemu/util/main-loop.c:219:9 #12 0xaaad46227de0 in os_host_main_loop_wait /qemu/util/main-loop.c:242 #13 0xaaad46227de0 in main_loop_wait /qemu/util/main-loop.c:518 #14 0xaaad43d9d60c in qemu_main_loop /qemu/softmmu/vl.c:1662:9 #15 0xaaad4607a5b0 in main /qemu/softmmu/main.c:49:5 #16 0xfffd1a460b9c in __libc_start_main (/lib64/libc.so.6+0x20b9c) #17 0xaaad43320740 in _start (aarch64-softmmu/qemu-system-aarch64+0x2c00740) 0xfffd149dcfc4 is located 0 bytes to the right of 100-byte region [0xfffd149dcf60,0xfffd149dcfc4) allocated by thread T0 here: #0 0xaaad433d1e70 in __interceptor_malloc (aarch64-softmmu/qemu-system-aarch64+0x2cb1e70) #1 0xfffd1af0e254 (/usr/lib64/iscsi/libiscsi.so.8+0xe254) #2 0xfffd1af0d174 (/usr/lib64/iscsi/libiscsi.so.8+0xd174) #3 0xfffd1af19fac (/usr/lib64/iscsi/libiscsi.so.8+0x19fac) #4 0xaaad45f9acc8 in iscsi_process_read /qemu/block/iscsi.c:403:5 #5 0xaaad4623733c in aio_dispatch_handler /qemu/util/aio-posix.c:467:9 #6 0xaaad4622f350 in aio_dispatch_handlers /qemu/util/aio-posix.c:510:20 #7 0xaaad4622f350 in aio_dispatch /qemu/util/aio-posix.c:520 #8 0xaaad46215944 in aio_ctx_dispatch /qemu/util/async.c:298:5 #9 0xfffd1bed12f4 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x512f4) #10 0xaaad46227de0 in glib_pollfds_poll /qemu/util/main-loop.c:219:9 #11 0xaaad46227de0 in os_host_main_loop_wait /qemu/util/main-loop.c:242 #12 0xaaad46227de0 in main_loop_wait /qemu/util/main-loop.c:518 #13 0xaaad43d9d60c in qemu_main_loop /qemu/softmmu/vl.c:1662:9 #14 0xaaad4607a5b0 in main /qemu/softmmu/main.c:49:5 #15 0xfffd1a460b9c in __libc_start_main (/lib64/libc.so.6+0x20b9c) #16 0xaaad43320740 in _start (aarch64-softmmu/qemu-system-aarch64+0x2c00740) Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200418062602.10776-1-kuhn.chenqun@huawei.com Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-04-20 11:31:46 +01:00
Peter Maydell	2f37b0222c	Block layer patches: - Fix crashes and hangs related to iothreads, bdrv_drain and block jobs: - Fix some AIO context locking in jobs - Fix blk->in_flight during blk_wait_while_drained() - vpc: Don't round up already aligned BAT sizes -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJejI1UAAoJEH8JsnLIjy/WQhEQAIF2zkkdbT77VTMLvFmPU36J hPR2RSzd5egNtfn3zm7vGZJ3FEZb5SEAV5R9CnwVAJeQnPUll7bjkPsoynrEUFU2 PSoFe8lKuYwoAOOSjqakASpKx/Xs3sctKc4fgRWr5/dbWFCC77Q+qxHiKO4KYfeh g00jsjeCLMVLop3r9uMDovlA81mJUIP5axSjSH7akgzLC+ZCfH2RFFu62D7wYv9w UgQM/NZt2YuLFm+BeQLB4K2BK+PZBCN1trPj+h11ecUwm9b1XZZfO/7R0T8Wrljc 49fd5Z/GombCnyxuLCr6QrhLZcr8FffLJXQgkdHTkWUKQXqZUfmqLjVIAbli0ZiC hP8jE5EgdAXpBrGeM2oH+dk0iQSBGTIMaGlhDwxO32xOAQ8TKpRiMZMfwGHqB+vn m/EPoHLHL6vQGxISfGjj4k3fnSP74nvRrS656MhLuG03SkPZacyTZQpTkErg2imW AeU6g4afvvLtJBoF/YYA5Qhff1Ux5eh9jactIW1DRf/Q4tTc4ioTU25560Le4eGZ kex/AwIcf9P47eTUCP8L6iNKz8RU7bV4g9vl9zz7fQm3i9GEhly88XvOppnXRUvT XdkfdlSmUZ9vue6rAfgsL5fQIHtsGRfH90nT11/IW1X4baOImtcQBWg3xZdR4zPS W2H0J01PlSKE8l/OWQlo =oODf -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - Fix crashes and hangs related to iothreads, bdrv_drain and block jobs: - Fix some AIO context locking in jobs - Fix blk->in_flight during blk_wait_while_drained() - vpc: Don't round up already aligned BAT sizes # gpg: Signature made Tue 07 Apr 2020 15:25:24 BST # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: vpc: Don't round up already aligned BAT sizes block: Fix blk->in_flight during blk_wait_while_drained() block: Increase BB.in_flight for coroutine and sync interfaces block-backend: Reorder flush/pdiscard function definitions backup: don't acquire aio_context in backup_clean replication: assert we own context before job_cancel_sync job: take each job's lock individually in job_txn_apply Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-04-07 19:12:45 +01:00
Kevin Wolf	3f6de653b9	vpc: Don't round up already aligned BAT sizes As reported on Launchpad, Azure apparently doesn't accept images for upload that are not both aligned to 1 MB blocks and have a BAT size that matches the image size exactly. As far as I can tell, there is no real reason why we create a BAT that is one entry longer than necessary for aligned image sizes, so change that. (Even though the condition is only mentioned as "should" in the spec and previous products accepted larger BATs - but we'll try to maintain compatibility with as many of Microsoft's ever-changing interpretations of the VHD spec as possible.) Fixes: https://bugs.launchpad.net/bugs/1870098 Reported-by: Tobias Witek Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200402093603.2369-1-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-07 15:42:08 +02:00
Kevin Wolf	7f16476fab	block: Fix blk->in_flight during blk_wait_while_drained() Waiting in blk_wait_while_drained() while blk->in_flight is increased for the current request is wrong because it will cause the drain operation to deadlock. This patch makes sure that blk_wait_while_drained() is called with blk->in_flight increased exactly once for the current request, and that it temporarily decreases the counter while it waits. Fixes: `cf3129323f` Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200407121259.21350-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-07 15:40:57 +02:00
Kevin Wolf	fbb92b6798	block: Increase BB.in_flight for coroutine and sync interfaces External callers of blk_co_() and of the synchronous blk_() functions don't currently increase the BlockBackend.in_flight counter, but calls from blk_aio_() do, so there is an inconsistency whether the counter has been increased or not. This patch moves the actual operations to static functions that can later know they will always be called with in_flight increased exactly once, even for external callers using the blk_co_() coroutine interfaces. If the public blk_co_*() interface is unused, remove it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200407121259.21350-3-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-07 15:40:41 +02:00
Kevin Wolf	564806c529	block-backend: Reorder flush/pdiscard function definitions Move all variants of the flush/pdiscard functions to a single place and put the blk_co_*() version first because it is called by all other variants (and will become static in the next patch). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200407121259.21350-2-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-07 15:40:28 +02:00
Stefan Reiter	eca0f3524a	backup: don't acquire aio_context in backup_clean All code-paths leading to backup_clean (via job_clean) have the job's context already acquired. The job's context is guaranteed to be the same as the one used by backup_top via backup_job_create. Since the previous logic effectively acquired the lock twice, this broke cleanup of backups for disks using IO threads, since the BDRV_POLL_WHILE in bdrv_backup_top_drop -> bdrv_do_drained_begin would only release the lock once, thus deadlocking with the IO thread. This is a partial revert of `0abf258171`. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200407115651.69472-4-s.reiter@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-07 14:34:47 +02:00
Stefan Reiter	08558e3325	replication: assert we own context before job_cancel_sync job_cancel_sync requires the job's lock to be held, all other callers already do this (replication_stop, drive_backup_abort, blockdev_backup_abort, job_cancel_sync_all, cancel_common). In this case we're in a BlockDriver handler, so we already have a lock, just assert that it is the same as the one used for the commit_job. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> Message-Id: <20200407115651.69472-3-s.reiter@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-04-07 14:34:47 +02:00
Alberto Garcia	fb43d2d46e	qcow2: Check request size in qcow2_co_pwritev_compressed_part() When issuing a compressed write request the number of bytes must be a multiple of the cluster size or reach the end of the last cluster. With the current code such requests are allowed and we hit an assertion: $ qemu-img create -f qcow2 img.qcow2 1M $ qemu-io -c 'write -c 0 32k' img.qcow2 qemu-io: block/qcow2.c:4257: qcow2_co_pwritev_compressed_task: Assertion `bytes == s->cluster_size \|\| (bytes < s->cluster_size && (offset + bytes == bs->total_sectors << BDRV_SECTOR_BITS))' failed. Aborted This patch fixes a regression introduced in `0d483dce38` Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200406143401.26854-1-berto@igalia.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-04-07 13:51:09 +02:00
Alberto Garcia	80f5c01183	qcow2: Forbid discard in qcow2 v2 images with backing files A discard request deallocates the selected clusters so they read back as zeroes. This is done by clearing the cluster offset field and setting QCOW_OFLAG_ZERO in the L2 entry. This flag is however only supported when qcow_version >= 3. In older images the cluster is simply deallocated, exposing any possible stale data from the backing file. Since discard is an advisory operation it's safer to simply forbid it in this scenario. Note that we are adding this check to qcow2_co_pdiscard() and not to qcow2_cluster_discard() or discard_in_l2_slice() because the last two are also used by qcow2_snapshot_create() to discard the clusters used by the VM state. In this case there's no risk of exposing stale data to the guest and we really want that the clusters are always discarded. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200331114345.29993-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-04-07 13:51:09 +02:00
Kevin Wolf	df74b1d3df	qcow2: Remove unused fields from BDRVQcow2State These fields were already removed in commit `c3c10f72`, but then commit `b58deb34` revived them probably due to bad merge conflict resolution. They are still unused, so remove them again. Fixes: `b58deb344d` Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200326170757.12344-1-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-27 14:47:23 +01:00
Kevin Wolf	ce8cabbd17	mirror: Wait only for in-flight operations mirror_wait_for_free_in_flight_slot() just picks a random operation to wait for. However, a MirrorOp is already in s->ops_in_flight when mirror_co_read() waits for free slots, so if not enough slots are immediately available, an operation can end up waiting for itself, or two or more operations can wait for each other to complete, which results in a hang. Fix this by adding a flag to MirrorOp that tells us if the request is already in flight (and therefore occupies slots that it will later free), and picking only such operations for waiting. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1794692 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200326153628.4869-3-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-27 14:47:23 +01:00
Kevin Wolf	9178f4fe5f	Revert "mirror: Don't let an operation wait for itself" This reverts commit `7e6c4ff792`. The fix was incomplete as it only protected against requests waiting for themselves, but not against requests waiting for each other. We need a different solution. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200326153628.4869-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-27 14:47:23 +01:00
Chen Qun	34afc5c298	block/iscsi:use the flags in iscsi_open() prevent Clang warning Clang static code analyzer show warning: block/iscsi.c:1920:9: warning: Value stored to 'flags' is never read flags &= ~BDRV_O_RDWR; ^ ~~~~~~~~~~~~ In iscsi_allocmap_init() only checks BDRV_O_NOCACHE, which is the same in both of flags and bs->open_flags. We can use the flags instead bs->open_flags to prevent Clang warning. Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200311032927.35092-1-kuhn.chenqun@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-27 14:47:23 +01:00
Eric Blake	ed04991063	sheepdog: Consistently set bdrv_has_zero_init_truncate block_int.h claims that .bdrv_has_zero_init must return 0 if .bdrv_has_zero_init_truncate does likewise; but this is violated if only the former callback is provided if .bdrv_co_truncate also exists. When adding the latter callback, it was mistakenly added to only one of the three possible sheepdog instantiations. Fixes: `1dcaf527` Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200324174233.1622067-5-eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:33 +01:00
Eric Blake	e7be13ad3f	qcow2: Avoid feature name extension on small cluster size As the feature name table can be quite large (over 9k if all 64 bits of all three feature fields have names; a mere 8 features leaves only 8 bytes for a backing file name in a 512-byte cluster), it is unwise to emit this optional header in images with small cluster sizes. Update iotest 036 to skip running on small cluster sizes; meanwhile, note that iotest 061 never passed on alternative cluster sizes (however, I limited this patch to tests with output affected by adding feature names, rather than auditing for other tests that are not robust to alternative cluster sizes). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200324174233.1622067-4-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:33 +01:00
Eric Blake	bb40ebce2c	qcow2: List autoclear bit names in header The feature table is supposed to advertise the name of all feature bits that we support; however, we forgot to update the table for autoclear bits. While at it, move the table to read-only memory in code, and tweak the qcow2 spec to name the second autoclear bit. Update iotests that are affected by the longer header length. Fixes: `88ddffae` Fixes: `93c24936` Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200324174233.1622067-3-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:33 +01:00
Eric Blake	a951a631b9	qcow2: Comment typo fixes Various trivial typos noticed while working on this file. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200324174233.1622067-2-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:33 +01:00
Maxim Levitsky	5a5e7f8cd8	block: trickle down the fallback image creation function use to the block drivers Instead of checking the .bdrv_co_create_opts to see if we need the fallback, just implement the .bdrv_co_create_opts in the drivers that need it. This way we don't break various places that need to know if the underlying protocol/format really supports image creation, and this way we still allow some drivers to not support image creation. Fixes: `fd17146cd9` Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1816007 Note that technically this driver reverts the image creation fallback for the vxhs driver since I don't have a means to test it, and IMHO it is better to leave it not supported as it was prior to generic image creation patches. Also drop iscsi_create_opts which was left accidentally. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200326011218.29230-3-mlevitsk@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> [mreitz: Fixed alignment, and moved bdrv_co_create_opts_simple() and bdrv_create_opts_simple from block.h into block_int.h] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:33 +01:00
Maxim Levitsky	b92902dfea	block: pass BlockDriver reference to the .bdrv_co_create This will allow the reuse of a single generic .bdrv_co_create implementation for several drivers. No functional changes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200326011218.29230-2-mlevitsk@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:33 +01:00
Vladimir Sementsov-Ogievskiy	66c8672d24	block/mirror: fix use after free of local_err local_err is used again in mirror_exit_common() after bdrv_set_backing_hd(), so we must zero it. Otherwise try to set non-NULL local_err will crash. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200324153630.11882-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-26 14:44:32 +01:00
Vladimir Sementsov-Ogievskiy	808cf3cb6a	block/qcow2: zero data_file child after free data_file being NULL doesn't seem to be a correct state, but it's better than dead pointer and simpler to debug. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200316060631.30052-3-vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-24 11:41:46 +01:00
Eric Blake	71eaec2e8c	block: Avoid memleak on qcow2 image info failure If we fail to get bitmap info, we must not leak the encryption info. Fixes: `b8968c875f` Fixes: Coverity CID 1421894 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200320183620.1112123-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Tested-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-24 11:41:46 +01:00
Peter Maydell	e6d567db23	Pull request -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+ber27ys35W+dsvQfe+BBqr8OQ4FAl5yg0AACgkQfe+BBqr8 OQ7uWBAArnxhM0kej5NAQiGsC3sHDOGUszKhsz8qwSOYQ0NkOH9495ksklOuw8T9 v/7CAp4rqvma00nL5D+6Srjc9Pe+9BdIz3aa0xvi+tkXvRvFKcWoZlLyNpK+SCbB FwT5sAzoZ004cijG3xsV8U/1i/jEWiRk0AV8USuGymTkQh0M9XS/MHVlrv0HJU9k 2tLz7FGHBmEGhXFXKjviNsrQfvHl2AzIKIYnOcK6OhF6wevJzRz8ShsYcmhyN7V3 rBm0IDiJxJ54a6oZ8FyukkqL5gj+mZDEoGXnspvWkmixqjmNw0iAWgyWNk8vnBue ROphtzZ7xvATu560vIXnlj9q+new5h1ZEb2Q1Nl6WL0PFHJ0vEmRHnpSj+lUTHFD aEwLL72qk12yZFPwEO9V+tO15mE+9vYbbsfe8SXVIlH52ec+OHGQ3/ytulL2zEDz vVJYy37wOhp8f5TFpdmFXi8ovHV7a8g3Vr6Hh1XmcEW0o/GjEgunjW7V+JHSJlDe XBp5IXsz0+zWn2hCxtjUzX3qBDIswyR89qR/NRJKpuu1D5VSaqVwM3vd5dCgDLHA lnCCi8kcq9rsnRNyeBb0t8d7c6e1HSW8OIHL4OC515VUFdrKSEyxprzDyIMPifDg 9B9ImLXg3CgOQVOXPU/0mbh4beTIeI7uJrmgzvzs3vZc+dLG+H8= =xYlZ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging Pull request # gpg: Signature made Wed 18 Mar 2020 20:23:28 GMT # gpg: using RSA key F9B7ABDBBCACDF95BE76CBD07DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" [full] # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * remotes/jnsnow/tags/bitmaps-pull-request: block/qcow2-bitmap: use bdrv_dirty_bitmap_next_dirty nbd/server: use bdrv_dirty_bitmap_next_dirty_area nbd/server: introduce NBDExtentArray block/dirty-bitmap: improve _next_dirty_area API block/dirty-bitmap: add _next_dirty API block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_t hbitmap: drop meta bitmaps as they are unused hbitmap: unpublish hbitmap_iter_skip_words hbitmap: move hbitmap_iter_next_word to hbitmap.c hbitmap: assert that we don't create bitmap larger than INT64_MAX build: Silence clang warning on older glib autoptr usage Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-03-19 15:31:09 +00:00
Vladimir Sementsov-Ogievskiy	2d00cbd8e2	block/qcow2-bitmap: use bdrv_dirty_bitmap_next_dirty store_bitmap_data() loop does bdrv_set_dirty_iter() on each iteration, which means that we actually don't need iterator itself and we can use simpler bitmap API. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-11-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy	299ea9ff01	block/dirty-bitmap: improve _next_dirty_area API Firstly, _next_dirty_area is for scenarios when we may contiguously search for next dirty area inside some limited region, so it is more comfortable to specify "end" which should not be recalculated on each iteration. Secondly, let's add a possibility to limit resulting area size, not limiting searching area. This will be used in NBD code in further commit. (Note that now bdrv_dirty_bitmap_next_dirty_area is unused) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-8-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy	9399c54b75	block/dirty-bitmap: add _next_dirty API We have bdrv_dirty_bitmap_next_zero, let's add corresponding bdrv_dirty_bitmap_next_dirty, which is more comfortable to use than bitmap iterators in some cases. For test modify test_hbitmap_next_zero_check_range to check both next_zero and next_dirty and add some new checks. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-7-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Vladimir Sementsov-Ogievskiy	642700fda0	block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_t We are going to introduce bdrv_dirty_bitmap_next_dirty so that same variable may be used to store its return value and to be its parameter, so it would int64_t. Similarly, we are going to refactor hbitmap_next_dirty_area to use hbitmap_next_dirty together with hbitmap_next_zero, therefore we want hbitmap_next_zero parameter type to be int64_t too. So, for convenience update all parameters of _next_zero and _next_dirty_area to be int64_t. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-6-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2020-03-18 14:03:46 -04:00
Peter Maydell	cf4b64406c	Error reporting patches for 2020-03-17 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl5w+zkSHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZTaeAQALPrnwX3g9/HLm2YHc1P0TB1eTenBqen K204sRW53waxzm4g9trb8P4Nzmp8r1oGmZfPriVzB3ykoW2Kzfu+4oa95+YT+exk H4XSQfCvCp1e/ZShkx5rY9Kg1gSgWhQ00MNwz8puHUsHtcp5dMTkmYqL4hzgWnA0 TwV7w06+6kLP4fRglIc5X7BVggBKosmMPfvjg/KYUe12Z3moSSQZA5dyEp5VAVl9 MNFJpryWVek6+Z8UFiQ3CMmR/H2UVI0liDlU1aZsR9pcyjiuJxrBEwboVO5qY3N7 lraKg+CVdiK7rn21bs6wAFOk08eG8VqZMeTb7HU6KJ6FIP2KopwvRXIEmNgo2C/C xU3XRl5oyRtaAOKSnwOBzEhZZ+wTRp2RcMzFS6p7URm5R3LNfB1dlqE7yE5z4lcl EgdbMLy4LiMkKwUPrVGBwzZNDO6ywVjFWUcHze9Dyb3z1ciWhwEENaIGe0CU3lhG ii+GxTzMTGoeJ2HE2hRmGTLACNt7a/we88aDY0kDLeVz5rq80oa+xckqV/oG3XpN v/imWHMugdsUwmQshUrT0JQq+BCnuwiHc82pm0X8bTqtJ6TmoIYhxuJkh040QIxt 5ymFfAMz7ysc+50JY7OEVRI/8YQPyCaZmst/D42dicWUU9NdasWcIx+kCmK3LOjj 0/Nb4vfX3xgN =vpk3 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2020-03-17' into staging Error reporting patches for 2020-03-17 # gpg: Signature made Tue 17 Mar 2020 16:30:49 GMT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-error-2020-03-17: hw/sd/ssi-sd: fix error handling in ssi_sd_realize xen-block: Use one Error * variable instead of two hw/misc/ivshmem: Use one Error * variable instead of two Use &error_abort instead of separate assert() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-03-18 17:57:40 +00:00
Peter Maydell	d649689a8e	* Bugfixes all over the place * get/set_uint cleanups (Felipe) * Lock guard support (Stefan) * MemoryRegion ownership cleanup (Philippe) * AVX512 optimization for buffer_is_zero (Robert) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJecOZiAAoJEL/70l94x66DgGkH/jpY4IgqlSAAWCgaxfe1n1vg ahSzSLrC8wiJq2Jxbmxn+5BbH6BxQ9ibflsY5bvCY/sTb7UlOFCPkFhQ2iUgplkw ciB5UfgCA6OHpKEhpHhXtzlybtNOlxXNWYJ1SrcVXbRES8f7XdhMKs15mnJJuOOE k/tuZo/44yZRJl0Cv+nkvIFcCVgyu1q0Lln/1MMPngY2r9gt893cY9feTBSSWgnp +7HZr5TXI7mcIytczFKzbdujlG4391DGejKX66IIxGcWg9vXS7TwAStzH1vSKVfJ 73SKZBoCU5gpHHHC+dqVyouMerV+UE+WQPNtF+LCsNgJBw/2NXc1ZgDrtz1OI2c= =+LRX -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * Bugfixes all over the place * get/set_uint cleanups (Felipe) * Lock guard support (Stefan) * MemoryRegion ownership cleanup (Philippe) * AVX512 optimization for buffer_is_zero (Robert) # gpg: Signature made Tue 17 Mar 2020 15:01:54 GMT # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (62 commits) hw/arm: Let devices own the MemoryRegion they create hw/arm: Remove unnecessary memory_region_set_readonly() on ROM alias hw/ppc/ppc405: Use memory_region_init_rom() with read-only regions hw/arm/stm32: Use memory_region_init_rom() with read-only regions hw/char: Let devices own the MemoryRegion they create hw/riscv: Let devices own the MemoryRegion they create hw/dma: Let devices own the MemoryRegion they create hw/display: Let devices own the MemoryRegion they create hw/core: Let devices own the MemoryRegion they create scripts/cocci: Patch to let devices own their MemoryRegions scripts/cocci: Patch to remove unnecessary memory_region_set_readonly() scripts/cocci: Patch to detect potential use of memory_region_init_rom hw/sparc: Use memory_region_init_rom() with read-only regions hw/sh4: Use memory_region_init_rom() with read-only regions hw/riscv: Use memory_region_init_rom() with read-only regions hw/ppc: Use memory_region_init_rom() with read-only regions hw/pci-host: Use memory_region_init_rom() with read-only regions hw/net: Use memory_region_init_rom() with read-only regions hw/m68k: Use memory_region_init_rom() with read-only regions hw/display: Use memory_region_init_rom() with read-only regions ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-03-17 18:33:05 +00:00
Markus Armbruster	20ac582d0c	Use &error_abort instead of separate assert() Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200313170517.22480-2-armbru@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [Unused Error *variable deleted]	2020-03-17 16:05:40 +01:00
Philippe Mathieu-Daudé	880a7817c1	misc: Replace zero-length arrays with flexible array member (manual) Description copied from Linux kernel commit from Gustavo A. R. Silva (see [3]): --v-- description start --v-- The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member [1], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being unadvertenly introduced [2] to the Linux codebase from now on. --^-- description end --^-- Do the similar housekeeping in the QEMU codebase (which uses C99 since commit `7be41675f7`). All these instances of code were found with the help of the following command (then manual analysis, without modifying structures only having a single flexible array member, such QEDTable in block/qed.h): git grep -F '[0];' [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=76497732932f [3] https://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git/commit/?id=17642a2fbd2c1 Inspired-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 22:07:42 +01:00
Philippe Mathieu-Daudé	f7795e4096	misc: Replace zero-length arrays with flexible array member (automatic) Description copied from Linux kernel commit from Gustavo A. R. Silva (see [3]): --v-- description start --v-- The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member [1], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being unadvertenly introduced [2] to the Linux codebase from now on. --^-- description end --^-- Do the similar housekeeping in the QEMU codebase (which uses C99 since commit `7be41675f7`). All these instances of code were found with the help of the following Coccinelle script: @@ identifier s, m, a; type t, T; @@ struct s { ... t m; - T a[0]; + T a[]; }; @@ identifier s, m, a; type t, T; @@ struct s { ... t m; - T a[0]; + T a[]; } QEMU_PACKED; [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=76497732932f [3] https://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git/commit/?id=17642a2fbd2c1 Inspired-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 22:07:42 +01:00
Vladimir Sementsov-Ogievskiy	4ab78b1918	block/io: fix bdrv_co_do_copy_on_readv Prior to `1143ec5ebf` it was OK to qemu_iovec_from_buf() from aligned-up buffer to original qiov, as qemu_iovec_from_buf() will stop at qiov end anyway. But after `1143ec5ebf` we assume that bdrv_co_do_copy_on_readv works on part of original qiov, defined by qiov_offset and bytes. So we must not touch qiov behind qiov_offset+bytes bound. Fix it. Cc: qemu-stable@nongnu.org # v4.2 Fixes: `1143ec5ebf` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200312081949.5350-1-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-03-16 11:46:11 +00:00
Peter Maydell	49780a582d	Block layer patches: - Relax restrictions for blockdev-snapshot (allows libvirt to do live storage migration with blockdev-mirror) - luks: Delete created files when block_crypto_co_create_opts_luks fails - Fix memleaks in qmp_object_add -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJeaQYTAAoJEH8JsnLIjy/WQwYP/3pzAjqVecL3dGmnPWAkBqCV CFpxT2nIMe+xCBvWQBoeekHsFJ7GQf4E1WVNRZgoAh9VQkvkajZsVNn8Auo2Veq2 c7w/R4xf/Wet2hKGVRS0JXwbg69U5BbpcF7E2DRNfp+CvaDCafHSDNeGTb3hFUjT x1jwhK6VqfY7+LHU0B0QmX2KA66nDx1p8l8HJQYd1MlCKAbj8kv/swEbqBJn32hA 32CIYfC4VCqkW5va1eOjd3Kyi/ugkFCHTI8+mOa45/BFBzIiKfCsDaFHh/DI59QB qcDKkUcO3+W788vCKgJQGnG070TwKPx2OnjhxFKiEGaoX3Sz+AY4wUvf3mfFl8GM zYqTdOy4Xh0ckvA6JCS0jtAKmANkeEGqnECAgub22Z+kyOzqC05B1FkYwqYDcFXY atWKm5Vr47jgD6Oq6O0OpZaZrAUWOfoBmq4ErnrBEHuW5329NEInmjYwxednK+43 CwU/lSdX7ujRSsjS8Xi1dHS4pxHK/mg51dInL44zGFUayegiLPgA8cuESE0mHOfZ 67X14rxu6D4Y5r0L+w7rsSGjByR29VynE1McL9fZ1Wp29JHaQ5fjdG6GMXEwYxmV R0YNXe85FAlNgqj0Bme+fR2YPxZ48NqHIMOvFFStNHyfD0qQN0TtT1iarfEcBR0u jm8MnSoIDLkRXEgBW9bW =ociU -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - Relax restrictions for blockdev-snapshot (allows libvirt to do live storage migration with blockdev-mirror) - luks: Delete created files when block_crypto_co_create_opts_luks fails - Fix memleaks in qmp_object_add # gpg: Signature made Wed 11 Mar 2020 15:38:59 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: qemu-iotests: adding LUKS cleanup for non-UTF8 secret error crypto.c: cleanup created file when block_crypto_co_create_opts_luks fails block.c: adding bdrv_co_delete_file block: introducing 'bdrv_co_delete_file' interface tests/qemu-iotests: Fix socket_scm_helper build path qapi: Add '@allow-write-only-overlay' feature for 'blockdev-snapshot' iotests: Add iothread cases to 155 block: Fix cross-AioContext blockdev-snapshot iotests: Test mirror with temporarily disabled target backing file iotests: Fix run_job() with use_log=False block: Relax restrictions for blockdev-snapshot block: Make bdrv_get_cumulative_perm() public qom-qmp-cmds: fix two memleaks in qmp_object_add Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-03-12 16:51:26 +00:00
Daniel Henrique Barboza	1bba30da24	crypto.c: cleanup created file when block_crypto_co_create_opts_luks fails When using a non-UTF8 secret to create a volume using qemu-img, the following error happens: $ qemu-img create -f luks --object secret,id=vol_1_encrypt0,file=vol_resize_pool.vol_1.secret.qzVQrI -o key-secret=vol_1_encrypt0 /var/tmp/pool_target/vol_1 10240K Formatting '/var/tmp/pool_target/vol_1', fmt=luks size=10485760 key-secret=vol_1_encrypt0 qemu-img: /var/tmp/pool_target/vol_1: Data from secret vol_1_encrypt0 is not valid UTF-8 However, the created file '/var/tmp/pool_target/vol_1' is left behind in the file system after the failure. This behavior can be observed when creating the volume using Libvirt, via 'virsh vol-create', and then getting "volume target path already exist" errors when trying to re-create the volume. The volume file is created inside block_crypto_co_create_opts_luks(), in block/crypto.c. If the bdrv_create_file() call is successful but any succeeding step fails, the existing 'fail' label does not take into account the created file, leaving it behind. This patch changes block_crypto_co_create_opts_luks() to delete 'filename' in case of failure. A failure in this point means that the volume is now truncated/corrupted, so even if 'filename' was an existing volume before calling qemu-img, it is now unusable. Deleting the file it is not much worse than leaving it in the filesystem in this scenario, and we don't have to deal with checking the file pre-existence in the code. in our case, block_crypto_co_create_generic calls qcrypto_block_create, which calls qcrypto_block_luks_create, and this function fails when calling qcrypto_secret_lookup_as_utf8. Reported-by: Srikanth Aithal <bssrikanth@in.ibm.com> Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200130213907.2830642-4-danielhb413@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-11 15:54:38 +01:00
Daniel Henrique Barboza	9bffae14df	block: introducing 'bdrv_co_delete_file' interface Adding to Block Drivers the capability of being able to clean up its created files can be useful in certain situations. For the LUKS driver, for instance, a failure in one of its authentication steps can leave files in the host that weren't there before. This patch adds the 'bdrv_co_delete_file' interface to block drivers and add it to the 'file' driver in file-posix.c. The implementation is given by 'raw_co_delete_file'. Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200130213907.2830642-2-danielhb413@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-11 15:54:38 +01:00
Vladimir Sementsov-Ogievskiy	397f4e9d83	block/block-copy: hide structure definitions Hide structure definitions and add explicit API instead, to keep an eye on the scope of the shared fields. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-10-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	5332e5d210	block/block-copy: reduce intersecting request lock Currently, block_copy operation lock the whole requested region. But there is no reason to lock clusters, which are already copied, it will disturb other parallel block_copy requests for no reason. Let's instead do the following: Lock only sub-region, which we are going to operate on. Then, after copying all dirty sub-regions, we should wait for intersecting requests block-copy, if they failed, we should retry these new dirty clusters. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Message-Id: <20200311103004.7649-9-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	8719091f9d	block/block-copy: rename start to offset in interfaces offset/bytes pair is more usual naming in block layer, let's use it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-8-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	dafaf13593	block/block-copy: refactor interfaces to use bytes instead of end We have a lot of "chunk_end - start" invocations, let's switch to bytes/cur_bytes scheme instead. While being here, improve check on block_copy_do_copy parameters to not overflow when calculating nbytes and use int64_t for bytes in block_copy for consistency. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	17187cb646	block/block-copy: factor out find_conflicting_inflight_req Split find_conflicting_inflight_req to be used separately. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-6-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	2d57511a88	block/block-copy: use block_status Use bdrv_block_status_above to chose effective chunk size and to handle zeroes effectively. This substitutes checking for just being allocated or not, and drops old code path for it. Assistance by backup job is dropped too, as caching block-status information is more difficult than just caching is-allocated information in our dirty bitmap, and backup job is not good place for this caching anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-5-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	9d31bc53fa	block/block-copy: specialcase first copy_range request In block_copy_do_copy we fallback to read+write if copy_range failed. In this case copy_size is larger than defined for buffered IO, and there is corresponding commit. Still, backup copies data cluster by cluster, and most of requests are limited to one cluster anyway, so the only source of this one bad-limited request is copy-before-write operation. Further patch will move backup to use block_copy directly, than for cases where copy_range is not supported, first request will be oversized in each backup. It's not good, let's change it now. Fix is simple: just limit first copy_range request like buffer-based request. If it succeed, set larger copy_range limit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	d0ebeca14a	block/block-copy: fix progress calculation Assume we have two regions, A and B, and region B is in-flight now, region A is not yet touched, but it is unallocated and should be skipped. Correspondingly, as progress we have total = A + B current = 0 If we reset unallocated region A and call progress_reset_callback, it will calculate 0 bytes dirty in the bitmap and call job_progress_set_remaining, which will set total = current + 0 = 0 + 0 = 0 So, B bytes are actually removed from total accounting. When job finishes we'll have total = 0 current = B , which doesn't sound good. This is because we didn't considered in-flight bytes, actually when calculating remaining, we should have set (in_flight + dirty_bytes) as remaining, not only dirty_bytes. To fix it, let's refactor progress calculation, moving it to block-copy itself instead of fixing callback. And, of course, track in_flight bytes count. We still have to keep one callback, to maintain backup job bytes_read calculation, but it will go on soon, when we turn the whole backup process into one block_copy call. Cc: qemu-stable@nongnu.org Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Message-Id: <20200311103004.7649-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Vladimir Sementsov-Ogievskiy	e7266570f2	block/qcow2-threads: fix qcow2_decompress On success path we return what inflate() returns instead of 0. And it most probably works for Z_STREAM_END as it is positive, but is definitely broken for Z_BUF_ERROR. While being here, switch to errno return code, to be closer to qcow2_compress API (and usual expectations). Revert condition in if to be more positive. Drop dead initialization of ret. Cc: qemu-stable@nongnu.org # v4.0 Fixes: `341926ab83` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200302150930.16218-1-vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
Pan Nengyuan	4aebf0f0da	block/qcow2: do free crypto_opts in qcow2_close() 'crypto_opts' forgot to free in qcow2_close(), this patch fix the bellow leak stack: Direct leak of 24 byte(s) in 1 object(s) allocated from: #0 0x7f0edd81f970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) #1 0x7f0edc6d149d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) #2 0x55d7eaede63d in qobject_input_start_struct /mnt/sdb/qemu-new/qemu_test/qemu/qapi/qobject-input-visitor.c:295 #3 0x55d7eaed78b8 in visit_start_struct /mnt/sdb/qemu-new/qemu_test/qemu/qapi/qapi-visit-core.c:49 #4 0x55d7eaf5140b in visit_type_QCryptoBlockOpenOptions qapi/qapi-visit-crypto.c:290 #5 0x55d7eae43af3 in block_crypto_open_opts_init /mnt/sdb/qemu-new/qemu_test/qemu/block/crypto.c:163 #6 0x55d7eacd2924 in qcow2_update_options_prepare /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1148 #7 0x55d7eacd33f7 in qcow2_update_options /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1232 #8 0x55d7eacd9680 in qcow2_do_open /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1512 #9 0x55d7eacdc55e in qcow2_open_entry /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1792 #10 0x55d7eacdc8fe in qcow2_open /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1819 #11 0x55d7eac3742d in bdrv_open_driver /mnt/sdb/qemu-new/qemu_test/qemu/block.c:1317 #12 0x55d7eac3e990 in bdrv_open_common /mnt/sdb/qemu-new/qemu_test/qemu/block.c:1575 #13 0x55d7eac4442c in bdrv_open_inherit /mnt/sdb/qemu-new/qemu_test/qemu/block.c:3126 #14 0x55d7eac45c3f in bdrv_open /mnt/sdb/qemu-new/qemu_test/qemu/block.c:3219 #15 0x55d7ead8e8a4 in blk_new_open /mnt/sdb/qemu-new/qemu_test/qemu/block/block-backend.c:397 #16 0x55d7eacde74c in qcow2_co_create /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:3534 #17 0x55d7eacdfa6d in qcow2_co_create_opts /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:3668 #18 0x55d7eac1c678 in bdrv_create_co_entry /mnt/sdb/qemu-new/qemu_test/qemu/block.c:485 #19 0x55d7eb0024d2 in coroutine_trampoline /mnt/sdb/qemu-new/qemu_test/qemu/util/coroutine-ucontext.c:115 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200227012950.12256-2-pannengyuan@huawei.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:30 +01:00
David Edmondson	69032253c3	block/curl: HTTP header field names are case insensitive RFC 7230 section 3.2 indicates that HTTP header field names are case insensitive. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20200224101310.101169-3-david.edmondson@oracle.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:29 +01:00
David Edmondson	7788a31939	block/curl: HTTP header fields allow whitespace around values RFC 7230 section 3.2 indicates that whitespace is permitted between the field name and field value and after the field value. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20200224101310.101169-2-david.edmondson@oracle.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:29 +01:00
Stefan Hajnoczi	a9da6e49d8	luks: implement .bdrv_measure() Add qemu-img measure support in the "luks" block driver. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200221112522.1497712-3-stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:29 +01:00
Stefan Hajnoczi	6d49d3a859	luks: extract qcrypto_block_calculate_payload_offset() The qcow2 .bdrv_measure() code calculates the crypto payload offset. This logic really belongs in crypto/block.c where it can be reused by other image formats. The "luks" block driver will need this same logic in order to implement .bdrv_measure(), so extract the qcrypto_block_calculate_payload_offset() function now. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200221112522.1497712-2-stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-03-11 12:42:29 +01:00
Maxim Levitsky	89802d5ae7	monitor/hmp: Move hmp_drive_add_node to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-12-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:20:22 +00:00
Maxim Levitsky	2bcad73c4b	monitor/hmp: move hmp_info_block* to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-11-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:20:21 +00:00
Maxim Levitsky	1061f8dd80	monitor/hmp: move remaining hmp_block* functions to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-10-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:20:13 +00:00
Maxim Levitsky	e263120ecc	monitor/hmp: move hmp_nbd_server* to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-9-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:17:58 +00:00
Maxim Levitsky	fce2b91fdf	monitor/hmp: move hmp_snapshot_* to block-hmp-cmds.c hmp_snapshot_blkdev is from GPLv2 version of the hmp-cmds.c thus have to change the licence to GPLv2 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-8-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:07:50 +00:00
Maxim Levitsky	6b7fbf61fb	monitor/hmp: move hmp_block_job* to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-7-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:07:48 +00:00
Maxim Levitsky	0932e3f23d	monitor/hmp: move hmp_drive_mirror and hmp_drive_backup to block-hmp-cmds.c Moved code was added after 2012-01-13, thus under GPLv2+ Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-6-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Fixed commit message	2020-03-09 18:07:35 +00:00
Maxim Levitsky	a1edae276a	monitor/hmp: move hmp_drive_del and hmp_commit to block-hmp-cmds.c Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-5-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:05:33 +00:00
Maxim Levitsky	a2dde2f221	monitor/hmp: rename device-hotplug.c to block/monitor/block-hmp-cmds.c These days device-hotplug.c only contains the hmp_drive_add In the next patch, rest of hmp_drive* functions will be moved there. Also add block-hmp-cmds.h to contain prototypes of these functions License for block-hmp-cmds.h since it contains the code moved from sysemu.h which lacks license and thus according to LICENSE is under GPLv2+ Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200308092440.23564-4-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-09 18:05:31 +00:00
Chen Qun	76e91cda07	block/file-posix: Remove redundant statement in raw_handle_perm_lock() Clang static code analyzer show warning: block/file-posix.c:891:9: warning: Value stored to 'op' is never read op = RAW_PL_ABORT; ^ ~~~~~~~~~~~~ Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200302130715.29440-5-kuhn.chenqun@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-03-09 15:59:31 +01:00
Chen Qun	35c9453592	block/stream: Remove redundant statement in stream_run() Clang static code analyzer show warning: block/stream.c:186:9: warning: Value stored to 'ret' is never read ret = 0; ^ ~ Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200302130715.29440-3-kuhn.chenqun@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-03-09 15:59:31 +01:00
Florian Florensa	19ae9ae014	block/rbd: Add support for ceph namespaces Starting from ceph Nautilus, RBD has support for namespaces, allowing for finer grain ACLs on images inside a pool, and tenant isolation. In the rbd cli tool documentation, the new image-spec and snap-spec are : - [pool-name/[namespace-name/]]image-name - [pool-name/[namespace-name/]]image-name@snap-name When using an non namespace's enabled qemu, it complains about not finding the image called namespace-name/image-name, thus we only need to parse the image once again to find if there is a '/' in its name, and if there is, use what is before it as the name of the namespace to later pass it to rados_ioctx_set_namespace. rados_ioctx_set_namespace if called with en empty string or a null pointer as the namespace parameters pretty much does nothing, as it then defaults to the default namespace. The namespace is extracted inside qemu_rbd_parse_filename, stored in the qdict, and used in qemu_rbd_connect to make it work with both qemu-img, and qemu itself. Signed-off-by: Florian Florensa <fflorensa@online.net> Message-Id: <20200110111513.321728-2-fflorensa@online.net> Reviewed-by: Jason Dillaman <dillaman@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-06 17:21:28 +01:00
Kevin Wolf	14837c6493	qemu-storage-daemon: Add --blockdev option This adds a --blockdev option to the storage daemon that works the same as the -blockdev option of the system emulator. In order to be able to link with blockdev.o, we also need to change stream.o from common-obj to block-obj, which is where all other block jobs already are. In contrast to the system emulator, qemu-storage-daemon options will be processed in the order they are given. The user needs to take care to refer to other objects only after defining them. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200224143008.13362-7-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-06 17:15:38 +01:00
Kevin Wolf	12c929bca2	block: Move system emulator QMP commands to block/qapi-sysemu.c These commands make only sense for system emulators and their implementations call functions that don't exist in tools (e.g. to resolve qdev IDs). Move them out so that blockdev.c can be linked to qemu-storage-daemon. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200224143008.13362-4-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-06 17:15:38 +01:00
Peter Krempa	65eb7c85a3	block/qcow2: Move bitmap reopen into bdrv_reopen_commit_post The bitmap code requires writing the 'file' child when the qcow2 driver is reopened in read-write mode. If the 'file' child is being reopened due to a permissions change, the modification is commited yet when qcow2_reopen_commit is called. This means that any attempt to write the 'file' child will end with EBADFD as the original fd was already closed. Moving bitmap reopening to the new callback which is called after permission modifications are commited fixes this as the file descriptor will be replaced with the correct one. The above problem manifests itself when reopening 'qcow2' format layer which uses a 'file-posix' file child which was opened with the 'auto-read-only' property set. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <db118dbafe1955afbc0a18d3dd220931074ce349.1582893284.git.pkrempa@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-06 17:15:37 +01:00
Max Reitz	3ede935fdb	qcow2: Fix alloc_cluster_abort() for pre-existing clusters handle_alloc() reuses preallocated zero clusters. If anything goes wrong during the data write, we do not change their L2 entry, so we must not let qcow2_alloc_cluster_abort() free them. Fixes: `8b24cd1415` Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200225143130.111267-2-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-03-06 17:15:37 +01:00
Lukas Straub	08ddb4eb6d	block/replication.c: Ignore requests after failover After failover the Secondary side of replication shouldn't change state, because it now functions as our primary disk. In replication_start, replication_do_checkpoint, replication_stop, ignore the request if current state is BLOCK_REPLICATION_DONE (sucessful failover) or BLOCK_REPLICATION_FAILOVER (failover in progres i.e. currently merging active and hidden images into the base image). Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Acked-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2020-03-03 18:04:47 +08:00
Pan Nengyuan	8198cf5ef0	block/nbd: fix memory leak in nbd_open() In currently implementation there will be a memory leak when nbd_client_connect() returns error status. Here is an easy way to reproduce: 1. run qemu-iotests as follow and check the result with asan: ./check -raw 143 Following is the asan output backtrack: Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f629688a560 in calloc (/usr/lib64/libasan.so.3+0xc7560) #1 0x7f6295e7e015 in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x50015) #2 0x56281dab4642 in qobject_input_start_struct /mnt/sdb/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:295 #3 0x56281dab1a04 in visit_start_struct /mnt/sdb/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:49 #4 0x56281dad1827 in visit_type_SocketAddress qapi/qapi-visit-sockets.c:386 #5 0x56281da8062f in nbd_config /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1716 #6 0x56281da8062f in nbd_process_options /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1829 #7 0x56281da8062f in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873 Direct leak of 15 byte(s) in 1 object(s) allocated from: #0 0x7f629688a3a0 in malloc (/usr/lib64/libasan.so.3+0xc73a0) #1 0x7f6295e7dfbd in g_malloc (/usr/lib64/libglib-2.0.so.0+0x4ffbd) #2 0x7f6295e96ace in g_strdup (/usr/lib64/libglib-2.0.so.0+0x68ace) #3 0x56281da804ac in nbd_process_options /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1834 #4 0x56281da804ac in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873 Indirect leak of 24 byte(s) in 1 object(s) allocated from: #0 0x7f629688a3a0 in malloc (/usr/lib64/libasan.so.3+0xc73a0) #1 0x7f6295e7dfbd in g_malloc (/usr/lib64/libglib-2.0.so.0+0x4ffbd) #2 0x7f6295e96ace in g_strdup (/usr/lib64/libglib-2.0.so.0+0x68ace) #3 0x56281dab41a3 in qobject_input_type_str_keyval /mnt/sdb/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:536 #4 0x56281dab2ee9 in visit_type_str /mnt/sdb/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:297 #5 0x56281dad0fa1 in visit_type_UnixSocketAddress_members qapi/qapi-visit-sockets.c:141 #6 0x56281dad17b6 in visit_type_SocketAddress_members qapi/qapi-visit-sockets.c:366 #7 0x56281dad186a in visit_type_SocketAddress qapi/qapi-visit-sockets.c:393 #8 0x56281da8062f in nbd_config /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1716 #9 0x56281da8062f in nbd_process_options /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1829 #10 0x56281da8062f in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873 Fixes: `8f071c9db5` Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Cc: qemu-stable <qemu-stable@nongnu.org> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1575517528-44312-3-git-send-email-pannengyuan@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2020-02-26 17:29:00 -06:00
Pan Nengyuan	7f493662be	block/nbd: extract the common cleanup code The BDRVNBDState cleanup code is common in two places, add nbd_clear_bdrvstate() function to do these cleanups. Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1575517528-44312-2-git-send-email-pannengyuan@huawei.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix compilation error and commit message] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-02-26 17:28:08 -06:00
Eric Blake	2485f22fe9	nbd-client: Support leading / in NBD URI The NBD URI specification [1] states that only one leading slash at the beginning of the URI path component is stripped, not all such slashes. This becomes important to a patch I just proposed to nbdkit [2], which would allow the exportname to select a file embedded within an ext2 image: ext2fs demands an absolute pathname beginning with '/', and because qemu was inadvertantly stripping it, my nbdkit patch had to work around the behavior. [1] https://github.com/NetworkBlockDevice/nbd/blob/master/doc/uri.md [2] https://www.redhat.com/archives/libguestfs/2020-February/msg00109.html Note that the qemu bug only affects handling of URIs such as nbd://host:port//abs/path (where '/abs/path' should be the export name); it is still possible to use --image-opts and pass the desired export name with a leading slash directly through JSON even without this patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200212023101.1162686-1-eblake@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>	2020-02-26 14:45:02 -06:00
Max Reitz	804359b8b9	block: Fix VM size field width in snapshot dump When printing the snapshot list (e.g. with qemu-img snapshot -l), the VM size field is only seven characters wide. As of `de38b5005e`, this is not necessarily sufficient: We generally print three digits, and this may require a decimal point. Also, the unit field grew from something as plain as "M" to " MiB". This means that number and unit may take up eight characters in total; but we also want spaces in front. Considering previously the maximum width was four characters and the field width was chosen to be three characters wider, let us adjust the field width to be eleven now. Fixes: `de38b5005e` Buglink: https://bugs.launchpad.net/qemu/+bug/1859989 Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200117105859.241818-2-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-20 16:43:42 +01:00
Max Reitz	80f0900905	iscsi: Drop iscsi_co_create_opts() The generic fallback implementation effectively does the same. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200122164532.178040-5-mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-20 16:43:42 +01:00
Max Reitz	87ca3b8fa6	file-posix: Drop hdev_co_create_opts() The generic fallback implementation effectively does the same. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200122164532.178040-4-mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-20 16:43:42 +01:00
Max Reitz	78c81a3f10	block/nbd: Fix hang in .bdrv_close() When nbd_close() is called from a coroutine, the connection_co never gets to run, and thus nbd_teardown_connection() hangs. This is because aio_co_enter() only puts the connection_co into the main coroutine's wake-up queue, so this main coroutine needs to yield and wait for connection_co to terminate. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200122164532.178040-2-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-20 16:43:42 +01:00
Vladimir Sementsov-Ogievskiy	4bc267a7c7	block/backup-top: fix flags handling backup-top "supports" write-unchanged, by skipping CBW operation in backup_top_co_pwritev. But it forgets to do the same in backup_top_co_pwrite_zeroes, as well as declare support for BDRV_REQ_WRITE_UNCHANGED. Fix this, and, while being here, declare also support for flags supported by source child. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200207161231.32707-1-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-20 16:43:42 +01:00
Daniel P. Berrangé	087ab8e775	block: always fill entire LUKS header space with zeros When initializing the LUKS header the size with default encryption parameters will currently be 2068480 bytes. This is rounded up to a multiple of the cluster size, 2081792, with 64k sectors. If the end of the header is not the same as the end of the cluster we fill the extra space with zeros. This was forgetting that not even the space allocated for the header will be fully initialized, as we only write key material for the first key slot. The space left for the other 7 slots is never written to. An optimization to the ref count checking code: commit `a5fff8d4b4` (refs/bisect/bad) Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Date: Wed Feb 27 16:14:30 2019 +0300 qcow2-refcount: avoid eating RAM made the assumption that every cluster which was allocated would have at least some data written to it. This was violated by way the LUKS header is only partially written, with much space simply reserved for future use. Depending on the cluster size this problem was masked by the logic which wrote zeros between the end of the LUKS header and the end of the cluster. $ qemu-img create --object secret,id=cluster_encrypt0,data=123456 \ -f qcow2 -o cluster_size=2k,encrypt.iter-time=1,\ encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 \ cluster_size_check.qcow2 100M Formatting 'cluster_size_check.qcow2', fmt=qcow2 size=104857600 encrypt.format=luks encrypt.key-secret=cluster_encrypt0 encrypt.iter-time=1 cluster_size=2048 lazy_refcounts=off refcount_bits=16 $ qemu-img check --object secret,id=cluster_encrypt0,data=redhat \ 'json:{"driver": "qcow2", "encrypt.format": "luks", \ "encrypt.key-secret": "cluster_encrypt0", \ "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}' ERROR: counting reference for region exceeding the end of the file by one cluster or more: offset 0x2000 size 0x1f9000 Leaked cluster 4 refcount=1 reference=0 ...snip... Leaked cluster 130 refcount=1 reference=0 1 errors were found on the image. Data may be corrupted, or further writes to the image may corrupt it. 127 leaked clusters were found on the image. This means waste of disk space, but no harm to data. Image end offset: 268288 The problem only exists when the disk image is entirely empty. Writing data to the disk image payload will solve the problem by causing the end of the file to be extended further. The change fixes it by ensuring that the entire allocated LUKS header region is fully initialized with zeros. The qemu-img check will still fail for any pre-existing disk images created prior to this change, unless at least 1 byte of the payload is written to. Fully writing zeros to the entire LUKS header is a good idea regardless as it ensures that space has been allocated on the host filesystem (or whatever block storage backend is used). Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20200207135520.2669430-1-berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-20 16:43:42 +01:00
Peter Krempa	facda5443f	qapi: Allow getting flat output from 'query-named-block-nodes' When a management application manages node names there's no reason to recurse into backing images in the output of query-named-block-nodes. Add a parameter to the command which will return just the top level structs. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <4470f8c779abc404dcf65e375db195cd91a80651.1579509782.git.pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [mreitz: Fixed coding style] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-20 16:43:42 +01:00
Max Reitz	3c7f75b321	quorum: Stop marking it as a filter Quorum is not a filter, for example because it cannot guarantee which of its children will serve the next request. Thus, any of its children may differ from the data visible to quorum's parents. We have other filters with multiple children, but they differ in this aspect: - blkverify quits the whole qemu process if its children differ. As such, we can always skip it when we want to skip it (as a filter node) by going to any of its children. Both have the same data. - replication generally serves requests from bs->file, so this is its only actually filtered child. - Block job filters currently only have one child, but they will probably get more children in the future. Still, they will always have only one actually filtered child. Having "filters" as a dedicated node category only makes sense if you can skip them by going to a one fixed child that always shows the same data as the filter node. Quorum cannot fulfill this, so it is not a filter. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-13-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:40 +01:00
Max Reitz	6e9cc05181	mirror: Double-check immediately before replacing There is no guarantee that we can still replace the node we want to replace at the end of the mirror job. Double-check by calling bdrv_recurse_can_replace(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-12-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:40 +01:00
Max Reitz	6b4907cf42	block: Remove bdrv_recurse_is_first_non_filter() It no longer has any users. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-11-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:40 +01:00
Max Reitz	a3ed794b36	quorum: Implement .bdrv_recurse_can_replace() Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200218103454.296704-9-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:40 +01:00
Max Reitz	998a6b2fc5	blkverify: Implement .bdrv_recurse_can_replace() Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-8-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:39 +01:00
Max Reitz	37a3791b38	quorum: Fix child permissions Quorum cannot share WRITE or RESIZE on its children. Presumably, it only does so because as a filter, it seemed intuitively correct to point its .bdrv_child_perm to bdrv_filter_default_perm(). However, it is not really a filter, and bdrv_filter_default_perm() does not work for it, so we have to provide a custom .bdrv_child_perm implementation. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 11:55:39 +01:00
Philippe Mathieu-Daudé	74e4a8a961	block/io_uring: Remove superfluous semicolon Fixes: `6663a0a337` Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200218094402.26625-5-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:54:02 +01:00
Kevin Wolf	9ad1e79f3f	commit: Fix is_read for block_job_error_action() block_job_error_action() needs to know if reading from the top node or writing to the base node failed so that it can set the right 'operation' in the BLOCK_JOB_ERROR QMP event. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200214200812.28180-6-kwolf@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Kevin Wolf	0c42e175fc	commit: Inline commit_populate() commit_populate() is a very short function and only called in a single place. Its return value doesn't tell us whether an error happened while reading or writing, which would be necessary for sending the right data in the BLOCK_JOB_ERROR QMP event. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200214200812.28180-5-kwolf@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Kevin Wolf	c5507b4d55	commit: Fix argument order for block_job_error_action() The block_job_error_action() error call in the commit job gives the on_err and is_read arguments in the wrong order. Fix this. (Of course, hard-coded is_read = false is wrong, too, but that's a separate problem for a separate patch.) Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200214200812.28180-4-kwolf@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Kevin Wolf	d71e65ec1d	commit: Remove unused bytes_written The bytes_written variable is only ever written to, it serves no purpose. This has actually been the case since the commit job was first introduced in commit `747ff60263`. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200214200812.28180-3-kwolf@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Philippe Mathieu-Daudé	5b1405db0f	block/qcow2-bitmap: Remove unneeded variable assignment Fix warning reported by Clang static code analyzer: CC block/qcow2-bitmap.o block/qcow2-bitmap.c:650:5: warning: Value stored to 'ret' is never read ret = -EINVAL; ^ ~~~~~~~ Fixes: `88ddffae8` Reported-by: Clang Static Analyzer Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200215161557.4077-2-philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Kevin Wolf	c3b6658c1a	qcow2: Fix qcow2_alloc_cluster_abort() for external data file For external data file, cluster allocations return an offset in the data file and are not refcounted. In this case, there is nothing to do for qcow2_alloc_cluster_abort(). Freeing the same offset in the qcow2 file is wrong and causes crashes in the better case or image corruption in the worse case. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200211094900.17315-3-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Kevin Wolf	dea9052ef1	qcow2: update_refcount(): Reset old_table_index after qcow2_cache_put() In the case that update_refcount() frees a refcount block, it evicts it from the metadata cache. Before doing so, however, it returns the currently used refcount block to the cache because it might be the same. Returning the refcount block early means that we need to reset old_table_index so that we reload the refcount block in the next iteration if it is actually still in use. Fixes: `f71c08ea8e` Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200211094900.17315-2-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Hikaru Nishida	8475ea4854	block/vvfat: Do not unref qcow on closing backing bdrv Before this commit, BDRVVVFATState.qcow is unrefed in write_target_close on closing backing bdrv of vvfat. However, qcow bdrv is opend as a child of vvfat in enable_write_target() so it will be also unrefed on closing vvfat itself. This causes use-after-free of qcow on freeing vvfat which has backing bdrv and qcow bdrv as children in this order because bdrv_close(vvfat) tries to free qcow bdrv after freeing backing bdrv as QLIST_FOREACH_SAFE() loop keeps next pointer, but BdrvChild of qcow is already freed in bdrv_close(backing bdrv). Signed-off-by: Hikaru Nishida <hikarupsp@gmail.com> Message-Id: <20200209175156.85748-1-hikarupsp@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Alberto Garcia	2d4b5256cf	qcow2: Fix alignment checks in encrypted images I/O requests to encrypted media should be aligned to the sector size used by the underlying encryption method, not to BDRV_SECTOR_SIZE. Fortunately this doesn't break anything at the moment because both existing QCRYPTO_BLOCK_*_SECTOR_SIZE have the same value as BDRV_SECTOR_SIZE. The checks in qcow2_co_preadv_encrypted() are also unnecessary because they are repeated immediately afterwards in qcow2_co_encdec(). Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200213171646.15876-1-berto@igalia.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-02-18 10:53:56 +01:00
Kevin Wolf	7e6c4ff792	mirror: Don't let an operation wait for itself mirror_wait_for_free_in_flight_slot() just picks a random operation to wait for. However, when mirror_co_read() waits for free slots, its MirrorOp is already in s->ops_in_flight, so if not enough slots are immediately available, an operation can end up waiting for itself to complete, which results in a hang. Fix this by passing the current MirrorOp and skipping this operation when picking an operation to wait for. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1794692 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2020-02-18 10:53:56 +01:00
Kevin Wolf	eed325b92c	mirror: Store MirrorOp.co for debuggability If a coroutine is launched, but the coroutine pointer isn't stored anywhere, debugging any problems inside the coroutine is quite hard. Let's store the coroutine pointer of a mirror operation in MirrorOp to have it available in the debugger. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2020-02-18 10:53:56 +01:00
Vladimir Sementsov-Ogievskiy	ac9d00bf7b	block: fix crash on zero-length unaligned write and read Commit `7a3f542fbd` "block/io: refactor padding" occasionally dropped aligning for zero-length request: bdrv_init_padding() blindly return false if bytes == 0, like there is nothing to align. This leads the following command to crash: ./qemu-io --image-opts -c 'write 1 0' \ driver=blkdebug,align=512,image.driver=null-co,image.size=512 >> qemu-io: block/io.c:1955: bdrv_aligned_pwritev: Assertion `(offset & (align - 1)) == 0' failed. >> Aborted (core dumped) Prior to `7a3f542fbd` we does aligning of such zero requests. Instead of recovering this behavior let's just do nothing on such requests as it is useless. Note that driver may have special meaning of zero-length reqeusts, like qcow2_co_pwritev_compressed_part, so we can't skip any zero-length operation. But for unaligned ones, we can't pass it to driver anyway. This commit also fixes crash in iotest 80 running with -nocache: ./check -nocache -qcow2 80 which crashes on same assertion due to trying to read empty extra data in qcow2_do_read_snapshots(). Cc: qemu-stable@nongnu.org # v4.2 Fixes: `7a3f542fbd` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20200206164245.17781-1-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-02-07 16:46:59 +00:00
Vladimir Sementsov-Ogievskiy	0df62f45c1	block/backup-top: fix failure path We can't access top after call bdrv_backup_top_drop, as it is already freed at this time. Also, no needs to unref target child by hand, it will be unrefed on bdrv_close() automatically. So, just do bdrv_backup_top_drop if append succeed and one bdrv_unref otherwise. Note, that in !appended case bdrv_unref(top) moved into drained section on source. It doesn't really matter, but just for code simplicity. Fixes: `7df7868b96` Cc: qemu-stable@nongnu.org # v4.2.0 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20200121142802.21467-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-06 13:47:45 +01:00
Alberto Garcia	3afea40243	qcow2: Use BDRV_SECTOR_SIZE instead of the hardcoded value This replaces all remaining instances in the qcow2 code. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: b5f74b606c2d9873b12d29acdb7fd498029c4025.1579374329.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-06 13:47:45 +01:00
Alberto Garcia	25ae71db55	qcow2: Don't require aligned offsets in qcow2_co_copy_range_from() qemu-img's convert_co_copy_range() operates at the sector level and block_copy() operates at the cluster level so this condition is always true, but it is not necessary to restrict this here, so let's leave it to the driver implementation return an error if there is any. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: a4264aaee656910c84161a2965f7a501437379ca.1579374329.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-06 13:47:45 +01:00
Alberto Garcia	da86f8cbad	qcow2: Use bs->bl.request_alignment when updating an L1 entry When updating an L1 entry the qcow2 driver writes a (512-byte) sector worth of data to avoid a read-modify-write cycle. Instead of always writing 512 bytes we should follow the alignment requirements of the storage backend. (the only exception is when the alignment is larger than the cluster size because then we could be overwriting data after the L1 table) Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 71f34d4ae4b367b32fb36134acbf4f4f7ee681f4.1579374329.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-06 13:47:45 +01:00
Alberto Garcia	344ffea951	qcow2: Tighten cluster_offset alignment assertions qcow2_alloc_cluster_offset() and qcow2_get_cluster_offset() always return offsets that are cluster-aligned so don't just check that they are sector-aligned. The check in qcow2_co_preadv_task() is also replaced by an assertion for the same reason. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 558ba339965f858bede4c73ce3f50f0c0493597d.1579374329.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-06 13:47:45 +01:00
Alberto Garcia	ef97d608c7	qcow2: Don't round the L1 table allocation up to the sector size The L1 table is read from disk using the byte-based bdrv_pread() and is never accessed beyond its last element, so there's no need to allocate more memory than that. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: b2e27214ec7b03a585931bcf383ee1ac3a641a10.1579374329.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-06 13:47:45 +01:00
Alberto Garcia	7cdca2e233	qcow2: Use a GString in report_unsupported_feature() This is a bit more efficient than having to allocate and free memory for each item. The default size (60) is enough for all the existing incompatible features or the "Unknown incompatible feature" message. Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20200115135626.19442-1-berto@igalia.com Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-06 13:47:45 +01:00
Alberto Garcia	3a75a870ef	qcow2: Assert that host cluster offsets fit in L2 table entries The standard cluster descriptor in L2 table entries has a field to store the host cluster offset. When we need to get that offset from an entry we use L2E_OFFSET_MASK to ensure that we only use the bits that belong to that field. But while that mask is used every time we read from an L2 entry, it is never used when we write to it. Due to the QCOW_MAX_CLUSTER_OFFSET limit set in the cluster allocation code QEMU can never produce offsets that don't fit in that field so any such offset would indicate a bug in QEMU. Compressed cluster descriptors contain two fields (host cluster offset and size of the compressed data) and the situation with them is similar. In this case the masks are not constant but are stored in the csize_mask and cluster_offset_mask fields of BDRVQcow2State. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200113161146.20099-1-berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-02-06 13:47:45 +01:00
Aarushi Mehta	daffeb027b	block/io_uring: adds userspace completion polling Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-11-stefanha@redhat.com Message-Id: <20200120141858.587874-11-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:42 +00:00
Aarushi Mehta	d803f59050	block: add trace events for io_uring Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-10-stefanha@redhat.com Message-Id: <20200120141858.587874-10-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:42 +00:00
Aarushi Mehta	c644751069	block/file-posix.c: extend to use io_uring Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Reviewed-by: Maxim Levitsky <maximlevitsky@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-9-stefanha@redhat.com Message-Id: <20200120141858.587874-9-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:42 +00:00
Aarushi Mehta	6663a0a337	block/io_uring: implements interfaces for io_uring Aborts when sqe fails to be set as sqes cannot be returned to the ring. Adds slow path for short reads for older kernels Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-5-stefanha@redhat.com Message-Id: <20200120141858.587874-5-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:41 +00:00
Paolo Bonzini	3ba0e1a00c	block/io: take bs->reqs_lock in bdrv_mark_request_serialising bdrv_mark_request_serialising is writing the overlap_offset and overlap_bytes fields of BdrvTrackedRequest. Take bs->reqs_lock for the whole duration of it, and not just when waiting for serialising requests, so that tracked_request_overlaps does not look at a half-updated request. The new code does not unlock/relock around retries. This is unnecessary because a retry is always preceded by a CoQueue wait, which already releases and reacquires bs->reqs_lock. Reported-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1578495356-46219-4-git-send-email-pbonzini@redhat.com Message-Id: <1578495356-46219-4-git-send-email-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:41 +00:00
Paolo Bonzini	18fbd0dec7	block/io: wait for serialising requests when a request becomes serialising Marking without waiting would not result in actual serialising behavior. Thus, make a call bdrv_mark_request_serialising sufficient for serialisation to happen. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1578495356-46219-3-git-send-email-pbonzini@redhat.com Message-Id: <1578495356-46219-3-git-send-email-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:41 +00:00
Paolo Bonzini	c53cb42769	block: eliminate BDRV_REQ_NO_SERIALISING It is unused since commit `00e30f0` ("block/backup: use backup-top instead of write notifiers", 2019-10-01), drop it to simplify the code. While at it, drop redundant assertions on flags. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1578495356-46219-2-git-send-email-pbonzini@redhat.com Message-Id: <1578495356-46219-2-git-send-email-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-30 20:59:41 +00:00
Kevin Wolf	5fbf1d56c2	iscsi: Don't access non-existent scsi_lba_status_descriptor In iscsi_co_block_status(), we may have received num_descriptors == 0 from the iscsi server. Therefore, we can't unconditionally access lbas->descriptors[0]. Add the missing check. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Felipe Franciosi <felipe@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de>	2020-01-27 17:19:53 +01:00
Felipe Franciosi	693fd2acdf	iscsi: Cap block count from GET LBA STATUS (CVE-2020-1711) When querying an iSCSI server for the provisioning status of blocks (via GET LBA STATUS), Qemu only validates that the response descriptor zero's LBA matches the one requested. Given the SCSI spec allows servers to respond with the status of blocks beyond the end of the LUN, Qemu may have its heap corrupted by clearing/setting too many bits at the end of its allocmap for the LUN. A malicious guest in control of the iSCSI server could carefully program Qemu's heap (by selectively setting the bitmap) and then smash it. This limits the number of bits that iscsi_co_block_status() will try to update in the allocmap so it can't overflow the bitmap. Fixes: CVE-2020-1711 Cc: qemu-stable@nongnu.org Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com> Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-01-27 17:19:53 +01:00
Eiichi Tsukata	fb574de81b	block/backup: fix memory leak in bdrv_backup_top_append() bdrv_open_driver() allocates bs->opaque according to drv->instance_size. There is no need to allocate it and overwrite opaque in bdrv_backup_top_append(). Reproducer: $ QTEST_QEMU_BINARY=./x86_64-softmmu/qemu-system-x86_64 valgrind -q --leak-check=full tests/test-replication -p /replication/secondary/start ==29792== 24 bytes in 1 blocks are definitely lost in loss record 52 of 226 ==29792== at 0x483AB1A: calloc (vg_replace_malloc.c:762) ==29792== by 0x4B07CE0: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6000.7) ==29792== by 0x12BAB9: bdrv_open_driver (block.c:1289) ==29792== by 0x12BEA9: bdrv_new_open_driver (block.c:1359) ==29792== by 0x1D15CB: bdrv_backup_top_append (backup-top.c:190) ==29792== by 0x1CC11A: backup_job_create (backup.c:439) ==29792== by 0x1CD542: replication_start (replication.c:544) ==29792== by 0x1401B9: replication_start_all (replication.c:52) ==29792== by 0x128B50: test_secondary_start (test-replication.c:427) ... Fixes: `7df7868b96` ("block: introduce backup-top filter driver") Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-01-27 17:19:53 +01:00
Sergio Lopez	0abf258171	block/backup-top: Don't acquire context while dropping top All paths that lead to bdrv_backup_top_drop(), except for the call from backup_clean(), imply that the BDS AioContext has already been acquired, so doing it there too can potentially lead to QEMU hanging on AIO_WAIT_WHILE(). An easy way to trigger this situation is by issuing a two actions transaction, with a proper and a bogus blockdev-backup, so the second one will trigger a rollback. This will trigger a hang with an stack trace like this one: #0 0x00007fb680c75016 in __GI_ppoll (fds=0x55e74580f7c0, nfds=1, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0x000055e743386e09 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 0x000055e743386e09 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:336 #3 0x000055e743388dc4 in aio_poll (ctx=0x55e7458925d0, blocking=blocking@entry=true) at util/aio-posix.c:669 #4 0x000055e743305dea in bdrv_flush (bs=bs@entry=0x55e74593c0d0) at block/io.c:2878 #5 0x000055e7432be58e in bdrv_close (bs=0x55e74593c0d0) at block.c:4017 #6 0x000055e7432be58e in bdrv_delete (bs=<optimized out>) at block.c:4262 #7 0x000055e7432be58e in bdrv_unref (bs=bs@entry=0x55e74593c0d0) at block.c:5644 #8 0x000055e743316b9b in bdrv_backup_top_drop (bs=bs@entry=0x55e74593c0d0) at block/backup-top.c:273 #9 0x000055e74331461f in backup_job_create (job_id=0x0, bs=bs@entry=0x55e7458d5820, target=target@entry=0x55e74589f640, speed=0, sync_mode=MIRROR_SYNC_MODE_FULL, sync_bitmap=sync_bitmap@entry=0x0, bitmap_mode=BITMAP_SYNC_MODE_ON_SUCCESS, compress=false, filter_node_name=0x0, on_source_error=BLOCKDEV_ON_ERROR_REPORT, on_target_error=BLOCKDEV_ON_ERROR_REPORT, creation_flags=0, cb=0x0, opaque=0x0, txn=0x0, errp=0x7ffddfd1efb0) at block/backup.c:478 #10 0x000055e74315bc52 in do_backup_common (backup=backup@entry=0x55e746c066d0, bs=bs@entry=0x55e7458d5820, target_bs=target_bs@entry=0x55e74589f640, aio_context=aio_context@entry=0x55e7458a91e0, txn=txn@entry=0x0, errp=errp@entry=0x7ffddfd1efb0) at blockdev.c:3580 #11 0x000055e74315c37c in do_blockdev_backup (backup=backup@entry=0x55e746c066d0, txn=0x0, errp=errp@entry=0x7ffddfd1efb0) at /usr/src/debug/qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64/./qapi/qapi-types-block-core.h:1492 #12 0x000055e74315c449 in blockdev_backup_prepare (common=0x55e746a8de90, errp=0x7ffddfd1f018) at blockdev.c:1885 #13 0x000055e743160152 in qmp_transaction (dev_list=<optimized out>, has_props=<optimized out>, props=0x55e7467fe2c0, errp=errp@entry=0x7ffddfd1f088) at blockdev.c:2340 #14 0x000055e743287ff5 in qmp_marshal_transaction (args=<optimized out>, ret=<optimized out>, errp=0x7ffddfd1f0f8) at qapi/qapi-commands-transaction.c:44 #15 0x000055e74333de6c in do_qmp_dispatch (errp=0x7ffddfd1f0f0, allow_oob=<optimized out>, request=<optimized out>, cmds=0x55e743c28d60 <qmp_commands>) at qapi/qmp-dispatch.c:132 #16 0x000055e74333de6c in qmp_dispatch (cmds=0x55e743c28d60 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175 #17 0x000055e74325c061 in monitor_qmp_dispatch (mon=0x55e745908030, req=<optimized out>) at monitor/qmp.c:145 #18 0x000055e74325c6fa in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:234 #19 0x000055e743385866 in aio_bh_call (bh=0x55e745807ae0) at util/async.c:117 #20 0x000055e743385866 in aio_bh_poll (ctx=ctx@entry=0x55e7458067a0) at util/async.c:117 #21 0x000055e743388c54 in aio_dispatch (ctx=0x55e7458067a0) at util/aio-posix.c:459 #22 0x000055e743385742 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260 #23 0x00007fb68543e67d in g_main_dispatch (context=0x55e745893a40) at gmain.c:3176 #24 0x00007fb68543e67d in g_main_context_dispatch (context=context@entry=0x55e745893a40) at gmain.c:3829 #25 0x000055e743387d08 in glib_pollfds_poll () at util/main-loop.c:219 #26 0x000055e743387d08 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 #27 0x000055e743387d08 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 #28 0x000055e74316a3c1 in main_loop () at vl.c:1828 #29 0x000055e743016a72 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504 Fix this by not acquiring the AioContext there, and ensuring all paths leading to it have it already acquired (backup_clean()). RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1782111 Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2020-01-27 17:19:53 +01:00
Wangyong	2558cb8dd4	linux-aio: increasing MAX_EVENTS to a larger hardcoded value Since commit `6040aedddb` "virtio-blk: make queue size configurable",if the user set the queue size to more than 128 ,it will not take effect. That's because linux aio's maximum outstanding requests at a time is always less than or equal to 128. This patch simply increase MAX_EVENTS to a larger hardcoded value of 1024 as a shortterm fix. Signed-off-by: wangyong <wang.yongD@h3c.com> Message-id: faa5781afd354a96a0be152b288f636f@h3c.com Message-Id: <faa5781afd354a96a0be152b288f636f@h3c.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-01-13 16:41:45 +00:00
Max Reitz	503ca1262b	backup-top: Begin drain earlier When dropping backup-top, we need to drain the node before freeing the BlockCopyState. Otherwise, requests may still be in flight and then the assertion in shres_destroy() will fail. (This becomes visible in intermittent failure of 056.) Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191219182638.104621-1-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-01-06 14:26:23 +01:00
Andrey Shinkevich	0d483dce38	qcow2: Allow writing compressed data of multiple clusters QEMU currently supports writing compressed data of the size equal to one cluster. This patch allows writing QCOW2 compressed data that exceed one cluster. Now, we split buffered data into separate clusters and write them compressed using the block/aio_task API. Suggested-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1575288906-551879-3-git-send-email-andrey.shinkevich@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-01-06 13:43:07 +01:00
Andrey Shinkevich	f41388e0fb	block: introduce compress filter driver Allow writing all the data compressed through the filter driver. The written data will be aligned by the cluster size. Based on the QEMU current implementation, that data can be written to unallocated clusters only. May be used for a backup job. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 1575288906-551879-2-git-send-email-andrey.shinkevich@virtuozzo.com [mreitz: Replace NULL bdrv_get_format_name() by "(no format)"] Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-01-06 13:43:07 +01:00
Vladimir Sementsov-Ogievskiy	a1db8733d2	qcow2-bitmaps: fix qcow2_can_store_new_dirty_bitmap qcow2_can_store_new_dirty_bitmap works wrong, as it considers only bitmaps already stored in the qcow2 image and ignores persistent BdrvDirtyBitmap objects. So, let's instead count persistent BdrvDirtyBitmaps. We load all qcow2 bitmaps on open, so there should not be any bitmap in the image for which we don't have BdrvDirtyBitmaps version. If it is - it's a kind of corruption, and no reason to check for corruptions here (open() and close() are better places for it). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191014115126.15360-2-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-01-06 13:43:06 +01:00
PanNengyuan	88be15a9e1	throttle-groups: fix memory leak in throttle_group_set_limit: This avoid a memory leak when qom-set is called to set throttle_group limits, here is an easy way to reproduce: 1. run qemu-iotests as follow and check the result with asan: ./check -qcow2 184 Following is the asan output backtrack: Direct leak of 912 byte(s) in 3 object(s) allocated from: #0 0xffff8d7ab3c3 in __interceptor_calloc (/lib64/libasan.so.4+0xd33c3) #1 0xffff8d4c31cb in g_malloc0 (/lib64/libglib-2.0.so.0+0x571cb) #2 0x190c857 in qobject_input_start_struct /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:295 #3 0x19070df in visit_start_struct /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:49 #4 0x1948b87 in visit_type_ThrottleLimits qapi/qapi-visit-block-core.c:3759 #5 0x17e4aa3 in throttle_group_set_limits /mnt/sdc/qemu-master/qemu-4.2.0-rc0/block/throttle-groups.c:900 #6 0x1650eff in object_property_set /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/object.c:1272 #7 0x1658517 in object_property_set_qobject /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/qom-qobject.c:26 #8 0x15880bb in qmp_qom_set /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/qom-qmp-cmds.c:74 #9 0x157e3e3 in qmp_marshal_qom_set qapi/qapi-commands-qom.c:154 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: PanNengyuan <pannengyuan@huawei.com> Message-id: 1574835614-42028-1-git-send-email-pannengyuan@huawei.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-01-06 13:43:06 +01:00
Max Reitz	69c6449ff1	blkdebug: Allow taking/unsharing permissions Sometimes it is useful to be able to add a node to the block graph that takes or unshare a certain set of permissions for debugging purposes. This patch adds this capability to blkdebug. (Note that you cannot make blkdebug release or share permissions that it needs to take or cannot share, because this might result in assertion failures in the block layer. But if the blkdebug node has no parents, it will not take any permissions and share everything by default, so you can then freely choose what permissions to take and share.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191108123455.39445-4-mreitz@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2020-01-06 13:43:06 +01:00
Peter Maydell	1010af540b	Block layer patches: - qemu-img: fix info --backing-chain --image-opts - Error out on image creation with conflicting size options - Fix external snapshot with VM state - hmp: Allow using qdev ID for qemu-io command - Misc code cleanup - Many iotests improvements -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJd+7H/AAoJEH8JsnLIjy/WwIoP/igzSCfiH7yOzWexk7J7DUh7 SplXRiBQSQAZ8KD2umUk80UO6/W19rHAl3bLLu50D5uzQ5PNz1wSW2S81JZsj0nQ ErUEr5yxLGAI/zHIw/LrjUSEygbqcZ21R7foNmIy3lInEXf6gGwPQEnZnZgGkH0x v5jwv1HpUpEyetnHunX2cK3JpSNEJCYsV+X1nzDwhjYFuGQq0nlhgUZ8BDt6+fEr b/S3TuHmj/FXMH+5wrSr0LiCKaIyAwPdREvC+61aklMNFuA8YwF33tOaJLoWeQCD qhxFA8jVd8b3dQ0NZXXbliXST5d6AzjpeqbNzsyKze1duVfcfF1RYsC6McojpQ7f 9qFIXPC9IHe3sNkUZpTtvONLnmCeHTc9lwVIyVp9B5KXcmcLedvYM04WqhZ+/eeJ BfgCvXOyF9sL2GaVPFD+98E2DOYG4dI4rmzqn98kQj+RAXDFvvJ+zxuDUC9omE6T pVvxczGPXmnxP2ITRNljJyLVCN1YgE2g9S0GAXNM2i4uOXvhFToEwrJLFytVeVsw UwtYkE0F8KJyqjje5N3HiXclBVa++PK3+jjNFA+3B6XZ9wcZZRS2ZXmzsyNOJ7qG AeIPtrqBCz+QKsHtmS9CwYpvLrPafobHA3WS2z0dsevcWBHYBQBEM2EgXRFC3N3a mymYGxMOsG6+8onwqwjJ =laVP -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - qemu-img: fix info --backing-chain --image-opts - Error out on image creation with conflicting size options - Fix external snapshot with VM state - hmp: Allow using qdev ID for qemu-io command - Misc code cleanup - Many iotests improvements # gpg: Signature made Thu 19 Dec 2019 17:23:11 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (30 commits) iotests: Test external snapshot with VM state hmp: Allow using qdev ID for qemu-io command block: Activate recursively even for already active nodes iotests: 211: Remove duplication with VM.blockdev_create() iotests: 207: Remove duplication with VM.blockdev_create() iotests: 266: Convert to VM.blockdev_create() iotests: 237: Convert to VM.blockdev_create() iotests: 213: Convert to VM.blockdev_create() iotests: 212: Convert to VM.blockdev_create() iotests: 210: Convert to VM.blockdev_create() iotests: 206: Convert to VM.blockdev_create() iotests: 255: Drop blockdev_create() iotests: Create VM.blockdev_create() qcow2: Move error check of local_err near its assignment iotests: Fix IMGOPTSSYNTAX for nbd iotests/273: Filter format-specific information iotests: Add more "_require_drivers" checks to the shell-based tests MAINTAINERS: fix qcow2-bitmap.c under Dirty Bitmaps header qcow2: Use offset_into_cluster() iotests: Support job-complete in run_job() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-12-20 18:25:32 +00:00
Tuguoyi	66be5c3e78	qcow2: Move error check of local_err near its assignment The local_err check outside of the if block was necessary when it was introduced in commit `d1258dd0c8` because it needed to be executed even if qcow2_load_autoloading_dirty_bitmaps() returned false. After some modifications that all required the error check to remain where it is, commit `9c98f145df` finally moved the qcow2_load_dirty_bitmaps() call into the if block, so now the error check should be there, too. Signed-off-by: Guoyi Tu <tu.guoyi@h3c.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-12-19 14:31:52 +01:00
Alberto Garcia	74e60fb56a	qcow2: Use offset_into_cluster() There's a couple of places left in the qcow2 code that still do the calculation manually, so let's replace them. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-12-18 11:21:16 +01:00
Kevin Wolf	3b65081638	qcow2: Declare BDRV_REQ_NO_FALLBACK supported In the common case, qcow2_co_pwrite_zeroes() already only modifies metadata case, so we're fine with or without BDRV_REQ_NO_FALLBACK set. The only exception is when using an external data file, where the request is passed down to the block driver of the external data file. We are forwarding the BDRV_REQ_NO_FALLBACK flag there, though, so this is fine, too. Declare the flag supported therefore. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2019-12-18 11:21:16 +01:00
Vladimir Sementsov-Ogievskiy	d936613547	nbd: assert that Error** is not NULL in nbd_iter_channel_error All callers of nbd_iter_channel_error() pass the address of a local_err variable, and only call this function if an error has already occurred, using this function to propagate that error. This is already implied by its name (local_err instead of the classic errp), but it is worth additionally stressing this by adding an assertion to make it part of the function contract. The local_err parameter is not here to return information about nbd_iter_channel_error failure. Instead it's assumed to be filled when passed to the function. This is already stressed by its name (local_err, instead of classic errp). Stress it additionally by assertion. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20191205174635.18758-22-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2019-12-18 08:43:19 +01:00
Vladimir Sementsov-Ogievskiy	e53a578a8b	block/snapshot: rename Error ** parameter to more common errp Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@Redhat.com> Message-Id: <20191205174635.18758-11-vsementsov@virtuozzo.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2019-12-18 08:43:19 +01:00
Vladimir Sementsov-Ogievskiy	f56281abd9	block/qcow2-bitmap: fix crash bug in qcow2_co_remove_persistent_dirty_bitmap Here is double bug: First, return error but not set errp. This may lead to: qmp block-dirty-bitmap-remove may report success when actually failed block-dirty-bitmap-remove used in a transaction will crash, as qmp_transaction will think that it returned success and will call block_dirty_bitmap_remove_commit which will crash, as state->bitmap is NULL Second (like in anecdote), this case is not an error at all. As it is documented in the comment above bdrv_co_remove_persistent_dirty_bitmap definition, absence of bitmap is not an error, and similar case handled at start of qcow2_co_remove_persistent_dirty_bitmap, it returns 0 when there is no bitmaps at all. But when there are some bitmaps, but not the requested one, it return error with errp unset. Fix that. Trigger: 1. create persistent bitmap A 2. shutdown vm (bitmap A is synced) 3. start vm 4. create persistent bitmap B 5. remove bitmap B - it fails (and crashes if in transaction) Potential workaround (rather invasive to ask clients to implement it): 1. create persistent bitmap A 2. shutdown vm 3. start vm 4. create persistent bitmap B 5. remember, that we want to remove bitmap B after vm shutdown ... some other operations ... 6. vm shutdown 7. start vm in stopped mode, and remove all bitmaps marked for removing 8. stop vm Fixes: `b56a1e3175` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20191205193049.30666-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> [eblake: commit message tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>	2019-12-09 09:23:04 -06:00
Markus Armbruster	cb09104ea8	block/file-posix: Fix laio_init() error handling crash bug raw_aio_attach_aio_context() passes uninitialized Error *local_err by reference to laio_init() via aio_setup_linux_aio(). When laio_init() fails, it passes it on to error_setg_errno(), tripping error_setv()'s assertion unless @local_err is null by dumb luck. Fix by initializing @local_err properly. Fixes: `ed6e216171` Cc: Nishanth Aravamudan <naravamudan@digitalocean.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20191130194240.10517-4-armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>	2019-12-02 16:14:41 +01:00
Vladimir Sementsov-Ogievskiy	a505475b95	block/qcow2-bitmap: fix bitmap migration Fix bitmap migration with dirty-bitmaps capability enabled and shared storage. We should ignore IN_USE bitmaps in the image on target, when migrating bitmaps through migration channel, see new comment below. Fixes: `74da6b9435` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191125125229.13531-2-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-26 14:18:01 +01:00
Eric Blake	93676c88d7	nbd: Don't send oversize strings Qemu as server currently won't accept export names larger than 256 bytes, nor create dirty bitmap names longer than 1023 bytes, so most uses of qemu as client or server have no reason to get anywhere near the NBD spec maximum of a 4k limit per string. However, we weren't actually enforcing things, ignoring when the remote side violates the protocol on input, and also having several code paths where we send oversize strings on output (for example, qemu-nbd --description could easily send more than 4k). Tighten things up as follows: client: - Perform bounds check on export name and dirty bitmap request prior to handing it to server - Validate that copied server replies are not too long (ignoring NBD_INFO_* replies that are not copied is not too bad) server: - Perform bounds check on export name and description prior to advertising it to client - Reject client name or metadata query that is too long - Adjust things to allow full 4k name limit rather than previous 256 byte limit Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20191114024635.11363-4-eblake@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-11-18 16:01:34 -06:00
Eric Blake	cf7c49cf6a	bitmap: Enforce maximum bitmap name length We document that for qcow2 persistent bitmaps, the name cannot exceed 1023 bytes. It is inconsistent if transient bitmaps do not have to abide by the same limit, and it is unlikely that any existing client even cares about using bitmap names this long. It's time to codify that ALL bitmaps managed by qemu (whether persistent in qcow2 or not) have a documented maximum length. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20191114024635.11363-3-eblake@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-11-18 16:01:34 -06:00
Max Reitz	24552feb6a	qcow2: Fix QCOW2_COMPRESSED_SECTOR_MASK Masks for L2 table entries should have 64 bit. Fixes: `b6c246942b` Buglink: https://bugs.launchpad.net/qemu/+bug/1850000 Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191028161841.1198-2-mreitz@redhat.com Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-07 14:37:46 +01:00
Tuguoyi	570542ecb1	qcow2-bitmap: Fix uint64_t left-shift overflow There are two issues in In check_constraints_on_bitmap(), 1) The sanity check on the granularity will cause uint64_t integer left-shift overflow when cluster_size is 2M and the granularity is BIGGER than 32K. 2) The way to calculate image size that the maximum bitmap supported can map to is a bit incorrect. This patch fix it by add a helper function to calculate the number of bytes needed by a normal bitmap in image and compare it to the maximum bitmap bytes supported by qemu. Fixes: `5f72826e7f` Signed-off-by: Guoyi Tu <tu.guoyi@h3c.com> Message-id: 4ba40cd1e7ee4a708b40899952e49f22@h3c.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-07 14:37:33 +01:00
Max Reitz	292d06b925	block/file-posix: Let post-EOF fallocate serialize The XFS kernel driver has a bug that may cause data corruption for qcow2 images as of qemu commit `c8bb23cbdb`. We can work around it by treating post-EOF fallocates as serializing up until infinity (INT64_MAX in practice). Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191101152510.11719-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-04 09:33:51 +01:00
Max Reitz	c28107e9e5	block: Add bdrv_co_get_self_request() Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191101152510.11719-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-04 09:32:51 +01:00
Max Reitz	304d9d7f03	block: Make wait/mark serialising requests public Make both bdrv_mark_request_serialising() and bdrv_wait_serialising_requests() public so they can be used from block drivers. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191101152510.11719-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-04 09:29:15 +01:00
Vladimir Sementsov-Ogievskiy	dcfbece684	block/block-copy: fix s->copy_size for compressed cluster `0e2402452f` allowed writes larger than cluster, but that's unsupported for compressed write. Fix it. Fixes: `0e2402452f` Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191029150934.26416-1-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-11-04 09:21:45 +01:00
Peter Maydell	aaffb85335	Block patches for softfreeze: - iotest patches - Improve performance of the mirror block job in write-blocking mode - Limit memory usage for the backup block job - Add discard and write-zeroes support to the NVMe host block driver - Fix a bug in the mirror job - Prevent the qcow2 driver from creating technically non-compliant qcow2 v3 images (where there is not enough extra data for snapshot table entries) - Allow callers of bdrv_truncate() (etc.) to determine whether the file must be resized to the exact given size or whether it is OK for block devices not to shrink -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAl2224ESHG1yZWl0ekBy ZWRoYXQuY29tAAoJEPQH2wBh1c9AeXMH/RXKEX4BZYMRKCe41P18tJC9Bl2x0T20 YeOsZVvpARlr7o/36BF2kGFF4MnL0OQ+9ELuyROX865rk/VL2rWqnHDE5oQM889a dFwMs+0zvNbig3iLNcw0H5OkE2mrdM+a1EUdn/lBe/39Z8dPqPxRGqIYHq38Ugdu emwSy1nWen7o0f71HRJfyVtI3KcrzXx71FrA/FY2yL/eHz+zRYGZj2SpAdFPkXP/ lgaz+m0tWhnSW1QzEOXB0Gh69ULt/DczCinYmv5qUY1noW5TPPtiDNCQTts5O4ba oJsR3AJv5/l9m65JTmiyQSqnQfPcstrQ5FqOcSnP637cfqUFyWsvdks= =L7v1 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-10-28' into staging Block patches for softfreeze: - iotest patches - Improve performance of the mirror block job in write-blocking mode - Limit memory usage for the backup block job - Add discard and write-zeroes support to the NVMe host block driver - Fix a bug in the mirror job - Prevent the qcow2 driver from creating technically non-compliant qcow2 v3 images (where there is not enough extra data for snapshot table entries) - Allow callers of bdrv_truncate() (etc.) to determine whether the file must be resized to the exact given size or whether it is OK for block devices not to shrink # gpg: Signature made Mon 28 Oct 2019 12:13:53 GMT # gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40 # gpg: issuer "mreitz@redhat.com" # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full] # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * remotes/maxreitz/tags/pull-block-2019-10-28: (69 commits) qemu-iotests: restrict 264 to qcow2 only Revert "qemu-img: Check post-truncation size" block: Pass truncate exact=true where reasonable block: Let format drivers pass @exact block: Evaluate @exact in protocol drivers block: Add @exact parameter to bdrv_co_truncate() block: Do not truncate file node when formatting block/cor: Drop cor_co_truncate() block: Handle filter truncation like native impl. iotests: Test qcow2's snapshot table handling iotests: Add peek_file* functions qcow2: Fix v3 snapshot table entry compliancy qcow2: Repair snapshot table with too many entries qcow2: Fix overly long snapshot tables qcow2: Keep track of the snapshot table length qcow2: Fix broken snapshot table entries qcow2: Add qcow2_check_fix_snapshot_table() qcow2: Separate qcow2_check_read_snapshot_table() qcow2: Write v3-compliant snapshot list on upgrade qcow2: Put qcow2_upgrade() into its own function ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-10-28 14:40:01 +00:00
Max Reitz	e8d04f9237	block: Pass truncate exact=true where reasonable This is a change in behavior, so all instances need a good justification. The comments added here should explain my reasoning. qed already had a comment that suggests it always expected bdrv_truncate()/blk_truncate() to behave as if exact=true were passed (`c743849bee` came eight months before `55b949c847`), so it was simply broken until now. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-8-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> [mreitz: Changed comment in qed.c to explain why a new QED file must be empty, as requested and suggested by Maxim] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 12:08:45 +01:00
Max Reitz	e61a28a9b6	block: Let format drivers pass @exact When truncating a format node, the @exact parameter is generally handled simply by virtue of the format storing the new size in the image metadata. Such formats do not need to pass on the parameter to their file nodes. There are exceptions, though: - raw and crypto cannot store the image size, and thus must pass on @exact. - When using qcow2 with an external data file, it just makes sense to keep its size in sync with the qcow2 virtual disk (because the external data file is the virtual disk). Therefore, we should pass @exact when truncating it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-7-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 12:05:30 +01:00
Max Reitz	82325ae5f2	block: Evaluate @exact in protocol drivers We have two protocol drivers that return success when trying to shrink a block device even though they cannot shrink it. This behavior is now only allowed with exact=false, so they should return an error with exact=true. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-6-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 12:05:24 +01:00
Max Reitz	c80d8b06cf	block: Add @exact parameter to bdrv_co_truncate() We have two drivers (iscsi and file-posix) that (in some cases) return success from their .bdrv_co_truncate() implementation if the block device is larger than the requested offset, but cannot be shrunk. Some callers do not want that behavior, so this patch adds a new parameter that they can use to turn off that behavior. This patch just adds the parameter and lets the block/io.c and block/block-backend.c functions pass it around. All other callers always pass false and none of the implementations evaluate it, so that this patch does not change existing behavior. Future patches take care of that. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 12:00:07 +01:00
Max Reitz	26536c7fc2	block: Do not truncate file node when formatting There is no reason why the format drivers need to truncate the protocol node when formatting it. When using the old .bdrv_co_create_ops() interface, the file will be created with no size option anyway, which generally gives it a size of 0. (Exceptions are block devices, which cannot be truncated anyway.) When using blockdev-create, the user must have given the file node some size anyway, so there is no reason why we should override that. qed is an exception, it needs the file to start completely empty (as explained by `c743849bee`). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-4-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:59:57 +01:00
Max Reitz	bb8160eb78	block/cor: Drop cor_co_truncate() No other filter driver has a .bdrv_co_truncate() implementation, and there is no need to because the general block layer code can handle it just as well. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-3-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:59:51 +01:00
Max Reitz	6b7e8f8b1c	block: Handle filter truncation like native impl. Make the filter truncation (passing it through to bs->file) a first-class citizen and handle it exactly as if it was the filter driver's native implementation of .bdrv_co_truncate(). I do not see a reason not to, it makes the code a bit shorter, and may be even more correct because this gets us to finish the write_req that we prepared before (may be important to e.g. bring dirty bitmaps to the correct size). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-2-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:59:45 +01:00
Max Reitz	e40e6e88f6	qcow2: Fix v3 snapshot table entry compliancy qcow2 v3 images require every snapshot table entry to have at least 16 bytes of extra data. If they do not, let qemu-img check -r all fix it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-15-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:09 +01:00
Max Reitz	d2b1d1ec73	qcow2: Repair snapshot table with too many entries The user cannot choose which snapshots are removed. This is fine because we have chosen the maximum snapshot table size to be so large (65536 entries) that it cannot be reasonably reached. If the snapshot table exceeds this size, the image has probably been corrupted in some way; in this case, it is most important to just make the image usable such that the user can copy off at least the active layer. (Also note that the snapshots will be removed only with "-r all", so a plain "check" or "check -r leaks" will not delete any data.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-14-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:08 +01:00
Max Reitz	099febf3ac	qcow2: Fix overly long snapshot tables We currently refuse to open qcow2 images with overly long snapshot tables. This patch makes qemu-img check -r all drop all offending entries past what we deem acceptable. The user cannot choose which snapshots are removed. This is fine because we have chosen the maximum snapshot table size to be so large (64 MB) that it cannot be reasonably reached. If the snapshot table exceeds this size, the image has probably been corrupted in some way; in this case, it is most important to just make the image usable such that the user can copy off at least the active layer. (Also note that the snapshots will be removed only with "-r all", so a plain "check" or "check -r leaks" will not delete any data.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-13-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:04 +01:00
Max Reitz	624143355c	qcow2: Keep track of the snapshot table length When repairing the snapshot table, we truncate entries that have too much extra data. This frees up space that we do not have to count towards the snapshot table size. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-12-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:02 +01:00
Max Reitz	f91f1f159b	qcow2: Fix broken snapshot table entries The only case where we currently reject snapshot table entries is when they have too much extra data. Fix them with qemu-img check -r all by counting it as a corruption, reducing their extra_data_size, and then letting qcow2_check_fix_snapshot_table() do the rest. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-11-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:02 +01:00
Max Reitz	fe446b5da2	qcow2: Add qcow2_check_fix_snapshot_table() qcow2_check_read_snapshot_table() can perform consistency checks, but it cannot fix everything. Specifically, it cannot allocate new clusters, because that should wait until the refcount structures are known to be consistent (i.e., after qcow2_check_refcounts()). Thus, it cannot call qcow2_write_snapshots(). Do that in qcow2_check_fix_snapshot_table(), which is called after qcow2_check_refcounts(). Currently, there is nothing that would set result->corruptions, so this is a no-op. A follow-up patch will change that. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-10-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:01 +01:00
Max Reitz	8bc584fe03	qcow2: Separate qcow2_check_read_snapshot_table() Reading the snapshot table can fail. That is a problem when we want to repair the image. Therefore, stop reading the snapshot table in qcow2_do_open() in check mode. Instead, add a new function qcow2_check_read_snapshot_table() that reads the snapshot table at a later point. In the future, we want to handle errors here and fix them. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-9-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:54:00 +01:00
Max Reitz	0a85af351d	qcow2: Write v3-compliant snapshot list on upgrade qcow2 v3 requires every snapshot table entry to have two extra data fields: The 64-bit VM state size, and the virtual disk size. Both are optional for v2 images, so they may not be present. qcow2_upgrade() therefore should update the snapshot table to ensure all entries have these extra data fields. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1727347 Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-8-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:53:52 +01:00
Max Reitz	722efb0c7c	qcow2: Put qcow2_upgrade() into its own function This does not make sense right now, but it will make sense once we need to do more than to just update s->qcow_version. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-7-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:53:20 +01:00
Max Reitz	e0314b56b2	qcow2: Make qcow2_write_snapshots() public Updating the snapshot list will be useful when upgrading a v2 image to v3, so we will need to call this function in qcow2.c. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-6-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:53:17 +01:00
Max Reitz	fcf9a6b728	qcow2: Keep unknown extra snapshot data The qcow2 specification says to ignore unknown extra data fields in snapshot table entries. Currently, we discard it whenever we update the image, which is a bit different from "ignore". This patch makes the qcow2 driver keep all unknown extra data fields when updating an image's snapshot table. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-5-mreitz@redhat.com [mreitz: Adjusted comments as proposed by Eric] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:52:57 +01:00
Max Reitz	ecf6c7c0c1	qcow2: Add Error ** to qcow2_read_snapshots() Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:51:09 +01:00
Max Reitz	d8fa8442ad	qcow2: Use endof() Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:51:08 +01:00
Max Reitz	f93c3add3a	mirror: Do not dereference invalid pointers mirror_exit_common() may be called twice (if it is called from mirror_prepare() and fails, it will be called from mirror_abort() again). In such a case, many of the pointers in the MirrorBlockJob object will already be freed. This can be seen most reliably for s->target, which is set to NULL (and then dereferenced by blk_bs()). Cc: qemu-stable@nongnu.org Fixes: `737efc1eda` Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191014153931.20699-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:49:37 +01:00
Maxim Levitsky	e87a09d625	block/nvme: add support for discard Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190913133627.28450-3-mlevitsk@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:34:35 +01:00
Maxim Levitsky	e0dd95e373	block/nvme: add support for write zeros Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190913133627.28450-2-mlevitsk@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:34:30 +01:00
Vladimir Sementsov-Ogievskiy	0e2402452f	block/block-copy: increase buffered copy request No reason to limit buffered copy to one cluster. Let's allow up to 1 MiB. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-7-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	7f739d0e53	block/block-copy: add memory limit Currently total allocation for parallel requests to block-copy instance is unlimited. Let's limit it to 128 MiB. For now block-copy is used only in backup, so actually we limit total allocation for backup job. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	e332a726da	block/block-copy: refactor copying Merge copying code into one function block_copy_do_copy, which only calls bdrv_ io functions and don't do any synchronization (like dirty bitmap set/reset). Refactor block_copy() function so that it takes full decision about size of chunk to be copied and does all the synchronization (checking intersecting requests, set/reset dirty bitmaps). It will help: - introduce parallel processing of block_copy iterations: we need to calculate chunk size, start async chunk copying and go to the next iteration - simplify synchronization improvement (like memory limiting in further commit and reducing critical section (now we lock the whole requested range, when actually we need to lock only dirty region which we handle at the moment)) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	b3b7036afb	block/block-copy: limit copy_range_size to 16 MiB Large copy range may imply memory allocation and large io effort, so using 2G copy range request may be bad idea. Let's limit it to 16 MiB. It also helps the following patch to refactor copy-with-offload fallback to copy-with-bounce-buffer. Note, that total memory usage of backup is still not limited, it will be fixed in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	3816edd2cb	block/block-copy: allocate buffer in block_copy_with_bounce_buffer Move bounce_buffer allocation block_copy_with_bounce_buffer. This commit simplifies further work on implementing copying by larger chunks (of different size) and further asynchronous handling of block_copy iterations (with help of block/aio_task API). Allocation works fast, a lot faster than disk io, so it's not a problem that we now allocate/free bounce_buffer more times. And we anyway will have to allocate several bounce_buffers for parallel execution of loop iterations in future. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:31 +01:00
Vladimir Sementsov-Ogievskiy	994b44ab20	Revert "mirror: Only mirror granularity-aligned chunks" This reverts commit `9adc1cb49a`. "mirror: Only mirror granularity-aligned chunks" Since previous commit unaligned chunks are supported by do_sync_target_write. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191011090711.19940-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:30 +01:00
Vladimir Sementsov-Ogievskiy	dbdf699cad	block/mirror: support unaligned write in active mirror Prior `9adc1cb49a` do_sync_target_write had a bug: it reset aligned-up region in the dirty bitmap, which means that we may not copy some bytes and assume them copied, which actually leads to producing corrupted target. So `9adc1cb49a` forced dirty bitmap granularity to be request_alignment for mirror-top filter, so we are not working with unaligned requests. However forcing large alignment obviously decreases performance of unaligned requests. This commit provides another solution for the problem: if unaligned padding is already dirty, we can safely ignore it, as 1. It's dirty, it will be copied by mirror_iteration anyway 2. It's dirty, so skipping it now we don't increase dirtiness of the bitmap and therefore don't damage "synchronicity" of the write-blocking mirror. If unaligned padding is not dirty, we just write it, no reason to touch dirty bitmap if we succeed (on failure we'll set the whole region ofcourse, but we loss "synchronicity" on failure anyway). Note: we need to disable dirty_bitmap, otherwise we will not be able to see in do_sync_target_write bitmap state before current operation. We may of course check dirty bitmap before the operation in bdrv_mirror_top_do_write and remember it, but we don't need active dirty bitmap for write-blocking mirror anyway. New code-path is unused until the following commit reverts `9adc1cb49a`. Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191011090711.19940-5-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:30 +01:00
Vladimir Sementsov-Ogievskiy	b30168647f	block/block-backend: add blk_co_pwritev_part Add blk write function with qiov_offset parameter. It's needed for the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191011090711.19940-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:30 +01:00
Vladimir Sementsov-Ogievskiy	5c511ac375	block/mirror: simplify do_sync_target_write do_sync_target_write is called from bdrv_mirror_top_do_write after write/discard operation, all inside active_write/active_write_settle protecting us from mirror iteration. So the whole area is dirty for sure, no reason to examine dirty bitmap. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191011090711.19940-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-28 11:22:30 +01:00
Wei Yang	038adc2f58	core: replace getpagesize() with qemu_real_host_page_size There are three page size in qemu: real host page size host page size target page size All of them have dedicate variable to represent. For the last two, we use the same form in the whole qemu project, while for the first one we use two forms: qemu_real_host_page_size and getpagesize(). qemu_real_host_page_size is defined to be a replacement of getpagesize(), so let it serve the role. [Note] Not fully tested for some arch or device. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-26 15:38:06 +02:00
Kevin Wolf	5e97855052	qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation() qcow2_detect_metadata_preallocation() calls qcow2_get_refcount() which requires s->lock to be taken to protect its accesses to the refcount table and refcount blocks. However, nothing in this code path actually took the lock. This could cause the same cache entry to be used by two requests at the same time, for different tables at different offsets, resulting in image corruption. As it would be preferable to base the detection on consistent data (even though it's just heuristics), let's take the lock not only around the qcow2_get_refcount() calls, but around the whole function. This patch takes the lock in qcow2_co_block_status() earlier and asserts in qcow2_detect_metadata_preallocation() that we hold the lock. Fixes: `69f47505ee` Cc: qemu-stable@nongnu.org Reported-by: Michael Weiser <michael.weiser@gmx.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Michael Weiser <michael.weiser@gmx.de> Reviewed-by: Michael Weiser <michael.weiser@gmx.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2019-10-25 15:18:55 +02:00
Vladimir Sementsov-Ogievskiy	8ccf458af5	block/backup: drop dead code from backup_job_create After commit `00e30f05de`, there is no more "goto error" points after job creation, so after "error:" @job is always NULL and we don't need roll-back job creation. Reported-by: Coverity (CID 1406402) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-25 15:15:01 +02:00
Vladimir Sementsov-Ogievskiy	f7651539d8	block/nbd: nbd reconnect Implement reconnect. To achieve this: 1. add new modes: connecting-wait: means, that reconnecting is in progress, and there were small number of reconnect attempts, so all requests are waiting for the connection. connecting-nowait: reconnecting is in progress, there were a lot of attempts of reconnect, all requests will return errors. two old modes are used too: connected: normal state quit: exiting after fatal error or on close Possible transitions are: * -> quit connecting-* -> connected connecting-wait -> connecting-nowait (transition is done after reconnect-delay seconds in connecting-wait mode) connected -> connecting-wait 2. Implement reconnect in connection_co. So, in connecting-* mode, connection_co, tries to reconnect unlimited times. 3. Retry nbd queries on channel error, if we are in connecting-wait state. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20191009084158.15614-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>	2019-10-22 09:22:07 -05:00
Vladimir Sementsov-Ogievskiy	4dd09f6223	qcow2-bitmap: move bitmap reopen-rw code to qcow2_reopen_commit The only reason I can imagine for this strange code at the very-end of bdrv_reopen_commit is the fact that bs->read_only updated after calling drv->bdrv_reopen_commit in bdrv_reopen_commit. And in the same time, prior to previous commit, qcow2_reopen_bitmaps_rw did a wrong check for being writable, when actually it only need writable file child not self. So, as it's fixed, let's move things to correct place. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Acked-by: Max Reitz <mreitz@redhat.com> Message-id: 20190927122355.7344-10-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:53:28 -04:00
Vladimir Sementsov-Ogievskiy	f6333cbf8b	block/qcow2-bitmap: fix and improve qcow2_reopen_bitmaps_rw - Correct check for write access to file child, and in correct place (only if we want to write). - Support reopen rw -> rw (which will be used in following commit), for example, !bdrv_dirty_bitmap_readonly() is not a corruption if bitmap is marked IN_USE in the image. - Consider unexpected bitmap as a corruption and check other combinations of in-image and in-RAM bitmaps. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190927122355.7344-9-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:53:28 -04:00
Vladimir Sementsov-Ogievskiy	644ddbb754	block/qcow2-bitmap: do not remove bitmaps on reopen-ro qcow2_reopen_bitmaps_ro wants to store bitmaps and then mark them all readonly. But the latter don't work, as qcow2_store_persistent_dirty_bitmaps removes bitmaps after storing. It's OK for inactivation but bad idea for reopen-ro. And this leads to the following bug: Assume we have persistent bitmap 'bitmap0'. Create external snapshot bitmap0 is stored and therefore removed Commit snapshot now we have no bitmaps Do some writes from guest () they are not marked in bitmap Shutdown Start bitmap0 is loaded as valid, but it is actually broken! It misses writes () Incremental backup it will be inconsistent So, let's stop removing bitmaps on reopen-ro. But don't rejoice: reopening bitmaps to rw is broken too, so the whole scenario will not work after this patch and we can't enable corresponding test cases in 260 iotests still. Reopening bitmaps rw will be fixed in the following patches. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190927122355.7344-7-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	bd429a884c	block/qcow2-bitmap: drop qcow2_reopen_bitmaps_rw_hint() The function is unused, drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190927122355.7344-6-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	f88676c149	block/qcow2-bitmap: get rid of bdrv_has_changed_persistent_bitmaps Firstly, no reason to optimize failure path. Then, function name is ambiguous: it checks for readonly and similar things, but someone may think that it will ignore normal bitmaps which was just unchanged, and this is in bad relation with the fact that we should drop IN_USE flag for unchanged bitmaps in the image. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190927122355.7344-5-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	ef9041a7b8	block/dirty-bitmap: refactor bdrv_dirty_bitmap_next bdrv_dirty_bitmap_next is always used in same pattern. So, split it into _next and _first, instead of combining two functions into one and add FOR_EACH_DIRTY_BITMAP macro. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-5-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	1e63830160	block/dirty-bitmap: drop BdrvDirtyBitmap.mutex mutex field is just a pointer to bs->dirty_bitmap_mutex, so no needs to store it in BdrvDirtyBitmap when we have bs pointer in it (since previous patch). Drop mutex field. Constantly use bdrv_dirty_bitmaps_lock/unlock in block/dirty-bitmap.c to make it more obvious that it's not per-bitmap lock. Still, for simplicity, leave bdrv_dirty_bitmap_lock/unlock functions as an external API. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-4-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	5deb6cbd1f	block/dirty-bitmap: add bs link Add bs field to BdrvDirtyBitmap structure. Drop BlockDriverState parameter from bitmap APIs where possible. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-3-vsementsov@virtuozzo.com [Rebased on top of block-copy. --js] Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	767db3aad8	block/dirty-bitmap: drop meta Drop meta bitmaps, as they are unused. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	d2c3080e41	block/qcow2: proper locking on bitmap add/remove paths qmp_block_dirty_bitmap_add and do_block_dirty_bitmap_remove do acquire aio context since `0a6c86d024`. But this is not enough: we also must lock qcow2 mutex when access in-image metadata. Especially it concerns freeing qcow2 clusters. To achieve this, move qcow2_can_store_new_dirty_bitmap and qcow2_remove_persistent_dirty_bitmap to coroutine context. Since we work in coroutines in correct aio context, we don't need context acquiring in blockdev.c anymore, drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190920082543.23444-4-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	b56a1e3175	block/dirty-bitmap: return int from bdrv_remove_persistent_dirty_bitmap It's more comfortable to not deal with local_err. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190920082543.23444-3-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	85cc8a4f6b	block: move bdrv_can_store_new_dirty_bitmap to block/dirty-bitmap.c block/dirty-bitmap.c seems to be more appropriate for it and bdrv_remove_persistent_dirty_bitmap already in it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190920082543.23444-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Max Reitz	d1b9d19f99	qcow2: Limit total allocation range to INT_MAX When the COW areas are included, the size of an allocation can exceed INT_MAX. This is kind of limited by handle_alloc() in that it already caps avail_bytes at INT_MAX, but the number of clusters still reflects the original length. This can have all sorts of effects, ranging from the storage layer write call failing to image corruption. (If there were no image corruption, then I suppose there would be data loss because the .cow_end area is forced to be empty, even though there might be something we need to COW.) Fix all of it by limiting nb_clusters so the equivalent number of bytes will not exceed INT_MAX. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Alberto Garcia	f2208fdc5b	block: Reject misaligned write requests with BDRV_REQ_NO_FALLBACK The BDRV_REQ_NO_FALLBACK flag means that an operation should only be performed if it can be offloaded or otherwise performed efficiently. However a misaligned write request requires a RMW so we should return an error and let the caller decide how to proceed. This hits an assertion since commit `c8bb23cbdb` if the required alignment is larger than the cluster size: qemu-img create -f qcow2 -o cluster_size=2k img.qcow2 4G qemu-io -c "open -o driver=qcow2,file.align=4k blkdebug::img.qcow2" \ -c 'write 0 512' qemu-io: block/io.c:1127: bdrv_driver_pwritev: Assertion `!(flags & BDRV_REQ_NO_FALLBACK)' failed. Aborted The reason is that when writing to an unallocated cluster we try to skip the copy-on-write part and zeroize it using BDRV_REQ_NO_FALLBACK instead, resulting in a write request that is too small (2KB cluster size vs 4KB required alignment). Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Pavel Dovgalyuk	e4ec5ad464	replay: add BH oneshot event for block layer Replay is capable of recording normal BH events, but sometimes there are single use callbacks scheduled with aio_bh_schedule_oneshot function. This patch enables recording and replaying such callbacks. Block layer uses these events for calling the completion function. Replaying these calls makes the execution deterministic. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Pavel Dovgalyuk	c8aa7895eb	replay: don't drain/flush bdrv queue while RR is working In record/replay mode bdrv queue is controlled by replay mechanism. It does not allow saving or loading the snapshots when bdrv queue is not empty. Stopping the VM is not blocked by nonempty queue, but flushing the queue is still impossible there, because it may cause deadlocks in replay mode. This patch disables bdrv_drain_all and bdrv_flush_all in record/replay mode. Stopping the machine when the IO requests are not finished is needed for the debugging. E.g., breakpoint may be set at the specified step, and forcing the IO requests to finish may break the determinism of the execution. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Pavel Dovgalyuk	3c6c4348f2	block: implement bdrv_snapshot_goto for blkreplay This patch enables making snapshots with blkreplay used in block devices. This function is required to make bdrv_snapshot_goto without calling .bdrv_open which is not implemented. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Peter Lieven	6caaad46de	block/vhdx: add check for truncated image files qemu is currently not able to detect truncated vhdx image files. Add a basic check if all allocated blocks are reachable at open and report all errors during bdrv_co_check. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-10-14 17:12:48 +02:00
Maxim Levitsky	e99754b42e	nbd: add empty .bdrv_reopen_prepare Fixes commit job / qemu-img commit, when commiting qcow2 file which is based on nbd export. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1718727 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190930213820.29777-2-mlevitsk@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	00e30f05de	block/backup: use backup-top instead of write notifiers Drop write notifiers and use filter node instead. = Changes = 1. Add filter-node-name argument for backup qmp api. We have to do it in this commit, as 257 needs to be fixed. 2. There are no more write notifiers here, so is_write_notifier parameter is dropped from block-copy paths. 3. To sync with in-flight requests at job finish we now have drained removing of the filter, we don't need rw-lock. 4. Block-copy is now using BdrvChildren instead of BlockBackends 5. As backup-top owns these children, we also move block-copy state into backup-top's ownership. = Iotest changes = 56: op-blocker doesn't shoot now, as we set it on source, but then check on filter, when trying to start second backup. To keep the test we instead can catch another collision: both jobs will get 'drive0' job-id, as job-id parameter is unspecified. To prevent interleaving with file-posix locks (as they are dependent on config) let's use another target for second backup. Also, it's obvious now that we'd like to drop this op-blocker at all and add a test-case for two backups from one node (to different destinations) actually works. But not in these series. 141: Output changed: prepatch, "Node is in use" comes from bdrv_has_blk check inside qmp_blockdev_del. But we've dropped block-copy blk objects, so no more blk objects on source bs (job blk is on backup-top filter bs). New message is from op-blocker, which is the next check in qmp_blockdev_add. 257: The test wants to emulate guest write during backup. They should go to filter node, not to original source node, of course. Therefore we need to specify filter node name and use it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-6-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	7df7868b96	block: introduce backup-top filter driver Backup-top filter caches write operations and does copy-before-write operations. The driver will be used in backup instead of write-notifiers. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-5-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	0f4b02b73e	block/block-copy: split block_copy_set_callbacks function Split block_copy_set_callbacks out of block_copy_state_new. It's needed for further commit: block-copy will use BdrvChildren of backup-top filter, so it will be created from backup-top filter creation function. But callbacks will still belong to backup job and will be set in separate. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-4-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	843670f30f	block/backup: move write_flags calculation inside backup_job_create This is logic-less refactoring, which simplifies further patch, as we'll need write_flags for backup-top filter creation and backup-top should be created before block job creation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-3-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	a6ffe1998c	block/backup: move in-flight requests handling from backup to block-copy Move synchronization mechanism to block-copy, to be able to use one block-copy instance from backup job and backup-top filter in parallel. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20191001131409.14202-2-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Anton Nefedov	d924559953	qapi: query-blockstat: add driver specific file-posix stats A block driver can provide a callback to report driver-specific statistics. file-posix driver now reports discard statistics Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20190923121737.83281-10-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Anton Nefedov	1c45036636	file-posix: account discard operations This will help to identify how many of the user-issued discard operations (accounted on a device level) have actually suceeded down on the host file (even though the numbers will not be exactly the same if non-raw format driver is used (e.g. qcow2 sending metadata discards)). Note that these numbers will not include discards triggered by write-zeroes + MAY_UNMAP calls. Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190923121737.83281-9-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Anton Nefedov	f344446654	block: add empty account cookie type Each block_acct_done/failed call is designed to correspond to a previous block_acct_start call, which initializes the stats cookie. However sometimes it is not the case, e.g. some error paths might report the same cookie twice because it is hard to accurately track if the cookie was reported yet or not. This patch cleans the cookie after report. (Note: block_acct_failed/done without a previous block_acct_start at all should be avoided. Uninitialized cookie might hold a garbage value and there is still "< BLOCK_MAX_IOTYPE" assertion for that) It will be particularly useful in ide code where it's hard to keep track whether the request done its accounting or not: in the following patch of the series, trim requests will do the accounting separately. Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190923121737.83281-4-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Anton Nefedov	159f85ddc8	qapi: add unmap to BlockDeviceStats Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20190923121737.83281-3-anton.nefedov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:18 +02:00
Vladimir Sementsov-Ogievskiy	beb5f5450d	block: move block_copy from block/backup.c to separate file Split block_copy to separate file, to be cleanly shared with backup-top filter driver in further commits. It's a clean movement, the only change is drop "static" from interface functions. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190920142056.12778-8-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	0e23e382b7	block/backup: fix block-comment style We need to fix comment style around block-copy functions before further moving them to separate file to satisfy checkpatch. But do more: fix all comments style. Also, seems like doubled first asterisk is not forbidden, but drop it too for consistency. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190920142056.12778-7-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	2c8074c453	block/backup: introduce BlockCopyState Split copying code part from backup to "block-copy", including separate state structure and function renaming. This is needed to share it with backup-top filter driver in further commits. Notes: 1. As BlockCopyState keeps own BlockBackend objects, remaining job->common.blk users only use it to get bs by blk_bs() call, so clear job->commen.blk permissions set in block_job_create and add job->source_bs to be used instead of blk_bs(job->common.blk), to keep it more clear which bs we use when introduce backup-top filter in further commit. 2. Rename s/initializing_bitmap/skip_unallocated/ to sound a bit better as interface to BlockCopyState 3. Split is not very clean: there left some duplicated fields, backup code uses some BlockCopyState fields directly, let's postpone it for further improvements and keep this comment simpler for review. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190920142056.12778-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00
Vladimir Sementsov-Ogievskiy	372c67ea61	block/backup: improve comment about image fleecing Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190920142056.12778-5-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-10-10 10:56:17 +02:00

... 16 17 18 19 20 ...

6207 Commits